Run a Script

This example shows you how to create a job running “Hello World” Python scripts. Although Python scripts are used here, you could also run Bash or Shell scripts. The Logging service log and log group are defined in the infrastructure. The output of the script appear in the logs.

Python

Suppose you would like to run the following “Hello World” python script named job_script.py.

print("Hello World")

First, initiate a job with a job name:

from ads.jobs import Job
job = Job(name="Job Name")

Next, you specify the desired infrastructure to run the job. If you are in a notebook session, ADS can automatically fetch the infrastructure configurations and use them for the job. If you aren’t in a notebook session or you want to customize the infrastructure, you can specify them using the methods from the DataScienceJob class:

from ads.jobs import DataScienceJob

job.with_infrastructure(
  DataScienceJob()
  .with_log_group_id("<log_group_ocid>")
  .with_log_id("<log_ocid>")
  # The following infrastructure configurations are optional
  # if you are in an OCI data science notebook session.
  # The configurations of the notebook session will be used as defaults
  .with_compartment_id("<compartment_ocid>")
  .with_project_id("<project_ocid>")
  .with_subnet_id("<subnet_ocid>")
  .with_shape_name("VM.Standard.E3.Flex")
  .with_shape_config_details(memory_in_gbs=16, ocpus=1) # Applicable only for the flexible shapes
  .with_block_storage_size(50)
)

In this example, it is a Python script so the ScriptRuntime() class is used to define the name of the script using the .with_source() method:

from ads.jobs import ScriptRuntime
job.with_runtime(
  ScriptRuntime().with_source("job_script.py")
)

Finally, you create and run the job, which gives you access to the job_run.id:

job.create()
job_run = job.run()

Additionally, you can acquire the job run using the OCID:

from ads.jobs import DataScienceJobRun
job_run = DataScienceJobRun.from_ocid(job_run.id)

The .watch() method is useful to monitor the progress of the job run:

job_run.watch()

After the job has been created and runs successfully, you can find the output of the script in the logs if you configured logging.

YAML

You could also initialize a job directly from a YAML string. For example, to create a job identical to the preceding example, you could simply run the following:

job = Job.from_string(f"""
kind: job
spec:
  infrastructure:
    kind: infrastructure
    type: dataScienceJob
    spec:
      logGroupId: <log_group_ocid>
      logId: <log_ocid>
      compartmentId: <compartment_ocid>
      projectId: <project_ocid>
      subnetId: <subnet_ocid>
      shapeName: VM.Standard.E3.Flex
      shapeConfigDetails:
        memoryInGBs: 16
        ocpus: 1
      blockStorageSize: 50
  name: <resource_name>
  runtime:
    kind: runtime
    type: python
    spec:
      scriptPathURI: job_script.py
""")

Command Line Arguments

If the Python script that you want to run as a job requires CLI arguments, use the .with_argument() method to pass the arguments to the job.

Python

Suppose you want to run the following python script named job_script_argument.py:

import sys
print("Hello " + str(sys.argv[1]) + " and " + str(sys.argv[2]))

This example runs a job with CLI arguments:

job = Job()
job.with_infrastructure(
  DataScienceJob()
  .with_log_id("<log_id>")
  .with_log_group_id("<log_group_id>")
)

# The CLI argument can be passed in using `with_argument` when defining the runtime
job.with_runtime(
  ScriptRuntime()
    .with_source("job_script_argument.py")
    .with_argument("<first_argument>", "<second_argument>")
  )

job.create()
job_run = job.run()

After the job run is created and run, you can use the .watch() method to monitor its progress:

job_run.watch()

This job run prints out Hello <first_argument> and <second_argument>.

YAML

You could create the preceding example job with the following YAML file:

  kind: job
  spec:
    infrastructure:
kind: infrastructure
type: dataScienceJob
spec:
  logGroupId: <log_group_ocid>
  logId: <log_ocid>
  compartmentId: <compartment_ocid>
  projectId: <project_ocid>
  subnetId: <subnet_ocid>
  shapeName: VM.Standard.E3.Flex
  shapeConfigDetails:
    memoryInGBs: 16
    ocpus: 1
  blockStorageSize: 50
    runtime:
      kind: runtime
type: python
      spec:
        args:
        - <first_argument>
        - <second_argument>
        scriptPathURI: job_script_argument.py

Environment Variables

Similarly, if the script you want to run requires environment variables, you also pass them in using the .with_environment_variable() method. The key-value pair of the environment variable are passed in using the .with_environment_variable() method, and are accessed in the Python script using the os.environ dictionary.

Python

Suppose you want to run the following python script named job_script_env.py:

import os
import sys
print("Hello " + os.environ["KEY1"] + " and " + os.environ["KEY2"])""")

This example runs a job with environment variables:

job = Job()
job.with_infrastructure(
  DataScienceJob()
  .with_log_group_id("<log_group_ocid>")
  .with_log_id("<log_ocid>")
  # The following infrastructure configurations are optional
  # if you are in an OCI data science notebook session.
  # The configurations of the notebook session will be used as defaults
  .with_compartment_id("<compartment_ocid>")
  .with_project_id("<project_ocid>")
  .with_subnet_id("<subnet_ocid>")
  .with_shape_name("VM.Standard.E3.Flex")
  .with_shape_config_details(memory_in_gbs=16, ocpus=1)
  .with_block_storage_size(50)
)

job.with_runtime(
  ScriptRuntime()
  .with_source("job_script_env.py")
  .with_environment_variable(KEY1="<first_value>", KEY2="<second_value>")
)
job.create()
job_run = job.run()

You can watch the progress of the job run using the .watch() method:

job_run.watch()

This job run prints out Hello <first_value> and <second_value>.

YAML

You could create the preceding example job with the following YAML file:

  kind: job
  spec:
    infrastructure:
      kind: infrastructure
type: dataScienceJob
      spec:
  logGroupId: <log_group_ocid>
  logId: <log_ocid>
  compartmentId: <compartment_ocid>
  projectId: <project_ocid>
  subnetId: <subnet_ocid>
  shapeName: VM.Standard.E3.Flex
  shapeConfigDetails:
    memoryInGBs: 16
    ocpus: 1
  blockStorageSize: 50
    runtime:
      kind: runtime
type: python
      spec:
        env:
        - name: KEY1
          value: <first_value>
        - name: KEY2
                value: <second_value>
        scriptPathURI: job_script_env.py

ScriptRuntime YAML Schema

kind:
  required: true
  type: string
  allowed:
    - runtime
type:
  required: true
  type: string
  allowed:
    - script
spec:
  required: true
  type: dict
  schema:
    args:
      nullable: true
      required: false
      type: list
      schema:
        type: string
    conda:
      nullable: false
      required: false
      type: dict
      schema:
        slug:
          required: true
          type: string
        type:
          allowed:
            - service
          required: true
          type: string
    env:
      nullable: true
      required: false
      type: list
      schema:
        type: dict
        schema:
        name:
          type: string
        value:
          type:
            - number
            - string
    scriptPathURI:
      required: true
      type: string
    entrypoint:
      required: false
      type: string