Run a Script

This example shows you how to create a job running “Hello World” Python scripts. Although Python scripts are used here, you could also run Bash or Shell scripts. The Logging service log and log group are defined in the infrastructure. The output of the script appear in the logs.

Python

Suppose you would like to run the following “Hello World” python script named job_script.py.

print("Hello World")

First, initiate a job with a job name:

from ads.jobs import Job
job = Job(name="Job Name")

Next, you specify the desired infrastructure to run the job. If you are in a notebook session, ADS can automatically fetch the infrastructure configurations and use them for the job. If you aren’t in a notebook session or you want to customize the infrastructure, you can specify them using the methods from the DataScienceJob class:

from ads.jobs import DataScienceJob

job.with_infrastructure(
  DataScienceJob()
  .with_log_id("<log_id>")
  .with_log_group_id("<log_group_id>")
)

In this example, it is a Python script so the ScriptRuntime() class is used to define the name of the script using the .with_source() method:

from ads.jobs import ScriptRuntime
job.with_runtime(
  ScriptRuntime().with_source("job_script.py")
)

Finally, you create and run the job, which gives you access to the job_run.id:

job.create()
job_run = job.run()

Additionally, you can acquire the job run using the OCID:

from ads.jobs import DataScienceJobRun
job_run = DataScienceJobRun.from_ocid(job_run.id)

The .watch() method is useful to monitor the progress of the job run:

job_run.watch()

After the job has been created and runs successfully, you can find the output of the script in the logs if you configured logging.

YAML

You could also initialize a job directly from a YAML string. For example, to create a job identical to the preceding example, you could simply run the following:

job = Job.from_string(f"""
kind: job
spec:
  infrastructure:
    kind: infrastructure
    spec:
      jobInfrastructureType: STANDALONE
      jobType: DEFAULT
      logGroupId: <log_group_id>
      logId: <log_id>
    type: dataScienceJob
  name: <resource_name>
  runtime:
    kind: runtime
    spec:
      scriptPathURI: job_script.py
    type: python
""")

Command Line Arguments

If the Python script that you want to run as a job requires CLI arguments, use the .with_argument() method to pass the arguments to the job.

Python

Suppose you want to run the following python script named job_script_argument.py:

import sys
print("Hello " + str(sys.argv[1]) + " and " + str(sys.argv[2]))

This example runs a job with CLI arguments:

job = Job()
job.with_infrastructure(
  DataScienceJob()
  .with_log_id("<log_id>")
  .with_log_group_id("<log_group_id>")
)

# The CLI argument can be passed in using `with_argument` when defining the runtime
job.with_runtime(
  ScriptRuntime()
    .with_source("job_script_argument.py")
    .with_argument("<first_argument>", "<second_argument>")
  )

job.create()
job_run = job.run()

After the job run is created and run, you can use the .watch() method to monitor its progress:

job_run.watch()

This job run prints out Hello <first_argument> and <second_argument>.

YAML

You can define a job with a YAML string. In order to define a job identical to the preceding job, you could use the following before running job.create() and job.run():

job = Job.from_yaml(f"""
kind: job
spec:
  infrastructure:
    kind: infrastructure
    spec:
      jobInfrastructureType: STANDALONE
      jobType: DEFAULT
      logGroupId: <log_group_id>
      logId: <log_id>
    type: dataScienceJob
  runtime:
    kind: runtime
    spec:
      args:
      - <first_argument>
      - <second_argument>
      scriptPathURI: job_script_argument.py
    type: python
""")

Environment Variables

Similarly, if the script you want to run requires environment variables, you also pass them in using the .with_environment_variable() method. The key-value pair of the environment variable are passed in using the .with_environment_variable() method, and are accessed in the Python script using the os.environ dictionary.

Python

Suppose you want to run the following python script named job_script_env.py:

import os
import sys
print("Hello " + os.environ["KEY1"] + " and " + os.environ["KEY2"])""")

This example runs a job with environment variables:

job = Job()
job.with_infrastructure(
  DataScienceJob()
  .with_log_group_id(<"log_group_id">)
  .with_log_id(<"log_id">)
)

job.with_runtime(
  ScriptRuntime()
  .with_source("job_script_env.py")
  .with_environment_variable(KEY1="<first_value>", KEY2="<second_value>")
)
job.create()
job_run = job.run()

You can watch the progress of the job run using the .watch() method:

job_run.watch()

This job run print sout Hello <first_value> and <second_value>.

YAML

The next example shows the equivalent way to create a job from a YAML string:

job = Job.from_yaml(f"""
kind: job
spec:
  infrastructure:
    kind: infrastructure
    spec:
      jobInfrastructureType: STANDALONE
      jobType: DEFAULT
      logGroupId: <log_group_id>
      logId: <log_id>
    type: dataScienceJob
  name: null
  runtime:
    kind: runtime
    spec:
      env:
      - name: KEY1
        value: <first_value>
      - name: KEY2
              value: <second_value>
      scriptPathURI: job_script_env.py
    type: python
""")

ScriptRuntime YAML Schema

kind:
  allowed:
    - runtime
  required: true
  type: string
spec:
  required: true
  schema:
    args:
      nullable: true
      required: false
      schema:
        type: string
      type: list
    conda:
      nullable: false
      required: false
      schema:
        slug:
          required: true
          type: string
        type:
          allowed:
            - service
          required: true
          type: string
      type: dict
    env:
      required: false
      schema:
        type: dict
      type: list
    freeform_tag:
      required: false
      type: dict
    scriptPathURI:
      required: true
      type: string
    entrypoint:
      required: false
      type: string
  type: dict
type:
  allowed:
    - script
  required: true
  type: string