Quick Start¶

Prerequisite

Before creating a job, ensure that you have policies configured for Data Science resources.

See IAM Policies and About Data Science Policies.

Define a Job¶

In ADS, a job is defined by Infrastructure and Runtime. The Data Science Job infrastructure is configured through a DataScienceJob instance. The runtime can be an instance of:

PythonRuntime for Python code stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Python Workload.
GitPythonRuntime for Python code from a Git repository. See Run Code from Git Repo.
NotebookRuntime for a single Jupyter notebook stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Notebook.
ScriptRuntime for bash or shell scripts stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Script.

ContainerRuntime for container images.

Here is an example to define and run a Python Job.

Note that a job can be defined either using Python APIs or YAML. See the next section for how to load and save the job with YAML.

Python
YAML

from ads.jobs import Job, DataScienceJob, PythonRuntime

job = (
    Job(name="My Job")
    .with_infrastructure(
        DataScienceJob()
        # Configure logging for getting the job run outputs.
        .with_log_group_id("<log_group_ocid>")
        # Log resource will be auto-generated if log ID is not specified.
        .with_log_id("<log_ocid>")
        # If you are in an OCI data science notebook session,
        # the following configurations are not required.
        # Configurations from the notebook session will be used as defaults.
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_subnet_id("<subnet_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        # Shape config details are applicable only for the flexible shapes.
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
        # Minimum/Default block storage size is 50 (GB).
        .with_block_storage_size(50)
        # A maximum number of 5 file systems are allowed to be mounted for a job.
        .with_storage_mount(
          {
            "src" : "<mount_target_ip_address>@<export_path>",
            "dest" : "<destination_path>/<destination_directory_name>"
          }, # mount oci file storage to path "<destination_path>/<destination_directory_name>"
          {
            "src" : "oci://<bucket_name>@<namespace>/<prefix>",
            "dest" : "<destination_directory_name>"
          } # mount oci object storage to path "/mnt/<destination_directory_name>"
        )
    )
    .with_runtime(
        PythonRuntime()
        # Specify the service conda environment by slug name.
        .with_service_conda("pytorch110_p38_cpu_v1")
        # Source code of the job, can be local or remote.
        .with_source("path/to/script.py")
        # Environment variable
        .with_environment_variable(NAME="Welcome to OCI Data Science.")
        # Command line argument
        .with_argument(greeting="Good morning")
    )
)

kind: job
spec:
  name: "My Job"
  infrastructure:
    kind: infrastructure
    type: dataScienceJob
    spec:
      blockStorageSize: 50
      compartmentId: <compartment_ocid>
      jobInfrastructureType: STANDALONE
      logGroupId: <log_group_ocid>
      logId: <log_ocid>
      projectId: <project_ocid>
      shapeConfigDetails:
        memoryInGBs: 16
        ocpus: 1
      shapeName: VM.Standard.E3.Flex
      subnetId: <subnet_ocid>
      storageMount:
      - src: <mount_target_ip_address>@<export_path>
        dest: <destination_path>/<destination_directory_name>
      - src: oci://<bucket_name>@<namespace>/<prefix>
        dest: <destination_directory_name>
  runtime:
    kind: runtime
    type: python
    spec:
      args:
      - --greeting
      - Good morning
      conda:
        slug: pytorch110_p38_cpu_v1
        type: service
      env:
      - name: NAME
        value: Welcome to OCI Data Science.
      scriptPathURI: path/to/script.py

The PythonRuntime is designed for Running a Python Workload. The source code is specified by with_source() (path/to/script.py). It can be a script, a Jupyter notebook, a folder or a zip file. The source code location can be a local or remote, including HTTP URL and OCI Object Storage. An example Python script is available on Data Science AI Sample GitHub Repository.

For more details, see Infrastructure and Runtime configurations. You can also Run a Notebook, Run a Script and Run Code from Git Repo.

YAML¶

A job can be defined using YAML, as shown in the “YAML” tab in the example above. Here are some examples to load/save the YAML job configurations:

# Load a job from a YAML file
job = Job.from_yaml(uri="oci://bucket_name@namespace/path/to/job.yaml")

# Save a job to a YAML file
job.to_yaml(uri="oci://bucket_name@namespace/path/to/job.yaml")

# Save a job to YAML in a string
yaml_string = job.to_yaml()

# Load a job from a YAML string
job = Job.from_yaml("""
kind: job
spec:
  infrastructure:
  kind: infrastructure
    ...
""")

The uri can be a local file path or a remote location supported by fsspec, including OCI object storage.

With the YAML file, you can create and run the job with ADS CLI:

ads opctl run -f your_job.yaml

For more details on ads opctl, see Working with the CLI.

The job infrastructure, runtime and job run also support YAML serialization/deserialization.

Run a Job and Monitor outputs¶

Once the job is defined or loaded from YAML, you can call the create() method to create the job on OCI. To start a job run, you can call the run() method, which returns a DataScienceJobRun instance. Once the job or job run is created, the job OCID can be accessed through job.id or run.id.

Note

Once a job is created, if you change the configuration, you will need to re-create a job for the new configuration.

# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()

The watch() method is useful to monitor the progress of the job run. It will stream the logs to terminal and return once the job is finished. Logging configurations are required for this method to show logs. Here is an example of the logs:

Job OCID: <job_ocid>
Job Run OCID: <job_run_ocid>
2023-02-27 15:58:01 - Job Run ACCEPTED
2023-02-27 15:58:11 - Job Run ACCEPTED, Infrastructure provisioning.
2023-02-27 15:59:06 - Job Run ACCEPTED, Infrastructure provisioned.
2023-02-27 15:59:29 - Job Run ACCEPTED, Job run bootstrap starting.
2023-02-27 16:01:08 - Job Run ACCEPTED, Job run bootstrap complete. Artifact execution starting.
2023-02-27 16:01:18 - Job Run IN_PROGRESS, Job run artifact execution in progress.
2023-02-27 16:01:11 - Good morning, your environment variable has value of (Welcome to OCI Data Science.)
2023-02-27 16:01:11 - Job Run 02-27-2023-16:01:11
2023-02-27 16:01:11 - Job Done.
2023-02-27 16:01:22 - Job Run SUCCEEDED, Job run artifact execution succeeded. Infrastructure de-provisioning.

Load Existing Job or Job Run¶

You can load an existing job or job run using the OCID from OCI:

from ads.jobs import Job, DataScienceJobRun

# Load a job
job = Job.from_datascience_job("<job_ocid>")

# Load a job run
job_run = DataScienceJobRun.from_ocid("<job_run_ocid>"")

List Existing Jobs or Job Runs¶

To get a list of existing jobs in a specific compartment:

from ads.jobs import Job

# Get a list of jobs in a specific compartment.
jobs = Job.datascience_job("<compartment_ocid>")

With a Job object, you can get a list of job runs:

# Gets a list of job runs for a specific job.
runs = job.run_list()

Delete a Job or Job Run¶

You can delete a job or job run by calling the delete() method.

# Delete a job and the corresponding job runs.
job.delete()
# Delete a job run
run.delete()

You can also cancel a job run:

run.cancel()

Variable Substitution¶

When defining a job or starting a job run, you can use environment variable substitution for the names and output_uri argument of the with_output() method.

For example, the following job specifies the name based on the environment variable DATASET_NAME, and output_uri based on the environment variables JOB_RUN_OCID:

Python
YAML

from ads.jobs import Job, DataScienceJob, PythonRuntime

job = (
    Job(name="Training on ${DATASET_NAME}")
    .with_infrastructure(
        DataScienceJob()
        .with_log_group_id("<log_group_ocid>")
        .with_log_id("<log_ocid>")
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
    )
    .with_runtime(
        PythonRuntime()
        .with_service_conda("pytorch110_p38_gpu_v1")
        .with_environment_variable(DATASET_NAME="MyData")
        .with_source("local/path/to/training_script.py")
        .with_output("output", "oci://bucket_name@namespace/prefix/${JOB_RUN_OCID}")
    )
)

kind: job
spec:
  name: Training on ${DATASET_NAME}
  infrastructure:
    kind: infrastructure
    type: dataScienceJob
    spec:
      compartmentId: <compartment_ocid>
      logGroupId: <log_group_ocid>
      logId: <log_ocid>
      projectId: <project_ocid>
      shapeConfigDetails:
        memoryInGBs: 16
        ocpus: 1
      shapeName: VM.Standard.E3.Flex
  runtime:
    kind: runtime
    type: python
    spec:
      conda:
        slug: pytorch110_p38_cpu_v1
        type: service
      env:
      - name: DATASET_NAME
        value: MyData
      outputDir: output
      outputUri: oci://bucket_name@namespace/prefix/${JOB_RUN_OCID}
      scriptPathURI: local/path/to/training_script.py

Note that JOB_RUN_OCID is an environment variable provided by the service after the job run is created. It is available for the output_uri but cannot be used in the job name.