Infrastructure and Runtime#

This page describes the configurations of Infrastructure and Runtime defining a Data Science Job.

Example#

The following example configures the infrastructure and runtime to run a Python script.

Python
YAML

from ads.jobs import Job, DataScienceJob, PythonRuntime

job = (
    Job(name="My Job")
    .with_infrastructure(
        DataScienceJob()
        # Configure logging for getting the job run outputs.
        .with_log_group_id("<log_group_ocid>")
        # Log resource will be auto-generated if log ID is not specified.
        .with_log_id("<log_ocid>")
        # If you are in an OCI data science notebook session,
        # the following configurations are not required.
        # Configurations from the notebook session will be used as defaults.
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_subnet_id("<subnet_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        # Shape config details are applicable only for the flexible shapes.
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
        # Minimum/Default block storage size is 50 (GB).
        .with_block_storage_size(50)
    )
    .with_runtime(
        PythonRuntime()
        # Specify the service conda environment by slug name.
        .with_service_conda("pytorch110_p38_cpu_v1")
        # Source code of the job, can be local or remote.
        .with_source("path/to/script.py")
        # Environment variable
        .with_environment_variable(NAME="Welcome to OCI Data Science.")
        # Command line argument
        .with_argument(greeting="Good morning")
    )
)

kind: job
spec:
  name: "My Job"
  infrastructure:
    kind: infrastructure
    type: dataScienceJob
    spec:
      blockStorageSize: 50
      compartmentId: <compartment_ocid>
      jobInfrastructureType: STANDALONE
      logGroupId: <log_group_ocid>
      logId: <log_ocid>
      projectId: <project_ocid>
      shapeConfigDetails:
        memoryInGBs: 16
        ocpus: 1
      shapeName: VM.Standard.E3.Flex
      subnetId: <subnet_ocid>
  runtime:
    kind: runtime
    type: python
    spec:
      args:
      - --greeting
      - Good morning
      conda:
        slug: pytorch110_p38_cpu_v1
        type: service
      env:
      - name: NAME
        value: Welcome to OCI Data Science.
      scriptPathURI: path/to/script.py

Infrastructure#

The Data Science Job infrastructure is defined by a DataScienceJob instance. For example:

Python
YAML

from ads.jobs import Job, DataScienceJob, GitPythonRuntime

infrastructure = (
    DataScienceJob()
    # Configure logging for getting the job run outputs.
    .with_log_group_id("<log_group_ocid>")
    # Log resource will be auto-generated if log ID is not specified.
    .with_log_id("<log_ocid>")
    # If you are in an OCI data science notebook session,
    # the following configurations are not required.
    # Configurations from the notebook session will be used as defaults.
    .with_compartment_id("<compartment_ocid>")
    .with_project_id("<project_ocid>")
    .with_subnet_id("<subnet_ocid>")
    .with_shape_name("VM.Standard.E3.Flex")
    # Shape config details are applicable only for the flexible shapes.
    .with_shape_config_details(memory_in_gbs=16, ocpus=1)
    # Minimum/Default block storage size is 50 (GB).
    .with_block_storage_size(50)
)

kind: infrastructure
type: dataScienceJob
spec:
  blockStorageSize: 50
  compartmentId: <compartment_ocid>
  logGroupId: <log_group_ocid>
  logId: <log_ocid>
  projectId: <project_ocid>
  shapeConfigDetails:
    memoryInGBs: 16
    ocpus: 1
  shapeName: VM.Standard.E3.Flex
  subnetId: <subnet_ocid>

When creating a DataScienceJob instance, the following configurations are required:

Compartment ID
Project ID
Compute Shape

The following configurations are optional:

Block Storage Size, defaults to 50 (GB)
Log Group ID
Log ID

For more details about the mandatory and optional parameters, see DataScienceJob.

Using Configurations from Notebook#

If you are creating a job from an OCI Data Science Notebook Session, the same infrastructure configurations from the notebook session will be used as defaults, including:

Compartment ID
Project ID
Subnet ID
Compute Shape
Block Storage Size

You can initialize the DataScienceJob with the logging configurations and override the other options as needed. For example:

Python
YAML

from ads.jobs import DataScienceJob

infrastructure = (
    DataScienceJob()
    .with_log_group_id("<log_group_ocid>")
    .with_log_id("<log_ocid>")
    # Use a GPU shape for the job,
    # regardless of the shape used by the notebook session
    .with_shape_name("VM.GPU3.1")
    # compartment ID, project ID, subnet ID and block storage will be
    # the same as the ones set in the notebook session
)

kind: infrastructure
type: dataScienceJob
spec:
  logGroupId: <log_group_ocid>
  logId: <log_ocid>
  shapeName: VM.GPU3.1

Compute Shapes#

The DataScienceJob class provides two static methods to obtain the support compute shapes:

You can get a list of currently supported compute shapes by calling instance_shapes().
can get a list of shapes that are available for fast launch by calling fast_launch_shapes(). Specifying a fast launch shape will allow your job to start as fast as possible.

Networking#

Data Science Job offers two types of networking: default networking (managed egress) and custom networking. Default networking allows job runs to access public internet through a NAT gateway and OCI service through a service gateway, both are configured automatically. Custom networking requires you to specify a subnet ID. You can control the network access through the subnet and security lists.

If you specified a subnet ID, your job will be configured to have custom networking. Otherwise, default networking will be used. Note that when you are in a Data Science Notebook Session, the same networking configuration is be used by default. You can specify the networking manually by calling with_job_infrastructure_type().

Logging#

Logging is not required to create the job. However, it is highly recommended to enable logging for debugging and monitoring.

In the preceding example, both the log OCID and corresponding log group OCID are specified with the DataScienceJob instance. If your administrator configured the permission for you to search for logging resources, you can skip specifying the log group OCID because ADS can automatically retrieve it.

If you specify only the log group OCID and no log OCID, a new Log resource is automatically created within the log group to store the logs, see also ADS Logging.

With logging configured, you can call watch() method to stream the logs.

Runtime#

The runtime of a job defines the source code of your workload, environment variables, CLI arguments and other configurations for the environment to run the workload.

Depending on the source code, ADS provides different types of runtime for defining a data science job, including:

PythonRuntime for Python code stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Python Workload.
GitPythonRuntime for Python code from a Git repository. See Run Code from Git Repo.
NotebookRuntime for a single Jupyter notebook stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Notebook.
ScriptRuntime for bash or shell scripts stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Script.

ContainerRuntime for container images.

Environment Variables#

You can set environment variables for a runtime by calling with_environment_variable(). Environment variables enclosed by ${...} will be substituted. For example:

Python
YAML

from ads.jobs import PythonRuntime

runtime = (
    PythonRuntime()
    .with_environment_variable(
        HOST="10.0.0.1",
        PORT="443",
        URL="http://${HOST}:${PORT}/path/",
        ESCAPED_URL="http://$${HOST}:$${PORT}/path/",
        MISSING_VAR="This is ${UNDEFINED}",
        VAR_WITH_DOLLAR="$10",
        DOUBLE_DOLLAR="$$10"
    )
)

kind: runtime
type: python
spec:
  env:
  - name: HOST
    value: 10.0.0.1
  - name: PORT
    value: '443'
  - name: URL
    value: http://${HOST}:${PORT}/path/
  - name: ESCAPED_URL
    value: http://$${HOST}:$${PORT}/path/
  - name: MISSING_VAR
    value: This is ${UNDEFINED}
  - name: VAR_WITH_DOLLAR
    value: $10
  - name: DOUBLE_DOLLAR
    value: $$10

for k, v in runtime.environment_variables.items():
    print(f"{k}: {v}")

will show the following environment variables for the runtime:

HOST: 10.0.0.1
PORT: 443
URL: http://10.0.0.1:443/path/
ESCAPED_URL: http://${HOST}:${PORT}/path/
MISSING_VAR: This is ${UNDEFINED}
VAR_WITH_DOLLAR: $10
DOUBLE_DOLLAR: $10

Note that:

You can use $$ to escape the substitution.
Undefined variable enclosed by ${...} will be ignored.
Double dollar signs $$ will be substituted by a single one $.

Command Line Arguments#

The command line arguments for running your script or function can be configured by calling with_argument(). For example:

Python
YAML

from ads.jobs import PythonRuntime

runtime = (
    PythonRuntime()
    .with_source("oci://bucket_name@namespace/path/to/script.py")
    .with_argument(
        "arg1", "arg2",
        key1="val1",
        key2="val2"
    )
)

kind: runtime
type: python
spec:
  scriptPathURI: oci://bucket_name@namespace/path/to/script.py
  args:
  - arg1
  - arg2
  - --key1
  - val1
  - --key2
  - val2

will configured the job to call your script by:

python script.py arg1 arg2 --key1 val1 --key2 val2

You can call with_argument() multiple times to set the arguments to your desired order. You can check runtime.args to see the added arguments.

Here are a few more examples:

Python
YAML

runtime = PythonRuntime()
runtime.with_argument(key1="val1", key2="val2")
runtime.with_argument("pos1")

kind: runtime
type: python
spec:
  args:
  - --key1
  - val1
  - --key2
  - val2
  - pos1

print(runtime.args)
# ['--key1', 'val1', '--key2', 'val2', 'pos1']

Python
YAML

runtime = PythonRuntime()
runtime.with_argument("pos1")
runtime.with_argument(key1="val1", key2="val2.1 val2.2")
runtime.with_argument("pos2")

kind: runtime
type: python
spec:
  args:
  - pos1
  - --key1
  - val1
  - --key2
  - val2.1 val2.2
  - pos2

print(runtime.args)
# ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']

Python
YAML

runtime = PythonRuntime()
runtime.with_argument("pos1")
runtime.with_argument(key1=None, key2="val2")
runtime.with_argument("pos2")

kind: runtime
type: python
spec:
  args:
  - pos1
  - --key1
  - --key2
  - val2
  - pos2

print(runtime.args)
# ["pos1", "--key1", "--key2", "val2", "pos2"]

Conda Environment#

You can configure a Conda Environment for running your workload. You can use the slug name to specify a conda environment provided by the data science service. For example, to use the TensorFlow conda environment:

Python
YAML

from ads.jobs import PythonRuntime

runtime = (
    PythonRuntime()
    .with_source("oci://bucket_name@namespace/path/to/script.py")
    # Use slug name for conda environment provided by data science service
    .with_service_conda("tensorflow28_p38_cpu_v1")
)

kind: runtime
type: python
spec:
  conda:
    type: service
    slug: tensorflow28_p38_cpu_v1
  scriptPathURI: oci://bucket_name@namespace/path/to/script.py

You can also use a custom conda environment published to OCI Object Storage by passing the uri to with_custom_conda(), for example:

Python
YAML

from ads.jobs import PythonRuntime

runtime = (
    PythonRuntime()
    .with_source("oci://bucket_name@namespace/path/to/script.py")
    .with_custom_conda("oci://bucket@namespace/conda_pack/pack_name")
)

kind: runtime
type: python
spec:
  conda:
    type: published
    uri: oci://bucket@namespace/conda_pack/pack_name
  scriptPathURI: oci://bucket_name@namespace/path/to/script.py

By default, ADS will try to determine the region based on the authenticated API key or resource principal. If your custom conda environment is stored in a different region, you can specify the region when calling with_custom_conda().

For more details on custom conda environment, see Publishing a Conda Environment to an Object Storage Bucket in Your Tenancy.

Override Configurations#

When you call ads.jobs.Job.run(), a new job run will be started with the configuration defined in the job. You may want to override the configuration with custom variables. For example, you can customize job run display name, override command line argument, specify additional environment variables, and add free form tags:

job_run = job.run(
    name="<my_job_run_name>",
    args="new_arg --new_key new_val",
    env_var={"new_env": "new_val"},
    freeform_tags={"new_tag": "new_tag_val"}
)