Infrastructure and Runtime

This page describes the configurations of Infrastructure and Runtime defining a Data Science Job.

Example

The following example configures the infrastructure and runtime to run a Python script.

  • Python
  • YAML
from ads.jobs import Job, DataScienceJob, PythonRuntime

job = (
    Job(name="My Job")
    .with_infrastructure(
        DataScienceJob()
        # Configure logging for getting the job run outputs.
        .with_log_group_id("<log_group_ocid>")
        # Log resource will be auto-generated if log ID is not specified.
        .with_log_id("<log_ocid>")
        # If you are in an OCI data science notebook session,
        # the following configurations are not required.
        # Configurations from the notebook session will be used as defaults.
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_subnet_id("<subnet_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        # Shape config details are applicable only for the flexible shapes.
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
        # Minimum/Default block storage size is 50 (GB).
        .with_block_storage_size(50)
        # A maximum number of 5 file systems are allowed to be mounted for a job.
        .with_storage_mount(
          {
            "src" : "<mount_target_ip_address>@<export_path>",
            "dest" : "<destination_path>/<destination_directory_name>"
          }, # mount oci file storage to path "<destination_path>/<destination_directory_name>"
          {
            "src" : "oci://<bucket_name>@<namespace>/<prefix>",
            "dest" : "<destination_directory_name>"
          } # mount oci object storage to path "/mnt/<destination_directory_name>"
        )
    )
    .with_runtime(
        PythonRuntime()
        # Specify the service conda environment by slug name.
        .with_service_conda("pytorch110_p38_cpu_v1")
        # Source code of the job, can be local or remote.
        .with_source("path/to/script.py")
        # Environment variable
        .with_environment_variable(NAME="Welcome to OCI Data Science.")
        # Command line argument
        .with_argument(greeting="Good morning")
    )
)
kind: job
spec:
  name: "My Job"
  infrastructure:
    kind: infrastructure
    type: dataScienceJob
    spec:
      blockStorageSize: 50
      compartmentId: <compartment_ocid>
      jobInfrastructureType: STANDALONE
      logGroupId: <log_group_ocid>
      logId: <log_ocid>
      projectId: <project_ocid>
      shapeConfigDetails:
        memoryInGBs: 16
        ocpus: 1
      shapeName: VM.Standard.E3.Flex
      subnetId: <subnet_ocid>
      storageMount:
      - src: <mount_target_ip_address>@<export_path>
        dest: <destination_path>/<destination_directory_name>
      - src: oci://<bucket_name>@<namespace>/<prefix>
        dest: <destination_directory_name>
  runtime:
    kind: runtime
    type: python
    spec:
      args:
      - --greeting
      - Good morning
      conda:
        slug: pytorch110_p38_cpu_v1
        type: service
      env:
      - name: NAME
        value: Welcome to OCI Data Science.
      scriptPathURI: path/to/script.py

Infrastructure

The Data Science Job infrastructure is defined by a DataScienceJob instance. For example:

  • Python
  • YAML
from ads.jobs import Job, DataScienceJob, GitPythonRuntime

infrastructure = (
    DataScienceJob()
    # Configure logging for getting the job run outputs.
    .with_log_group_id("<log_group_ocid>")
    # Log resource will be auto-generated if log ID is not specified.
    .with_log_id("<log_ocid>")
    # If you are in an OCI data science notebook session,
    # the following configurations are not required.
    # Configurations from the notebook session will be used as defaults.
    .with_compartment_id("<compartment_ocid>")
    .with_project_id("<project_ocid>")
    .with_subnet_id("<subnet_ocid>")
    .with_shape_name("VM.Standard.E3.Flex")
    # Shape config details are applicable only for the flexible shapes.
    .with_shape_config_details(memory_in_gbs=16, ocpus=1)
    # Minimum/Default block storage size is 50 (GB).
    .with_block_storage_size(50)
    # A maximum number of 5 file systems are allowed to be mounted for a job.
    .with_storage_mount(
      {
        "src" : "<mount_target_ip_address>@<export_path>",
        "dest" : "<destination_path>/<destination_directory_name>"
      }, # mount oci file storage to path "<destination_path>/<destination_directory_name>"
      {
        "src" : "oci://<bucket_name>@<namespace>/<prefix>",
        "dest" : "<destination_directory_name>"
      } # mount oci object storage to path "/mnt/<destination_directory_name>"
    )
)
kind: infrastructure
type: dataScienceJob
spec:
  blockStorageSize: 50
  compartmentId: <compartment_ocid>
  logGroupId: <log_group_ocid>
  logId: <log_ocid>
  projectId: <project_ocid>
  shapeConfigDetails:
    memoryInGBs: 16
    ocpus: 1
  shapeName: VM.Standard.E3.Flex
  subnetId: <subnet_ocid>
  storageMount:
  - src: <mount_target_ip_address>@<export_path>
    dest: <destination_path>/<destination_directory_name>
  - src: oci://<bucket_name>@<namespace>/<prefix>
    dest: <destination_directory_name>

When creating a DataScienceJob instance, the following configurations are required:

  • Compartment ID

  • Project ID

  • Compute Shape

The following configurations are optional:

  • Block Storage Size, defaults to 50 (GB)

  • Log Group ID

  • Log ID

For more details about the mandatory and optional parameters, see DataScienceJob.

Using Configurations from Notebook

If you are creating a job from an OCI Data Science Notebook Session, the same infrastructure configurations from the notebook session will be used as defaults, including:

  • Compartment ID

  • Project ID

  • Subnet ID

  • Compute Shape

  • Block Storage Size

You can initialize the DataScienceJob with the logging configurations and override the other options as needed. For example:

  • Python
  • YAML
from ads.jobs import DataScienceJob

infrastructure = (
    DataScienceJob()
    .with_log_group_id("<log_group_ocid>")
    .with_log_id("<log_ocid>")
    # Use a GPU shape for the job,
    # regardless of the shape used by the notebook session
    .with_shape_name("VM.GPU3.1")
    # compartment ID, project ID, subnet ID and block storage will be
    # the same as the ones set in the notebook session
)
kind: infrastructure
type: dataScienceJob
spec:
  logGroupId: <log_group_ocid>
  logId: <log_ocid>
  shapeName: VM.GPU3.1

Compute Shapes

The DataScienceJob class provides two static methods to obtain the support compute shapes:

  • You can get a list of currently supported compute shapes by calling instance_shapes().

  • can get a list of shapes that are available for fast launch by calling fast_launch_shapes(). Specifying a fast launch shape will allow your job to start as fast as possible.

Networking

Data Science Job offers two types of networking: default networking (managed egress) and custom networking. Default networking allows job runs to access public internet through a NAT gateway and OCI service through a service gateway, both are configured automatically. Custom networking requires you to specify a subnet ID. You can control the network access through the subnet and security lists.

If you specified a subnet ID, your job will be configured to have custom networking. Otherwise, default networking will be used.

Note that when you are in a Data Science Notebook Session, the same networking configuration is be used by default. You can specify the networking manually by calling with_job_infrastructure_type(). For example, if you are using custom networking in the notebook session but you would like to use default networking for the job:

  • Python
  • YAML
from ads.jobs import DataScienceJob

infrastructure = (
    DataScienceJob()
    .with_log_group_id("<log_group_ocid>")
    .with_log_id("<log_ocid>")
    # Use default networking,
    # regardless of the networking used by the notebook session
    .with_job_infrastructure_type("ME_STANDALONE")
    # compartment ID, project ID, compute shape and block storage will be
    # the same as the ones set in the notebook session
)
kind: infrastructure
type: dataScienceJob
spec:
  jobInfrastructureType: ME_STANDALONE
  logGroupId: <log_group_ocid>
  logId: <log_ocid>

Logging

Logging is not required to create the job. However, it is highly recommended to enable logging for debugging and monitoring.

In the preceding example, both the log OCID and corresponding log group OCID are specified with the DataScienceJob instance. If your administrator configured the permission for you to search for logging resources, you can skip specifying the log group OCID because ADS can automatically retrieve it.

If you specify only the log group OCID and no log OCID, a new Log resource is automatically created within the log group to store the logs, see also ADS Logging.

With logging configured, you can call watch() method to stream the logs.

Mounting File Systems

Data Science Job supports mounting multiple types of file systems, see Data Science Job Mounting File Systems. A maximum number of 5 file systems are allowed to be mounted for each Data Science Job. You can specify a list of file systems to be mounted by calling with_storage_mount(). For each file system to be mounted, you need to pass a dictionary with src and dest as keys. For example, you can pass <mount_target_ip_address>@<export_path> as the value for src to mount OCI File Storage and you can also pass oci://<bucket_name>@<namespace>/<prefix> to mount OCI Object Storage. The value of dest indicates the path and directory to which you want to mount the file system and must be in the format as <destination_path>/<destination_directory_name>. The <destination_directory_name> is required while the <destination_path> is optional. The <destination_path> must start with character / if provided. If not, the file systems will be mounted to /mnt/<destination_directory_name> by default.

  • Python
  • YAML
from ads.jobs import DataScienceJob

infrastructure = (
    DataScienceJob()
    .with_log_group_id("<log_group_ocid>")
    .with_log_id("<log_ocid>")
    .with_storage_mount(
      {
        "src" : "<mount_target_ip_address>@<export_path>",
        "dest" : "<destination_path>/<destination_directory_name>"
      }, # mount oci file storage to path "<destination_path>/<destination_directory_name>"
      {
        "src" : "oci://<bucket_name>@<namespace>/<prefix>",
        "dest" : "<destination_directory_name>"
      } # mount oci object storage to path "/mnt/<destination_directory_name>"
    )
)
kind: infrastructure
type: dataScienceJob
spec:
  logGroupId: <log_group_ocid>
  logId: <log_ocid>
  storageMount:
  - src: <mount_target_ip_address>@<export_path>
    dest: <destination_path>/<destination_directory_name>
  - src: oci://<bucket_name>@<namespace>/<prefix>
    dest: <destination_directory_name>

Runtime

The runtime of a job defines the source code of your workload, environment variables, CLI arguments and other configurations for the environment to run the workload.

Depending on the source code, ADS provides different types of runtime for defining a data science job, including:

Environment Variables

You can set environment variables for a runtime by calling with_environment_variable(). Environment variables enclosed by ${...} will be substituted. For example:

  • Python
  • YAML
from ads.jobs import PythonRuntime

runtime = (
    PythonRuntime()
    .with_environment_variable(
        HOST="10.0.0.1",
        PORT="443",
        URL="http://${HOST}:${PORT}/path/",
        ESCAPED_URL="http://$${HOST}:$${PORT}/path/",
        MISSING_VAR="This is ${UNDEFINED}",
        VAR_WITH_DOLLAR="$10",
        DOUBLE_DOLLAR="$$10"
    )
)
kind: runtime
type: python
spec:
  env:
  - name: HOST
    value: 10.0.0.1
  - name: PORT
    value: '443'
  - name: URL
    value: http://${HOST}:${PORT}/path/
  - name: ESCAPED_URL
    value: http://$${HOST}:$${PORT}/path/
  - name: MISSING_VAR
    value: This is ${UNDEFINED}
  - name: VAR_WITH_DOLLAR
    value: $10
  - name: DOUBLE_DOLLAR
    value: $$10
for k, v in runtime.environment_variables.items():
    print(f"{k}: {v}")

will show the following environment variables for the runtime:

HOST: 10.0.0.1
PORT: 443
URL: http://10.0.0.1:443/path/
ESCAPED_URL: http://${HOST}:${PORT}/path/
MISSING_VAR: This is ${UNDEFINED}
VAR_WITH_DOLLAR: $10
DOUBLE_DOLLAR: $10

Note that:

  • You can use $$ to escape the substitution.

  • Undefined variable enclosed by ${...} will be ignored.

  • Double dollar signs $$ will be substituted by a single one $.

See also: Service Provided Environment Variables

Command Line Arguments

The command line arguments for running your script or function can be configured by calling with_argument(). For example:

  • Python
  • YAML
from ads.jobs import PythonRuntime

runtime = (
    PythonRuntime()
    .with_source("oci://bucket_name@namespace/path/to/script.py")
    .with_argument(
        "arg1", "arg2",
        key1="val1",
        key2="val2"
    )
)
kind: runtime
type: python
spec:
  scriptPathURI: oci://bucket_name@namespace/path/to/script.py
  args:
  - arg1
  - arg2
  - --key1
  - val1
  - --key2
  - val2

will configured the job to call your script by:

python script.py arg1 arg2 --key1 val1 --key2 val2

You can call with_argument() multiple times to set the arguments to your desired order. You can check runtime.args to see the added arguments.

Here are a few more examples:

  • Python
  • YAML
runtime = PythonRuntime()
runtime.with_argument(key1="val1", key2="val2")
runtime.with_argument("pos1")
kind: runtime
type: python
spec:
  args:
  - --key1
  - val1
  - --key2
  - val2
  - pos1
print(runtime.args)
# ['--key1', 'val1', '--key2', 'val2', 'pos1']
  • Python
  • YAML
runtime = PythonRuntime()
runtime.with_argument("pos1")
runtime.with_argument(key1="val1", key2="val2.1 val2.2")
runtime.with_argument("pos2")
kind: runtime
type: python
spec:
  args:
  - pos1
  - --key1
  - val1
  - --key2
  - val2.1 val2.2
  - pos2
print(runtime.args)
# ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']
  • Python
  • YAML
runtime = PythonRuntime()
runtime.with_argument("pos1")
runtime.with_argument(key1=None, key2="val2")
runtime.with_argument("pos2")
kind: runtime
type: python
spec:
  args:
  - pos1
  - --key1
  - --key2
  - val2
  - pos2
print(runtime.args)
# ["pos1", "--key1", "--key2", "val2", "pos2"]

Conda Environment

You can configure a Conda Environment for running your workload. You can use the slug name to specify a conda environment provided by the data science service. For example, to use the TensorFlow conda environment:

  • Python
  • YAML
from ads.jobs import PythonRuntime

runtime = (
    PythonRuntime()
    .with_source("oci://bucket_name@namespace/path/to/script.py")
    # Use slug name for conda environment provided by data science service
    .with_service_conda("tensorflow28_p38_cpu_v1")
)
kind: runtime
type: python
spec:
  conda:
    type: service
    slug: tensorflow28_p38_cpu_v1
  scriptPathURI: oci://bucket_name@namespace/path/to/script.py

You can also use a custom conda environment published to OCI Object Storage by passing the uri to with_custom_conda(), for example:

  • Python
  • YAML
from ads.jobs import PythonRuntime

runtime = (
    PythonRuntime()
    .with_source("oci://bucket_name@namespace/path/to/script.py")
    .with_custom_conda("oci://bucket@namespace/conda_pack/pack_name")
)
kind: runtime
type: python
spec:
  conda:
    type: published
    uri: oci://bucket@namespace/conda_pack/pack_name
  scriptPathURI: oci://bucket_name@namespace/path/to/script.py

By default, ADS will try to determine the region based on the authenticated API key or resource principal. If your custom conda environment is stored in a different region, you can specify the region when calling with_custom_conda().

For more details on custom conda environment, see Publishing a Conda Environment to an Object Storage Bucket in Your Tenancy.

Override Configurations

When you call ads.jobs.Job.run(), a new job run will be started with the configuration defined in the job. You may want to override the configuration with custom variables. For example, you can customize job run display name, override command line argument, specify additional environment variables, and add free form tags:

job_run = job.run(
    name="<my_job_run_name>",
    args="new_arg --new_key new_val",
    env_var={"new_env": "new_val"},
    freeform_tags={"new_tag": "new_tag_val"}
)