Infrastructure and Runtime¶
This page describes the configurations of Infrastructure and Runtime defining a Data Science Job.
Example¶
The following example configures the infrastructure and runtime to run a Python script.
from ads.jobs import Job, DataScienceJob, PythonRuntime
job = (
Job(name="My Job")
.with_infrastructure(
DataScienceJob()
# Configure logging for getting the job run outputs.
.with_log_group_id("<log_group_ocid>")
# Log resource will be auto-generated if log ID is not specified.
.with_log_id("<log_ocid>")
# If you are in an OCI data science notebook session,
# the following configurations are not required.
# Configurations from the notebook session will be used as defaults.
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_subnet_id("<subnet_ocid>")
.with_shape_name("VM.Standard.E3.Flex")
# Shape config details are applicable only for the flexible shapes.
.with_shape_config_details(memory_in_gbs=16, ocpus=1)
# Minimum/Default block storage size is 50 (GB).
.with_block_storage_size(50)
# A maximum number of 5 file systems are allowed to be mounted for a job.
.with_storage_mount(
{
"src" : "<mount_target_ip_address>@<export_path>",
"dest" : "<destination_path>/<destination_directory_name>"
}, # mount oci file storage to path "<destination_path>/<destination_directory_name>"
{
"src" : "oci://<bucket_name>@<namespace>/<prefix>",
"dest" : "<destination_directory_name>"
} # mount oci object storage to path "/mnt/<destination_directory_name>"
)
)
.with_runtime(
PythonRuntime()
# Specify the service conda environment by slug name.
.with_service_conda("pytorch110_p38_cpu_v1")
# Source code of the job, can be local or remote.
.with_source("path/to/script.py")
# Environment variable
.with_environment_variable(NAME="Welcome to OCI Data Science.")
# Command line argument
.with_argument(greeting="Good morning")
)
)
kind: job
spec:
name: "My Job"
infrastructure:
kind: infrastructure
type: dataScienceJob
spec:
blockStorageSize: 50
compartmentId: <compartment_ocid>
jobInfrastructureType: STANDALONE
logGroupId: <log_group_ocid>
logId: <log_ocid>
projectId: <project_ocid>
shapeConfigDetails:
memoryInGBs: 16
ocpus: 1
shapeName: VM.Standard.E3.Flex
subnetId: <subnet_ocid>
storageMount:
- src: <mount_target_ip_address>@<export_path>
dest: <destination_path>/<destination_directory_name>
- src: oci://<bucket_name>@<namespace>/<prefix>
dest: <destination_directory_name>
runtime:
kind: runtime
type: python
spec:
args:
- --greeting
- Good morning
conda:
slug: pytorch110_p38_cpu_v1
type: service
env:
- name: NAME
value: Welcome to OCI Data Science.
scriptPathURI: path/to/script.py
Infrastructure¶
The Data Science Job infrastructure is defined by a DataScienceJob
instance.
For example:
from ads.jobs import Job, DataScienceJob, GitPythonRuntime
infrastructure = (
DataScienceJob()
# Configure logging for getting the job run outputs.
.with_log_group_id("<log_group_ocid>")
# Log resource will be auto-generated if log ID is not specified.
.with_log_id("<log_ocid>")
# If you are in an OCI data science notebook session,
# the following configurations are not required.
# Configurations from the notebook session will be used as defaults.
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_subnet_id("<subnet_ocid>")
.with_shape_name("VM.Standard.E3.Flex")
# Shape config details are applicable only for the flexible shapes.
.with_shape_config_details(memory_in_gbs=16, ocpus=1)
# Minimum/Default block storage size is 50 (GB).
.with_block_storage_size(50)
# A maximum number of 5 file systems are allowed to be mounted for a job.
.with_storage_mount(
{
"src" : "<mount_target_ip_address>@<export_path>",
"dest" : "<destination_path>/<destination_directory_name>"
}, # mount oci file storage to path "<destination_path>/<destination_directory_name>"
{
"src" : "oci://<bucket_name>@<namespace>/<prefix>",
"dest" : "<destination_directory_name>"
} # mount oci object storage to path "/mnt/<destination_directory_name>"
)
)
kind: infrastructure
type: dataScienceJob
spec:
blockStorageSize: 50
compartmentId: <compartment_ocid>
logGroupId: <log_group_ocid>
logId: <log_ocid>
projectId: <project_ocid>
shapeConfigDetails:
memoryInGBs: 16
ocpus: 1
shapeName: VM.Standard.E3.Flex
subnetId: <subnet_ocid>
storageMount:
- src: <mount_target_ip_address>@<export_path>
dest: <destination_path>/<destination_directory_name>
- src: oci://<bucket_name>@<namespace>/<prefix>
dest: <destination_directory_name>
When creating a DataScienceJob
instance, the following configurations are required:
Compartment ID
Project ID
Compute Shape
The following configurations are optional:
Block Storage Size, defaults to 50 (GB)
Log Group ID
Log ID
For more details about the mandatory and optional parameters, see DataScienceJob
.
Using Configurations from Notebook¶
If you are creating a job from an OCI Data Science Notebook Session, the same infrastructure configurations from the notebook session will be used as defaults, including:
Compartment ID
Project ID
Subnet ID
Compute Shape
Block Storage Size
You can initialize the DataScienceJob
with the logging configurations and override the other options as needed. For example:
from ads.jobs import DataScienceJob
infrastructure = (
DataScienceJob()
.with_log_group_id("<log_group_ocid>")
.with_log_id("<log_ocid>")
# Use a GPU shape for the job,
# regardless of the shape used by the notebook session
.with_shape_name("VM.GPU3.1")
# compartment ID, project ID, subnet ID and block storage will be
# the same as the ones set in the notebook session
)
kind: infrastructure
type: dataScienceJob
spec:
logGroupId: <log_group_ocid>
logId: <log_ocid>
shapeName: VM.GPU3.1
Compute Shapes¶
The DataScienceJob
class provides two static methods to obtain the support compute shapes:
You can get a list of currently supported compute shapes by calling
instance_shapes()
.can get a list of shapes that are available for fast launch by calling
fast_launch_shapes()
. Specifying a fast launch shape will allow your job to start as fast as possible.
Networking¶
Data Science Job offers two types of networking: default networking (managed egress) and custom networking. Default networking allows job runs to access public internet through a NAT gateway and OCI service through a service gateway, both are configured automatically. Custom networking requires you to specify a subnet ID. You can control the network access through the subnet and security lists.
If you specified a subnet ID, your job will be configured to have custom networking. Otherwise, default networking will be used.
Note that when you are in a Data Science Notebook Session,
the same networking configuration is be used by default.
You can specify the networking manually by calling with_job_infrastructure_type()
.
For example, if you are using custom networking in the notebook session
but you would like to use default networking for the job:
from ads.jobs import DataScienceJob
infrastructure = (
DataScienceJob()
.with_log_group_id("<log_group_ocid>")
.with_log_id("<log_ocid>")
# Use default networking,
# regardless of the networking used by the notebook session
.with_job_infrastructure_type("ME_STANDALONE")
# compartment ID, project ID, compute shape and block storage will be
# the same as the ones set in the notebook session
)
kind: infrastructure
type: dataScienceJob
spec:
jobInfrastructureType: ME_STANDALONE
logGroupId: <log_group_ocid>
logId: <log_ocid>
Logging¶
Logging is not required to create the job. However, it is highly recommended to enable logging for debugging and monitoring.
In the preceding example, both the log OCID and corresponding log group OCID are specified
with the DataScienceJob
instance.
If your administrator configured the permission for you to search for logging resources,
you can skip specifying the log group OCID because ADS can automatically retrieve it.
If you specify only the log group OCID and no log OCID, a new Log resource is automatically created within the log group to store the logs, see also ADS Logging.
With logging configured, you can call watch()
method to stream the logs.
Mounting File Systems¶
Data Science Job supports mounting multiple types of file systems,
see Data Science Job Mounting File Systems. A maximum number of 5 file systems are
allowed to be mounted for each Data Science Job. You can specify a list of file systems to be mounted
by calling with_storage_mount()
. For each file system to be mounted,
you need to pass a dictionary with src
and dest
as keys. For example, you can pass
<mount_target_ip_address>@<export_path>
as the value for src
to mount OCI File Storage and you can also
pass oci://<bucket_name>@<namespace>/<prefix>
to mount OCI Object Storage. The value of
dest
indicates the path and directory to which you want to mount the file system and must be in the
format as <destination_path>/<destination_directory_name>
. The <destination_directory_name>
is required
while the <destination_path>
is optional. The <destination_path>
must start with character /
if provided.
If not, the file systems will be mounted to /mnt/<destination_directory_name>
by default.
from ads.jobs import DataScienceJob
infrastructure = (
DataScienceJob()
.with_log_group_id("<log_group_ocid>")
.with_log_id("<log_ocid>")
.with_storage_mount(
{
"src" : "<mount_target_ip_address>@<export_path>",
"dest" : "<destination_path>/<destination_directory_name>"
}, # mount oci file storage to path "<destination_path>/<destination_directory_name>"
{
"src" : "oci://<bucket_name>@<namespace>/<prefix>",
"dest" : "<destination_directory_name>"
} # mount oci object storage to path "/mnt/<destination_directory_name>"
)
)
kind: infrastructure
type: dataScienceJob
spec:
logGroupId: <log_group_ocid>
logId: <log_ocid>
storageMount:
- src: <mount_target_ip_address>@<export_path>
dest: <destination_path>/<destination_directory_name>
- src: oci://<bucket_name>@<namespace>/<prefix>
dest: <destination_directory_name>
Runtime¶
The runtime of a job defines the source code of your workload, environment variables, CLI arguments and other configurations for the environment to run the workload.
Depending on the source code, ADS provides different types of runtime for defining a data science job, including:
PythonRuntime
for Python code stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Python Workload.GitPythonRuntime
for Python code from a Git repository. See Run Code from Git Repo.NotebookRuntime
for a single Jupyter notebook stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Notebook.ScriptRuntime
for bash or shell scripts stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Script.
ContainerRuntime
for container images.
Environment Variables¶
You can set environment variables for a runtime by calling
with_environment_variable()
.
Environment variables enclosed by ${...}
will be substituted. For example:
from ads.jobs import PythonRuntime
runtime = (
PythonRuntime()
.with_environment_variable(
HOST="10.0.0.1",
PORT="443",
URL="http://${HOST}:${PORT}/path/",
ESCAPED_URL="http://$${HOST}:$${PORT}/path/",
MISSING_VAR="This is ${UNDEFINED}",
VAR_WITH_DOLLAR="$10",
DOUBLE_DOLLAR="$$10"
)
)
kind: runtime
type: python
spec:
env:
- name: HOST
value: 10.0.0.1
- name: PORT
value: '443'
- name: URL
value: http://${HOST}:${PORT}/path/
- name: ESCAPED_URL
value: http://$${HOST}:$${PORT}/path/
- name: MISSING_VAR
value: This is ${UNDEFINED}
- name: VAR_WITH_DOLLAR
value: $10
- name: DOUBLE_DOLLAR
value: $$10
for k, v in runtime.environment_variables.items():
print(f"{k}: {v}")
will show the following environment variables for the runtime:
HOST: 10.0.0.1
PORT: 443
URL: http://10.0.0.1:443/path/
ESCAPED_URL: http://${HOST}:${PORT}/path/
MISSING_VAR: This is ${UNDEFINED}
VAR_WITH_DOLLAR: $10
DOUBLE_DOLLAR: $10
Note that:
You can use
$$
to escape the substitution.Undefined variable enclosed by
${...}
will be ignored.Double dollar signs
$$
will be substituted by a single one$
.
See also: Service Provided Environment Variables
Command Line Arguments¶
The command line arguments for running your script or function can be configured by calling
with_argument()
. For example:
from ads.jobs import PythonRuntime
runtime = (
PythonRuntime()
.with_source("oci://bucket_name@namespace/path/to/script.py")
.with_argument(
"arg1", "arg2",
key1="val1",
key2="val2"
)
)
kind: runtime
type: python
spec:
scriptPathURI: oci://bucket_name@namespace/path/to/script.py
args:
- arg1
- arg2
- --key1
- val1
- --key2
- val2
will configured the job to call your script by:
python script.py arg1 arg2 --key1 val1 --key2 val2
You can call with_argument()
multiple times to set the arguments
to your desired order. You can check runtime.args
to see the added arguments.
Here are a few more examples:
runtime = PythonRuntime()
runtime.with_argument(key1="val1", key2="val2")
runtime.with_argument("pos1")
kind: runtime
type: python
spec:
args:
- --key1
- val1
- --key2
- val2
- pos1
print(runtime.args)
# ['--key1', 'val1', '--key2', 'val2', 'pos1']
runtime = PythonRuntime()
runtime.with_argument("pos1")
runtime.with_argument(key1="val1", key2="val2.1 val2.2")
runtime.with_argument("pos2")
kind: runtime
type: python
spec:
args:
- pos1
- --key1
- val1
- --key2
- val2.1 val2.2
- pos2
print(runtime.args)
# ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']
runtime = PythonRuntime()
runtime.with_argument("pos1")
runtime.with_argument(key1=None, key2="val2")
runtime.with_argument("pos2")
kind: runtime
type: python
spec:
args:
- pos1
- --key1
- --key2
- val2
- pos2
print(runtime.args)
# ["pos1", "--key1", "--key2", "val2", "pos2"]
Conda Environment¶
You can configure a Conda Environment for running your workload. You can use the slug name to specify a conda environment provided by the data science service. For example, to use the TensorFlow conda environment:
from ads.jobs import PythonRuntime
runtime = (
PythonRuntime()
.with_source("oci://bucket_name@namespace/path/to/script.py")
# Use slug name for conda environment provided by data science service
.with_service_conda("tensorflow28_p38_cpu_v1")
)
kind: runtime
type: python
spec:
conda:
type: service
slug: tensorflow28_p38_cpu_v1
scriptPathURI: oci://bucket_name@namespace/path/to/script.py
You can also use a custom conda environment published to OCI Object Storage
by passing the uri
to with_custom_conda()
,
for example:
from ads.jobs import PythonRuntime
runtime = (
PythonRuntime()
.with_source("oci://bucket_name@namespace/path/to/script.py")
.with_custom_conda("oci://bucket@namespace/conda_pack/pack_name")
)
kind: runtime
type: python
spec:
conda:
type: published
uri: oci://bucket@namespace/conda_pack/pack_name
scriptPathURI: oci://bucket_name@namespace/path/to/script.py
By default, ADS will try to determine the region based on the authenticated API key or resource principal.
If your custom conda environment is stored in a different region,
you can specify the region
when calling with_custom_conda()
.
For more details on custom conda environment, see Publishing a Conda Environment to an Object Storage Bucket in Your Tenancy.
Override Configurations¶
When you call ads.jobs.Job.run()
, a new job run will be started with the configuration defined in the job.
You may want to override the configuration with custom variables. For example,
you can customize job run display name, override command line argument, specify additional environment variables,
and add free form tags:
job_run = job.run(
name="<my_job_run_name>",
args="new_arg --new_key new_val",
env_var={"new_env": "new_val"},
freeform_tags={"new_tag": "new_tag_val"}
)