Quick Start¶
Prerequisite
Before creating a job, ensure that you have policies configured for Data Science resources.
See IAM Policies and About Data Science Policies.
Define a Job¶
In ADS, a job is defined by Infrastructure and Runtime.
The Data Science Job infrastructure is configured through a DataScienceJob
instance.
The runtime can be an instance of:
PythonRuntime
for Python code stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Python Workload.GitPythonRuntime
for Python code from a Git repository. See Run Code from Git Repo.NotebookRuntime
for a single Jupyter notebook stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Notebook.ScriptRuntime
for bash or shell scripts stored locally, OCI object storage, or other remote location supported by fsspec. See Run a Script.
ContainerRuntime
for container images.
Here is an example to define and run a Python Job
.
Note that a job can be defined either using Python APIs or YAML. See the next section for how to load and save the job with YAML.
from ads.jobs import Job, DataScienceJob, PythonRuntime
job = (
Job(name="My Job")
.with_infrastructure(
DataScienceJob()
# Configure logging for getting the job run outputs.
.with_log_group_id("<log_group_ocid>")
# Log resource will be auto-generated if log ID is not specified.
.with_log_id("<log_ocid>")
# If you are in an OCI data science notebook session,
# the following configurations are not required.
# Configurations from the notebook session will be used as defaults.
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_subnet_id("<subnet_ocid>")
.with_shape_name("VM.Standard.E3.Flex")
# Shape config details are applicable only for the flexible shapes.
.with_shape_config_details(memory_in_gbs=16, ocpus=1)
# Minimum/Default block storage size is 50 (GB).
.with_block_storage_size(50)
# A maximum number of 5 file systems are allowed to be mounted for a job.
.with_storage_mount(
{
"src" : "<mount_target_ip_address>@<export_path>",
"dest" : "<destination_path>/<destination_directory_name>"
}, # mount oci file storage to path "<destination_path>/<destination_directory_name>"
{
"src" : "oci://<bucket_name>@<namespace>/<prefix>",
"dest" : "<destination_directory_name>"
} # mount oci object storage to path "/mnt/<destination_directory_name>"
)
)
.with_runtime(
PythonRuntime()
# Specify the service conda environment by slug name.
.with_service_conda("pytorch110_p38_cpu_v1")
# Source code of the job, can be local or remote.
.with_source("path/to/script.py")
# Environment variable
.with_environment_variable(NAME="Welcome to OCI Data Science.")
# Command line argument
.with_argument(greeting="Good morning")
)
)
kind: job
spec:
name: "My Job"
infrastructure:
kind: infrastructure
type: dataScienceJob
spec:
blockStorageSize: 50
compartmentId: <compartment_ocid>
jobInfrastructureType: STANDALONE
logGroupId: <log_group_ocid>
logId: <log_ocid>
projectId: <project_ocid>
shapeConfigDetails:
memoryInGBs: 16
ocpus: 1
shapeName: VM.Standard.E3.Flex
subnetId: <subnet_ocid>
storageMount:
- src: <mount_target_ip_address>@<export_path>
dest: <destination_path>/<destination_directory_name>
- src: oci://<bucket_name>@<namespace>/<prefix>
dest: <destination_directory_name>
runtime:
kind: runtime
type: python
spec:
args:
- --greeting
- Good morning
conda:
slug: pytorch110_p38_cpu_v1
type: service
env:
- name: NAME
value: Welcome to OCI Data Science.
scriptPathURI: path/to/script.py
The PythonRuntime
is designed for Running a Python Workload.
The source code is specified by with_source()
(path/to/script.py
).
It can be a script, a Jupyter notebook, a folder or a zip file.
The source code location can be a local or remote, including HTTP URL and OCI Object Storage.
An example Python script
is available on Data Science AI Sample GitHub Repository.
For more details, see Infrastructure and Runtime configurations. You can also Run a Notebook, Run a Script and Run Code from Git Repo.
YAML¶
A job can be defined using YAML, as shown in the “YAML” tab in the example above. Here are some examples to load/save the YAML job configurations:
# Load a job from a YAML file
job = Job.from_yaml(uri="oci://bucket_name@namespace/path/to/job.yaml")
# Save a job to a YAML file
job.to_yaml(uri="oci://bucket_name@namespace/path/to/job.yaml")
# Save a job to YAML in a string
yaml_string = job.to_yaml()
# Load a job from a YAML string
job = Job.from_yaml("""
kind: job
spec:
infrastructure:
kind: infrastructure
...
""")
The uri
can be a local file path or a remote location supported by
fsspec, including OCI object storage.
With the YAML file, you can create and run the job with ADS CLI:
ads opctl run -f your_job.yaml
For more details on ads opctl
, see Working with the CLI.
The job infrastructure, runtime and job run also support YAML serialization/deserialization.
Run a Job and Monitor outputs¶
Once the job is defined or loaded from YAML, you can call the create()
method
to create the job on OCI. To start a job run, you can call the run()
method,
which returns a DataScienceJobRun
instance.
Once the job or job run is created, the job OCID can be accessed through job.id
or run.id
.
Note
Once a job is created, if you change the configuration, you will need to re-create a job for the new configuration.
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()
The watch()
method is useful to monitor the progress of the job run.
It will stream the logs to terminal and return once the job is finished.
Logging configurations are required for this method to show logs. Here is an example of the logs:
Job OCID: <job_ocid>
Job Run OCID: <job_run_ocid>
2023-02-27 15:58:01 - Job Run ACCEPTED
2023-02-27 15:58:11 - Job Run ACCEPTED, Infrastructure provisioning.
2023-02-27 15:59:06 - Job Run ACCEPTED, Infrastructure provisioned.
2023-02-27 15:59:29 - Job Run ACCEPTED, Job run bootstrap starting.
2023-02-27 16:01:08 - Job Run ACCEPTED, Job run bootstrap complete. Artifact execution starting.
2023-02-27 16:01:18 - Job Run IN_PROGRESS, Job run artifact execution in progress.
2023-02-27 16:01:11 - Good morning, your environment variable has value of (Welcome to OCI Data Science.)
2023-02-27 16:01:11 - Job Run 02-27-2023-16:01:11
2023-02-27 16:01:11 - Job Done.
2023-02-27 16:01:22 - Job Run SUCCEEDED, Job run artifact execution succeeded. Infrastructure de-provisioning.
Load Existing Job or Job Run¶
You can load an existing job or job run using the OCID from OCI:
from ads.jobs import Job, DataScienceJobRun
# Load a job
job = Job.from_datascience_job("<job_ocid>")
# Load a job run
job_run = DataScienceJobRun.from_ocid("<job_run_ocid>"")
List Existing Jobs or Job Runs¶
To get a list of existing jobs in a specific compartment:
from ads.jobs import Job
# Get a list of jobs in a specific compartment.
jobs = Job.datascience_job("<compartment_ocid>")
With a Job
object, you can get a list of job runs:
# Gets a list of job runs for a specific job.
runs = job.run_list()
Delete a Job or Job Run¶
You can delete a job or job run by calling the delete()
method.
# Delete a job and the corresponding job runs.
job.delete()
# Delete a job run
run.delete()
You can also cancel a job run:
run.cancel()
Variable Substitution¶
When defining a job or starting a job run,
you can use environment variable substitution for the names and output_uri
argument of
the with_output()
method.
For example, the following job specifies the name based on the environment variable DATASET_NAME
,
and output_uri
based on the environment variables JOB_RUN_OCID
:
from ads.jobs import Job, DataScienceJob, PythonRuntime
job = (
Job(name="Training on ${DATASET_NAME}")
.with_infrastructure(
DataScienceJob()
.with_log_group_id("<log_group_ocid>")
.with_log_id("<log_ocid>")
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_shape_name("VM.Standard.E3.Flex")
.with_shape_config_details(memory_in_gbs=16, ocpus=1)
)
.with_runtime(
PythonRuntime()
.with_service_conda("pytorch110_p38_gpu_v1")
.with_environment_variable(DATASET_NAME="MyData")
.with_source("local/path/to/training_script.py")
.with_output("output", "oci://bucket_name@namespace/prefix/${JOB_RUN_OCID}")
)
)
kind: job
spec:
name: Training on ${DATASET_NAME}
infrastructure:
kind: infrastructure
type: dataScienceJob
spec:
compartmentId: <compartment_ocid>
logGroupId: <log_group_ocid>
logId: <log_ocid>
projectId: <project_ocid>
shapeConfigDetails:
memoryInGBs: 16
ocpus: 1
shapeName: VM.Standard.E3.Flex
runtime:
kind: runtime
type: python
spec:
conda:
slug: pytorch110_p38_cpu_v1
type: service
env:
- name: DATASET_NAME
value: MyData
outputDir: output
outputUri: oci://bucket_name@namespace/prefix/${JOB_RUN_OCID}
scriptPathURI: local/path/to/training_script.py
Note that JOB_RUN_OCID
is an environment variable provided by the service after the job run is created.
It is available for the output_uri
but cannot be used in the job name.
See also: