Run a Python Workload¶
The PythonRuntime
is designed for running a Python workload.
You can configure the environment variables, command line arguments, and conda environment
as described in Infrastructure and Runtime. This section shows the additional enhancements provided by
PythonRuntime
.
Example¶
Here is an example to define and run a job using PythonRuntime
:
from ads.jobs import Job, DataScienceJob, PythonRuntime
job = (
Job(name="My Job")
.with_infrastructure(
DataScienceJob()
# Configure logging for getting the job run outputs.
.with_log_group_id("<log_group_ocid>")
# Log resource will be auto-generated if log ID is not specified.
.with_log_id("<log_ocid>")
# If you are in an OCI data science notebook session,
# the following configurations are not required.
# Configurations from the notebook session will be used as defaults.
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_subnet_id("<subnet_ocid>")
.with_shape_name("VM.Standard.E3.Flex")
# Shape config details are applicable only for the flexible shapes.
.with_shape_config_details(memory_in_gbs=16, ocpus=1)
# Minimum/Default block storage size is 50 (GB).
.with_block_storage_size(50)
)
.with_runtime(
PythonRuntime()
# Specify the service conda environment by slug name.
.with_service_conda("pytorch110_p38_cpu_v1")
# The job artifact can be a single Python script, a directory or a zip file.
.with_source("local/path/to/code_dir")
# Environment variable
.with_environment_variable(NAME="Welcome to OCI Data Science.")
# Command line argument, arg1 --key arg2
.with_argument("arg1", key="arg2")
# Set the working directory
# When using a directory as source, the default working dir is the parent of code_dir.
# Working dir should be a relative path beginning from the source directory (code_dir)
.with_working_dir("code_dir")
# The entrypoint is applicable only to directory or zip file as source
# The entrypoint should be a path relative to the working dir.
# Here my_script.py is a file in the code_dir/my_package directory
.with_entrypoint("my_package/my_script.py")
# Add an additional Python path, relative to the working dir (code_dir/other_packages).
.with_python_path("other_packages")
# Copy files in "code_dir/output" to object storage after job finishes.
.with_output("output", "oci://bucket_name@namespace/path/to/dir")
)
)
kind: job
spec:
name: "My Job"
infrastructure:
kind: infrastructure
type: dataScienceJob
spec:
blockStorageSize: 50
compartmentId: <compartment_ocid>
jobInfrastructureType: STANDALONE
logGroupId: <log_group_ocid>
logId: <log_ocid>
projectId: <project_ocid>
shapeConfigDetails:
memoryInGBs: 16
ocpus: 1
shapeName: VM.Standard.E3.Flex
subnetId: <subnet_ocid>
runtime:
kind: runtime
type: python
spec:
args:
- arg1
- --key
- arg2
conda:
slug: pytorch110_p38_cpu_v1
type: service
entrypoint: my_package/my_script.py
env:
- name: NAME
value: Welcome to OCI Data Science.
outputDir: output
outputUri: oci://bucket_name@namespace/path/to/dir
pythonPath:
- other_packages
scriptPathURI: local/path/to/code_dir
workingDir: code_dir
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()
The PythonRuntime
uses an driver script from ADS for the job run.
It performs additional operations before and after invoking your code.
You can examine the driver script by downloading the job artifact from the OCI Console.
Source Code¶
In the with_source()
method, you can specify the location of your source code.
The location can be a local path or a remote URI supported by
fsspec.
For example, you can specify files on OCI object storage using URI like
oci://bucket@namespace/path/to/prefix
. ADS will use the authentication method configured by
ads.set_auth()
to fetch the files and upload them as job artifact.
The source code can be a single file, a compressed file/archive (zip/tar), or a folder.
When the source code is a compressed file/archive (zip/tar) or a folder, you need to also specify the entrypoint
using with_entrypoint()
. The path of the entrypoint should be a path relative to
the working directory.
The entrypoint can be a Python script or a Jupyter Notebook.
Working Directory¶
The working directory of your workload can be configured by with_working_dir()
.
By default, PythonRuntime
will create a code
directory as the working directory
in the job run to store your source code (job artifact),
for example /home/datascience/decompressed_artifact/code
.
When the entrypoint is a Jupyter notebook, the working directory for the code running in the notebook will be the directory containing the notebook.
When the entrypoint is not a notebook, the working directory depends on the source code.
File Source Code¶
If your source code is a single file, for example, my_script.py
, the file structure in the job run will look like:
code <---This is the working directory
└── my_script.py
You can refer your as ./my_script.py
Folder Source Code¶
If your source code is a folder, for example my_source_code
, ADS will compress the folder as job artifact.
In the job run, it will be decompressed under the working directory. The file structure in the job run will look like:
code <---This is the working directory
└── my_source_code
├── my_module.py
└── my_entrypoint.py
In this case, the working directory is the parent of your source code folder.
You will need to specify the entrypoint as my_source_code/my_entrypoint.py
.
runtime = (
PythonRuntime()
.with_source("path/to/my_source_code")
.with_entrypoint("my_source_code/my_entrypoint.py")
)
Alternatively, you can specify the working directory as my_source_code
and the entrypoint as my_entrypoint.py
:
runtime = (
PythonRuntime()
.with_source("path/to/my_source_code")
.with_working_dir("my_source_code")
.with_entrypoint("my_entrypoint.py")
)
Archive Source Code¶
If your source code is a zip/tar file, the files in the archive will be decompressed under the working directory.
The file structure in the job run depends on whether your archive has a top level directory.
For example, you can inspect the structure of your zip file by running the unzip -l
command:
unzip -l my_source_code.zip
This will give you outputs similar to the following:
Archive: path/to/my_source_code.zip
Length Date Time Name
--------- ---------- ----- ----
0 02-22-2023 16:38 my_source_code/
1803 02-22-2023 16:38 my_source_code/my_module.py
91 02-22-2023 16:38 my_source_code/my_entrypoint.py
--------- -------
1894 3 files
In this case, a top level directory my_source_code/
is presented in the archive.
The file structure in the job run will look like:
code <---This is the working directory
└── my_source_code
├── my_module.py
└── my_entrypoint.py
which is the same as the case when you specified a local folder as source code. You can configure the entrypoint and working directory similar to the examples above.
If a top level directory is not presented, outputs for the archive will look like the following:
Archive: path/to/my_source_code.zip
Length Date Time Name
--------- ---------- ----- ----
1803 02-22-2023 16:38 my_module.py
91 02-22-2023 16:38 my_entrypoint.py
--------- -------
1894 2 files
In this case, the file structure in the job run will look like:
code <---This is the working directory
├── my_module.py
└── my_entrypoint.py
And, you can specify the entrypoint with the filename directly:
runtime = (
PythonRuntime()
.with_source("path/to/my_source_code.zip")
.with_entrypoint("my_entrypoint.py")
)
Python Paths¶
The working directory is added to the Python paths automatically.
You can call with_python_path()
to add additional python paths as needed.
The paths should be relative paths from the working directory.
Outputs¶
The with_output()
method allows you to specify the output path output_path
in the job run and a remote URI (output_uri
).
Files in the output_path
are copied to the remote output URI after the job run finishes successfully.
Note that the output_path
should be a path relative to the working directory.
OCI object storage location can be specified in the format of oci://bucket_name@namespace/path/to/dir
.
Please make sure you configure the I AM policy to allow the job run dynamic group to use object storage.