Run a Python Workload#

The PythonRuntime is designed for running a Python workload. You can configure the environment variables, command line arguments, and conda environment as described in Infrastructure and Runtime. This section shows the additional enhancements provided by PythonRuntime.

Example#

Here is an example to define and run a job using PythonRuntime:

  • Python
  • YAML
from ads.jobs import Job, DataScienceJob, PythonRuntime

job = (
    Job(name="My Job")
    .with_infrastructure(
        DataScienceJob()
        # Configure logging for getting the job run outputs.
        .with_log_group_id("<log_group_ocid>")
        # Log resource will be auto-generated if log ID is not specified.
        .with_log_id("<log_ocid>")
        # If you are in an OCI data science notebook session,
        # the following configurations are not required.
        # Configurations from the notebook session will be used as defaults.
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_subnet_id("<subnet_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        # Shape config details are applicable only for the flexible shapes.
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
        # Minimum/Default block storage size is 50 (GB).
        .with_block_storage_size(50)
    )
    .with_runtime(
        PythonRuntime()
        # Specify the service conda environment by slug name.
        .with_service_conda("pytorch110_p38_cpu_v1")
        # The job artifact can be a single Python script, a directory or a zip file.
        .with_source("local/path/to/code_dir")
        # Environment variable
        .with_environment_variable(NAME="Welcome to OCI Data Science.")
        # Command line argument, arg1 --key arg2
        .with_argument("arg1", key="arg2")
        # Set the working directory
        # When using a directory as source, the default working dir is the parent of code_dir.
        # Working dir should be a relative path beginning from the source directory (code_dir)
        .with_working_dir("code_dir")
        # The entrypoint is applicable only to directory or zip file as source
        # The entrypoint should be a path relative to the working dir.
        # Here my_script.py is a file in the code_dir/my_package directory
        .with_entrypoint("my_package/my_script.py")
        # Add an additional Python path, relative to the working dir (code_dir/other_packages).
        .with_python_path("other_packages")
        # Copy files in "code_dir/output" to object storage after job finishes.
        .with_output("output", "oci://bucket_name@namespace/path/to/dir")
    )
)
kind: job
spec:
  name: "My Job"
  infrastructure:
    kind: infrastructure
    type: dataScienceJob
    spec:
      blockStorageSize: 50
      compartmentId: <compartment_ocid>
      jobInfrastructureType: STANDALONE
      logGroupId: <log_group_ocid>
      logId: <log_ocid>
      projectId: <project_ocid>
      shapeConfigDetails:
        memoryInGBs: 16
        ocpus: 1
      shapeName: VM.Standard.E3.Flex
      subnetId: <subnet_ocid>
  runtime:
    kind: runtime
    type: python
    spec:
      args:
      - arg1
      - --key
      - arg2
      conda:
        slug: pytorch110_p38_cpu_v1
        type: service
      entrypoint: my_package/my_script.py
      env:
      - name: NAME
        value: Welcome to OCI Data Science.
      outputDir: output
      outputUri: oci://bucket_name@namespace/path/to/dir
      pythonPath:
      - other_packages
      scriptPathURI: local/path/to/code_dir
      workingDir: code_dir
      workingDir: code_dir
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()

The PythonRuntime uses an driver script from ADS for the job run. It performs additional operations before and after invoking your code. You can examine the driver script by downloading the job artifact from the OCI Console.

Source Code#

In the with_source() method, you can specify the location of your source code. The location can be a local path or a remote URI supported by fsspec. For example, you can specify files on OCI object storage using URI like oci://bucket@namespace/path/to/prefix. ADS will use the authentication method configured by ads.set_auth() to fetch the files and upload them as job artifact.

The source code can be a single file, a compressed file/archive (zip/tar), or a folder. When the source code is a compressed file/archive (zip/tar) or a folder, you need to also specify the entrypoint using with_entrypoint(). The path of the entrypoint should be a path relative to the working directory.

The entrypoint can be a Python script or a Jupyter Notebook.

Working Directory#

The working directory of your workload can be configured by with_working_dir(). By default, PythonRuntime will create a code directory as the working directory in the job run to store your source code (job artifact), for example /home/datascience/decompressed_artifact/code.

When the entrypoint is a Jupyter notebook, the working directory for the code running in the notebook will be the directory containing the notebook.

When the entrypoint is not a notebook, the working directory depends on the source code.

File Source Code#

If your source code is a single file, for example, my_script.py, the file structure in the job run will look like:

code  <---This is the working directory
└── my_script.py

You can refer your as ./my_script.py

Folder Source Code#

If your source code is a folder, for example my_source_code, ADS will compress the folder as job artifact. In the job run, it will be decompressed under the working directory. The file structure in the job run will look like:

code  <---This is the working directory
└── my_source_code
    ├── my_module.py
    └── my_entrypoint.py

In this case, the working directory is the parent of your source code folder. You will need to specify the entrypoint as my_source_code/my_entrypoint.py.

runtime = (
  PythonRuntime()
  .with_source("path/to/my_source_code")
  .with_entrypoint("my_source_code/my_entrypoint.py")
)

Alternatively, you can specify the working directory as my_source_code and the entrypoint as my_entrypoint.py:

runtime = (
  PythonRuntime()
  .with_source("path/to/my_source_code")
  .with_working_dir("my_source_code")
  .with_entrypoint("my_entrypoint.py")
)

Archive Source Code#

If your source code is a zip/tar file, the files in the archive will be decompressed under the working directory. The file structure in the job run depends on whether your archive has a top level directory. For example, you can inspect the structure of your zip file by running the unzip -l command:

unzip -l my_source_code.zip

This will give you outputs similar to the following:

Archive:  path/to/my_source_code.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  02-22-2023 16:38   my_source_code/
     1803  02-22-2023 16:38   my_source_code/my_module.py
       91  02-22-2023 16:38   my_source_code/my_entrypoint.py
---------                     -------
     1894                     3 files

In this case, a top level directory my_source_code/ is presented in the archive. The file structure in the job run will look like:

code  <---This is the working directory
└── my_source_code
    ├── my_module.py
    └── my_entrypoint.py

which is the same as the case when you specified a local folder as source code. You can configure the entrypoint and working directory similar to the examples above.

If a top level directory is not presented, outputs for the archive will look like the following:

Archive:  path/to/my_source_code.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     1803  02-22-2023 16:38   my_module.py
       91  02-22-2023 16:38   my_entrypoint.py
---------                     -------
     1894                     2 files

In this case, the file structure in the job run will look like:

code  <---This is the working directory
├── my_module.py
└── my_entrypoint.py

And, you can specify the entrypoint with the filename directly:

runtime = (
  PythonRuntime()
  .with_source("path/to/my_source_code.zip")
  .with_entrypoint("my_entrypoint.py")
)

Python Paths#

The working directory is added to the Python paths automatically. You can call with_python_path() to add additional python paths as needed. The paths should be relative paths from the working directory.

Outputs#

The with_output() method allows you to specify the output path output_path in the job run and a remote URI (output_uri). Files in the output_path are copied to the remote output URI after the job run finishes successfully. Note that the output_path should be a path relative to the working directory.

OCI object storage location can be specified in the format of oci://bucket_name@namespace/path/to/dir. Please make sure you configure the I AM policy to allow the job run dynamic group to use object storage.