Run Code from Git Repo#

The GitPythonRuntime allows you to run source code from a Git repository as a job.

PyTorch Example#

The following example shows how to run a PyTorch Neural Network Example to train third order polynomial predicting y=sin(x).

  • Python
  • YAML
from ads.jobs import Job, DataScienceJob, GitPythonRuntime

job = (
    Job(name="My Job")
    .with_infrastructure(
        DataScienceJob()
        # Configure logging for getting the job run outputs.
        .with_log_group_id("<log_group_ocid>")
        # Log resource will be auto-generated if log ID is not specified.
        .with_log_id("<log_ocid>")
        # If you are in an OCI data science notebook session,
        # the following configurations are not required.
        # Configurations from the notebook session will be used as defaults.
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_subnet_id("<subnet_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        # Shape config details are applicable only for the flexible shapes.
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
        # Minimum/Default block storage size is 50 (GB).
        .with_block_storage_size(50)
    )
    .with_runtime(
        GitPythonRuntime()
        .with_environment_variable(GREETINGS="Welcome to OCI Data Science")
        # Specify the service conda environment by slug name.
        .with_service_conda("pytorch19_p37_gpu_v1")
        # Specify the git repository
        # Optionally, you can specify the branch or commit
        .with_source("https://github.com/pytorch/tutorials.git")
        # Entrypoint is a relative path from the root of the git repo.
        .with_entrypoint("beginner_source/examples_nn/polynomial_nn.py")
        # Copy files in "beginner_source/examples_nn" to object storage after job finishes.
        .with_output(
          output_dir="beginner_source/examples_nn",
          output_uri="oci://bucket_name@namespace/path/to/dir"
        )
    )
)
kind: job
spec:
  name: "My Job"
  infrastructure:
    kind: infrastructure
    type: dataScienceJob
    spec:
      blockStorageSize: 50
      compartmentId: <compartment_ocid>
      jobInfrastructureType: STANDALONE
      logGroupId: <log_group_ocid>
      logId: <log_ocid>
      projectId: <project_ocid>
      shapeConfigDetails:
        memoryInGBs: 16
        ocpus: 1
      shapeName: VM.Standard.E3.Flex
      subnetId: <subnet_ocid>
  runtime:
    kind: runtime
    type: gitPython
    spec:
      conda:
        slug: pytorch19_p37_gpu_v1
        type: service
      entrypoint: beginner_source/examples_nn/polynomial_nn.py
      env:
      - name: GREETINGS
        value: Welcome to OCI Data Science
      outputDir: beginner_source/examples_nn
      outputUri: oci://bucket_name@namespace/path/to/dir
      url: https://github.com/pytorch/tutorials.git
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()

Git Repository#

To configure the GitPythonRuntime, you must specify the source code url and the entrypoint. The default branch from the Git repository is used unless you specify a different branch or commit in the with_source() method.

For a public repository, we recommend the “http://” or “https://” URL. Authentication may be required for the SSH URL even if the repository is public.

To use a private repository, you must first save an SSH key to OCI Vault as a secret, and provide the secret_ocid when calling with_source(). For more information about creating and using secrets, see Managing Secret with Vault. For repository on GitHub, you could setup the GitHub Deploy Key as secret.

Git Version for Private Repository

Git version of 2.3+ is required to use a private repository.

Entrypoint#

The entrypoint specifies how the source code is invoked. The with_entrypoint() supports the following arguments:

  • path: Required. The relative path of the script/module from the root of the Git repository.

  • func: Optional. The function in the script specified by path to call. If you don’t specify it, then the script specified by path is run as a Python script in a subprocess.

The arguments for the entrypoint can be specified through with_argument(). For running a script, the arguments are passed in as command line arguments. See Runtime Command Line Arguments for more details. For running a function, the arguments are passed into the function call.

The following example shows how you can define a runtime using Python function from a git repository as an entrypoint. Here my_function is a function in the my_source/my_module.py module.

  • Python
  • YAML
runtime = (
  GitPythonRuntime()
  .with_environment_variable(GREETINGS="Welcome to OCI Data Science")
  # Specify the service conda environment by slug name.
  .with_service_conda("pytorch19_p37_gpu_v1")
  # Specify the git repository
  .with_source("https://example.com/your_repository.git")
  # Entrypoint is a relative path from the root of the git repo.
  .with_entrypoint("my_source/my_module.py", func="my_function")
  .with_argument("arg1", "arg2", key1="val1", key2="val2")
)
kind: runtime
type: gitPython
spec:
  args:
  - arg1
  - arg2
  - --key1
  - val1
  - --key2
  - val2
  conda:
    slug: pytorch19_p37_gpu_v1
    type: service
  entryFunction: my_function
  entrypoint: my_source/my_module.py
  env:
  - name: GREETINGS
    value: Welcome to OCI Data Science
  url: https://example.com/your_repository.git

The function will be called as my_function("arg1", "arg2", key1="val1", key2="val2").

The arguments can be strings, list of strings or dict containing only strings.

GitPythonRuntime also support Jupyter notebook as entrypoint. Arguments are not used when the entrypoint is a notebook.

Working Directory#

By default, the working directory is the root of the git repository. This can be configured by can be configured by with_working_dir() using a relative path from the root of the Git repository.

Note that the entrypoint should always specified as a relative path from the root of the Git repository, regardless of the working directory. The python paths and output directory should be specified relative to the working directory.

Python Paths#

The working directory is the root of the git repository. The working directory is added to the Python paths automatically. You can call with_python_path() to add additional python paths as needed. The paths should be relative paths from the working directory.

Outputs#

The with_output() method allows you to specify the output path output_dir in the job run and a remote URI (output_uri). Files in the output_dir are copied to the remote output URI after the job run finishes successfully. Note that the output_dir should be a path relative to the working directory.

OCI object storage location can be specified in the format of oci://bucket_name@namespace/path/to/dir. Please make sure you configure the I AM policy to allow the job run dynamic group to use object storage.

Metadata#

The GitPythonRuntime updates metadata as free-form tags of the job run after the job run finishes. The following tags are added automatically:

  • commit: The Git commit ID.

  • method: The entry function or method.

  • module: The entry script or module.

  • outputs: The prefix of the output files in Object Storage.

  • repo: The URL of the Git repository.

The new values overwrite any existing tags. If you want to skip the metadata update, set skip_metadata_update to True when initializing the runtime:

runtime = GitPythonRuntime(skip_metadata_update=True)