Run a Git Repo
The ADS GitPythonRuntime
class allows you to run source code from a Git repository as a Data Science job. The next example shows how to run a
PyTorch Neural Network Example to train third order polynomial predicting y=sin(x).
Python
To configure the GitPythonRuntime
, you must specify the source code url
and entrypoint path
. Similar to PythonRuntime
, you can specify a service conda environment, environment variables, and CLI arguments. In this example, the pytorch19_p37_gpu_v1
service conda environment is used. Assuming you are running this example in an Data Science notebook session, only log ID and log group ID need to be configured for the DataScienceJob
object, see Data Science Jobs for more details about configuring the infrastructure.
from ads.jobs import Job, DataScienceJob, GitPythonRuntime
job = (
Job()
.with_infrastructure(
DataScienceJob()
.with_log_group_id("<log_group_ocid>")
.with_log_id("<log_ocid>")
# The following infrastructure configurations are optional
# if you are in an OCI data science notebook session.
# The configurations of the notebook session will be used as defaults
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_subnet_id("<subnet_ocid>")
.with_shape_name("VM.Standard.E3.Flex")
.with_shape_config_details(memory_in_gbs=16, ocpus=1) # Applicable only for the flexible shapes
.with_block_storage_size(50)
)
.with_runtime(
GitPythonRuntime()
.with_environment_variable(GREETINGS="Welcome to OCI Data Science")
.with_service_conda("pytorch19_p37_gpu_v1")
.with_source("https://github.com/pytorch/tutorials.git")
.with_entrypoint("beginner_source/examples_nn/polynomial_nn.py")
.with_output(
output_dir="~/Code/tutorials/beginner_source/examples_nn",
output_uri="oci://BUCKET_NAME@BUCKET_NAMESPACE/PREFIX"
)
)
)
# Create the job with OCI
job.create()
# Run the job and stream the outputs
job_run = job.run().watch()
The default branch from the Git repository is used unless you specify a different branch
or commit
in the .with_source()
method.
For a public repository, we recommend the “http://” or “https://” URL. Authentication may be required for the SSH URL even if the repository is public.
To use a private repository, you must first save an SSH key to an OCI Vault as a secret, and provide the secret_ocid
to the with_source()
method, see Managing Secret with Vault. For example, you could use GitHub Deploy
Key.
The entry point specifies how the source code is invoked. The .with_entrypiont()
has the following arguments:
func
: Optional. The function in the script specified bypath
to call. If you don’t specify it, then the script specified bypath
is run as a Python script in a subprocess.path
: Required. The relative path for the script, module, or file to start the job.
With the GitPythonRuntime
class, you can save the output files from the job run to Object Storage using with_output()
. By default, the source code is cloned to the ~/Code
directory. In the example, the files in the example_nn
directory are copied to the Object Storage specified by the output_uri
parameter. The output_uri
parameter should have this format:
oci://BUCKET_NAME@BUCKET_NAMESPACE/PREFIX
The GitPythonRuntime
also supports these additional configurations:
The
.with_python_path()
method allows you to add additional Python paths to the runtime. By default, the code directory checked out from Git is added tosys.path
. Additional Python paths are appended before the code directory is appended.The
.with_argument()
method allows you to pass arguments to invoke the script or function. For running a script, the arguments are passed in as CLI arguments. For running a function, thelist
anddict
JSON serializable objects are supported and are passed into the function.
The GitPythonRuntime
method updates metadata in the free form tags of the job run after the job run finishes. The following tags are added automatically:
commit
: The Git commit ID.method
: The entry function or method.module
: The entry script or module.outputs
: The prefix of the output files in Object Storage.repo
: The URL of the Git repository.
The new values overwrite any existing tags. If you want to skip the metadata update, set skip_metadata_update
to True
when initializing the runtime:
runtime = GitPythonRuntime(skip_metadata_update=True)
YAML
You could create the preceding example job with the following YAML file:
kind: job
spec:
infrastructure:
kind: infrastructure
type: dataScienceJob
spec:
logGroupId: <log_group_ocid>
logId: <log_ocid>
compartmentId: <compartment_ocid>
projectId: <project_ocid>
subnetId: <subnet_ocid>
shapeName: VM.Standard.E3.Flex
shapeConfigDetails:
memoryInGBs: 16
ocpus: 1
blockStorageSize: 50
name: git_example
runtime:
kind: runtime
type: gitPython
spec:
entrypoint: beginner_source/examples_nn/polynomial_nn.py
outputDir: ~/Code/tutorials/beginner_source/examples_nn
outputUri: oci://BUCKET_NAME@BUCKET_NAMESPACE/PREFIX
url: https://github.com/pytorch/tutorials.git
conda:
slug: pytorch19_p37_gpu_v1
type: service
env:
- name: GREETINGS
value: Welcome to OCI Data Science
GitPythonRuntime YAML Schema
kind:
required: true
type: string
allowed:
- runtime
type:
required: true
type: string
allowed:
- gitPython
spec:
required: true
type: dict
schema:
args:
type: list
nullable: true
required: false
schema:
type: string
branch:
nullable: true
required: false
type: string
commit:
nullable: true
required: false
type: string
codeDir:
required: false
type: string
conda:
nullable: false
required: false
type: dict
schema:
slug:
required: true
type: string
type:
required: true
type: string
allowed:
- service
entryFunction:
nullable: true
required: false
type: string
entrypoint:
required: false
type:
- string
- list
env:
nullable: true
required: false
type: list
schema:
type: dict
schema:
name:
type: string
value:
type:
- number
- string
outputDir:
required: false
type: string
outputUri:
required: false
type: string
pythonPath:
nullable: true
required: false
type: list
url:
required: false
type: string