Run Code from Git Repo¶
The GitPythonRuntime
allows you to run source code from a Git repository as a job.
PyTorch Example¶
The following example shows how to run a PyTorch Neural Network Example to train third order polynomial predicting y=sin(x).
from ads.jobs import Job, DataScienceJob, GitPythonRuntime
job = (
Job(name="My Job")
.with_infrastructure(
DataScienceJob()
# Configure logging for getting the job run outputs.
.with_log_group_id("<log_group_ocid>")
# Log resource will be auto-generated if log ID is not specified.
.with_log_id("<log_ocid>")
# If you are in an OCI data science notebook session,
# the following configurations are not required.
# Configurations from the notebook session will be used as defaults.
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_subnet_id("<subnet_ocid>")
.with_shape_name("VM.Standard.E3.Flex")
# Shape config details are applicable only for the flexible shapes.
.with_shape_config_details(memory_in_gbs=16, ocpus=1)
# Minimum/Default block storage size is 50 (GB).
.with_block_storage_size(50)
)
.with_runtime(
GitPythonRuntime()
.with_environment_variable(GREETINGS="Welcome to OCI Data Science")
# Specify the service conda environment by slug name.
.with_service_conda("pytorch19_p37_gpu_v1")
# Specify the git repository
# Optionally, you can specify the branch or commit
.with_source("https://github.com/pytorch/tutorials.git")
# Entrypoint is a relative path from the root of the git repo.
.with_entrypoint("beginner_source/examples_nn/polynomial_nn.py")
# Copy files in "beginner_source/examples_nn" to object storage after job finishes.
.with_output(
output_dir="beginner_source/examples_nn",
output_uri="oci://bucket_name@namespace/path/to/dir"
)
)
)
kind: job
spec:
name: "My Job"
infrastructure:
kind: infrastructure
type: dataScienceJob
spec:
blockStorageSize: 50
compartmentId: <compartment_ocid>
jobInfrastructureType: STANDALONE
logGroupId: <log_group_ocid>
logId: <log_ocid>
projectId: <project_ocid>
shapeConfigDetails:
memoryInGBs: 16
ocpus: 1
shapeName: VM.Standard.E3.Flex
subnetId: <subnet_ocid>
runtime:
kind: runtime
type: gitPython
spec:
conda:
slug: pytorch19_p37_gpu_v1
type: service
entrypoint: beginner_source/examples_nn/polynomial_nn.py
env:
- name: GREETINGS
value: Welcome to OCI Data Science
outputDir: beginner_source/examples_nn
outputUri: oci://bucket_name@namespace/path/to/dir
url: https://github.com/pytorch/tutorials.git
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()
Git Repository¶
To configure the GitPythonRuntime
, you must specify the source code url
and the entrypoint.
The default branch from the Git repository is used unless you specify a different branch
or commit
in the with_source()
method.
For a public repository, we recommend the “http://” or “https://” URL. Authentication may be required for the SSH URL even if the repository is public.
To use a private repository, you must first save an SSH key to
OCI Vault as a secret,
and provide the secret_ocid
when calling with_source()
.
For more information about creating and using secrets,
see Managing Secret with Vault.
For repository on GitHub, you could setup the
GitHub Deploy Key as secret.
Git Version for Private Repository
Git version of 2.3+ is required to use a private repository.
Entrypoint¶
The entrypoint specifies how the source code is invoked.
The with_entrypoint()
supports the following arguments:
path
: Required. The relative path of the script/module from the root of the Git repository.func
: Optional. The function in the script specified bypath
to call. If you don’t specify it, then the script specified bypath
is run as a Python script in a subprocess.
The arguments for the entrypoint can be specified through with_argument()
.
For running a script, the arguments are passed in as command line arguments.
See Runtime Command Line Arguments for more details.
For running a function, the arguments are passed into the function call.
The following example shows how you can define a runtime using Python function from a git repository as an entrypoint.
Here my_function
is a function in the my_source/my_module.py
module.
runtime = (
GitPythonRuntime()
.with_environment_variable(GREETINGS="Welcome to OCI Data Science")
# Specify the service conda environment by slug name.
.with_service_conda("pytorch19_p37_gpu_v1")
# Specify the git repository
.with_source("https://example.com/your_repository.git")
# Entrypoint is a relative path from the root of the git repo.
.with_entrypoint("my_source/my_module.py", func="my_function")
.with_argument("arg1", "arg2", key1="val1", key2="val2")
)
kind: runtime
type: gitPython
spec:
args:
- arg1
- arg2
- --key1
- val1
- --key2
- val2
conda:
slug: pytorch19_p37_gpu_v1
type: service
entryFunction: my_function
entrypoint: my_source/my_module.py
env:
- name: GREETINGS
value: Welcome to OCI Data Science
url: https://example.com/your_repository.git
The function will be called as my_function("arg1", "arg2", key1="val1", key2="val2")
.
The arguments can be strings, list
of strings or dict
containing only strings.
GitPythonRuntime
also support Jupyter notebook as entrypoint.
Arguments are not used when the entrypoint is a notebook.
Working Directory¶
By default, the working directory is the root of the git repository.
This can be configured by can be configured by with_working_dir()
using a relative path from the root of the Git repository.
Note that the entrypoint should always specified as a relative path from the root of the Git repository, regardless of the working directory. The python paths and output directory should be specified relative to the working directory.
Python Paths¶
The working directory is the root of the git repository.
The working directory is added to the Python paths automatically.
You can call with_python_path()
to add additional python paths as needed.
The paths should be relative paths from the working directory.
Outputs¶
The with_output()
method allows you to specify the output path output_dir
in the job run and a remote URI (output_uri
).
Files in the output_dir
are copied to the remote output URI after the job run finishes successfully.
Note that the output_dir
should be a path relative to the working directory.
OCI object storage location can be specified in the format of oci://bucket_name@namespace/path/to/dir
.
Please make sure you configure the I AM policy to allow the job run dynamic group to use object storage.
Metadata¶
The GitPythonRuntime
updates metadata as free-form tags of the job run
after the job run finishes. The following tags are added automatically:
commit
: The Git commit ID.method
: The entry function or method.module
: The entry script or module.outputs
: The prefix of the output files in Object Storage.repo
: The URL of the Git repository.
The new values overwrite any existing tags.
If you want to skip the metadata update, set skip_metadata_update
to True
when initializing the runtime:
runtime = GitPythonRuntime(skip_metadata_update=True)