Run a Notebook¶
The NotebookRuntime
allows you to run a single Jupyter notebook as a job.
TensorFlow Example¶
The following example shows you how to run an the TensorFlow 2 quick start for beginner notebook from the internet and save the results to OCI Object Storage. The notebook path points to the raw file link from GitHub. To run the example, ensure that you have internet access to retrieve the notebook:
from ads.jobs import Job, DataScienceJob, NotebookRuntime
job = (
Job(name="My Job")
.with_infrastructure(
DataScienceJob()
# Configure logging for getting the job run outputs.
.with_log_group_id("<log_group_ocid>")
# Log resource will be auto-generated if log ID is not specified.
.with_log_id("<log_ocid>")
# If you are in an OCI data science notebook session,
# the following configurations are not required.
# Configurations from the notebook session will be used as defaults.
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_subnet_id("<subnet_ocid>")
.with_shape_name("VM.Standard.E3.Flex")
# Shape config details are applicable only for the flexible shapes.
.with_shape_config_details(memory_in_gbs=16, ocpus=1)
# Minimum/Default block storage size is 50 (GB).
.with_block_storage_size(50)
)
.with_runtime(
NotebookRuntime()
.with_notebook(
path="https://raw.githubusercontent.com/tensorflow/docs/master/site/en/tutorials/customization/basics.ipynb",
encoding='utf-8'
)
.with_service_conda("tensorflow28_p38_cpu_v1")
.with_environment_variable(GREETINGS="Welcome to OCI Data Science")
.with_exclude_tag(["ignore", "remove"])
.with_output("oci://bucket_name@namespace/path/to/dir")
)
)
kind: job
spec:
name: "My Job"
infrastructure:
kind: infrastructure
type: dataScienceJob
spec:
blockStorageSize: 50
compartmentId: <compartment_ocid>
jobInfrastructureType: STANDALONE
logGroupId: <log_group_ocid>
logId: <log_ocid>
projectId: <project_ocid>
shapeConfigDetails:
memoryInGBs: 16
ocpus: 1
shapeName: VM.Standard.E3.Flex
subnetId: <subnet_ocid>
runtime:
kind: runtime
type: notebook
spec:
conda:
slug: tensorflow28_p38_cpu_v1
type: service
env:
- name: GREETINGS
value: Welcome to OCI Data Science
excludeTags:
- ignore
- remove
notebookEncoding: utf-8
notebookPathURI: https://raw.githubusercontent.com/tensorflow/docs/master/site/en/tutorials/customization/basics.ipynb
outputUri: oci://bucket_name@namespace/path/to/dir
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()
# Download the notebook back to local
run.download("/path/to/local/dir")
Working Directory¶
An empty directory in the job run will be created as the working directory for running the notebook. All relative paths used in the notebook will be base on the working directory.
Download the Outputs¶
If you specify the output location using with_output()
.
All files in the working directory, including the notebook with outputs,
will be saved to output location (oci://bucket_name@namespace/path/to/dir
) after the job finishes running.
You can download the output by calling the download()
method.
Exclude Cells¶
The NotebookRuntime
also allows you to specify tags to exclude cells from being processed
in a job run using with_exclude_tag()
method.
For example, you could do exploratory data analysis and visualization in a notebook,
and you may want to exclude the visualization when running the notebook in a job.
To tag cells in a notebook, see Adding tags using notebook interfaces.
The with_exclude_tag()
take a list of tags as argument
Cells with any matching tags are excluded from the job run.
In the above example, cells with ignore
or remove
are excluded.
Notebook with Dependencies¶
If your notebook needs extra dependencies like custom module or data files, you can use
PythonRuntime
or GitPythonRuntime
and set your notebook as the entrypoint.
See also:
Here is an example of running the minGPT demo notebook.
from ads.jobs import Job, DataScienceJob, GitPythonRuntime
job = (
Job(name="My Job")
.with_infrastructure(
DataScienceJob()
# Configure logging for getting the job run outputs.
.with_log_group_id("<log_group_ocid>")
# Log resource will be auto-generated if log ID is not specified.
.with_log_id("<log_ocid>")
# If you are in an OCI data science notebook session,
# the following configurations are not required.
# Configurations from the notebook session will be used as defaults.
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_subnet_id("<subnet_ocid>")
.with_shape_name("VM.Standard.E3.Flex")
# Shape config details are applicable only for the flexible shapes.
.with_shape_config_details(memory_in_gbs=16, ocpus=1)
# Minimum/Default block storage size is 50 (GB).
.with_block_storage_size(50)
)
.with_runtime(
GitPythonRuntime()
# Use service conda pack
.with_service_conda("pytorch110_p38_gpu_v1")
# Specify training source code from GitHub
.with_source(url="https://github.com/karpathy/minGPT.git")
# Entrypoint is a relative path from the root of the Git repository
.with_entrypoint("demo.ipynb")
)
)
kind: job
spec:
name: "My Job"
infrastructure:
kind: infrastructure
type: dataScienceJob
spec:
blockStorageSize: 50
compartmentId: <compartment_ocid>
jobInfrastructureType: STANDALONE
logGroupId: <log_group_ocid>
logId: <log_ocid>
projectId: <project_ocid>
shapeConfigDetails:
memoryInGBs: 16
ocpus: 1
shapeName: VM.Standard.E3.Flex
subnetId: <subnet_ocid>
runtime:
kind: runtime
type: gitPython
spec:
conda:
slug: pytorch19_p37_gpu_v1
type: service
entrypoint: demo.ipynb
url: https://github.com/karpathy/minGPT.git
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()