Run a Notebook#

The NotebookRuntime allows you to run a single Jupyter notebook as a job.

TensorFlow Example#

The following example shows you how to run an the TensorFlow 2 quick start for beginner notebook from the internet and save the results to OCI Object Storage. The notebook path points to the raw file link from GitHub. To run the example, ensure that you have internet access to retrieve the notebook:

  • Python
  • YAML
from ads.jobs import Job, DataScienceJob, NotebookRuntime

job = (
    Job(name="My Job")
    .with_infrastructure(
        DataScienceJob()
        # Configure logging for getting the job run outputs.
        .with_log_group_id("<log_group_ocid>")
        # Log resource will be auto-generated if log ID is not specified.
        .with_log_id("<log_ocid>")
        # If you are in an OCI data science notebook session,
        # the following configurations are not required.
        # Configurations from the notebook session will be used as defaults.
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_subnet_id("<subnet_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        # Shape config details are applicable only for the flexible shapes.
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
        # Minimum/Default block storage size is 50 (GB).
        .with_block_storage_size(50)
    )
    .with_runtime(
        NotebookRuntime()
        .with_notebook(
            path="https://raw.githubusercontent.com/tensorflow/docs/master/site/en/tutorials/customization/basics.ipynb",
            encoding='utf-8'
        )
        .with_service_conda("tensorflow28_p38_cpu_v1")
        .with_environment_variable(GREETINGS="Welcome to OCI Data Science")
        .with_exclude_tag(["ignore", "remove"])
        .with_output("oci://bucket_name@namespace/path/to/dir")
    )
)
kind: job
spec:
  name: "My Job"
  infrastructure:
    kind: infrastructure
    type: dataScienceJob
    spec:
      blockStorageSize: 50
      compartmentId: <compartment_ocid>
      jobInfrastructureType: STANDALONE
      logGroupId: <log_group_ocid>
      logId: <log_ocid>
      projectId: <project_ocid>
      shapeConfigDetails:
        memoryInGBs: 16
        ocpus: 1
      shapeName: VM.Standard.E3.Flex
      subnetId: <subnet_ocid>
  runtime:
    kind: runtime
    type: notebook
    spec:
      conda:
        slug: tensorflow28_p38_cpu_v1
        type: service
      env:
      - name: GREETINGS
        value: Welcome to OCI Data Science
      excludeTags:
      - ignore
      - remove
      notebookEncoding: utf-8
      notebookPathURI: https://raw.githubusercontent.com/tensorflow/docs/master/site/en/tutorials/customization/basics.ipynb
      outputUri: oci://bucket_name@namespace/path/to/dir
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()
# Download the notebook back to local
run.download("/path/to/local/dir")

Working Directory#

An empty directory in the job run will be created as the working directory for running the notebook. All relative paths used in the notebook will be base on the working directory.

Download the Outputs#

If you specify the output location using with_output(). All files in the working directory, including the notebook with outputs, will be saved to output location (oci://bucket_name@namespace/path/to/dir) after the job finishes running. You can download the output by calling the download() method.

Exclude Cells#

The NotebookRuntime also allows you to specify tags to exclude cells from being processed in a job run using with_exclude_tag() method. For example, you could do exploratory data analysis and visualization in a notebook, and you may want to exclude the visualization when running the notebook in a job.

To tag cells in a notebook, see Adding tags using notebook interfaces.

The with_exclude_tag() take a list of tags as argument Cells with any matching tags are excluded from the job run. In the above example, cells with ignore or remove are excluded.

Notebook with Dependencies#

If your notebook needs extra dependencies like custom module or data files, you can use PythonRuntime or GitPythonRuntime and set your notebook as the entrypoint.

See also:

Here is an example of running the minGPT demo notebook.

  • Python
  • YAML
from ads.jobs import Job, DataScienceJob, GitPythonRuntime

job = (
    Job(name="My Job")
    .with_infrastructure(
        DataScienceJob()
        # Configure logging for getting the job run outputs.
        .with_log_group_id("<log_group_ocid>")
        # Log resource will be auto-generated if log ID is not specified.
        .with_log_id("<log_ocid>")
        # If you are in an OCI data science notebook session,
        # the following configurations are not required.
        # Configurations from the notebook session will be used as defaults.
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_subnet_id("<subnet_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        # Shape config details are applicable only for the flexible shapes.
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
        # Minimum/Default block storage size is 50 (GB).
        .with_block_storage_size(50)
    )
    .with_runtime(
        GitPythonRuntime()
        # Use service conda pack
        .with_service_conda("pytorch110_p38_gpu_v1")
        # Specify training source code from GitHub
        .with_source(url="https://github.com/karpathy/minGPT.git")
        # Entrypoint is a relative path from the root of the Git repository
        .with_entrypoint("demo.ipynb")
    )
)
kind: job
spec:
  name: "My Job"
  infrastructure:
    kind: infrastructure
    type: dataScienceJob
    spec:
      blockStorageSize: 50
      compartmentId: <compartment_ocid>
      jobInfrastructureType: STANDALONE
      logGroupId: <log_group_ocid>
      logId: <log_ocid>
      projectId: <project_ocid>
      shapeConfigDetails:
        memoryInGBs: 16
        ocpus: 1
      shapeName: VM.Standard.E3.Flex
      subnetId: <subnet_ocid>
  runtime:
    kind: runtime
    type: gitPython
    spec:
      conda:
        slug: pytorch19_p37_gpu_v1
        type: service
      entrypoint: demo.ipynb
      url: https://github.com/karpathy/minGPT.git
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs
run.watch()