Run a Notebook

In some cases, you may want to run an existing JupyterLab notebook as a job. You can do this using the NotebookRuntime() object.

The next example show you how to run an the TensorFlow 2 quick start for beginner notebook from the internet and save the results to OCI Object Storage. The notebook path points to the raw file link from GitHub. To run the following example, ensure that you have internet access to retrieve the notebook:

Python

from ads.jobs import Job, DataScienceJob, NotebookRuntime

job = (
    Job()
    .with_infrastructure(
        DataScienceJob()
        .with_log_id("<log_id>")
        .with_log_group_id("<log_group_id>")
    )
    .with_runtime(
        NotebookRuntime()
        .with_notebook(path="https://raw.githubusercontent.com/tensorflow/docs/master/site/en/tutorials/customization/basics.ipynb")
        .with_service_conda(tensorflow26_p37_cpu_v2")
        .with_output("oci://bucket_name@namespace/path/to/dir")
    )

job.create()
run = job.run().watch()

After the notebook finishes running, the notebook with results are saved to oci://bucket_name@namespace/path/to/dir. You can download the output by calling the download() method.

run.download("/path/to/local/dir")

The NotebookRuntime also allows you to use exclusion tags, which lets you exclude cells from a job run. For example, you could use these tags to do exploratory data analysis, and then train and evaluate your model in a notebook. Then you could use that same notebook to only build future models that are trained on a different dataset. So the job run only has to execute the cells that are related to training the model, and not the exploratory data analysis or model evaluation.

You tag the cells in the notebook, and then specify the tags using the .with_exclude_tag() method. Cells with any matching tags are excluded from the job run. For example, if you tagged cells with ignore and remove, you can pass in a list of the two tags to the method and those cells are excluded from the code that is executed as part of the job run. To tag cells in a notebook, see Adding tags using notebook interfaces.

job.with_runtime(
    NotebookRuntime()
    .with_notebook("path/to/notebook")
    .with_exclude_tag(["ignore", "remove"])
)

YAML

You could use the following YAML to create the same job:

kind: job
spec:
  infrastructure:
    kind: infrastructure
    spec:
      jobInfrastructureType: STANDALONE
      jobType: DEFAULT
      logGroupId: <log_group_id>
      logId: <log.id>
    type: dataScienceJob
  runtime:
    kind: runtime
    spec:
      conda:
        slug: tensorflow26_p37_cpu_v1
        type: service
      notebookPathURI: {path_to_nb}
    type: notebook

NotebookRuntime Schema

kind:
allowed:
    - runtime
required: true
type: string
spec:
required: true
schema:
    args:
    nullable: true
    required: false
    schema:
        type: string
    type: list
    conda:
    nullable: false
    required: false
    schema:
        slug:
        required: true
        type: string
        type:
        allowed:
            - service
        required: true
        type: string
    type: dict
    env:
    required: false
    schema:
        type: dict
    type: list
    excludeTags:
    required: false
    type: list
    freeform_tag:
    required: false
    type: dict
    notebookPathURI:
    required: false
    type: string
    outputUri:
    required: false
    type: string
type: dict
type:
allowed:
    - notebook
required: true
type: string