Local Pipeline Execution¶
Your pipeline can be executed locally to facilitate development and troubleshooting. Each pipeline step is executed in its own local container.
Prerequisites¶
Restrictions¶
Your pipeline steps are subject to the same restrictions as local jobs.
They are also subject to these additional restrictions:
Pipeline steps must be of kind
customScript
.Custom container images are not yet supported. You must use the development container image with a conda environment.
Configuring Local Pipeline Orchestrator¶
Use ads opctl configure
. Refer to the local_backend.ini
description in the configuration instructions.
Most importantly, max_parallel_containers
controls how many pipeline steps may be executed in parallel on your machine. Your pipeline DAG may allow multiple steps to be executed in parallel,
but your local machine may not have enough cpu cores / memory to effectively run them all simultaneously.
Running your Pipeline¶
Local pipeline execution requires you to define your pipeline in a yaml file. Refer to the YAML examples here.
Then, invoke the following command to run your pipeline.
ads opctl run --backend local --file my_pipeline.yaml --source-folder /path/to/my/pipeline/step/files
- Parameter explanation:
--backend local
: Run the pipeline locally using docker containers.--file my_pipeline.yaml
: The yaml file defining your pipeline.--source-folder /path/to/my/pipeline/step/files
: The local directory containing the files used by your pipeline steps. This directory is mounted into the container as a volume. Defaults to the current working directory if no value is provided.
Source folder and relative paths¶
If your pipeline step runtimes are of type script
or notebook
, the paths in your yaml files must be relative to the --source-folder
.
Pipeline steps using a runtime of type python
are able to define their own working directory that will be mounted into the step’s container instead.
For example, suppose your yaml file looked like this:
kind: pipeline
spec:
displayName: example
dag:
- (step_1, step_2) >> step_3
stepDetails:
- kind: customScript
spec:
description: A step running a notebook
name: step_1
runtime:
kind: runtime
spec:
conda:
slug: myconda_p38_cpu_v1
type: service
notebookEncoding: utf-8
notebookPathURI: step_1_files/my-notebook.ipynb
type: notebook
- kind: customScript
spec:
description: A step running a shell script
name: step_2
runtime:
kind: runtime
spec:
conda:
slug: myconda_p38_cpu_v1
type: service
scriptPathURI: step_2_files/my-script.sh
type: script
- kind: customScript
spec:
description: A step running a python script
name: step_3
runtime:
kind: runtime
spec:
conda:
slug: myconda_p38_cpu_v1
type: service
workingDir: /step_3/custom/working/dir
scriptPathURI: my-python.py
type: python
type: pipeline
And suppose the pipeline is executed locally with the following command:
ads opctl run --backend local --file my_pipeline.yaml --source-folder /my/files
step_1
uses a notebook
runtime. The container for step_1
will mount the /my/files
directory into the container. The /my/files/step_1_files/my-notebook.ipynb
notebook file
will be converted into a python script and executed in the container.
step_2
uses a script
runtime. The container for step_2
will mount the /my/files
directory into the container. The /my/files/step_2_files/my-script.sh
shell script will
be executed in the container.
step_3
uses a python
runtime. Instead of mounting the /my/files
directory specified by --source-folder
, the /step_3/custom/working/dir
directory will be mounted into the
container. The /step_3/custom/working/dir/my-python.py
script will be executed in the container.
Viewing container output and orchestration messages¶
When a container is running, you can use the docker logs
command to view its output. See https://docs.docker.com/engine/reference/commandline/logs/
Alternatively, you can use the --debug
parameter to print each container’s stdout/stderr messages to your shell. Note that Python buffers output by default, so you may see output written
to the shell in bursts. If you want to see output displayed in real-time for a particular step, specify a non-zero value for the PYTHONUNBUFFERED
environment variable in your step’s runtime
specification. For example:
- kind: customScript
spec:
description: A step running a shell script
name: step_1
runtime:
kind: runtime
spec:
conda:
slug: myconda_p38_cpu_v1
type: service
scriptPathURI: my-script.sh
env:
PYTHONUNBUFFERED: 1
type: script
Pipeline steps can run in parallel. You may want your pipeline steps to prefix their log output to easily distinguish which lines of output are coming from which step.
When the --debug
parameter is specified, the CLI will also output pipeline orchestration messages. These include messages about which steps are being started and a summary of each
step’s result when the pipeline finishes execution.