Local Pipeline Execution#
Your pipeline can be executed locally to facilitate development and troubleshooting. Each pipeline step is executed in its own local container.
Your pipeline steps are subject to the same restrictions as local jobs.
They are also subject to these additional restrictions:
Pipeline steps must be of kind
Custom container images are not yet supported. You must use the development container image with a conda environment.
Configuring Local Pipeline Orchestrator#
ads opctl configure. Refer to the
local_backend.ini description in the configuration instructions.
max_parallel_containers controls how many pipeline steps may be executed in parallel on your machine. Your pipeline DAG may allow multiple steps to be executed in parallel,
but your local machine may not have enough cpu cores / memory to effectively run them all simultaneously.
Running your Pipeline#
Local pipeline execution requires you to define your pipeline in a yaml file. Refer to the YAML examples here.
Then, invoke the following command to run your pipeline.
ads opctl run --backend local --file my_pipeline.yaml --source-folder /path/to/my/pipeline/step/files
- Parameter explanation:
--backend local: Run the pipeline locally using docker containers.
--file my_pipeline.yaml: The yaml file defining your pipeline.
--source-folder /path/to/my/pipeline/step/files: The local directory containing the files used by your pipeline steps. This directory is mounted into the container as a volume. Defaults to the current working directory if no value is provided.
Source folder and relative paths#
If your pipeline step runtimes are of type
notebook, the paths in your yaml files must be relative to the
Pipeline steps using a runtime of type
python are able to define their own working directory that will be mounted into the step’s container instead.
For example, suppose your yaml file looked like this:
kind: pipeline spec: displayName: example dag: - (step_1, step_2) >> step_3 stepDetails: - kind: customScript spec: description: A step running a notebook name: step_1 runtime: kind: runtime spec: conda: slug: myconda_p38_cpu_v1 type: service notebookEncoding: utf-8 notebookPathURI: step_1_files/my-notebook.ipynb type: notebook - kind: customScript spec: description: A step running a shell script name: step_2 runtime: kind: runtime spec: conda: slug: myconda_p38_cpu_v1 type: service scriptPathURI: step_2_files/my-script.sh type: script - kind: customScript spec: description: A step running a python script name: step_3 runtime: kind: runtime spec: conda: slug: myconda_p38_cpu_v1 type: service workingDir: /step_3/custom/working/dir scriptPathURI: my-python.py type: python type: pipeline
And suppose the pipeline is executed locally with the following command:
ads opctl run --backend local --file my_pipeline.yaml --source-folder /my/files
step_1 uses a
notebook runtime. The container for
step_1 will mount the
/my/files directory into the container. The
/my/files/step_1_files/my-notebook.ipynb notebook file
will be converted into a python script and executed in the container.
step_2 uses a
script runtime. The container for
step_2 will mount the
/my/files directory into the container. The
/my/files/step_2_files/my-script.sh shell script will
be executed in the container.
step_3 uses a
python runtime. Instead of mounting the
/my/files directory specified by
/step_3/custom/working/dir directory will be mounted into the
/step_3/custom/working/dir/my-python.py script will be executed in the container.
Viewing container output and orchestration messages#
When a container is running, you can use the
docker logs command to view its output. See https://docs.docker.com/engine/reference/commandline/logs/
Alternatively, you can use the
--debug parameter to print each container’s stdout/stderr messages to your shell. Note that Python buffers output by default, so you may see output written
to the shell in bursts. If you want to see output displayed in real-time for a particular step, specify a non-zero value for the
PYTHONUNBUFFERED environment variable in your step’s runtime
specification. For example:
- kind: customScript spec: description: A step running a shell script name: step_1 runtime: kind: runtime spec: conda: slug: myconda_p38_cpu_v1 type: service scriptPathURI: my-script.sh env: PYTHONUNBUFFERED: 1 type: script
Pipeline steps can run in parallel. You may want your pipeline steps to prefix their log output to easily distinguish which lines of output are coming from which step.
--debug parameter is specified, the CLI will also output pipeline orchestration messages. These include messages about which steps are being started and a summary of each
step’s result when the pipeline finishes execution.