ads.jobs package
Submodules
ads.jobs.ads_job module
- class ads.jobs.ads_job.Job(name: Optional[str] = None, infrastructure=None, runtime=None)
Bases:
Builder
Represents a Job containing infrastructure and runtime.
Example
Here is an example for creating and running a job:
from ads.jobs import Job, DataScienceJob, PythonRuntime # Define an OCI Data Science job to run a python script job = ( Job(name="<job_name>") .with_infrastructure( DataScienceJob() .with_compartment_id("<compartment_ocid>") .with_project_id("<project_ocid>") .with_subnet_id("<subnet_ocid>") .with_shape_name("VM.Standard.E3.Flex") .with_shape_config_details(memory_in_gbs=16, ocpus=1) .with_block_storage_size(50) .with_log_group_id("<log_group_ocid>") .with_log_id("<log_ocid>") ) .with_runtime( ScriptRuntime() .with_source("oci://bucket_name@namespace/path/to/script.py") .with_service_conda("tensorflow26_p37_cpu_v2") .with_environment_variable(ENV="value") .with_argument("argument", key="value") .with_freeform_tag(tag_name="tag_value") ) ) # Create and Run the job run = job.create().run() # Stream the job run outputs run.watch()
If you are in an OCI notebook session and you would like to use the same infrastructure configurations, the infrastructure configuration can be simplified. Here is another example of creating and running a jupyter notebook as a job:
from ads.jobs import Job, DataScienceJob, NotebookRuntime # Define an OCI Data Science job to run a jupyter Python notebook job = ( Job(name="<job_name>") .with_infrastructure( # The same configurations as the OCI notebook session will be used. DataScienceJob() .with_log_group_id("<log_group_ocid>") .with_log_id("<log_ocid>") ) .with_runtime( NotebookRuntime() .with_notebook("path/to/notebook.ipynb") .with_service_conda(tensorflow26_p37_cpu_v2") # Saves the notebook with outputs to OCI object storage. .with_output("oci://bucket_name@namespace/path/to/dir") ) ).create() # Run and monitor the job run = job.run().watch() # Download the notebook and outputs to local directory run.download(to_dir="path/to/local/dir/")
See also
https
//docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/user_guide/jobs/index.html
Initializes a job.
- The infrastructure and runtime can be configured when initializing the job,
or by calling with_infrastructure() and with_runtime().
The infrastructure should be a subclass of ADS job Infrastructure, e.g., DataScienceJob, DataFlow. The runtime should be a subclass of ADS job Runtime, e.g., PythonRuntime, ScriptRuntime.
- Parameters
name (str, optional) – The name of the job, by default None. If it is None, a default name may be generated by the infrastructure, depending on the implementation of the infrastructure. For OCI data science job, the default name contains the job artifact name and a timestamp. If no artifact, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
infrastructure (Infrastructure, optional) – Job infrastructure, by default None
runtime (Runtime, optional) – Job runtime, by default None.
- create(**kwargs) Job
Creates the job on the infrastructure.
- Returns
The job instance (self)
- Return type
- static dataflow_job(compartment_id: Optional[str] = None, **kwargs) List[Job]
List data flow jobs under a given compartment.
- Parameters
compartment_id (str) – compartment id
kwargs – additional keyword arguments
- Returns
list of Job instances
- Return type
List[Job]
- static datascience_job(compartment_id: Optional[str] = None, **kwargs) List[DataScienceJob]
Lists the existing data science jobs in the compartment.
- Parameters
compartment_id (str) – The compartment ID for listing the jobs. This is optional if running in an OCI notebook session. The jobs in the same compartment of the notebook session will be returned.
- Returns
A list of Job objects.
- Return type
list
- delete() None
Deletes the job from the infrastructure.
- download(to_dir: str, output_uri=None, **storage_options)
Downloads files from remote output URI to local.
- Parameters
to_dir (str) – Local directory to which the files will be downloaded to.
output_uri ((str, optional). Default is None.) – The remote URI from which the files will be downloaded. Defaults to None. If output_uri is not specified, this method will try to get the output_uri from the runtime.
storage_options – Extra keyword arguments for particular storage connection. This method uses fsspec to download the files from remote URI. storage_options will to be passed into fsspec.open_files().
- Returns
The job instance (self)
- Return type
- Raises
AttributeError – The output_uri is not specified and the runtime is not configured with output_uri.
- static from_dataflow_job(job_id: str) Job
Create a Data Flow job given a job id.
- Parameters
job_id (str) – id of the job
- Returns
a Job instance
- Return type
- static from_datascience_job(job_id) Job
Loads a data science job from OCI.
- Parameters
job_id (str) – OCID of an existing data science job.
- Returns
A job instance.
- Return type
- classmethod from_dict(config: dict) Job
Initializes a job from a dictionary containing the configurations.
- Parameters
config (dict) – A dictionary containing the infrastructure and runtime specifications.
- Returns
A job instance
- Return type
- Raises
NotImplementedError – If the type of the intrastructure or runtime is not supported.
- property id: str
The ID of the job. For jobs running on OCI, this is the OCID.
- Returns
ID of the job.
- Return type
str
- property infrastructure: Union[DataScienceJob, DataFlow]
The job infrastructure.
- Returns
Job infrastructure.
- Return type
Infrastructure
- property kind: str
The kind of the object as showing in YAML.
- Returns
“job”
- Return type
str
- property name: str
The name of the job. For jobs running on OCI, this is the display name.
- Returns
The name of the job.
- Return type
str
- run(name=None, args=None, env_var=None, freeform_tags=None, wait=False) Union[DataScienceJobRun, DataFlowRun]
Runs the job.
- Parameters
name (str, optional) – Name of the job run, by default None. The infrastructure handles the naming of the job run. For data science job, if a name is not provided, a default name will be generated containing the job name and the timestamp of the run. If no artifact, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
args (str, optional) – Command line arguments for the job run, by default None. This will override the configurations on the job. If this is None, the args from the job configuration will be used.
env_var (dict, optional) – Additional environment variables for the job run, by default None
freeform_tags (dict, optional) – Freeform tags for the job run, by default None
wait (bool, optional) – Indicate if this method call should wait for the job run. By default False, this method returns as soon as the job run is created. If this is set to True, this method will stream the job logs and wait until it finishes, similar to job.run().watch().
- Returns
A job run instance, depending on the infrastructure.
- Return type
Job Run Instance
- run_list(**kwargs) list
Gets a list of runs of the job.
- Returns
A list of job run instances, the actual object type depends on the infrastructure.
- Return type
list
- property runtime: Runtime
The job runtime.
- Returns
The job runtime
- Return type
Runtime
- status() str
Status of the job
- Returns
Status of the job
- Return type
str
- to_dict() dict
Serialize the job specifications to a dictionary.
- Returns
A dictionary containing job specifications.
- Return type
dict
- with_infrastructure(infrastructure) Job
Sets the infrastructure for the job.
- Parameters
infrastructure (Infrastructure) – Job infrastructure.
- Returns
The job instance (self)
- Return type
ads.jobs.builders.runtimes.python_runtime module
- class ads.jobs.builders.runtimes.python_runtime.CondaRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
Runtime
Represents a job runtime with conda pack
Initialize the object with specifications.
User can either pass in the specification as a dictionary or through keyword arguments.
- Parameters
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_CONDA = 'conda'
- CONST_CONDA_REGION = 'region'
- CONST_CONDA_SLUG = 'slug'
- CONST_CONDA_TYPE = 'type'
- CONST_CONDA_TYPE_CUSTOM = 'published'
- CONST_CONDA_TYPE_SERVICE = 'service'
- CONST_CONDA_URI = 'uri'
- attribute_map = {'conda': 'conda', 'env': 'env', 'freeformTags': 'freeform_tags'}
- property conda: dict
The conda pack specification
- Returns
A dictionary with “type” and “slug” as keys.
- Return type
dict
- with_custom_conda(uri: str, region: Optional[str] = None)
Specifies the custom conda pack for running the job
- Parameters
uri (str) – The OCI object storage URI for the conda pack, e.g. “oci://your_bucket@namespace/object_name.” In the Environment Explorer of an OCI notebook session, this is shown as the “source” of the conda pack.
region (str, optional) –
The region of the bucket storing the custom conda pack, by default None. If region is not specified, ADS will use the region from your authentication credentials, * For API Key, config[“region”] is used. * For Resource Principal, signer.region is used.
This is required if the conda pack is stored in a different region.
- Returns
The runtime instance.
- Return type
self
See also
https
//docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm
- with_service_conda(slug: str)
Specifies the service conda pack for running the job
- Parameters
slug (str) – The slug name of the service conda pack
- Returns
The runtime instance.
- Return type
self
- class ads.jobs.builders.runtimes.python_runtime.DataFlowNotebookRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
DataFlowRuntime
,NotebookRuntime
Initialize the object with specifications.
User can either pass in the specification as a dictionary or through keyword arguments.
- Parameters
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- convert(overwrite=False)
- class ads.jobs.builders.runtimes.python_runtime.DataFlowRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
CondaRuntime
Initialize the object with specifications.
User can either pass in the specification as a dictionary or through keyword arguments.
- Parameters
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_ARCHIVE_BUCKET = 'archiveBucket'
- CONST_ARCHIVE_URI = 'archiveUri'
- CONST_CONDA_AUTH_TYPE = 'condaAuthType'
- CONST_CONFIGURATION = 'configuration'
- CONST_SCRIPT_BUCKET = 'scriptBucket'
- CONST_SCRIPT_PATH = 'scriptPathURI'
- property archive_bucket: str
Bucket to save archive zip
- property archive_uri
The Uri of archive zip
- attribute_map = {'archiveUri': 'archive_uri', 'condaAuthType': 'conda_auth_type', 'configuration': 'configuration', 'env': 'env', 'freeformTags': 'freeform_tags', 'scriptBucket': 'script_bucket', 'scriptPathURI': 'script_path_uri'}
- property configuration: dict
Configuration for Spark
- convert(**kwargs)
- property script_bucket: str
Bucket to save script
- property script_uri: str
The URI of the source code
- with_archive_bucket(bucket) DataFlowRuntime
Set object storage bucket to save the archive zip, in case archive uri given is local.
- Parameters
bucket (str) – name of the bucket
- Returns
runtime instance itself
- Return type
- with_archive_uri(uri: str) DataFlowRuntime
Set archive uri (which is a zip file containing dependencies).
- Parameters
uri (str) – uri to the archive zip
- Returns
runtime instance itself
- Return type
- with_conda(conda_spec: Optional[dict] = None)
- with_configuration(config: dict) DataFlowRuntime
Set Configuration for Spark.
- Parameters
config (dict) – dictionary of configuration details https://spark.apache.org/docs/latest/configuration.html#available-properties. Example: { “spark.app.name” : “My App Name”, “spark.shuffle.io.maxRetries” : “4” }
- Returns
runtime instance itself
- Return type
- with_custom_conda(uri: str, region: Optional[str] = None, auth_type: Optional[str] = None)
Specifies the custom conda pack for running the job
- Parameters
uri (str) – The OCI object storage URI for the conda pack, e.g. “oci://your_bucket@namespace/object_name.” In the Environment Explorer of an OCI notebook session, this is shown as the “source” of the conda pack.
region (str, optional) – The region of the bucket storing the custom conda pack, by default None. If region is not specified, ADS will use the region from your authentication credentials, * For API Key, config[“region”] is used. * For Resource Principal, signer.region is used. This is required if the conda pack is stored in a different region.
auth_type (str, (="resource_principal")) – One of “resource_principal”, “api_keys”, “instance_principal”, etc. Auth mechanism used to read the conda back uri provided.
- Returns
The runtime instance.
- Return type
self
See also
https
//docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm
- with_script_bucket(bucket) DataFlowRuntime
Set object storage bucket to save the script, in case script uri given is local.
- Parameters
bucket (str) – name of the bucket
- Returns
runtime instance itself
- Return type
- with_script_uri(path) DataFlowRuntime
Set script uri.
- Parameters
uri (str) – uri to the script
- Returns
runtime instance itself
- Return type
- with_service_conda(slug: str)
Specifies the service conda pack for running the job
- Parameters
slug (str) – The slug name of the service conda pack
- Returns
The runtime instance.
- Return type
self
- class ads.jobs.builders.runtimes.python_runtime.GitPythonRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
CondaRuntime
,_PythonRuntimeMixin
Represents a job runtime with source code from git repository
Initialize the object with specifications.
User can either pass in the specification as a dictionary or through keyword arguments.
- Parameters
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_BRANCH = 'branch'
- CONST_COMMIT = 'commit'
- CONST_GIT_SSH_SECRET_ID = 'gitSecretId'
- CONST_GIT_URL = 'url'
- CONST_SKIP_METADATA = 'skipMetadataUpdate'
- attribute_map = {'branch': 'branch', 'commit': 'commit', 'conda': 'conda', 'entryFunction': 'entry_function', 'entrypoint': 'entrypoint', 'env': 'env', 'freeformTags': 'freeform_tags', 'gitSecretId': 'git_secret_id', 'outputDir': 'output_dir', 'outputUri': 'output_uri', 'pythonPath': 'python_path', 'skipMetadataUpdate': 'skip_metadata_update', 'url': 'url'}
- property branch: str
Git branch name.
- property commit: str
Git commit ID (SHA1 hash)
- property skip_metadata_update
Indicate if the metadata update should be skipped after the job run
By default, the job run metadata will be updated with the following freeform tags: * repo: The URL of the Git repository * commit: The Git commit ID * module: The entry script/module * method: The entry function/method * outputs. The prefix of the output files in object storage.
This update step also requires resource principals to have the permission to update the job run.
- Returns
True if the metadata update will be skipped. Otherwise False.
- Return type
bool
- property ssh_secret_ocid
The OCID of the OCI Vault secret storing the Git SSH key.
- property url: str
URL of the Git repository.
- with_argument(*args, **kwargs)
Specifies the arguments for running the script/function.
When running a python script, the arguments will be the command line arguments. For example, with_argument(“arg1”, “arg2”, key1=”val1”, key2=”val2”) will generate the command line arguments: “arg1 arg2 –key1 val1 –key2 val2”
When running a function, the arguments will be passed into the function. Arguments can also be list, dict or any JSON serializable object. For example, with_argument(“arg1”, “arg2”, key1=[“val1a”, “val1b”], key2=”val2”) will be passed in as “your_function(“arg1”, “arg2”, key1=[“val1a”, “val1b”], key2=”val2”)
- Returns
The runtime instance.
- Return type
self
- with_source(url: str, branch: Optional[str] = None, commit: Optional[str] = None, secret_ocid: Optional[str] = None)
Specifies the Git repository and branch/commit for the job source code.
- Parameters
url (str) – URL of the Git repository.
branch (str, optional) – Git branch name, by default None, the default branch will be used.
commit (str, optional) – Git commit ID (SHA1 hash), by default None, the most recent commit will be used.
secret_ocid (str) – The secret OCID storing the SSH key content for checking out the Git repository.
- Returns
The runtime instance.
- Return type
self
- class ads.jobs.builders.runtimes.python_runtime.NotebookRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
CondaRuntime
Represents a job runtime with Jupyter notebook
Initialize the object with specifications.
User can either pass in the specification as a dictionary or through keyword arguments.
- Parameters
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_EXCLUDE_TAG = 'excludeTags'
- CONST_NOTEBOOK_ENCODING = 'notebookEncoding'
- CONST_NOTEBOOK_PATH = 'notebookPathURI'
- CONST_OUTPUT_URI = 'outputURI'
- attribute_map = {'conda': 'conda', 'env': 'env', 'excludeTags': 'exclude_tags', 'freeformTags': 'freeform_tags', 'notebookEncoding': 'notebook_encoding', 'notebookPathURI': 'notebook_path_uri', 'outputURI': 'output_uri'}
- property exclude_tag: list
A list of cell tags indicating cells to be excluded from the job
- property notebook_encoding: str
The encoding of the notebook
- property notebook_uri: str
The URI of the notebook
- property output_uri: list
URI for storing the output notebook and files
- with_exclude_tag(*tags)
Specifies the cell tags in the notebook to exclude cells from the job script.
- Parameters
*tags (list) – A list of tags (strings).
- Returns
The runtime instance.
- Return type
self
- with_notebook(path: str, encoding='utf-8')
Specifies the notebook to be converted to python script and run as a job.
- Parameters
path (str) – The path of the Jupyter notebook
- Returns
The runtime instance.
- Return type
self
- with_output(output_uri: str)
Specifies the output URI for storing the output notebook and files.
- Parameters
output_uri (str) – URI for storing the output notebook and files. For example, oci://bucket@namespace/path/to/dir
- Returns
The runtime instance.
- Return type
self
- class ads.jobs.builders.runtimes.python_runtime.PythonRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
ScriptRuntime
,_PythonRuntimeMixin
Represents a job runtime using ADS driver script to run Python code
Initialize the object with specifications.
User can either pass in the specification as a dictionary or through keyword arguments.
- Parameters
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_WORKING_DIR = 'workingDir'
- attribute_map = {'conda': 'conda', 'entryFunction': 'entry_function', 'entrypoint': 'entrypoint', 'env': 'env', 'freeformTags': 'freeform_tags', 'outputDir': 'output_dir', 'outputUri': 'output_uri', 'pythonPath': 'python_path', 'scriptPathURI': 'script_path_uri', 'workingDir': 'working_dir'}
- with_working_dir(working_dir: str)
Specifies the working directory in the job run. By default, the working directory will the directory containing the user code (job artifact directory). This can be changed by specifying a relative path to the job artifact directory.
- Parameters
working_dir (str) – The path of the working directory. This can be a relative path from the job artifact directory.
- Returns
The runtime instance.
- Return type
self
- property working_dir: str
The working directory for the job run.
- class ads.jobs.builders.runtimes.python_runtime.ScriptRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
CondaRuntime
Represents job runtime with scripts and conda pack
Initialize the object with specifications.
User can either pass in the specification as a dictionary or through keyword arguments.
- Parameters
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_ENTRYPOINT = 'entrypoint'
- CONST_SCRIPT_PATH = 'scriptPathURI'
- attribute_map = {'conda': 'conda', 'entrypoint': 'entrypoint', 'env': 'env', 'freeformTags': 'freeform_tags', 'scriptPathURI': 'script_path_uri'}
- property entrypoint: str
The relative path of the script to be set as entrypoint when source is a zip/tar/directory.
- property script_uri: str
The URI of the source code
- property source_uri: str
The URI of the source code
- with_entrypoint(entrypoint: str)
Specify the entrypoint for the job
- Parameters
entrypoint (str) – The relative path of the script to be set as entrypoint when source is a zip/tar/directory.
- Returns
The runtime instance.
- Return type
self
- with_script(uri: str)
Specifies the source code script for the job
- with_source(uri: str, entrypoint: Optional[str] = None)
Specifies the source code for the job
- Parameters
uri (str) – URI to the source code, which can be a (.py/.sh) script, a zip/tar file or directory containing the scripts/modules If the source code is a single file, URI can be any URI supported by fsspec, including http://, https:// and OCI object storage. For example: oci://your_bucket@your_namespace/path/to/script.py If the source code is a directory, only local directory is supported.
entrypoint (str, optional) – The relative path of the script to be set as entrypoint when source is a zip/tar/directory. By default None. This is not needed when the source is a single script.
- Returns
The runtime instance.
- Return type
self
ads.jobs.builders.infrastructure.dataflow module
- class ads.jobs.builders.infrastructure.dataflow.DataFlow(spec: Optional[dict] = None, **kwargs)
Bases:
Infrastructure
Initialize the object with specifications.
User can either pass in the specification as a dictionary or through keyword arguments.
- Parameters
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_BUCKET_URI = 'logs_bucket_uri'
- CONST_COMPARTMENT_ID = 'compartment_id'
- CONST_CONFIG = 'configuration'
- CONST_DRIVER_SHAPE = 'driver_shape'
- CONST_DRIVER_SHAPE_CONFIG = 'driver_shape_config'
- CONST_EXECUTE = 'execute'
- CONST_EXECUTOR_SHAPE = 'executor_shape'
- CONST_EXECUTOR_SHAPE_CONFIG = 'executor_shape_config'
- CONST_ID = 'id'
- CONST_LANGUAGE = 'language'
- CONST_MEMORY_IN_GBS = 'memory_in_gbs'
- CONST_METASTORE_ID = 'metastore_id'
- CONST_NUM_EXECUTORS = 'num_executors'
- CONST_OCPUS = 'ocpus'
- CONST_SPARK_VERSION = 'spark_version'
- CONST_WAREHOUSE_BUCKET_URI = 'warehouse_bucket_uri'
- attribute_map = {'compartment_id': 'compartmentId', 'configuration': 'configuration', 'driver_shape': 'driverShape', 'driver_shape_config': 'driverShapeConfig', 'execute': 'execute', 'executor_shape': 'executorShape', 'executor_shape_config': 'executorShapeConfig', 'id': 'id', 'logs_bucket_uri': 'logsBucketUri', 'memory_in_gbs': 'memoryInGBs', 'metastore_id': 'metastoreId', 'num_executors': 'numExecutors', 'ocpus': 'ocpus', 'spark_version': 'sparkVersion', 'warehouse_bucket_uri': 'warehouseBucketUri'}
- create(runtime: DataFlowRuntime, **kwargs) DataFlow
Create a Data Flow job given a runtime.
- Parameters
runtime – runtime to bind to the Data Flow job
kwargs – additional keyword arguments
- Returns
a Data Flow job instance
- Return type
- delete()
Delete a Data Flow job and canceling associated runs.
- Return type
None
- classmethod from_dict(config: dict) DataFlow
Load a Data Flow job instance from a dictionary of configurations.
- Parameters
config (dict) – dictionary of configurations
- Returns
a Data Flow job instance
- Return type
- classmethod from_id(id: str) DataFlow
Load a Data Flow job given an id.
- Parameters
id (str) – id of the Data Flow job to load
- Returns
a Data Flow job instance
- Return type
- property job_id: Optional[str]
The OCID of the job
- classmethod list_jobs(compartment_id: Optional[str] = None, **kwargs) List[DataFlow]
List Data Flow jobs in a given compartment.
- Parameters
compartment_id (str) – id of that compartment
kwargs – additional keyword arguments for filtering jobs
- Returns
list of Data Flow jobs
- Return type
List[DataFlow]
- property name: str
Display name of the job
- run(name: Optional[str] = None, args: Optional[List[str]] = None, env_vars: Optional[Dict[str, str]] = None, freeform_tags: Optional[Dict[str, str]] = None, wait: bool = False, **kwargs) DataFlowRun
Run a Data Flow job.
- Parameters
name (str, optional) – name of the run. If a name is not provided, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
args (List[str], optional) – list of command line arguments
env_vars (Dict[str, str], optional) – dictionary of environment variables (not used for data flow)
freeform_tags (Dict[str, str], optional) – freeform tags
wait (bool, optional) – whether to wait for a run to terminate
kwargs – additional keyword arguments
- Returns
a DataFlowRun instance
- Return type
- run_list(**kwargs) List[DataFlowRun]
List runs associated with a Data Flow job.
- Parameters
kwargs – additional arguments for filtering runs.
- Returns
list of DataFlowRun instances
- Return type
List[DataFlowRun]
- to_dict() dict
Serialize job to a dictionary.
- Returns
serialized job as a dictionary
- Return type
dict
- to_yaml() str
Serializes the object into YAML string.
- Returns
YAML stored in a string.
- Return type
str
- with_compartment_id(id: str) DataFlow
Set compartment id for a Data Flow job.
- Parameters
id (str) – compartment id
- Returns
the Data Flow instance itself
- Return type
- with_configuration(configs: dict) DataFlow
Set configuration for a Data Flow job.
- Parameters
configs (dict) – dictionary of configurations
- Returns
the Data Flow instance itself
- Return type
- with_driver_shape(shape: str) DataFlow
Set driver shape for a Data Flow job.
- Parameters
shape (str) – driver shape
- Returns
the Data Flow instance itself
- Return type
- with_driver_shape_config(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) DataFlow
Sets the driver shape config details of Data Flow job infrastructure. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.
- Parameters
memory_in_gbs (float) – The size of the memory in GBs.
ocpus (float) – The OCPUs count.
kwargs – Additional keyword arguments.
- Returns
the Data Flow instance itself.
- Return type
- with_execute(exec: str) DataFlow
Set command for spark-submit.
- Parameters
exec (str) – str of commands
- Returns
the Data Flow instance itself
- Return type
- with_executor_shape(shape: str) DataFlow
Set executor shape for a Data Flow job.
- Parameters
shape (str) – executor shape
- Returns
the Data Flow instance itself
- Return type
- with_executor_shape_config(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) DataFlow
Sets the executor shape config details of Data Flow job infrastructure. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.
- Parameters
memory_in_gbs (float) – The size of the memory in GBs.
ocpus (float) – The OCPUs count.
kwargs – Additional keyword arguments.
- Returns
the Data Flow instance itself.
- Return type
- with_id(id: str) DataFlow
Set id for a Data Flow job.
- Parameters
id (str) – id of a job
- Returns
the Data Flow instance itself
- Return type
- with_language(lang: str) DataFlow
Set language for a Data Flow job.
- Parameters
lang (str) – language for the job
- Returns
the Data Flow instance itself
- Return type
- with_logs_bucket_uri(uri: str) DataFlow
Set logs bucket uri for a Data Flow job.
- Parameters
uri (str) – uri to logs bucket
- Returns
the Data Flow instance itself
- Return type
- with_metastore_id(id: str) DataFlow
Set Hive metastore id for a Data Flow job.
- Parameters
id (str) – metastore id
- Returns
the Data Flow instance itself
- Return type
- with_num_executors(n: int) DataFlow
Set number of executors for a Data Flow job.
- Parameters
n (int) – number of executors
- Returns
the Data Flow instance itself
- Return type
- with_spark_version(ver: str) DataFlow
Set spark version for a Data Flow job. Currently supported versions are 2.4.4, 3.0.2 and 3.2.1 Documentation: https://docs.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#before_you_begin
- Parameters
ver (str) – spark version
- Returns
the Data Flow instance itself
- Return type
- class ads.jobs.builders.infrastructure.dataflow.DataFlowApp(config: Optional[dict] = None, signer: Optional[Signer] = None, client_kwargs: Optional[dict] = None, **kwargs)
Bases:
OCIModelMixin
,Application
Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,
and the authentication will be determined by OCI Python SDK.
- Parameters
config (dict, optional) – OCI API key config dictionary, by default None.
signer (oci.signer.Signer, optional) – OCI authentication signer, by default None.
client_kwargs (dict, optional) – Additional keyword arguments for initializing the OCI client.
- property client: DataFlowClient
OCI client
- create() DataFlowApp
Create a Data Flow application.
- Returns
a DataFlowApp instance
- Return type
- delete() None
Delete a Data Flow application.
- Return type
None
- classmethod init_client(**kwargs) DataFlowClient
Initializes the OCI client specified in the “client” keyword argument Sub-class should override this method and call cls._init_client(client=OCI_CLIENT)
- Parameters
**kwargs – Additional keyword arguments for initalizing the OCI client.
- Return type
An instance of OCI client.
- to_yaml() str
Serializes the object into YAML string.
- Returns
YAML stored in a string.
- Return type
str
- class ads.jobs.builders.infrastructure.dataflow.DataFlowLogs(run_id)
Bases:
object
- property application
- property driver
- property executor
- class ads.jobs.builders.infrastructure.dataflow.DataFlowRun(config: Optional[dict] = None, signer: Optional[Signer] = None, client_kwargs: Optional[dict] = None, **kwargs)
Bases:
OCIModelMixin
,Run
,RunInstance
Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,
and the authentication will be determined by OCI Python SDK.
- Parameters
config (dict, optional) – OCI API key config dictionary, by default None.
signer (oci.signer.Signer, optional) – OCI authentication signer, by default None.
client_kwargs (dict, optional) – Additional keyword arguments for initializing the OCI client.
- TERMINATED_STATES = ['CANCELED', 'FAILED', 'SUCCEEDED']
- property client: DataFlowClient
OCI client
- create() DataFlowRun
Create a Data Flow run.
- Returns
a DataFlowRun instance
- Return type
- delete() None
Cancel a Data Flow run if it is not yet terminated.
- Return type
None
- classmethod init_client(**kwargs) DataFlowClient
Initializes the OCI client specified in the “client” keyword argument Sub-class should override this method and call cls._init_client(client=OCI_CLIENT)
- Parameters
**kwargs – Additional keyword arguments for initalizing the OCI client.
- Return type
An instance of OCI client.
- property logs: DataFlowLogs
Show logs from a run. There are three types of logs: application log, driver log and executor log, each with stdout and stderr separately. To access each type of logs, >>> dfr.logs.application.stdout >>> dfr.logs.driver.stderr
- Returns
an instance of DataFlowLogs
- Return type
- property run_details_link
Link to run details page in OCI console
- Returns
html display
- Return type
DisplayHandle
- property status: str
Show status (lifecycle state) of a run.
- Returns
status of the run
- Return type
str
- to_yaml() str
Serializes the object into YAML string.
- Returns
YAML stored in a string.
- Return type
str
- wait(interval: int = 3) DataFlowRun
Wait for a run to terminate.
- Parameters
interval (int, optional) – interval to wait before probing again
- Returns
a DataFlowRun instance
- Return type
- watch(interval: int = 3) DataFlowRun
This is an alias of wait() method. It waits for a run to terminate.
- Parameters
interval (int, optional) – interval to wait before probing again
- Returns
a DataFlowRun instance
- Return type
ads.jobs.builders.infrastructure.dsc_job module
- class ads.jobs.builders.infrastructure.dsc_job.DSCJob(artifact: Optional[Union[str, Artifact]] = None, **kwargs)
Bases:
OCIDataScienceMixin
,Job
Represents an OCI Data Science Job This class contains all attributes of the oci.data_science.models.Job. The main purpose of this class is to link the oci.data_science.models.Job model and the related client methods. Mainly, linking the Job model (payload) to Create/Update/Get/List/Delete methods.
A DSCJob can be initialized by unpacking a the properties stored in a dictionary (payload):
job_properties = { "display_name": "my_job", "job_infrastructure_configuration_details": {"shape_name": "VM.MY_SHAPE"} } job = DSCJob(**job_properties)
The properties can also be OCI REST API payload, in which the keys are in camel format.
job_payload = { "projectId": "<project_ocid>", "compartmentId": "<compartment_ocid>", "displayName": "<job_name>", "jobConfigurationDetails": { "jobType": "DEFAULT", "commandLineArguments": "pos_arg1 pos_arg2 --key1 val1 --key2 val2", "environmentVariables": { "KEY1": "VALUE1", "KEY2": "VALUE2", # User specifies conda env via env var "CONDA_ENV_TYPE" : "service", "CONDA_ENV_SLUG" : "mlcpuv1" } }, "jobInfrastructureConfigurationDetails": { "jobInfrastructureType": "STANDALONE", "shapeName": "VM.Standard.E3.Flex", "jobShapeConfigDetails": { "memoryInGBs": 16, "ocpus": 1 }, "blockStorageSizeInGBs": "100", "subnetId": "<subnet_ocid>" } } job = DSCJob(**job_payload)
Initialize a DSCJob object.
- Parameters
artifact (str or Artifact) – Job artifact, which can be a path or an Artifact object. Defaults to None.
kwargs – Same as kwargs in oci.data_science.models.Job. Keyword arguments are passed into OCI Job model to initialize the properties.
- DEFAULT_INFRA_TYPE = 'ME_STANDALONE'
- property artifact: Union[str, Artifact]
Job artifact.
- Returns
When creating a job, this be a path or an Artifact object. When loading the job from OCI, this will be the filename of the job artifact.
- Return type
str or Artifact
- create() DSCJob
Create the job on OCI Data Science platform
- Returns
The DSCJob instance (self), which allows chaining additional method.
- Return type
- delete() DSCJob
Deletes the job and the corresponding job runs.
- Returns
The DSCJob instance (self), which allows chaining additional method.
- Return type
- download_artifact(artifact_path: str) DSCJob
Downloads the artifact from OCI
- Parameters
artifact_path (str) – Local path to store the job artifact.
- Returns
The DSCJob instance (self), which allows chaining additional method.
- Return type
- classmethod from_ocid(ocid) DSCJob
Gets a job by OCID
- Parameters
ocid (str) – The OCID of the job.
- Returns
An instance of DSCJob.
- Return type
- load_properties_from_env() None
Loads default properties from the environment
- run(**kwargs) DataScienceJobRun
Runs the job
- Parameters
**kwargs – Keyword arguments for initializing a Data Science Job Run. The keys can be any keys in supported by OCI JobConfigurationDetails and JobRun, including: * hyperparameter_values: dict(str, str) * environment_variables: dict(str, str) * command_line_arguments: str * maximum_runtime_in_minutes: int * display_name: str
specified (If display_name is not) –
"<JOB_NAME>-run-<TIMESTAMP>". (it will be generated as) –
- Returns
An instance of DSCJobRun, which can be used to monitor the job run.
- Return type
DSCJobRun
- run_list(**kwargs) list[DataScienceJobRun]
Lists the runs of this job.
- Parameters
**kwargs – Keyword arguments to te passed into the OCI list_job_runs() for filtering the job runs.
- Returns
A list of DSCJobRun objects
- Return type
list
- upload_artifact(artifact_path: Optional[str] = None) DSCJob
Uploads the job artifact to OCI
- Parameters
artifact_path (str, optional) – Local path to the job artifact file to be uploaded, by default None. If artifact_path is None, the path in self.artifact will be used.
- Returns
The DSCJob instance (self), which allows chaining additional method.
- Return type
- ads.jobs.builders.infrastructure.dsc_job.DSCJobRun
alias of
DataScienceJobRun
- class ads.jobs.builders.infrastructure.dsc_job.DataScienceJob(spec: Optional[Dict] = None, **kwargs)
Bases:
Infrastructure
Represents the OCI Data Science Job infrastructure.
Initializes a data science job infrastructure
- Parameters
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_BLOCK_STORAGE = 'blockStorageSize'
- CONST_COMPARTMENT_ID = 'compartmentId'
- CONST_DISPLAY_NAME = 'displayName'
- CONST_JOB_INFRA = 'jobInfrastructureType'
- CONST_JOB_TYPE = 'jobType'
- CONST_LOG_GROUP_ID = 'logGroupId'
- CONST_LOG_ID = 'logId'
- CONST_MEMORY_IN_GBS = 'memoryInGBs'
- CONST_OCPUS = 'ocpus'
- CONST_PROJECT_ID = 'projectId'
- CONST_SHAPE_CONFIG_DETAILS = 'shapeConfigDetails'
- CONST_SHAPE_NAME = 'shapeName'
- CONST_SUBNET_ID = 'subnetId'
- attribute_map = {'blockStorageSize': 'block_storage_size', 'compartmentId': 'compartment_id', 'displayName': 'display_name', 'jobInfrastructureType': 'job_infrastructure_type', 'jobType': 'job_type', 'logGroupId': 'log_group_id', 'logId': 'log_id', 'projectId': 'project_id', 'shapeConfigDetails': 'shape_config_details', 'shapeName': 'shape_name', 'subnetId': 'subnet_id'}
- property block_storage_size: int
Block storage size for the job
- property compartment_id: Optional[str]
The compartment OCID
- create(runtime, **kwargs) DataScienceJob
Creates a job with runtime.
- Parameters
runtime (Runtime) – An ADS job runtime.
- Returns
The DataScienceJob instance (self)
- Return type
- delete() None
Deletes a job
- classmethod from_dsc_job(dsc_job: DSCJob) DataScienceJob
Initialize a DataScienceJob instance from a DSCJob
- Parameters
dsc_job (DSCJob) – An instance of DSCJob
- Returns
An instance of DataScienceJob
- Return type
- classmethod from_id(job_id: str) DataScienceJob
Gets an existing job using Job OCID
- Parameters
job_id (str) – Job OCID
- Returns
An instance of DataScienceJob
- Return type
- classmethod instance_shapes(compartment_id: Optional[str] = None) list
Lists the supported shapes for running jobs in a compartment.
- Parameters
compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
- Returns
A list of dictionaries containing the information of the supported shapes.
- Return type
list
- property job_id: Optional[str]
The OCID of the job
- property job_infrastructure_type: Optional[str]
Job infrastructure type
- property job_type: Optional[str]
Job type
- classmethod list_jobs(compartment_id: Optional[str] = None, **kwargs) List[DataScienceJob]
Lists all jobs in a compartment.
- Parameters
compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
**kwargs – Keyword arguments to be passed into OCI list_jobs API for filtering the jobs.
- Returns
A list of DataScienceJob object.
- Return type
List[DataScienceJob]
- property log_group_id: str
Log group OCID of the data science job
- Returns
Log group OCID
- Return type
str
- property log_id: str
Log OCID for the data science job.
- Returns
Log OCID
- Return type
str
- property name: str
Display name of the job
- payload_attribute_map = {'blockStorageSize': 'job_infrastructure_configuration_details.block_storage_size_in_gbs', 'compartmentId': 'compartment_id', 'displayName': 'display_name', 'jobInfrastructureType': 'job_infrastructure_configuration_details.job_infrastructure_type', 'jobType': 'job_configuration_details.job_type', 'logGroupId': 'job_log_configuration_details.log_group_id', 'logId': 'job_log_configuration_details.log_id', 'projectId': 'project_id', 'shapeConfigDetails': 'job_infrastructure_configuration_details.job_shape_config_details', 'shapeName': 'job_infrastructure_configuration_details.shape_name', 'subnetId': 'job_infrastructure_configuration_details.subnet_id'}
- property project_id: Optional[str]
Project OCID
- run(name=None, args=None, env_var=None, freeform_tags=None, wait=False) DataScienceJobRun
Runs a job on OCI Data Science job
- Parameters
name (str, optional) – The name of the job run, by default None.
args (str, optional) – Command line arguments for the job run, by default None.
env_var (dict, optional) – Environment variable for the job run, by default None
freeform_tags (dict, optional) – Freeform tags for the job run, by default None
wait (bool, optional) – Indicate if this method should wait for the run to finish before it returns, by default False.
- Returns
A Data Science Job Run instance.
- Return type
DSCJobRun
- run_list(**kwargs) List[DataScienceJobRun]
Gets a list of job runs.
- Parameters
**kwargs – Keyword arguments for filtering the job runs. These arguments will be passed to OCI API.
- Returns
A list of job runs.
- Return type
List[DSCJobRun]
- property shape_config_details: Dict
The details for the job run shape configuration.
- shape_config_details_attribute_map = {'memoryInGBs': 'memory_in_gbs', 'ocpus': 'ocpus'}
- property shape_name: Optional[str]
Shape name
- snake_to_camel_map = {'block_storage_size_in_gbs': 'blockStorageSize', 'compartment_id': 'compartmentId', 'display_name': 'displayName', 'job_infrastructure_type': 'jobInfrastructureType', 'job_shape_config_details': 'shapeConfigDetails', 'job_type': 'jobType', 'log_group_id': 'logGroupId', 'log_id': 'logId', 'project_id': 'projectId', 'shape_name': 'shapeName', 'subnet_id': 'subnetId'}
- static standardize_spec(spec)
- property status: Optional[str]
Status of the job.
- Returns
Status of the job.
- Return type
str
- property subnet_id: str
Subnet ID
- with_block_storage_size(size_in_gb: int) DataScienceJob
Sets the block storage size in GB
- Parameters
size_in_gb (int) – Block storage size in GB
- Returns
The DataScienceJob instance (self)
- Return type
- with_compartment_id(compartment_id: str) DataScienceJob
Sets the compartment OCID
- Parameters
compartment_id (str) – The compartment OCID
- Returns
The DataScienceJob instance (self)
- Return type
- with_job_infrastructure_type(infrastructure_type: str) DataScienceJob
Sets the job infrastructure type
- Parameters
infrastructure_type (str) – Job infrastructure type as string
- Returns
The DataScienceJob instance (self)
- Return type
- with_job_type(job_type: str) DataScienceJob
Sets the job type
- Parameters
job_type (str) – Job type as string
- Returns
The DataScienceJob instance (self)
- Return type
- with_log_group_id(log_group_id: str) DataScienceJob
Sets the log group OCID for the data science job. If log group ID is specified but log ID is not, a new log resource will be created automatically for each job run to store the logs.
- Parameters
log_group_id (str) – Log Group OCID
- Returns
The DataScienceJob instance (self)
- Return type
- with_log_id(log_id: str) DataScienceJob
Sets the log OCID for the data science job. If log ID is specified, setting the log group ID (with_log_group_id()) is not strictly needed. ADS will look up the log group ID automatically. However, this may require additional permission, and the look up may not be available for newly created log group. Specifying both log ID (with_log_id()) and log group ID (with_log_group_id()) can avoid such lookup and speed up the job creation.
- Parameters
log_id (str) – Log resource OCID.
- Returns
The DataScienceJob instance (self)
- Return type
- with_project_id(project_id: str) DataScienceJob
Sets the project OCID
- Parameters
project_id (str) – The project OCID
- Returns
The DataScienceJob instance (self)
- Return type
- with_shape_config_details(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) DataScienceJob
Sets the details for the job run shape configuration. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.
- Parameters
memory_in_gbs (float) – The size of the memory in GBs.
ocpus (float) – The OCPUs count.
kwargs – Additional keyword arguments.
- Returns
The DataScienceJob instance (self)
- Return type
- with_shape_name(shape_name: str) DataScienceJob
Sets the shape name for running the job
- Parameters
shape_name (str) – Shape name
- Returns
The DataScienceJob instance (self)
- Return type
- with_subnet_id(subnet_id: str) DataScienceJob
Sets the subnet ID
- Parameters
subnet_id (str) – Subnet ID
- Returns
The DataScienceJob instance (self)
- Return type
- class ads.jobs.builders.infrastructure.dsc_job.DataScienceJobRun(config: Optional[dict] = None, signer: Optional[Signer] = None, client_kwargs: Optional[dict] = None, **kwargs)
Bases:
OCIDataScienceMixin
,JobRun
,RunInstance
Represents a Data Science Job run
Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,
and the authentication will be determined by OCI Python SDK.
- Parameters
config (dict, optional) – OCI API key config dictionary, by default None.
signer (oci.signer.Signer, optional) – OCI authentication signer, by default None.
client_kwargs (dict, optional) – Additional keyword arguments for initializing the OCI client.
- TERMINAL_STATES = ['SUCCEEDED', 'FAILED', 'CANCELED', 'DELETED']
- cancel() DataScienceJobRun
Cancels a job run This method will wait for the job run to be canceled before returning.
- Returns
The job run instance.
- Return type
self
- create() DataScienceJobRun
Creates a job run
- download(to_dir)
Downloads files from job run output URI to local.
- Parameters
to_dir (str) – Local directory to which the files will be downloaded to.
- Returns
The job run instance (self)
- Return type
- property log_group_id: str
The log group ID from OCI logging service containing the logs from the job run.
- property log_id: str
The log ID from OCI logging service containing the logs from the job run.
- property logging: OCILog
The OCILog object containing the logs from the job run
- logs(limit: Optional[int] = None) list
Gets the logs of the job run.
- Parameters
limit (int, optional) – Limit the number of logs to be returned. Defaults to None. All logs will be returned.
- Returns
A list of log records. Each log record is a dictionary with the following keys: id, time, message.
- Return type
list
- property status: str
Lifecycle status
- Returns
Status in a string.
- Return type
str
- to_yaml() str
Serializes the object into YAML string.
- Returns
YAML stored in a string.
- Return type
str
- watch(interval: float = 3) DataScienceJobRun
Watches the job run until it finishes. Before the job start running, this method will output the job run status. Once the job start running, the logs will be streamed until the job is success, failed or cancelled.
- Parameters
interval (int) – Time interval in seconds between each request to update the logs. Defaults to 3 (seconds).