ads.jobs package

Submodules

ads.jobs.ads_job module

class ads.jobs.ads_job.Job(name: Optional[str] = None, infrastructure=None, runtime=None)

Bases: Builder

Represents a Job containing infrastructure and runtime.

Example

Here is an example for creating and running a job:

from ads.jobs import Job, DataScienceJob, PythonRuntime
# Define an OCI Data Science job to run a python script
job = (
    Job(name="<job_name>")
    .with_infrastructure(
        DataScienceJob()
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_subnet_id("<subnet_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
        .with_block_storage_size(50)
        .with_log_group_id("<log_group_ocid>")
        .with_log_id("<log_ocid>")
    )
    .with_runtime(
        ScriptRuntime()
        .with_source("oci://bucket_name@namespace/path/to/script.py")
        .with_service_conda("tensorflow26_p37_cpu_v2")
        .with_environment_variable(ENV="value")
        .with_argument("argument", key="value")
        .with_freeform_tag(tag_name="tag_value")
    )
)
# Create and Run the job
run = job.create().run()
# Stream the job run outputs
run.watch()

If you are in an OCI notebook session and you would like to use the same infrastructure configurations, the infrastructure configuration can be simplified. Here is another example of creating and running a jupyter notebook as a job:

from ads.jobs import Job, DataScienceJob, NotebookRuntime
# Define an OCI Data Science job to run a jupyter Python notebook
job = (
    Job(name="<job_name>")
    .with_infrastructure(
        # The same configurations as the OCI notebook session will be used.
        DataScienceJob()
        .with_log_group_id("<log_group_ocid>")
        .with_log_id("<log_ocid>")
    )
    .with_runtime(
        NotebookRuntime()
        .with_notebook("path/to/notebook.ipynb")
        .with_service_conda(tensorflow26_p37_cpu_v2")
        # Saves the notebook with outputs to OCI object storage.
        .with_output("oci://bucket_name@namespace/path/to/dir")
    )
).create()
# Run and monitor the job
run = job.run().watch()
# Download the notebook and outputs to local directory
run.download(to_dir="path/to/local/dir/")

ads.jobs.builders.runtimes.python_runtime module

class ads.jobs.builders.runtimes.python_runtime.CondaRuntime(spec: Optional[Dict] = None, **kwargs)

Bases: Runtime

Represents a job runtime with conda pack

Initialize the object with specifications.

User can either pass in the specification as a dictionary or through keyword arguments.

Parameters

spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.

CONST_CONDA = 'conda'

CONST_CONDA_REGION = 'region'

CONST_CONDA_SLUG = 'slug'

CONST_CONDA_TYPE = 'type'

CONST_CONDA_TYPE_CUSTOM = 'published'

CONST_CONDA_TYPE_SERVICE = 'service'

CONST_CONDA_URI = 'uri'

attribute_map = {'conda': 'conda', 'env': 'env', 'freeformTags': 'freeform_tags'}

property conda: dict

The conda pack specification

Returns: A dictionary with “type” and “slug” as keys.
Return type: dict

with_custom_conda(uri: str, region: Optional[str] = None)

Specifies the custom conda pack for running the job

Parameters

uri (str) – The OCI object storage URI for the conda pack, e.g. “oci://your_bucket@namespace/object_name.” In the Environment Explorer of an OCI notebook session, this is shown as the “source” of the conda pack.
region (str, optional) –
The region of the bucket storing the custom conda pack, by default None. If region is not specified, ADS will use the region from your authentication credentials, * For API Key, config[“region”] is used. * For Resource Principal, signer.region is used.

This is required if the conda pack is stored in a different region.

Returns

The runtime instance.

Return type

self

ads.jobs.builders.infrastructure.dataflow module

class ads.jobs.builders.infrastructure.dataflow.DataFlow(spec: Optional[dict] = None, **kwargs)

Bases: Infrastructure

Initialize the object with specifications.

User can either pass in the specification as a dictionary or through keyword arguments.

Parameters

spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.

CONST_BUCKET_URI = 'logs_bucket_uri'

CONST_COMPARTMENT_ID = 'compartment_id'

CONST_CONFIG = 'configuration'

CONST_DRIVER_SHAPE = 'driver_shape'

CONST_DRIVER_SHAPE_CONFIG = 'driver_shape_config'

CONST_EXECUTE = 'execute'

CONST_EXECUTOR_SHAPE = 'executor_shape'

CONST_EXECUTOR_SHAPE_CONFIG = 'executor_shape_config'

CONST_ID = 'id'

CONST_LANGUAGE = 'language'

CONST_MEMORY_IN_GBS = 'memory_in_gbs'

CONST_METASTORE_ID = 'metastore_id'

CONST_NUM_EXECUTORS = 'num_executors'

CONST_OCPUS = 'ocpus'

CONST_SPARK_VERSION = 'spark_version'

CONST_WAREHOUSE_BUCKET_URI = 'warehouse_bucket_uri'

attribute_map = {'compartment_id': 'compartmentId', 'configuration': 'configuration', 'driver_shape': 'driverShape', 'driver_shape_config': 'driverShapeConfig', 'execute': 'execute', 'executor_shape': 'executorShape', 'executor_shape_config': 'executorShapeConfig', 'id': 'id', 'logs_bucket_uri': 'logsBucketUri', 'memory_in_gbs': 'memoryInGBs', 'metastore_id': 'metastoreId', 'num_executors': 'numExecutors', 'ocpus': 'ocpus', 'spark_version': 'sparkVersion', 'warehouse_bucket_uri': 'warehouseBucketUri'}

create(runtime: DataFlowRuntime, **kwargs) → DataFlow

Create a Data Flow job given a runtime.

Parameters

runtime – runtime to bind to the Data Flow job
kwargs – additional keyword arguments

Returns

a Data Flow job instance

Return type

DataFlow

delete()

Delete a Data Flow job and canceling associated runs.

Return type: None

classmethod from_dict(config: dict) → DataFlow

Load a Data Flow job instance from a dictionary of configurations.

Parameters: config (dict) – dictionary of configurations
Returns: a Data Flow job instance
Return type: DataFlow

classmethod from_id(id: str) → DataFlow

Load a Data Flow job given an id.

Parameters: id (str) – id of the Data Flow job to load
Returns: a Data Flow job instance
Return type: DataFlow

property job_id: Optional[str]: The OCID of the job

classmethod list_jobs(compartment_id: Optional[str] = None, **kwargs) → List[DataFlow]

List Data Flow jobs in a given compartment.

Parameters

compartment_id (str) – id of that compartment
kwargs – additional keyword arguments for filtering jobs

Returns

list of Data Flow jobs

Return type

List[DataFlow]

property name: str: Display name of the job

run(name: Optional[str] = None, args: Optional[List[str]] = None, env_vars: Optional[Dict[str, str]] = None, freeform_tags: Optional[Dict[str, str]] = None, wait: bool = False, **kwargs) → DataFlowRun

Run a Data Flow job.

Parameters

name (str, optional) – name of the run. If a name is not provided, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
args (List[str], optional) – list of command line arguments
env_vars (Dict[str, str], optional) – dictionary of environment variables (not used for data flow)
freeform_tags (Dict[str, str], optional) – freeform tags
wait (bool, optional) – whether to wait for a run to terminate
kwargs – additional keyword arguments

Returns

a DataFlowRun instance

Return type

DataFlowRun

run_list(**kwargs) → List[DataFlowRun]

List runs associated with a Data Flow job.

Parameters: kwargs – additional arguments for filtering runs.
Returns: list of DataFlowRun instances
Return type: List[DataFlowRun]

to_dict() → dict

Serialize job to a dictionary.

Returns: serialized job as a dictionary
Return type: dict

to_yaml() → str

Serializes the object into YAML string.

Returns: YAML stored in a string.
Return type: str

with_compartment_id(id: str) → DataFlow

Set compartment id for a Data Flow job.

Parameters: id (str) – compartment id
Returns: the Data Flow instance itself
Return type: DataFlow

with_configuration(configs: dict) → DataFlow

Set configuration for a Data Flow job.

Parameters: configs (dict) – dictionary of configurations
Returns: the Data Flow instance itself
Return type: DataFlow

with_driver_shape(shape: str) → DataFlow

Set driver shape for a Data Flow job.

Parameters: shape (str) – driver shape
Returns: the Data Flow instance itself
Return type: DataFlow

with_driver_shape_config(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) → DataFlow

Sets the driver shape config details of Data Flow job infrastructure. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.

Parameters

memory_in_gbs (float) – The size of the memory in GBs.
ocpus (float) – The OCPUs count.
kwargs – Additional keyword arguments.

Returns

the Data Flow instance itself.

Return type

DataFlow

with_execute(exec: str) → DataFlow

Set command for spark-submit.

Parameters: exec (str) – str of commands
Returns: the Data Flow instance itself
Return type: DataFlow

with_executor_shape(shape: str) → DataFlow

Set executor shape for a Data Flow job.

Parameters: shape (str) – executor shape
Returns: the Data Flow instance itself
Return type: DataFlow

with_executor_shape_config(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) → DataFlow

Sets the executor shape config details of Data Flow job infrastructure. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.

Parameters

memory_in_gbs (float) – The size of the memory in GBs.
ocpus (float) – The OCPUs count.
kwargs – Additional keyword arguments.

Returns

the Data Flow instance itself.

Return type

DataFlow

with_id(id: str) → DataFlow

Set id for a Data Flow job.

Parameters: id (str) – id of a job
Returns: the Data Flow instance itself
Return type: DataFlow

with_language(lang: str) → DataFlow

Set language for a Data Flow job.

Parameters: lang (str) – language for the job
Returns: the Data Flow instance itself
Return type: DataFlow

with_logs_bucket_uri(uri: str) → DataFlow

Set logs bucket uri for a Data Flow job.

Parameters: uri (str) – uri to logs bucket
Returns: the Data Flow instance itself
Return type: DataFlow

with_metastore_id(id: str) → DataFlow

Set Hive metastore id for a Data Flow job.

Parameters: id (str) – metastore id
Returns: the Data Flow instance itself
Return type: DataFlow

with_num_executors(n: int) → DataFlow

Set number of executors for a Data Flow job.

Parameters: n (int) – number of executors
Returns: the Data Flow instance itself
Return type: DataFlow

with_spark_version(ver: str) → DataFlow

Set spark version for a Data Flow job. Currently supported versions are 2.4.4, 3.0.2 and 3.2.1 Documentation: https://docs.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#before_you_begin

Parameters: ver (str) – spark version
Returns: the Data Flow instance itself
Return type: DataFlow

with_warehouse_bucket_uri(uri: str) → DataFlow

Set warehouse bucket uri for a Data Flow job.

Parameters: uri (str) – uri to warehouse bucket
Returns: the Data Flow instance itself
Return type: DataFlow

class ads.jobs.builders.infrastructure.dataflow.DataFlowApp(config: Optional[dict] = None, signer: Optional[Signer] = None, client_kwargs: Optional[dict] = None, **kwargs)

Bases: OCIModelMixin, Application

Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,

and the authentication will be determined by OCI Python SDK.

Parameters

config (dict, optional) – OCI API key config dictionary, by default None.
signer (oci.signer.Signer, optional) – OCI authentication signer, by default None.
client_kwargs (dict, optional) – Additional keyword arguments for initializing the OCI client.

property client: DataFlowClient: OCI client

create() → DataFlowApp

Create a Data Flow application.

Returns: a DataFlowApp instance
Return type: DataFlowApp

delete() → None

Delete a Data Flow application.

Return type: None

classmethod init_client(**kwargs) → DataFlowClient

Initializes the OCI client specified in the “client” keyword argument Sub-class should override this method and call cls._init_client(client=OCI_CLIENT)

Parameters: **kwargs – Additional keyword arguments for initalizing the OCI client.
Return type: An instance of OCI client.

to_yaml() → str

Serializes the object into YAML string.

Returns: YAML stored in a string.
Return type: str

class ads.jobs.builders.infrastructure.dataflow.DataFlowLogs(run_id)

Bases: object

property application

property driver

property executor

class ads.jobs.builders.infrastructure.dataflow.DataFlowRun(config: Optional[dict] = None, signer: Optional[Signer] = None, client_kwargs: Optional[dict] = None, **kwargs)

Bases: OCIModelMixin, Run, RunInstance

Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,

and the authentication will be determined by OCI Python SDK.

Parameters

config (dict, optional) – OCI API key config dictionary, by default None.
signer (oci.signer.Signer, optional) – OCI authentication signer, by default None.
client_kwargs (dict, optional) – Additional keyword arguments for initializing the OCI client.

TERMINATED_STATES = ['CANCELED', 'FAILED', 'SUCCEEDED']

property client: DataFlowClient: OCI client

create() → DataFlowRun

Create a Data Flow run.

Returns: a DataFlowRun instance
Return type: DataFlowRun

delete() → None

Cancel a Data Flow run if it is not yet terminated.

Return type: None

classmethod init_client(**kwargs) → DataFlowClient

Initializes the OCI client specified in the “client” keyword argument Sub-class should override this method and call cls._init_client(client=OCI_CLIENT)

Parameters: **kwargs – Additional keyword arguments for initalizing the OCI client.
Return type: An instance of OCI client.

property logs: DataFlowLogs

Show logs from a run. There are three types of logs: application log, driver log and executor log, each with stdout and stderr separately. To access each type of logs, >>> dfr.logs.application.stdout >>> dfr.logs.driver.stderr

Returns: an instance of DataFlowLogs
Return type: DataFlowLogs

property run_details_link

Link to run details page in OCI console

Returns: html display
Return type: DisplayHandle

property status: str

Show status (lifecycle state) of a run.

Returns: status of the run
Return type: str

to_yaml() → str

Serializes the object into YAML string.

Returns: YAML stored in a string.
Return type: str

wait(interval: int = 3) → DataFlowRun

Wait for a run to terminate.

Parameters: interval (int, optional) – interval to wait before probing again
Returns: a DataFlowRun instance
Return type: DataFlowRun

watch(interval: int = 3) → DataFlowRun

This is an alias of wait() method. It waits for a run to terminate.

Parameters: interval (int, optional) – interval to wait before probing again
Returns: a DataFlowRun instance
Return type: DataFlowRun

ads.jobs.builders.infrastructure.dsc_job module

class ads.jobs.builders.infrastructure.dsc_job.DSCJob(artifact: Optional[Union[str, Artifact]] = None, **kwargs)

Bases: OCIDataScienceMixin, Job

Represents an OCI Data Science Job This class contains all attributes of the oci.data_science.models.Job. The main purpose of this class is to link the oci.data_science.models.Job model and the related client methods. Mainly, linking the Job model (payload) to Create/Update/Get/List/Delete methods.

A DSCJob can be initialized by unpacking a the properties stored in a dictionary (payload):

job_properties = {
    "display_name": "my_job",
    "job_infrastructure_configuration_details": {"shape_name": "VM.MY_SHAPE"}
}
job = DSCJob(**job_properties)

The properties can also be OCI REST API payload, in which the keys are in camel format.

job_payload = {
    "projectId": "<project_ocid>",
    "compartmentId": "<compartment_ocid>",
    "displayName": "<job_name>",
    "jobConfigurationDetails": {
        "jobType": "DEFAULT",
        "commandLineArguments": "pos_arg1 pos_arg2 --key1 val1 --key2 val2",
        "environmentVariables": {
            "KEY1": "VALUE1",
            "KEY2": "VALUE2",
            # User specifies conda env via env var
            "CONDA_ENV_TYPE" : "service",
            "CONDA_ENV_SLUG" : "mlcpuv1"
        }
    },
    "jobInfrastructureConfigurationDetails": {
        "jobInfrastructureType": "STANDALONE",
        "shapeName": "VM.Standard.E3.Flex",
        "jobShapeConfigDetails": {
            "memoryInGBs": 16,
            "ocpus": 1
        },
        "blockStorageSizeInGBs": "100",
        "subnetId": "<subnet_ocid>"
    }
}
job = DSCJob(**job_payload)

Initialize a DSCJob object.

Parameters

artifact (str or Artifact) – Job artifact, which can be a path or an Artifact object. Defaults to None.
kwargs – Same as kwargs in oci.data_science.models.Job. Keyword arguments are passed into OCI Job model to initialize the properties.

DEFAULT_INFRA_TYPE = 'ME_STANDALONE'

property artifact: Union[str, Artifact]

Job artifact.

Returns: When creating a job, this be a path or an Artifact object. When loading the job from OCI, this will be the filename of the job artifact.
Return type: str or Artifact

create() → DSCJob

Create the job on OCI Data Science platform

Returns: The DSCJob instance (self), which allows chaining additional method.
Return type: DSCJob

delete() → DSCJob

Deletes the job and the corresponding job runs.

Returns: The DSCJob instance (self), which allows chaining additional method.
Return type: DSCJob

download_artifact(artifact_path: str) → DSCJob

Downloads the artifact from OCI

Parameters: artifact_path (str) – Local path to store the job artifact.
Returns: The DSCJob instance (self), which allows chaining additional method.
Return type: DSCJob

classmethod from_ocid(ocid) → DSCJob

Gets a job by OCID

Parameters: ocid (str) – The OCID of the job.
Returns: An instance of DSCJob.
Return type: DSCJob

load_properties_from_env() → None: Loads default properties from the environment

run(**kwargs) → DataScienceJobRun

Runs the job

Parameters

**kwargs – Keyword arguments for initializing a Data Science Job Run. The keys can be any keys in supported by OCI JobConfigurationDetails and JobRun, including: * hyperparameter_values: dict(str, str) * environment_variables: dict(str, str) * command_line_arguments: str * maximum_runtime_in_minutes: int * display_name: str
specified (If display_name is not) –
"<JOB_NAME>-run-<TIMESTAMP>". (it will be generated as) –

Returns

An instance of DSCJobRun, which can be used to monitor the job run.

Return type

DSCJobRun

run_list(**kwargs) → list[DataScienceJobRun]

Lists the runs of this job.

Parameters: **kwargs – Keyword arguments to te passed into the OCI list_job_runs() for filtering the job runs.
Returns: A list of DSCJobRun objects
Return type: list

update() → DSCJob: Updates the Data Science Job.

upload_artifact(artifact_path: Optional[str] = None) → DSCJob

Uploads the job artifact to OCI

Parameters: artifact_path (str, optional) – Local path to the job artifact file to be uploaded, by default None. If artifact_path is None, the path in self.artifact will be used.
Returns: The DSCJob instance (self), which allows chaining additional method.
Return type: DSCJob

ads.jobs.builders.infrastructure.dsc_job.DSCJobRun: alias of DataScienceJobRun

class ads.jobs.builders.infrastructure.dsc_job.DataScienceJob(spec: Optional[Dict] = None, **kwargs)

Bases: Infrastructure

Represents the OCI Data Science Job infrastructure.

Initializes a data science job infrastructure

Parameters

spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.

CONST_BLOCK_STORAGE = 'blockStorageSize'

CONST_COMPARTMENT_ID = 'compartmentId'

CONST_DISPLAY_NAME = 'displayName'

CONST_JOB_INFRA = 'jobInfrastructureType'

CONST_JOB_TYPE = 'jobType'

CONST_LOG_GROUP_ID = 'logGroupId'

CONST_LOG_ID = 'logId'

CONST_MEMORY_IN_GBS = 'memoryInGBs'

CONST_OCPUS = 'ocpus'

CONST_PROJECT_ID = 'projectId'

CONST_SHAPE_CONFIG_DETAILS = 'shapeConfigDetails'

CONST_SHAPE_NAME = 'shapeName'

CONST_SUBNET_ID = 'subnetId'

attribute_map = {'blockStorageSize': 'block_storage_size', 'compartmentId': 'compartment_id', 'displayName': 'display_name', 'jobInfrastructureType': 'job_infrastructure_type', 'jobType': 'job_type', 'logGroupId': 'log_group_id', 'logId': 'log_id', 'projectId': 'project_id', 'shapeConfigDetails': 'shape_config_details', 'shapeName': 'shape_name', 'subnetId': 'subnet_id'}

property block_storage_size: int: Block storage size for the job

property compartment_id: Optional[str]: The compartment OCID

create(runtime, **kwargs) → DataScienceJob

Creates a job with runtime.

Parameters: runtime (Runtime) – An ADS job runtime.
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

delete() → None: Deletes a job

classmethod from_dsc_job(dsc_job: DSCJob) → DataScienceJob

Initialize a DataScienceJob instance from a DSCJob

Parameters: dsc_job (DSCJob) – An instance of DSCJob
Returns: An instance of DataScienceJob
Return type: DataScienceJob

classmethod from_id(job_id: str) → DataScienceJob

Gets an existing job using Job OCID

Parameters: job_id (str) – Job OCID
Returns: An instance of DataScienceJob
Return type: DataScienceJob

classmethod instance_shapes(compartment_id: Optional[str] = None) → list

Lists the supported shapes for running jobs in a compartment.

Parameters: compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
Returns: A list of dictionaries containing the information of the supported shapes.
Return type: list

property job_id: Optional[str]: The OCID of the job

property job_infrastructure_type: Optional[str]: Job infrastructure type

property job_type: Optional[str]: Job type

classmethod list_jobs(compartment_id: Optional[str] = None, **kwargs) → List[DataScienceJob]

Lists all jobs in a compartment.

Parameters

compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
**kwargs – Keyword arguments to be passed into OCI list_jobs API for filtering the jobs.

Returns

A list of DataScienceJob object.

Return type

List[DataScienceJob]

property log_group_id: str

Log group OCID of the data science job

Returns: Log group OCID
Return type: str

property log_id: str

Log OCID for the data science job.

Returns: Log OCID
Return type: str

property name: str: Display name of the job

payload_attribute_map = {'blockStorageSize': 'job_infrastructure_configuration_details.block_storage_size_in_gbs', 'compartmentId': 'compartment_id', 'displayName': 'display_name', 'jobInfrastructureType': 'job_infrastructure_configuration_details.job_infrastructure_type', 'jobType': 'job_configuration_details.job_type', 'logGroupId': 'job_log_configuration_details.log_group_id', 'logId': 'job_log_configuration_details.log_id', 'projectId': 'project_id', 'shapeConfigDetails': 'job_infrastructure_configuration_details.job_shape_config_details', 'shapeName': 'job_infrastructure_configuration_details.shape_name', 'subnetId': 'job_infrastructure_configuration_details.subnet_id'}

property project_id: Optional[str]: Project OCID

run(name=None, args=None, env_var=None, freeform_tags=None, wait=False) → DataScienceJobRun

Runs a job on OCI Data Science job

Parameters

name (str, optional) – The name of the job run, by default None.
args (str, optional) – Command line arguments for the job run, by default None.
env_var (dict, optional) – Environment variable for the job run, by default None
freeform_tags (dict, optional) – Freeform tags for the job run, by default None
wait (bool, optional) – Indicate if this method should wait for the run to finish before it returns, by default False.

Returns

A Data Science Job Run instance.

Return type

DSCJobRun

run_list(**kwargs) → List[DataScienceJobRun]

Gets a list of job runs.

Parameters: **kwargs – Keyword arguments for filtering the job runs. These arguments will be passed to OCI API.
Returns: A list of job runs.
Return type: List[DSCJobRun]

property shape_config_details: Dict: The details for the job run shape configuration.

shape_config_details_attribute_map = {'memoryInGBs': 'memory_in_gbs', 'ocpus': 'ocpus'}

property shape_name: Optional[str]: Shape name

snake_to_camel_map = {'block_storage_size_in_gbs': 'blockStorageSize', 'compartment_id': 'compartmentId', 'display_name': 'displayName', 'job_infrastructure_type': 'jobInfrastructureType', 'job_shape_config_details': 'shapeConfigDetails', 'job_type': 'jobType', 'log_group_id': 'logGroupId', 'log_id': 'logId', 'project_id': 'projectId', 'shape_name': 'shapeName', 'subnet_id': 'subnetId'}

static standardize_spec(spec)

property status: Optional[str]

Status of the job.

Returns: Status of the job.
Return type: str

property subnet_id: str: Subnet ID

with_block_storage_size(size_in_gb: int) → DataScienceJob

Sets the block storage size in GB

Parameters: size_in_gb (int) – Block storage size in GB
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

with_compartment_id(compartment_id: str) → DataScienceJob

Sets the compartment OCID

Parameters: compartment_id (str) – The compartment OCID
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

with_job_infrastructure_type(infrastructure_type: str) → DataScienceJob

Sets the job infrastructure type

Parameters: infrastructure_type (str) – Job infrastructure type as string
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

with_job_type(job_type: str) → DataScienceJob

Sets the job type

Parameters: job_type (str) – Job type as string
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

with_log_group_id(log_group_id: str) → DataScienceJob

Sets the log group OCID for the data science job. If log group ID is specified but log ID is not, a new log resource will be created automatically for each job run to store the logs.

Parameters: log_group_id (str) – Log Group OCID
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

with_log_id(log_id: str) → DataScienceJob

Sets the log OCID for the data science job. If log ID is specified, setting the log group ID (with_log_group_id()) is not strictly needed. ADS will look up the log group ID automatically. However, this may require additional permission, and the look up may not be available for newly created log group. Specifying both log ID (with_log_id()) and log group ID (with_log_group_id()) can avoid such lookup and speed up the job creation.

Parameters: log_id (str) – Log resource OCID.
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

with_project_id(project_id: str) → DataScienceJob

Sets the project OCID

Parameters: project_id (str) – The project OCID
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

with_shape_config_details(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) → DataScienceJob

Sets the details for the job run shape configuration. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.

Parameters

memory_in_gbs (float) – The size of the memory in GBs.
ocpus (float) – The OCPUs count.
kwargs – Additional keyword arguments.

Returns

The DataScienceJob instance (self)

Return type

DataScienceJob

with_shape_name(shape_name: str) → DataScienceJob

Sets the shape name for running the job

Parameters: shape_name (str) – Shape name
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

with_subnet_id(subnet_id: str) → DataScienceJob

Sets the subnet ID

Parameters: subnet_id (str) – Subnet ID
Returns: The DataScienceJob instance (self)
Return type: DataScienceJob

class ads.jobs.builders.infrastructure.dsc_job.DataScienceJobRun(config: Optional[dict] = None, signer: Optional[Signer] = None, client_kwargs: Optional[dict] = None, **kwargs)

Bases: OCIDataScienceMixin, JobRun, RunInstance

Represents a Data Science Job run

Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,

and the authentication will be determined by OCI Python SDK.

Parameters

config (dict, optional) – OCI API key config dictionary, by default None.
signer (oci.signer.Signer, optional) – OCI authentication signer, by default None.
client_kwargs (dict, optional) – Additional keyword arguments for initializing the OCI client.

TERMINAL_STATES = ['SUCCEEDED', 'FAILED', 'CANCELED', 'DELETED']

cancel() → DataScienceJobRun

Cancels a job run This method will wait for the job run to be canceled before returning.

Returns: The job run instance.
Return type: self

create() → DataScienceJobRun: Creates a job run

download(to_dir)

Downloads files from job run output URI to local.

Parameters: to_dir (str) – Local directory to which the files will be downloaded to.
Returns: The job run instance (self)
Return type: DataScienceJobRun

property job

The job instance of this run.

Returns: An ADS Job instance
Return type: Job

property log_group_id: str: The log group ID from OCI logging service containing the logs from the job run.

property log_id: str: The log ID from OCI logging service containing the logs from the job run.

property logging: OCILog: The OCILog object containing the logs from the job run

logs(limit: Optional[int] = None) → list

Gets the logs of the job run.

Parameters: limit (int, optional) – Limit the number of logs to be returned. Defaults to None. All logs will be returned.
Returns: A list of log records. Each log record is a dictionary with the following keys: id, time, message.
Return type: list

property status: str

Lifecycle status

Returns: Status in a string.
Return type: str

to_yaml() → str

Serializes the object into YAML string.

Returns: YAML stored in a string.
Return type: str

watch(interval: float = 3) → DataScienceJobRun

Watches the job run until it finishes. Before the job start running, this method will output the job run status. Once the job start running, the logs will be streamed until the job is success, failed or cancelled.

Parameters: interval (int) – Time interval in seconds between each request to update the logs. Defaults to 3 (seconds).

ads.jobs package

Submodules

ads.jobs.ads_job module

ads.jobs.builders.runtimes.python_runtime module

ads.jobs.builders.infrastructure.dataflow module

ads.jobs.builders.infrastructure.dsc_job module

Module contents