ads.jobs.builders.infrastructure package¶
Submodules¶
ads.jobs.builders.infrastructure.base module¶
- class ads.jobs.builders.infrastructure.base.Infrastructure(spec: Dict | None = None, **kwargs)[source]¶
Bases:
Builder
Base class for job infrastructure
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
- create(runtime: Runtime, **kwargs)[source]¶
Create/deploy a job on the infrastructure.
- Parameters:
runtime (Runtime) – a runtime object
kwargs (dict) – additional arguments
- property kind: str¶
Kind of the object to be stored in YAML. All runtimes will have “infrastructure” as kind. Subclass will have different types.
- classmethod list_jobs(**kwargs) list [source]¶
List jobs from the infrastructure.
- Parameters:
kwargs (keyword arguments for filtering the results)
- Returns:
list of infrastructure objects, each representing a job from the infrastructure.
- Return type:
- run(name: str | None = None, args: str | None = None, env_var: dict | None = None, freeform_tags: dict | None = None, defined_tags: dict | None = None, wait: bool = False)[source]¶
Runs a job on the infrastructure.
- Parameters:
name (str, optional) – The name of the job run, by default None
args (str, optional) – Command line arguments for the job run, by default None.
env_var (dict, optional) – Environment variable for the job run, by default None.
freeform_tags (dict, optional) – Freeform tags for the job run, by default None.
defined_tags (dict, optional) – Defined tags for the job run, by default None.
wait (bool, optional) – Indicate if this method should wait for the run to finish before it returns, by default False.
ads.jobs.builders.infrastructure.dataflow module¶
- class ads.jobs.builders.infrastructure.dataflow.DataFlow(spec: dict | None = None, **kwargs)[source]¶
Bases:
Infrastructure
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
- CONST_BUCKET_URI = 'logs_bucket_uri'¶
- CONST_COMPARTMENT_ID = 'compartment_id'¶
- CONST_CONFIG = 'configuration'¶
- CONST_DEFINED_TAGS = 'defined_tags'¶
- CONST_DRIVER_SHAPE = 'driver_shape'¶
- CONST_DRIVER_SHAPE_CONFIG = 'driver_shape_config'¶
- CONST_EXECUTE = 'execute'¶
- CONST_EXECUTOR_SHAPE = 'executor_shape'¶
- CONST_EXECUTOR_SHAPE_CONFIG = 'executor_shape_config'¶
- CONST_FREEFORM_TAGS = 'freeform_tags'¶
- CONST_ID = 'id'¶
- CONST_LANGUAGE = 'language'¶
- CONST_MEMORY_IN_GBS = 'memory_in_gbs'¶
- CONST_METASTORE_ID = 'metastore_id'¶
- CONST_NUM_EXECUTORS = 'num_executors'¶
- CONST_OCPUS = 'ocpus'¶
- CONST_POOL_ID = 'pool_id'¶
- CONST_PRIVATE_ENDPOINT_ID = 'private_endpoint_id'¶
- CONST_SPARK_VERSION = 'spark_version'¶
- CONST_WAREHOUSE_BUCKET_URI = 'warehouse_bucket_uri'¶
- attribute_map = {'compartment_id': 'compartmentId', 'configuration': 'configuration', 'defined_tags': 'definedTags', 'driver_shape': 'driverShape', 'driver_shape_config': 'driverShapeConfig', 'execute': 'execute', 'executor_shape': 'executorShape', 'executor_shape_config': 'executorShapeConfig', 'freeform_tags': 'freeformTags', 'id': 'id', 'logs_bucket_uri': 'logsBucketUri', 'memory_in_gbs': 'memoryInGBs', 'metastore_id': 'metastoreId', 'num_executors': 'numExecutors', 'ocpus': 'ocpus', 'pool_id': 'poolId', 'private_endpoint_id': 'privateEndpointId', 'spark_version': 'sparkVersion', 'warehouse_bucket_uri': 'warehouseBucketUri'}¶
- create(runtime: DataFlowRuntime, **kwargs) DataFlow [source]¶
Create a Data Flow job given a runtime.
- Parameters:
runtime – runtime to bind to the Data Flow job
kwargs – additional keyword arguments
- Returns:
a Data Flow job instance
- Return type:
- classmethod from_dict(config: dict) DataFlow [source]¶
Load a Data Flow job instance from a dictionary of configurations.
- init(**kwargs) DataFlow [source]¶
Initializes a starter specification for the DataFlow.
- Returns:
The DataFlow instance (self)
- Return type:
- classmethod list_jobs(compartment_id: str | None = None, **kwargs) List[DataFlow] [source]¶
List Data Flow jobs in a given compartment.
- run(name: str | None = None, args: List[str] | None = None, env_vars: Dict[str, str] | None = None, freeform_tags: Dict[str, str] | None = None, defined_tags: Dict[str, Dict[str, object]] | None = None, wait: bool = False, **kwargs) DataFlowRun [source]¶
Run a Data Flow job.
- Parameters:
name (str, optional) – name of the run. If a name is not provided, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
args (List[str], optional) – list of command line arguments
env_vars (Dict[str, str], optional) – dictionary of environment variables (not used for data flow)
defined_tags (Dict[str, Dict[str, object]], optional) – defined tags
wait (bool, optional) – whether to wait for a run to terminate
kwargs – additional keyword arguments
- Returns:
a DataFlowRun instance
- Return type:
- run_list(**kwargs) List[DataFlowRun] [source]¶
List runs associated with a Data Flow job.
- Parameters:
kwargs – additional arguments for filtering runs.
- Returns:
list of DataFlowRun instances
- Return type:
List[DataFlowRun]
- to_dict(**kwargs) dict [source]¶
Serialize job to a dictionary.
- Returns:
serialized job as a dictionary
- Return type:
- to_yaml(**kwargs) str [source]¶
Serializes the object into YAML string.
- Returns:
YAML stored in a string.
- Return type:
- with_defined_tag(**kwargs) DataFlow [source]¶
Sets defined tags
- Returns:
The DataFlow instance (self)
- Return type:
- with_driver_shape_config(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) DataFlow [source]¶
Sets the driver shape config details of Data Flow job infrastructure. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.
- with_executor_shape_config(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) DataFlow [source]¶
Sets the executor shape config details of Data Flow job infrastructure. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.
- with_freeform_tag(**kwargs) DataFlow [source]¶
Sets freeform tags
- Returns:
The DataFlow instance (self)
- Return type:
- with_private_endpoint_id(private_endpoint_id: str) DataFlow [source]¶
Set the private endpoint ID for a Data Flow job infrastructure.
- with_spark_version(ver: str) DataFlow [source]¶
Set spark version for a Data Flow job. Currently supported versions are 2.4.4, 3.0.2 and 3.2.1 Documentation: https://docs.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#before_you_begin
- class ads.jobs.builders.infrastructure.dataflow.DataFlowApp(config: dict | None = None, signer: Signer | None = None, client_kwargs: dict | None = None, **kwargs)[source]¶
Bases:
OCIModelMixin
,Application
Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,
and the authentication will be determined by OCI Python SDK.
- Parameters:
- property client: DataFlowClient¶
OCI client
- create() DataFlowApp [source]¶
Create a Data Flow application.
- Returns:
a DataFlowApp instance
- Return type:
- classmethod init_client(**kwargs) DataFlowClient [source]¶
Initializes the OCI client specified in the “client” keyword argument Sub-class should override this method and call cls._init_client(client=OCI_CLIENT)
- Parameters:
**kwargs – Additional keyword arguments for initializing the OCI client.
- Return type:
An instance of OCI client.
- class ads.jobs.builders.infrastructure.dataflow.DataFlowLogs(run_id)[source]¶
Bases:
object
- property application¶
- property driver¶
- property executor¶
- class ads.jobs.builders.infrastructure.dataflow.DataFlowRun(config: dict | None = None, signer: Signer | None = None, client_kwargs: dict | None = None, **kwargs)[source]¶
Bases:
OCIModelMixin
,Run
,RunInstance
Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,
and the authentication will be determined by OCI Python SDK.
- Parameters:
- TERMINATED_STATES = ['CANCELED', 'FAILED', 'SUCCEEDED']¶
- cancel() DataFlowRun [source]¶
Cancel a Data Flow run if it is not yet terminated. Will be executed synchronously.
- Returns:
The dataflow run instance.
- Return type:
self
- property client: DataFlowClient¶
OCI client
- create() DataFlowRun [source]¶
Create a Data Flow run.
- Returns:
a DataFlowRun instance
- Return type:
- delete() DataFlowRun [source]¶
Cancel and delete a Data Flow run if it is not yet terminated. Will be executed asynchronously.
- Returns:
The dataflow run instance.
- Return type:
self
- classmethod init_client(**kwargs) DataFlowClient [source]¶
Initializes the OCI client specified in the “client” keyword argument Sub-class should override this method and call cls._init_client(client=OCI_CLIENT)
- Parameters:
**kwargs – Additional keyword arguments for initializing the OCI client.
- Return type:
An instance of OCI client.
- property logs: DataFlowLogs¶
Show logs from a run. There are three types of logs: application log, driver log and executor log, each with stdout and stderr separately. To access each type of logs, >>> dfr.logs.application.stdout >>> dfr.logs.driver.stderr
- Returns:
an instance of DataFlowLogs
- Return type:
- property status: str¶
Show status (lifecycle state) of a run.
- Returns:
status of the run
- Return type:
- to_yaml() str [source]¶
Serializes the object into YAML string.
- Returns:
YAML stored in a string.
- Return type:
- wait(interval: int = 3) DataFlowRun [source]¶
Wait for a run to terminate.
- Parameters:
interval (int, optional) – interval to wait before probing again
- Returns:
a DataFlowRun instance
- Return type:
- watch(interval: int = 3) DataFlowRun [source]¶
This is an alias of wait() method. It waits for a run to terminate.
- Parameters:
interval (int, optional) – interval to wait before probing again
- Returns:
a DataFlowRun instance
- Return type:
ads.jobs.builders.infrastructure.dsc_job module¶
- class ads.jobs.builders.infrastructure.dsc_job.DSCJob(artifact: str | Artifact | None = None, **kwargs)[source]¶
Bases:
OCIDataScienceMixin
,Job
Represents an OCI Data Science Job This class contains all attributes of the oci.data_science.models.Job. The main purpose of this class is to link the oci.data_science.models.Job model and the related client methods. Mainly, linking the Job model (payload) to Create/Update/Get/List/Delete methods.
A DSCJob can be initialized by unpacking a the properties stored in a dictionary (payload):
job_properties = { "display_name": "my_job", "job_infrastructure_configuration_details": {"shape_name": "VM.MY_SHAPE"} } job = DSCJob(**job_properties)
The properties can also be OCI REST API payload, in which the keys are in camel format.
job_payload = { "projectId": "<project_ocid>", "compartmentId": "<compartment_ocid>", "displayName": "<job_name>", "jobConfigurationDetails": { "jobType": "DEFAULT", "commandLineArguments": "pos_arg1 pos_arg2 --key1 val1 --key2 val2", "environmentVariables": { "KEY1": "VALUE1", "KEY2": "VALUE2", # User specifies conda env via env var "CONDA_ENV_TYPE" : "service", "CONDA_ENV_SLUG" : "mlcpuv1" } }, "jobInfrastructureConfigurationDetails": { "jobInfrastructureType": "STANDALONE", "shapeName": "VM.Standard.E3.Flex", "jobShapeConfigDetails": { "memoryInGBs": 16, "ocpus": 1 }, "blockStorageSizeInGBs": "100", "subnetId": "<subnet_ocid>" } } job = DSCJob(**job_payload)
Initialize a DSCJob object.
- Parameters:
- CONST_DEFAULT_BLOCK_STORAGE_SIZE = 50¶
- DEFAULT_INFRA_TYPE = 'ME_STANDALONE'¶
- create() DSCJob [source]¶
Create the job on OCI Data Science platform
- Returns:
The DSCJob instance (self), which allows chaining additional method.
- Return type:
- delete(force_delete: bool = False) DSCJob [source]¶
Deletes the job and the corresponding job runs.
- Parameters:
force_delete (bool, optional, defaults to False) – the deletion fails when associated job runs are in progress, but if force_delete to true, then the job run will be canceled, then it will be deleted. In this case, delete job has to wait till job has been canceled.
- Returns:
The DSCJob instance (self), which allows chaining additional method.
- Return type:
- run(**kwargs) DataScienceJobRun [source]¶
Runs the job
- Parameters:
**kwargs – Keyword arguments for initializing a Data Science Job Run. The keys can be any keys in supported by OCI JobConfigurationDetails, OcirContainerJobEnvironmentConfigurationDetails and JobRun, including: * hyperparameter_values: dict(str, str) * environment_variables: dict(str, str) * command_line_arguments: str * maximum_runtime_in_minutes: int * display_name: str * freeform_tags: dict(str, str) * defined_tags: dict(str, dict(str, object)) * image: str * cmd: list[str] * entrypoint: list[str] * image_digest: str * image_signature_id: str
specified (If display_name is not)
"<JOB_NAME>-run-<TIMESTAMP>". (it will be generated as)
- Returns:
An instance of DSCJobRun, which can be used to monitor the job run.
- Return type:
DSCJobRun
- run_list(**kwargs) list[DataScienceJobRun] [source]¶
Lists the runs of this job.
- Parameters:
**kwargs – Keyword arguments to te passed into the OCI list_job_runs() for filtering the job runs.
- Returns:
A list of DSCJobRun objects
- Return type:
- ads.jobs.builders.infrastructure.dsc_job.DSCJobRun¶
alias of
DataScienceJobRun
- class ads.jobs.builders.infrastructure.dsc_job.DataScienceJob(spec: Dict | None = None, **kwargs)[source]¶
Bases:
Infrastructure
Represents the OCI Data Science Job infrastructure.
To configure the infrastructure for a Data Science Job:
infrastructure = ( DataScienceJob() # Configure logging for getting the job run outputs. .with_log_group_id("<log_group_ocid>") # Log resource will be auto-generated if log ID is not specified. .with_log_id("<log_ocid>") # If you are in an OCI data science notebook session, # the following configurations are not required. # Configurations from the notebook session will be used as defaults. .with_compartment_id("<compartment_ocid>") .with_project_id("<project_ocid>") .with_subnet_id("<subnet_ocid>") .with_shape_name("VM.Standard.E3.Flex") # Shape config details are applicable only for the flexible shapes. .with_shape_config_details(memory_in_gbs=16, ocpus=1) # Minimum/Default block storage size is 50 (GB). .with_block_storage_size(50) # A list of file systems to be mounted .with_storage_mount( { "src" : "<mount_target_ip_address>:<export_path>", "dest" : "<destination_directory_name>" } ) # Tags .with_freeform_tag(my_tag="my_value") .with_defined_tag(**{"Operations": {"CostCenter": "42"}}) )
Initializes a data science job infrastructure
- Parameters:
- CONST_BLOCK_STORAGE = 'blockStorageSize'¶
- CONST_COMPARTMENT_ID = 'compartmentId'¶
- CONST_DEFINED_TAGS = 'definedTags'¶
- CONST_DISPLAY_NAME = 'displayName'¶
- CONST_FREEFORM_TAGS = 'freeformTags'¶
- CONST_JOB_INFRA = 'jobInfrastructureType'¶
- CONST_JOB_TYPE = 'jobType'¶
- CONST_LOG_GROUP_ID = 'logGroupId'¶
- CONST_LOG_ID = 'logId'¶
- CONST_MEMORY_IN_GBS = 'memoryInGBs'¶
- CONST_OCPUS = 'ocpus'¶
- CONST_PROJECT_ID = 'projectId'¶
- CONST_SHAPE_CONFIG_DETAILS = 'shapeConfigDetails'¶
- CONST_SHAPE_NAME = 'shapeName'¶
- CONST_STORAGE_MOUNT = 'storageMount'¶
- CONST_SUBNET_ID = 'subnetId'¶
- attribute_map = {'blockStorageSize': 'block_storage_size', 'compartmentId': 'compartment_id', 'definedTags': 'defined_tags', 'displayName': 'display_name', 'freeformTags': 'freeform_tags', 'jobInfrastructureType': 'job_infrastructure_type', 'jobType': 'job_type', 'logGroupId': 'log_group_id', 'logId': 'log_id', 'projectId': 'project_id', 'shapeConfigDetails': 'shape_config_details', 'shapeName': 'shape_name', 'storageMount': 'storage_mount', 'subnetId': 'subnet_id'}¶
- auth = {}¶
- build() DataScienceJob [source]¶
Load default values from the environment for the job infrastructure. Should be implemented on the child level.
- create(runtime, **kwargs) DataScienceJob [source]¶
Creates a job with runtime.
- Parameters:
runtime (Runtime) – An ADS job runtime.
- Returns:
The DataScienceJob instance (self)
- Return type:
- classmethod fast_launch_shapes(compartment_id: str | None = None, **kwargs) list [source]¶
Lists the supported fast launch shapes for running jobs in a compartment.
- Parameters:
compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
- Returns:
A list of oci.data_science.models.FastLaunchJobConfigSummary objects containing the information of the supported shapes.
- Return type:
Examples
To get a list of shape names:
shapes = DataScienceJob.fast_launch_shapes( compartment_id=os.environ["PROJECT_COMPARTMENT_OCID"] ) shape_names = [shape.shape_name for shape in shapes]
- classmethod from_dsc_job(dsc_job: DSCJob) DataScienceJob [source]¶
Initialize a DataScienceJob instance from a DSCJob
- Parameters:
dsc_job (DSCJob) – An instance of DSCJob
- Returns:
An instance of DataScienceJob
- Return type:
- classmethod from_id(job_id: str) DataScienceJob [source]¶
Gets an existing job using Job OCID
- Parameters:
job_id (str) – Job OCID
- Returns:
An instance of DataScienceJob
- Return type:
- init(**kwargs) DataScienceJob [source]¶
Initializes a starter specification for the DataScienceJob.
- Returns:
The DataScienceJob instance (self)
- Return type:
- classmethod instance_shapes(compartment_id: str | None = None, **kwargs) list [source]¶
Lists the supported shapes for running jobs in a compartment.
- Parameters:
compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
- Returns:
A list of oci.data_science.models.JobShapeSummary objects containing the information of the supported shapes.
- Return type:
Examples
To get a list of shape names:
shapes = DataScienceJob.fast_launch_shapes( compartment_id=os.environ["PROJECT_COMPARTMENT_OCID"] ) shape_names = [shape.name for shape in shapes]
- classmethod list_jobs(compartment_id: str | None = None, **kwargs) List[DataScienceJob] [source]¶
Lists all jobs in a compartment.
- Parameters:
compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
**kwargs – Keyword arguments to be passed into OCI list_jobs API for filtering the jobs.
- Returns:
A list of DataScienceJob object.
- Return type:
List[DataScienceJob]
- property log_group_id: str¶
Log group OCID of the data science job
- Returns:
Log group OCID
- Return type:
- payload_attribute_map = {'blockStorageSize': 'job_infrastructure_configuration_details.block_storage_size_in_gbs', 'compartmentId': 'compartment_id', 'displayName': 'display_name', 'jobInfrastructureType': 'job_infrastructure_configuration_details.job_infrastructure_type', 'jobType': 'job_configuration_details.job_type', 'logGroupId': 'job_log_configuration_details.log_group_id', 'logId': 'job_log_configuration_details.log_id', 'projectId': 'project_id', 'shapeConfigDetails': 'job_infrastructure_configuration_details.job_shape_config_details', 'shapeName': 'job_infrastructure_configuration_details.shape_name', 'subnetId': 'job_infrastructure_configuration_details.subnet_id'}¶
- run(name=None, args=None, env_var=None, freeform_tags=None, defined_tags=None, wait=False, **kwargs) DataScienceJobRun [source]¶
Runs a job on OCI Data Science job
- Parameters:
name (str, optional) – The name of the job run, by default None.
args (str, optional) – Command line arguments for the job run, by default None.
env_var (dict, optional) – Environment variable for the job run, by default None
freeform_tags (dict, optional) – Freeform tags for the job run, by default None
defined_tags (dict, optional) – Defined tags for the job run, by default None
wait (bool, optional) – Indicate if this method should wait for the run to finish before it returns, by default False.
kwargs – additional keyword arguments
- Returns:
A Data Science Job Run instance.
- Return type:
- run_list(**kwargs) List[DataScienceJobRun] [source]¶
Gets a list of job runs.
- Parameters:
**kwargs – Keyword arguments for filtering the job runs. These arguments will be passed to OCI API.
- Returns:
A list of job runs.
- Return type:
List[DSCJobRun]
- shape_config_details_attribute_map = {'memoryInGBs': 'memory_in_gbs', 'ocpus': 'ocpus'}¶
- snake_to_camel_map = {'block_storage_size_in_gbs': 'blockStorageSize', 'compartment_id': 'compartmentId', 'display_name': 'displayName', 'job_infrastructure_type': 'jobInfrastructureType', 'job_shape_config_details': 'shapeConfigDetails', 'job_type': 'jobType', 'log_group_id': 'logGroupId', 'log_id': 'logId', 'project_id': 'projectId', 'shape_name': 'shapeName', 'subnet_id': 'subnetId'}¶
- property storage_mount: List[dict]¶
Files systems that have been mounted for the data science job
- Returns:
A list of file systems that have been mounted
- Return type:
- storage_mount_type_dict = {'FILE_STORAGE': <class 'ads.common.dsc_file_system.OCIFileStorage'>, 'OBJECT_STORAGE': <class 'ads.common.dsc_file_system.OCIObjectStorage'>}¶
- with_block_storage_size(size_in_gb: int) DataScienceJob [source]¶
Sets the block storage size in GB
- Parameters:
size_in_gb (int) – Block storage size in GB
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_compartment_id(compartment_id: str) DataScienceJob [source]¶
Sets the compartment OCID
- Parameters:
compartment_id (str) – The compartment OCID
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_defined_tag(**kwargs) DataScienceJob [source]¶
Sets defined tags
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_freeform_tag(**kwargs) DataScienceJob [source]¶
Sets freeform tags
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_job_infrastructure_type(infrastructure_type: str) DataScienceJob [source]¶
Sets the job infrastructure type
- Parameters:
infrastructure_type (str) – Job infrastructure type as string
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_job_type(job_type: str) DataScienceJob [source]¶
Sets the job type
- Parameters:
job_type (str) – Job type as string
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_log_group_id(log_group_id: str) DataScienceJob [source]¶
Sets the log group OCID for the data science job. If log group ID is specified but log ID is not, a new log resource will be created automatically for each job run to store the logs.
- Parameters:
log_group_id (str) – Log Group OCID
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_log_id(log_id: str) DataScienceJob [source]¶
Sets the log OCID for the data science job. If log ID is specified, setting the log group ID (with_log_group_id()) is not strictly needed. ADS will look up the log group ID automatically. However, this may require additional permission, and the look up may not be available for newly created log group. Specifying both log ID (with_log_id()) and log group ID (with_log_group_id()) can avoid such lookup and speed up the job creation.
- Parameters:
log_id (str) – Log resource OCID.
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_project_id(project_id: str) DataScienceJob [source]¶
Sets the project OCID
- Parameters:
project_id (str) – The project OCID
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_shape_config_details(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) DataScienceJob [source]¶
Sets the details for the job run shape configuration. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.
- Parameters:
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_shape_name(shape_name: str) DataScienceJob [source]¶
Sets the shape name for running the job
- Parameters:
shape_name (str) – Shape name
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_storage_mount(*storage_mount: List[dict]) DataScienceJob [source]¶
Sets the file systems to be mounted for the data science job. A maximum number of 5 file systems are allowed to be mounted for a single data science job.
- Parameters:
storage_mount (List[dict]) – A list of file systems to be mounted.
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_subnet_id(subnet_id: str) DataScienceJob [source]¶
Sets the subnet ID
- Parameters:
subnet_id (str) – Subnet ID
- Returns:
The DataScienceJob instance (self)
- Return type:
- class ads.jobs.builders.infrastructure.dsc_job.DataScienceJobRun(config: dict | None = None, signer: Signer | None = None, client_kwargs: dict | None = None, **kwargs)[source]¶
Bases:
OCIDataScienceMixin
,JobRun
,RunInstance
Represents a Data Science Job run
Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,
and the authentication will be determined by OCI Python SDK.
- Parameters:
- TERMINAL_STATES = ['SUCCEEDED', 'FAILED', 'CANCELED', 'DELETED']¶
- cancel(wait_for_completion: bool = True) DataScienceJobRun [source]¶
Cancels a job run
- Parameters:
wait_for_completion (bool) – Whether to wait for job run to be cancelled before proceeding. Defaults to True.
- Returns:
The job run instance.
- Return type:
self
- create() DataScienceJobRun [source]¶
Creates a job run
- download(to_dir)[source]¶
Downloads files from job run output URI to local.
- Parameters:
to_dir (str) – Local directory to which the files will be downloaded to.
- Returns:
The job run instance (self)
- Return type:
- property exit_code¶
The exit code of the job run from the lifecycle details. Note that, None will be returned if the job run is not finished or failed without exit code. 0 will be returned if job run succeeded.
- property log_group_id: str¶
The log group ID from OCI logging service containing the logs from the job run.
- to_yaml() str [source]¶
Serializes the object into YAML string.
- Returns:
YAML stored in a string.
- Return type:
- wait(interval: float = 3)[source]¶
Waits for the job run until if finishes.
- Parameters:
interval (float) – Time interval in seconds between each request to update the logs. Defaults to 3 (seconds).
- watch(interval: float = 3, wait: float = 90) DataScienceJobRun [source]¶
Watches the job run until it finishes. Before the job start running, this method will output the job run status. Once the job start running, the logs will be streamed until the job is success, failed or cancelled.
- Parameters:
interval (float) – Time interval in seconds between each request to update the logs. Defaults to 3 (seconds).
wait (float) – Time in seconds to keep updating the logs after the job run finished. It may take some time for logs to appear in OCI logging service after the job run is finished. Defaults to 90 (seconds).
ads.jobs.builders.infrastructure.dsc_job_runtime module¶
Contains classes for conversion between ADS runtime and OCI Data Science Job implementation. This module is for ADS developers only. In this module, a payload is defined as a dictionary for initializing a DSCJob object. The DSCJob can be initialized with the same arguments for initializing oci.data_science.models.Job,
plus an “artifact” argument for job artifact.
The payload also contain infrastructure information. The conversion from a runtime to a payload is called translate in this module. The conversion from a DSCJob to a runtime is called extract in this module.
- class ads.jobs.builders.infrastructure.dsc_job_runtime.CondaRuntimeHandler(data_science_job)[source]¶
Bases:
RuntimeHandler
Runtime Handler for CondaRuntime
Initialize the runtime handler.
- Parameters:
data_science_job (DataScienceJob) – An instance of the DataScienceJob to be created or extracted from.
- CONST_CONDA_BUCKET = 'CONDA_ENV_BUCKET'¶
- CONST_CONDA_NAMESPACE = 'CONDA_ENV_NAMESPACE'¶
- CONST_CONDA_OBJ_NAME = 'CONDA_ENV_OBJECT_NAME'¶
- CONST_CONDA_REGION = 'CONDA_ENV_REGION'¶
- CONST_CONDA_SLUG = 'CONDA_ENV_SLUG'¶
- CONST_CONDA_TYPE = 'CONDA_ENV_TYPE'¶
- RUNTIME_CLASS¶
alias of
CondaRuntime
- class ads.jobs.builders.infrastructure.dsc_job_runtime.ContainerRuntimeHandler(data_science_job)[source]¶
Bases:
RuntimeHandler
Initialize the runtime handler.
- Parameters:
data_science_job (DataScienceJob) – An instance of the DataScienceJob to be created or extracted from.
- CMD_DELIMITER = ','¶
- RUNTIME_CLASS¶
alias of
ContainerRuntime
- static split_args(args: str) list [source]¶
Splits the cmd or entrypoint arguments for BYOC job into a list. BYOC jobs uses environment variables to store the values of cmd and entrypoint. In the values, comma(,) is used to separate cmd or entrypoint arguments. In YAML, the arguments are formatted into a list (Exec form).
>>> ContainerRuntimeHandler.split_args("/bin/bash") ["/bin/bash"] >>> ContainerRuntimeHandler.split_args("-c,echo Hello World") ['-c', 'echo Hello World']
- class ads.jobs.builders.infrastructure.dsc_job_runtime.DataScienceJobRuntimeManager(data_science_job)[source]¶
Bases:
RuntimeHandler
This class is used by the DataScienceJob infrastructure to handle the runtime conversion. The translate() method determines the actual runtime handler by matching the RUNTIME_CLASS. The extract() method determines the actual runtime handler by checking if the runtime can be extracted. The order in runtime_handlers is used for extraction until a runtime is extracted. RuntimeHandler on the top of the list will have higher priority. If a runtime is a specify case of another runtime, the handler should be placed with higher priority.
Initialize the runtime handler.
- Parameters:
data_science_job (DataScienceJob) – An instance of the DataScienceJob to be created or extracted from.
- extract(dsc_job)[source]¶
Extract the runtime from an OCI data science job object.
This method determines the actual runtime handler by checking if the runtime can be extracted.
- runtime_handlers = [<class 'ads.jobs.builders.infrastructure.dsc_job_runtime.ContainerRuntimeHandler'>, <class 'ads.jobs.builders.infrastructure.dsc_job_runtime.PyTorchDistributedRuntimeHandler'>, <class 'ads.jobs.builders.infrastructure.dsc_job_runtime.GitPythonRuntimeHandler'>, <class 'ads.jobs.builders.infrastructure.dsc_job_runtime.NotebookRuntimeHandler'>, <class 'ads.jobs.builders.infrastructure.dsc_job_runtime.PythonRuntimeHandler'>, <class 'ads.jobs.builders.infrastructure.dsc_job_runtime.ScriptRuntimeHandler'>]¶
- class ads.jobs.builders.infrastructure.dsc_job_runtime.GitPythonRuntimeHandler(data_science_job)[source]¶
Bases:
CondaRuntimeHandler
Runtime Handler for GitPythonRuntime
Initialize the runtime handler.
- Parameters:
data_science_job (DataScienceJob) – An instance of the DataScienceJob to be created or extracted from.
- CONST_ENTRYPOINT = 'GIT_ENTRYPOINT'¶
- CONST_ENTRY_FUNCTION = 'ENTRY_FUNCTION'¶
- CONST_GIT_BRANCH = 'GIT_BRANCH'¶
- CONST_GIT_CODE_DIR = 'CODE_DIR'¶
- CONST_GIT_COMMIT = 'GIT_COMMIT'¶
- CONST_GIT_SSH_SECRET_ID = 'GIT_SECRET_OCID'¶
- CONST_GIT_URL = 'GIT_URL'¶
- CONST_JOB_ENTRYPOINT = 'JOB_RUN_ENTRYPOINT'¶
- CONST_OUTPUT_DIR = 'OUTPUT_DIR'¶
- CONST_OUTPUT_URI = 'OUTPUT_URI'¶
- CONST_PYTHON_PATH = 'PYTHON_PATH'¶
- CONST_SKIP_METADATA = 'SKIP_METADATA_UPDATE'¶
- CONST_WORKING_DIR = 'WORKING_DIR'¶
- PATH_DELIMITER = ':'¶
- RUNTIME_CLASS¶
alias of
GitPythonRuntime
- SPEC_MAPPINGS = {'branch': 'GIT_BRANCH', 'commit': 'GIT_COMMIT', 'entryFunction': 'ENTRY_FUNCTION', 'entrypoint': 'GIT_ENTRYPOINT', 'gitSecretId': 'GIT_SECRET_OCID', 'outputDir': 'OUTPUT_DIR', 'outputUri': 'OUTPUT_URI', 'pythonPath': 'PYTHON_PATH', 'url': 'GIT_URL', 'workingDir': 'WORKING_DIR'}¶
- exception ads.jobs.builders.infrastructure.dsc_job_runtime.IncompatibleRuntime[source]¶
Bases:
Exception
Represents an exception when runtime is not compatible with the OCI data science job configuration. This exception is designed to be raised during the extraction of a runtime from OCI data science job. The data science job does not explicitly contain information of the type of the ADS runtime. Each runtime handler should determine if the configuration of the job can be converted to the runtime. This exception should be raised during the extract() call if the configuration cannot be converted. The RuntimeManager uses this exception to determine if the conversion is successful.
- class ads.jobs.builders.infrastructure.dsc_job_runtime.NotebookRuntimeHandler(data_science_job)[source]¶
Bases:
CondaRuntimeHandler
Runtime Handler for NotebookRuntime
Initialize the runtime handler.
- Parameters:
data_science_job (DataScienceJob) – An instance of the DataScienceJob to be created or extracted from.
- CONST_ENTRYPOINT = 'JOB_RUN_ENTRYPOINT'¶
- CONST_EXCLUDE_TAGS = 'NOTEBOOK_EXCLUDE_TAGS'¶
- CONST_NOTEBOOK_ENCODING = 'NOTEBOOK_ENCODING'¶
- CONST_NOTEBOOK_NAME = 'JOB_RUN_NOTEBOOK'¶
- CONST_OUTPUT_URI = 'OUTPUT_URI'¶
- RUNTIME_CLASS¶
alias of
NotebookRuntime
- SPEC_MAPPINGS = {'excludeTags': 'NOTEBOOK_EXCLUDE_TAGS', 'notebookEncoding': 'NOTEBOOK_ENCODING', 'outputUri': 'OUTPUT_URI'}¶
- class ads.jobs.builders.infrastructure.dsc_job_runtime.PyTorchDistributedRuntimeHandler(data_science_job)[source]¶
Bases:
PythonRuntimeHandler
Initialize the runtime handler.
- Parameters:
data_science_job (DataScienceJob) – An instance of the DataScienceJob to be created or extracted from.
- CONST_COMMAND = 'OCI__LAUNCH_CMD'¶
- CONST_DEEPSPEED = 'OCI__DEEPSPEED'¶
- CONST_WORKER_COUNT = 'OCI__WORKER_COUNT'¶
- GIT_SPEC_MAPPINGS = {'OCI__RUNTIME_GIT_BRANCH': 'branch', 'OCI__RUNTIME_GIT_COMMIT': 'commit', 'OCI__RUNTIME_GIT_SECRET_ID': 'gitSecretId', 'OCI__RUNTIME_URI': 'url'}¶
- RUNTIME_CLASS¶
alias of
PyTorchDistributedRuntime
- SPEC_MAPPINGS = {'command': 'OCI__LAUNCH_CMD', 'entryFunction': 'ENTRY_FUNCTION', 'entrypoint': 'CODE_ENTRYPOINT', 'outputDir': 'OUTPUT_DIR', 'outputUri': 'OUTPUT_URI', 'pythonPath': 'PYTHON_PATH', 'workingDir': 'WORKING_DIR'}¶
- class ads.jobs.builders.infrastructure.dsc_job_runtime.PythonRuntimeHandler(data_science_job)[source]¶
Bases:
CondaRuntimeHandler
Runtime Handler for PythonRuntime
Initialize the runtime handler.
- Parameters:
data_science_job (DataScienceJob) – An instance of the DataScienceJob to be created or extracted from.
- CONST_CODE_ENTRYPOINT = 'CODE_ENTRYPOINT'¶
- CONST_ENTRY_FUNCTION = 'ENTRY_FUNCTION'¶
- CONST_JOB_ENTRYPOINT = 'JOB_RUN_ENTRYPOINT'¶
- CONST_OUTPUT_DIR = 'OUTPUT_DIR'¶
- CONST_OUTPUT_URI = 'OUTPUT_URI'¶
- CONST_PYTHON_PATH = 'PYTHON_PATH'¶
- CONST_WORKING_DIR = 'WORKING_DIR'¶
- PATH_DELIMITER = ':'¶
- RUNTIME_CLASS¶
alias of
PythonRuntime
- SPEC_MAPPINGS = {'command': 'OCI__LAUNCH_CMD', 'entryFunction': 'ENTRY_FUNCTION', 'entrypoint': 'CODE_ENTRYPOINT', 'outputDir': 'OUTPUT_DIR', 'outputUri': 'OUTPUT_URI', 'pythonPath': 'PYTHON_PATH', 'workingDir': 'WORKING_DIR'}¶
- class ads.jobs.builders.infrastructure.dsc_job_runtime.RuntimeHandler(data_science_job)[source]¶
Bases:
object
Base class for Runtime Handler.
Each runtime handler should define the RUNTIME_CLASS to be the runtime it can handle.
Each runtime handler is initialized with a DataScienceJob instance. This instance is a reference and the modification will be exposed to the users.
Each runtime handler expose two methods: translate() and extract(). In this class, translate or extract signals the direction of conversion. All method starts with “translate” handles the conversion from ADS runtime to OCI API payload. All method starts with “extract” handles the conversion from OCI data science Job to ADS runtime. This base class defines the default handling for translate() and extract(). Each sub-class can override the two methods to provide additional handling. Alternatively, a sub-class can also override a sub-method, which is called by the translate() or extract() method. For example, _translate_env() handles the conversion of environment variables from ADS runtime to OCI API payload.
See the individual methods for more details.
Initialize the runtime handler.
- Parameters:
data_science_job (DataScienceJob) – An instance of the DataScienceJob to be created or extracted from.
- extract(dsc_job)[source]¶
Extract the runtime from an OCI data science job object. This method calls the following sub-methods: * _extract_tags() * _extract_args() * _extract_envs() * _extract_artifact() * _extract_runtime_minutes() Each of these method returns a dict for specifying the runtime. The dictionaries are combined before initalizing the runtime. A sub-class can modify one of more of these methods.
- class ads.jobs.builders.infrastructure.dsc_job_runtime.ScriptRuntimeHandler(data_science_job)[source]¶
Bases:
CondaRuntimeHandler
Runtime Handler for ScriptRuntime
Initialize the runtime handler.
- Parameters:
data_science_job (DataScienceJob) – An instance of the DataScienceJob to be created or extracted from.
- CONST_ENTRYPOINT = 'JOB_RUN_ENTRYPOINT'¶
- RUNTIME_CLASS¶
alias of
ScriptRuntime
ads.jobs.builders.infrastructure.utils module¶
- ads.jobs.builders.infrastructure.utils.get_value(obj, attr, default=None)[source]¶
Gets a copy of the value from a nested dictionary of an object with nested attributes.
- Parameters:
obj – An object or a dictionary
attr – Attributes as a string seprated by dot(.)
default – Default value to be returned if attribute is not found.
- Returns:
A copy of the attribute value. For dict or list, a deepcopy will be returned.
- Return type:
Any