ads.jobs package

ads.jobs package

Subpackages

Submodules

ads.jobs.ads_job module

class ads.jobs.ads_job.Job(name: Optional[str] = None, infrastructure=None, runtime=None)

Bases: Builder

Represents a Job containing infrastructure and runtime.

Example

Here is an example for creating and running a job:

from ads.jobs import Job, DataScienceJob, PythonRuntime
# Define an OCI Data Science job to run a python script
job = (
    Job(name="<job_name>")
    .with_infrastructure(
        DataScienceJob()
        .with_compartment_id("<compartment_ocid>")
        .with_project_id("<project_ocid>")
        .with_subnet_id("<subnet_ocid>")
        .with_shape_name("VM.Standard.E3.Flex")
        .with_shape_config_details(memory_in_gbs=16, ocpus=1)
        .with_block_storage_size(50)
        .with_log_group_id("<log_group_ocid>")
        .with_log_id("<log_ocid>")
    )
    .with_runtime(
        ScriptRuntime()
        .with_source("oci://bucket_name@namespace/path/to/script.py")
        .with_service_conda("tensorflow26_p37_cpu_v2")
        .with_environment_variable(ENV="value")
        .with_argument("argument", key="value")
        .with_freeform_tag(tag_name="tag_value")
    )
)
# Create and Run the job
run = job.create().run()
# Stream the job run outputs
run.watch()

If you are in an OCI notebook session and you would like to use the same infrastructure configurations, the infrastructure configuration can be simplified. Here is another example of creating and running a jupyter notebook as a job:

from ads.jobs import Job, DataScienceJob, NotebookRuntime
# Define an OCI Data Science job to run a jupyter Python notebook
job = (
    Job(name="<job_name>")
    .with_infrastructure(
        # The same configurations as the OCI notebook session will be used.
        DataScienceJob()
        .with_log_group_id("<log_group_ocid>")
        .with_log_id("<log_ocid>")
    )
    .with_runtime(
        NotebookRuntime()
        .with_notebook("path/to/notebook.ipynb")
        .with_service_conda(tensorflow26_p37_cpu_v2")
        # Saves the notebook with outputs to OCI object storage.
        .with_output("oci://bucket_name@namespace/path/to/dir")
    )
).create()
# Run and monitor the job
run = job.run().watch()
# Download the notebook and outputs to local directory
run.download(to_dir="path/to/local/dir/")

ads.jobs.cli module

ads.jobs.env_var_parser module

class ads.jobs.env_var_parser.EnvVarInterpolation

Bases: ExtendedInterpolation

before_get(parser, section, option, value, defaults)

before_read(parser, section, option, value)

before_set(parser, section: str, option: str, value: str) → str

before_write(parser, section, option, value)

ads.jobs.env_var_parser.escape(s: str) → str

ads.jobs.env_var_parser.parse(env_var: Union[Dict, List[dict]]) → dict

Parse the environment variables and perform substitutions. This will also converts kubernetes style environment variables from a list to a dictionary.

Parameters:

env_var (dict or list) –

Environment variables specified as a list or a dictionary. If evn_var is a list, it should be in the format of:

”[{“name”: “ENV_NAME_1”, “value”: “ENV_VALUE_1”}, {“name”: “ENV_NAME_2”, “value”: “ENV_VALUE_2”}]

Returns:

Environment variable as a dictionary.

Return type:

dict

ads.jobs.extension module

ads.jobs.extension.dataflow(line, cell=None)

ads.jobs.extension.dataflow_log(options, args)

ads.jobs.extension.dataflow_run(options, args, cell)

ads.jobs.extension.load_ipython_extension(ipython)

ads.jobs.serializer module

class ads.jobs.serializer.Serializable

Bases: ABC

Base class that represents a serializable item.

to_dict(self) → dict: Serializes the object into a dictionary.

from_dict(cls, obj_dict) → cls: Returns an instance of the class instantiated from the dictionary provided.

_write_to_file(s, uri, \*\*kwargs): Write string s into location specified by uri

_read_from_file(uri, \*\*kwargs): Returns contents from location specified by URI

to_json(self, uri=None, \*\*kwargs): Returns object serialized as a JSON string

from_json(cls, json_string=None, uri=None, \*\*kwargs): Creates an object from JSON string provided or from URI location containing JSON string

to_yaml(self, uri=None, \*\*kwargs): Returns object serialized as a YAML string

from_yaml(cls, yaml_string=None, uri=None, \*\*kwargs): Creates an object from YAML string provided or from URI location containing YAML string

from_string(cls, obj_string=None: str, uri=None, \*\*kwargs): Creates an object from string provided or from URI location containing string

abstract classmethod from_dict(obj_dict: dict)

Returns an instance of the class instantiated by the dictionary provided

Parameters:: obj_dict (dict) – Dictionary representation of the object

classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs)

Creates an object from JSON string provided or from URI location containing JSON string

Parameters:

json_string (string, optional) – JSON string. Defaults to None.
uri (string, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.

Raises:

ValueError – Raised if neither string nor uri is provided

Returns:

Returns instance of the class

Return type:

cls

classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.cyaml.CSafeLoader'>, **kwargs)

Creates an object from string provided or from URI location containing string

Parameters:

obj_string (str, optional) – String representing the object
uri (str, optional) – URI location of file containing string. Defaults to None.
loader (callable, optional) – Custom YAML loader. Defaults to CLoader/SafeLoader.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.

Returns:

Returns instance of the class

Return type:

cls

classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.cyaml.CSafeLoader'>, **kwargs)

Creates an object from YAML string provided or from URI location containing YAML string

Parameters:

yaml_string (string, optional) – YAML string. Defaults to None.
uri (string, optional) – URI location of file containing YAML string. Defaults to None.
loader (callable, optional) – Custom YAML loader. Defaults to CLoader/SafeLoader.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.

Raises:

ValueError – Raised if neither string nor uri is provided

Returns:

Returns instance of the class

Return type:

cls

abstract to_dict() → dict: Serializes instance of class into a dictionary

to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) → str

Returns object serialized as a JSON string

Parameters:

uri (string, optional) – URI location to save the JSON string. Defaults to None.
encoder (callable, optional) – Encoder for custom data structures. Defaults to JSONEncoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.

Returns:

Serialized version of object

Return type:

string

to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.cyaml.CSafeDumper'>, **kwargs) → str

Returns object serialized as a YAML string

Parameters:

uri (string, optional) – URI location to save the YAML string. Defaults to None.
dumper (callable, optional) – Custom YAML Dumper. Defaults to CDumper/SafeDumper.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.

Returns:

Serialized version of object

Return type:

string

ads.jobs.utils module

class ads.jobs.utils.DataFlowConfig(path: Optional[str] = None, oci_profile: Optional[str] = None)

Bases: Application

Create a DataFlowConfig object. If a path to config file is given it is loaded from the path.

Parameters:

path (str, optional) – path to configuration file, by default None
oci_profile (str, optional) – oci profile to use, by default None

LANGUAGE_JAVA = 'JAVA': A constant which can be used with the language property of a Application. This constant has a value of “JAVA”

LANGUAGE_PYTHON = 'PYTHON': A constant which can be used with the language property of a Application. This constant has a value of “PYTHON”

LANGUAGE_SCALA = 'SCALA': A constant which can be used with the language property of a Application. This constant has a value of “SCALA”

LANGUAGE_SQL = 'SQL': A constant which can be used with the language property of a Application. This constant has a value of “SQL”

LIFECYCLE_STATE_ACTIVE = 'ACTIVE': A constant which can be used with the lifecycle_state property of a Application. This constant has a value of “ACTIVE”

LIFECYCLE_STATE_DELETED = 'DELETED': A constant which can be used with the lifecycle_state property of a Application. This constant has a value of “DELETED”

LIFECYCLE_STATE_INACTIVE = 'INACTIVE': A constant which can be used with the lifecycle_state property of a Application. This constant has a value of “INACTIVE”

TYPE_BATCH = 'BATCH': A constant which can be used with the type property of a Application. This constant has a value of “BATCH”

TYPE_SESSION = 'SESSION': A constant which can be used with the type property of a Application. This constant has a value of “SESSION”

TYPE_STREAMING = 'STREAMING': A constant which can be used with the type property of a Application. This constant has a value of “STREAMING”

property application_log_config

Gets the application_log_config of this Application.

Returns:: The application_log_config of this Application.
Return type:: oci.data_flow.models.ApplicationLogConfig

property archive_bucket

//<bucket-name>@<namespace>/<prefix>.

Returns:: archive bucket (path)
Return type:: str
Type:: Bucket to save archive zip. Also accept a prefix in the format of oci

property archive_uri

Gets the archive_uri of this Application. A comma separated list of one or more archive files as Oracle Cloud Infrastructure URIs. For example, oci://path/to/a.zip,oci://path/to/b.zip. An Oracle Cloud Infrastructure URI of an archive.zip file containing custom dependencies that may be used to support the execution of a Python, Java, or Scala application. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.

Returns:: The archive_uri of this Application.
Return type:: str

property arguments

Gets the arguments of this Application. The arguments passed to the running application as command line arguments. An argument is either a plain text or a placeholder. Placeholders are replaced using values from the parameters map. Each placeholder specified must be represented in the parameters map else the request (POST or PUT) will fail with a HTTP 400 status code. Placeholders are specified as Service Api Spec, where name is the name of the parameter. Example: [ “–input”, “${input_file}”, “–name”, “John Doe” ] If “input_file” has a value of “mydata.xml”, then the value above will be translated to –input mydata.xml –name “John Doe”

Returns:: The arguments of this Application.
Return type:: list[str]

property class_name

Gets the class_name of this Application. The class for the application.

Returns:: The class_name of this Application.
Return type:: str

property compartment_id

[Required] Gets the compartment_id of this Application. The OCID of a compartment.

Returns:: The compartment_id of this Application.
Return type:: str

property configuration

Gets the configuration of this Application. The Spark configuration passed to the running process. See https://spark.apache.org/docs/latest/configuration.html#available-properties. Example: { “spark.app.name” : “My App Name”, “spark.shuffle.io.maxRetries” : “4” } Note: Not all Spark properties are permitted to be set. Attempting to set a property that is not allowed to be overwritten will cause a 400 status to be returned.

Returns:: The configuration of this Application.
Return type:: dict(str, str)

property defined_tags

Gets the defined_tags of this Application. Defined tags for this resource. Each key is predefined and scoped to a namespace. For more information, see Resource Tags. Example: {“Operations”: {“CostCenter”: “42”}}

Returns:: The defined_tags of this Application.
Return type:: dict(str, dict(str, object))

property description

Gets the description of this Application. A user-friendly description.

Returns:: The description of this Application.
Return type:: str

property display_name

[Required] Gets the display_name of this Application. A user-friendly name. This name is not necessarily unique.

Returns:: The display_name of this Application.
Return type:: str

property driver_shape

[Required] Gets the driver_shape of this Application. The VM shape for the driver. Sets the driver cores and memory.

Returns:: The driver_shape of this Application.
Return type:: str

property driver_shape_config

Gets the driver_shape_config of this Application.

Returns:: The driver_shape_config of this Application.
Return type:: oci.data_flow.models.ShapeConfig

property execute

Gets the execute of this Application. The input used for spark-submit command. For more details see https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit. Supported options include --class, --file, --jars, --conf, --py-files, and main application file with arguments. Example: --jars oci://path/to/a.jar,oci://path/to/b.jar --files oci://path/to/a.json,oci://path/to/b.csv --py-files oci://path/to/a.py,oci://path/to/b.py --conf spark.sql.crossJoin.enabled=true --class org.apache.spark.examples.SparkPi oci://path/to/main.jar 10 Note: If execute is specified together with applicationId, className, configuration, fileUri, language, arguments, parameters during application create/update, or run create/submit, Data Flow service will use derived information from execute input only.

Returns:: The execute of this Application.
Return type:: str

property executor_shape

[Required] Gets the executor_shape of this Application. The VM shape for the executors. Sets the executor cores and memory.

Returns:: The executor_shape of this Application.
Return type:: str

property executor_shape_config

Gets the executor_shape_config of this Application.

Returns:: The executor_shape_config of this Application.
Return type:: oci.data_flow.models.ShapeConfig

property file_uri

[Required] Gets the file_uri of this Application. An Oracle Cloud Infrastructure URI of the file containing the application to execute. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.

Returns:: The file_uri of this Application.
Return type:: str

property freeform_tags

Gets the freeform_tags of this Application. Free-form tags for this resource. Each tag is a simple key-value pair with no predefined name, type, or namespace. For more information, see Resource Tags. Example: {“Department”: “Finance”}

Returns:: The freeform_tags of this Application.
Return type:: dict(str, str)

property id

[Required] Gets the id of this Application. The application ID.

Returns:: The id of this Application.
Return type:: str

property idle_timeout_in_minutes

Gets the idle_timeout_in_minutes of this Application. The timeout value in minutes used to manage Runs. A Run would be stopped after inactivity for this amount of time period. Note: This parameter is currently only applicable for Runs of type SESSION. Default value is 2880 minutes (2 days)

Returns:: The idle_timeout_in_minutes of this Application.
Return type:: int

property language

[Required] Gets the language of this Application. The Spark language.

Allowed values for this property are: “SCALA”, “JAVA”, “PYTHON”, “SQL”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.

Returns:: The language of this Application.
Return type:: str

property lifecycle_state

[Required] Gets the lifecycle_state of this Application. The current state of this application.

Allowed values for this property are: “ACTIVE”, “DELETED”, “INACTIVE”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.

Returns:: The lifecycle_state of this Application.
Return type:: str

property logs_bucket_uri

Gets the logs_bucket_uri of this Application. An Oracle Cloud Infrastructure URI of the bucket where the Spark job logs are to be uploaded. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.

Returns:: The logs_bucket_uri of this Application.
Return type:: str

property max_duration_in_minutes

Gets the max_duration_in_minutes of this Application. The maximum duration in minutes for which an Application should run. Data Flow Run would be terminated once it reaches this duration from the time it transitions to IN_PROGRESS state.

Returns:: The max_duration_in_minutes of this Application.
Return type:: int

property metastore_id

Gets the metastore_id of this Application. The OCID of OCI Hive Metastore.

Returns:: The metastore_id of this Application.
Return type:: str

property num_executors

[Required] Gets the num_executors of this Application. The number of executor VMs requested.

Returns:: The num_executors of this Application.
Return type:: int

property owner_principal_id

[Required] Gets the owner_principal_id of this Application. The OCID of the user who created the resource.

Returns:: The owner_principal_id of this Application.
Return type:: str

property owner_user_name

Gets the owner_user_name of this Application. The username of the user who created the resource. If the username of the owner does not exist, null will be returned and the caller should refer to the ownerPrincipalId value instead.

Returns:: The owner_user_name of this Application.
Return type:: str

property parameters

Gets the parameters of this Application. An array of name/value pairs used to fill placeholders found in properties like Application.arguments. The name must be a string of one or more word characters (a-z, A-Z, 0-9, _). The value can be a string of 0 or more characters of any kind. Example: [ { name: “iterations”, value: “10”}, { name: “input_file”, value: “mydata.xml” }, { name: “variable_x”, value: “${x}”} ]

Returns:: The parameters of this Application.
Return type:: list[oci.data_flow.models.ApplicationParameter]

property private_endpoint_id

Gets the private_endpoint_id of this Application. The OCID of a private endpoint.

Returns:: The private_endpoint_id of this Application.
Return type:: str

property script_bucket

//<bucket-name>@<namespace>/<prefix>.

Returns:: script bucket (path)
Return type:: str
Type:: Bucket to save user script. Also accept a prefix in the format of oci

property spark_version

[Required] Gets the spark_version of this Application. The Spark version utilized to run the application.

Returns:: The spark_version of this Application.
Return type:: str

property time_created

[Required] Gets the time_created of this Application. The date and time a application was created, expressed in RFC 3339 timestamp format. Example: 2018-04-03T21:10:29.600Z

Returns:: The time_created of this Application.
Return type:: datetime

property time_updated

[Required] Gets the time_updated of this Application. The date and time a application was updated, expressed in RFC 3339 timestamp format. Example: 2018-04-03T21:10:29.600Z

Returns:: The time_updated of this Application.
Return type:: datetime

property type

Gets the type of this Application. The Spark application processing type.

Allowed values for this property are: “BATCH”, “STREAMING”, “SESSION”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.

Returns:: The type of this Application.
Return type:: str

property warehouse_bucket_uri

Gets the warehouse_bucket_uri of this Application. An Oracle Cloud Infrastructure URI of the bucket to be used as default warehouse directory for BATCH SQL runs. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.

Returns:: The warehouse_bucket_uri of this Application.
Return type:: str

ads.jobs.utils.get_dataflow_config(path=None, oci_profile=None)

ads.jobs package

Subpackages

Submodules

ads.jobs.ads_job module

ads.jobs.cli module

ads.jobs.env_var_parser module

ads.jobs.extension module

ads.jobs.serializer module

ads.jobs.utils module

Module contents