ads.jobs package
Subpackages
Submodules
ads.jobs.ads_job module
- class ads.jobs.ads_job.Job(name: Optional[str] = None, infrastructure=None, runtime=None)
Bases:
Builder
Represents a Job defined by infrastructure and runtime.
Examples
Here is an example for creating and running a job:
from ads.jobs import Job, DataScienceJob, PythonRuntime # Define an OCI Data Science job to run a python script job = ( Job(name="<job_name>") .with_infrastructure( DataScienceJob() # Configure logging for getting the job run outputs. .with_log_group_id("<log_group_ocid>") # Log resource will be auto-generated if log ID is not specified. .with_log_id("<log_ocid>") # If you are in an OCI data science notebook session, # the following configurations are not required. # Configurations from the notebook session will be used as defaults. .with_compartment_id("<compartment_ocid>") .with_project_id("<project_ocid>") .with_subnet_id("<subnet_ocid>") .with_shape_name("VM.Standard.E3.Flex") # Shape config details are applicable only for the flexible shapes. .with_shape_config_details(memory_in_gbs=16, ocpus=1) # Minimum/Default block storage size is 50 (GB). .with_block_storage_size(50) ) .with_runtime( PythonRuntime() # Specify the service conda environment by slug name. .with_service_conda("pytorch110_p38_cpu_v1") # The job artifact can be a single Python script, a directory or a zip file. .with_source("local/path/to/code_dir") # Environment variable .with_environment_variable(NAME="Welcome to OCI Data Science.") # Command line argument, arg1 --key arg2 .with_argument("arg1", key="arg2") # Set the working directory # When using a directory as source, the default working dir is the parent of code_dir. # Working dir should be a relative path beginning from the source directory (code_dir) .with_working_dir("code_dir") # The entrypoint is applicable only to directory or zip file as source # The entrypoint should be a path relative to the working dir. # Here my_script.py is a file in the code_dir/my_package directory .with_entrypoint("my_package/my_script.py") # Add an additional Python path, relative to the working dir (code_dir/other_packages). .with_python_path("other_packages") # Copy files in "code_dir/output" to object storage after job finishes. .with_output("output", "oci://bucket_name@namespace/path/to/dir") ) ) # Create and Run the job run = job.create().run() # Stream the job run outputs run.watch()
If you are in an OCI notebook session and you would like to use the same infrastructure configurations, the infrastructure configuration can be simplified. Here is another example of creating and running a jupyter notebook as a job:
from ads.jobs import Job, DataScienceJob, NotebookRuntime # Define an OCI Data Science job to run a jupyter Python notebook job = ( Job(name="<job_name>") .with_infrastructure( # The same configurations as the OCI notebook session will be used. DataScienceJob() .with_log_group_id("<log_group_ocid>") .with_log_id("<log_ocid>") ) .with_runtime( NotebookRuntime() .with_notebook("path/to/notebook.ipynb") .with_service_conda(tensorflow28_p38_cpu_v1") # Saves the notebook with outputs to OCI object storage. .with_output("oci://bucket_name@namespace/path/to/dir") ) ).create() # Run and monitor the job run = job.run().watch() # Download the notebook and outputs to local directory run.download(to_dir="path/to/local/dir/")
See also
https
//docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/user_guide/jobs/index.html
Initializes a job.
- The infrastructure and runtime can be configured when initializing the job,
or by calling with_infrastructure() and with_runtime().
The infrastructure should be a subclass of ADS job Infrastructure, e.g., DataScienceJob, DataFlow. The runtime should be a subclass of ADS job Runtime, e.g., PythonRuntime, NotebookRuntime.
- Parameters:
name (str, optional) – The name of the job, by default None. If it is None, a default name may be generated by the infrastructure, depending on the implementation of the infrastructure. For OCI data science job, the default name contains the job artifact name and a timestamp. If no artifact, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
infrastructure (Infrastructure, optional) – Job infrastructure, by default None
runtime (Runtime, optional) – Job runtime, by default None.
- attribute_map = {}
- create(**kwargs) Job
Creates the job on the infrastructure.
- Returns:
The job instance (self)
- Return type:
- static dataflow_job(compartment_id: Optional[str] = None, **kwargs) List[Job]
List data flow jobs under a given compartment.
- Parameters:
compartment_id (str) – compartment id
kwargs – additional keyword arguments
- Returns:
list of Job instances
- Return type:
List[Job]
- static datascience_job(compartment_id: Optional[str] = None, **kwargs) List[DataScienceJob]
Lists the existing data science jobs in the compartment.
- Parameters:
compartment_id (str) – The compartment ID for listing the jobs. This is optional if running in an OCI notebook session. The jobs in the same compartment of the notebook session will be returned.
- Returns:
A list of Job objects.
- Return type:
list
- delete() None
Deletes the job from the infrastructure.
- download(to_dir: str, output_uri=None, **storage_options)
Downloads files from remote output URI to local.
- Parameters:
to_dir (str) – Local directory to which the files will be downloaded to.
output_uri ((str, optional). Default is None.) – The remote URI from which the files will be downloaded. Defaults to None. If output_uri is not specified, this method will try to get the output_uri from the runtime.
storage_options – Extra keyword arguments for particular storage connection. This method uses fsspec to download the files from remote URI. storage_options will to be passed into fsspec.open_files().
- Returns:
The job instance (self)
- Return type:
- Raises:
AttributeError – The output_uri is not specified and the runtime is not configured with output_uri.
- static from_dataflow_job(job_id: str) Job
Create a Data Flow job given a job id.
- Parameters:
job_id (str) – id of the job
- Returns:
a Job instance
- Return type:
- static from_datascience_job(job_id) Job
Loads a data science job from OCI.
- Parameters:
job_id (str) – OCID of an existing data science job.
- Returns:
A job instance.
- Return type:
- classmethod from_dict(config: dict) Job
Initializes a job from a dictionary containing the configurations.
- Parameters:
config (dict) – A dictionary containing the infrastructure and runtime specifications.
- Returns:
A job instance
- Return type:
- Raises:
NotImplementedError – If the type of the intrastructure or runtime is not supported.
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property id: str
The ID of the job. For jobs running on OCI, this is the OCID.
- Returns:
ID of the job.
- Return type:
str
- property infrastructure: Union[DataScienceJob, DataFlow]
The job infrastructure.
- Returns:
Job infrastructure.
- Return type:
- property kind: str
The kind of the object as showing in YAML.
- Returns:
“job”
- Return type:
str
- property name: str
The name of the job. For jobs running on OCI, this is the display name.
- Returns:
The name of the job.
- Return type:
str
- run(name=None, args=None, env_var=None, freeform_tags=None, wait=False) Union[DataScienceJobRun, DataFlowRun]
Runs the job.
- Parameters:
name (str, optional) – Name of the job run, by default None. The infrastructure handles the naming of the job run. For data science job, if a name is not provided, a default name will be generated containing the job name and the timestamp of the run. If no artifact, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
args (str, optional) – Command line arguments for the job run, by default None. This will override the configurations on the job. If this is None, the args from the job configuration will be used.
env_var (dict, optional) – Additional environment variables for the job run, by default None
freeform_tags (dict, optional) – Freeform tags for the job run, by default None
wait (bool, optional) – Indicate if this method call should wait for the job run. By default False, this method returns as soon as the job run is created. If this is set to True, this method will stream the job logs and wait until it finishes, similar to job.run().watch().
- Returns:
A job run instance, depending on the infrastructure.
- Return type:
Job Run Instance
Examples
To run a job and override the configurations:
job_run = job.run( name="<my_job_run_name>", args="new_arg --new_key new_val", env_var={"new_env": "new_val"}, freeform_tags={"new_tag": "new_tag_val"} )
- run_list(**kwargs) list
Gets a list of runs of the job.
- Returns:
A list of job run instances, the actual object type depends on the infrastructure.
- Return type:
list
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- status() str
Status of the job
- Returns:
Status of the job
- Return type:
str
- to_dict() dict
Serialize the job specifications to a dictionary.
- Returns:
A dictionary containing job specifications.
- Return type:
dict
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML.
This implementation returns the class name with the first letter coverted to lower case.
- with_infrastructure(infrastructure) Job
Sets the infrastructure for the job.
- Parameters:
infrastructure (Infrastructure) – Job infrastructure.
- Returns:
The job instance (self)
- Return type:
ads.jobs.cli module
ads.jobs.env_var_parser module
- class ads.jobs.env_var_parser.EnvVarInterpolation
Bases:
ExtendedInterpolation
Modified version of ExtendedInterpolation to ignore errors
https://github.com/python/cpython/blob/main/Lib/configparser.py
- before_get(parser, section, option, value, defaults)
- before_read(parser, section, option, value)
- before_set(parser, section: str, option: str, value: str) str
- before_write(parser, section, option, value)
- ads.jobs.env_var_parser.escape(s: str) str
- ads.jobs.env_var_parser.parse(env_var: Union[Dict, List[dict]]) dict
Parse the environment variables and perform substitutions. This will also converts kubernetes style environment variables from a list to a dictionary.
- Parameters:
env_var (dict or list) –
Environment variables specified as a list or a dictionary. If evn_var is a list, it should be in the format of:
”[{“name”: “ENV_NAME_1”, “value”: “ENV_VALUE_1”}, {“name”: “ENV_NAME_2”, “value”: “ENV_VALUE_2”}]
- Returns:
Environment variable as a dictionary.
- Return type:
dict
ads.jobs.extension module
- ads.jobs.extension.dataflow(line, cell=None)
- ads.jobs.extension.dataflow_log(options, args)
- ads.jobs.extension.dataflow_run(options, args, cell)
- ads.jobs.extension.load_ipython_extension(ipython)
ads.jobs.serializer module
- ads.jobs.serializer.Self
Special type to represent the current enclosed class.
This type is used by factory class method or when a method returns
self
.alias of TypeVar(‘Self’, bound=
Serializable
)
- class ads.jobs.serializer.Serializable
Bases:
ABC
Base class that represents a serializable item.
- abstract classmethod from_dict(obj_dict: dict)
Initialize an instance of the class from a dictionary
- Parameters:
obj_dict (dict) – Dictionary representation of the object
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- abstract to_dict() dict
Serializes an instance of class into a dictionary
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
ads.jobs.utils module
- class ads.jobs.utils.DataFlowConfig(path: Optional[str] = None, oci_profile: Optional[str] = None)
Bases:
Application
Create a DataFlowConfig object. If a path to config file is given it is loaded from the path.
- Parameters:
path (str, optional) – path to configuration file, by default None
oci_profile (str, optional) – oci profile to use, by default None
- LANGUAGE_JAVA = 'JAVA'
A constant which can be used with the language property of a Application. This constant has a value of “JAVA”
- LANGUAGE_PYTHON = 'PYTHON'
A constant which can be used with the language property of a Application. This constant has a value of “PYTHON”
- LANGUAGE_SCALA = 'SCALA'
A constant which can be used with the language property of a Application. This constant has a value of “SCALA”
- LANGUAGE_SQL = 'SQL'
A constant which can be used with the language property of a Application. This constant has a value of “SQL”
- LIFECYCLE_STATE_ACTIVE = 'ACTIVE'
A constant which can be used with the lifecycle_state property of a Application. This constant has a value of “ACTIVE”
- LIFECYCLE_STATE_DELETED = 'DELETED'
A constant which can be used with the lifecycle_state property of a Application. This constant has a value of “DELETED”
- LIFECYCLE_STATE_INACTIVE = 'INACTIVE'
A constant which can be used with the lifecycle_state property of a Application. This constant has a value of “INACTIVE”
- TYPE_BATCH = 'BATCH'
A constant which can be used with the type property of a Application. This constant has a value of “BATCH”
- TYPE_SESSION = 'SESSION'
A constant which can be used with the type property of a Application. This constant has a value of “SESSION”
- TYPE_STREAMING = 'STREAMING'
A constant which can be used with the type property of a Application. This constant has a value of “STREAMING”
- property application_log_config
Gets the application_log_config of this Application.
- Returns:
The application_log_config of this Application.
- Return type:
oci.data_flow.models.ApplicationLogConfig
- property archive_bucket
//<bucket-name>@<namespace>/<prefix>.
- Returns:
archive bucket (path)
- Return type:
str
- Type:
Bucket to save archive zip. Also accept a prefix in the format of oci
- property archive_uri
Gets the archive_uri of this Application. A comma separated list of one or more archive files as Oracle Cloud Infrastructure URIs. For example,
oci://path/to/a.zip,oci://path/to/b.zip
. An Oracle Cloud Infrastructure URI of an archive.zip file containing custom dependencies that may be used to support the execution of a Python, Java, or Scala application. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.- Returns:
The archive_uri of this Application.
- Return type:
str
- property arguments
Gets the arguments of this Application. The arguments passed to the running application as command line arguments. An argument is either a plain text or a placeholder. Placeholders are replaced using values from the parameters map. Each placeholder specified must be represented in the parameters map else the request (POST or PUT) will fail with a HTTP 400 status code. Placeholders are specified as Service Api Spec, where name is the name of the parameter. Example: [ “–input”, “${input_file}”, “–name”, “John Doe” ] If “input_file” has a value of “mydata.xml”, then the value above will be translated to –input mydata.xml –name “John Doe”
- Returns:
The arguments of this Application.
- Return type:
list[str]
- property class_name
Gets the class_name of this Application. The class for the application.
- Returns:
The class_name of this Application.
- Return type:
str
- property compartment_id
[Required] Gets the compartment_id of this Application. The OCID of a compartment.
- Returns:
The compartment_id of this Application.
- Return type:
str
- property configuration
Gets the configuration of this Application. The Spark configuration passed to the running process. See https://spark.apache.org/docs/latest/configuration.html#available-properties. Example: { “spark.app.name” : “My App Name”, “spark.shuffle.io.maxRetries” : “4” } Note: Not all Spark properties are permitted to be set. Attempting to set a property that is not allowed to be overwritten will cause a 400 status to be returned.
- Returns:
The configuration of this Application.
- Return type:
dict(str, str)
- property defined_tags
Gets the defined_tags of this Application. Defined tags for this resource. Each key is predefined and scoped to a namespace. For more information, see Resource Tags. Example: {“Operations”: {“CostCenter”: “42”}}
- Returns:
The defined_tags of this Application.
- Return type:
dict(str, dict(str, object))
- property description
Gets the description of this Application. A user-friendly description.
- Returns:
The description of this Application.
- Return type:
str
- property display_name
[Required] Gets the display_name of this Application. A user-friendly name. This name is not necessarily unique.
- Returns:
The display_name of this Application.
- Return type:
str
- property driver_shape
[Required] Gets the driver_shape of this Application. The VM shape for the driver. Sets the driver cores and memory.
- Returns:
The driver_shape of this Application.
- Return type:
str
- property driver_shape_config
Gets the driver_shape_config of this Application.
- Returns:
The driver_shape_config of this Application.
- Return type:
oci.data_flow.models.ShapeConfig
- property execute
Gets the execute of this Application. The input used for spark-submit command. For more details see https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit. Supported options include
--class
,--file
,--jars
,--conf
,--py-files
, and main application file with arguments. Example:--jars oci://path/to/a.jar,oci://path/to/b.jar --files oci://path/to/a.json,oci://path/to/b.csv --py-files oci://path/to/a.py,oci://path/to/b.py --conf spark.sql.crossJoin.enabled=true --class org.apache.spark.examples.SparkPi oci://path/to/main.jar 10
Note: If execute is specified together with applicationId, className, configuration, fileUri, language, arguments, parameters during application create/update, or run create/submit, Data Flow service will use derived information from execute input only.- Returns:
The execute of this Application.
- Return type:
str
- property executor_shape
[Required] Gets the executor_shape of this Application. The VM shape for the executors. Sets the executor cores and memory.
- Returns:
The executor_shape of this Application.
- Return type:
str
- property executor_shape_config
Gets the executor_shape_config of this Application.
- Returns:
The executor_shape_config of this Application.
- Return type:
oci.data_flow.models.ShapeConfig
- property file_uri
[Required] Gets the file_uri of this Application. An Oracle Cloud Infrastructure URI of the file containing the application to execute. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.
- Returns:
The file_uri of this Application.
- Return type:
str
- property freeform_tags
Gets the freeform_tags of this Application. Free-form tags for this resource. Each tag is a simple key-value pair with no predefined name, type, or namespace. For more information, see Resource Tags. Example: {“Department”: “Finance”}
- Returns:
The freeform_tags of this Application.
- Return type:
dict(str, str)
- property id
[Required] Gets the id of this Application. The application ID.
- Returns:
The id of this Application.
- Return type:
str
- property idle_timeout_in_minutes
Gets the idle_timeout_in_minutes of this Application. The timeout value in minutes used to manage Runs. A Run would be stopped after inactivity for this amount of time period. Note: This parameter is currently only applicable for Runs of type SESSION. Default value is 2880 minutes (2 days)
- Returns:
The idle_timeout_in_minutes of this Application.
- Return type:
int
- property language
[Required] Gets the language of this Application. The Spark language.
Allowed values for this property are: “SCALA”, “JAVA”, “PYTHON”, “SQL”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.
- Returns:
The language of this Application.
- Return type:
str
- property lifecycle_state
[Required] Gets the lifecycle_state of this Application. The current state of this application.
Allowed values for this property are: “ACTIVE”, “DELETED”, “INACTIVE”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.
- Returns:
The lifecycle_state of this Application.
- Return type:
str
- property logs_bucket_uri
Gets the logs_bucket_uri of this Application. An Oracle Cloud Infrastructure URI of the bucket where the Spark job logs are to be uploaded. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.
- Returns:
The logs_bucket_uri of this Application.
- Return type:
str
- property max_duration_in_minutes
Gets the max_duration_in_minutes of this Application. The maximum duration in minutes for which an Application should run. Data Flow Run would be terminated once it reaches this duration from the time it transitions to IN_PROGRESS state.
- Returns:
The max_duration_in_minutes of this Application.
- Return type:
int
- property metastore_id
Gets the metastore_id of this Application. The OCID of OCI Hive Metastore.
- Returns:
The metastore_id of this Application.
- Return type:
str
- property num_executors
[Required] Gets the num_executors of this Application. The number of executor VMs requested.
- Returns:
The num_executors of this Application.
- Return type:
int
- property owner_principal_id
[Required] Gets the owner_principal_id of this Application. The OCID of the user who created the resource.
- Returns:
The owner_principal_id of this Application.
- Return type:
str
- property owner_user_name
Gets the owner_user_name of this Application. The username of the user who created the resource. If the username of the owner does not exist, null will be returned and the caller should refer to the ownerPrincipalId value instead.
- Returns:
The owner_user_name of this Application.
- Return type:
str
- property parameters
Gets the parameters of this Application. An array of name/value pairs used to fill placeholders found in properties like Application.arguments. The name must be a string of one or more word characters (a-z, A-Z, 0-9, _). The value can be a string of 0 or more characters of any kind. Example: [ { name: “iterations”, value: “10”}, { name: “input_file”, value: “mydata.xml” }, { name: “variable_x”, value: “${x}”} ]
- Returns:
The parameters of this Application.
- Return type:
list[oci.data_flow.models.ApplicationParameter]
- property private_endpoint_id
Gets the private_endpoint_id of this Application. The OCID of a private endpoint.
- Returns:
The private_endpoint_id of this Application.
- Return type:
str
- property script_bucket
//<bucket-name>@<namespace>/<prefix>.
- Returns:
script bucket (path)
- Return type:
str
- Type:
Bucket to save user script. Also accept a prefix in the format of oci
- property spark_version
[Required] Gets the spark_version of this Application. The Spark version utilized to run the application.
- Returns:
The spark_version of this Application.
- Return type:
str
- property time_created
[Required] Gets the time_created of this Application. The date and time a application was created, expressed in RFC 3339 timestamp format. Example: 2018-04-03T21:10:29.600Z
- Returns:
The time_created of this Application.
- Return type:
datetime
- property time_updated
[Required] Gets the time_updated of this Application. The date and time a application was updated, expressed in RFC 3339 timestamp format. Example: 2018-04-03T21:10:29.600Z
- Returns:
The time_updated of this Application.
- Return type:
datetime
- property type
Gets the type of this Application. The Spark application processing type.
Allowed values for this property are: “BATCH”, “STREAMING”, “SESSION”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.
- Returns:
The type of this Application.
- Return type:
str
- property warehouse_bucket_uri
Gets the warehouse_bucket_uri of this Application. An Oracle Cloud Infrastructure URI of the bucket to be used as default warehouse directory for BATCH SQL runs. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.
- Returns:
The warehouse_bucket_uri of this Application.
- Return type:
str
- ads.jobs.utils.get_dataflow_config(path=None, oci_profile=None)
Module contents
- class ads.jobs.ContainerRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
Runtime
Represents a container job runtime
To define container runtime:
>>> ContainerRuntime() >>> .with_image("iad.ocir.io/<your_tenancy>/<your_image>") >>> .with_cmd("sleep 5 && echo Hello World") >>> .with_entrypoint(["/bin/sh", "-c"]) >>> .with_environment_variable(MY_ENV="MY_VALUE")
Alternatively, you can define the
entrypoint
andcmd
along with the image.>>> ContainerRuntime() >>> .with_image( >>> "iad.ocir.io/<your_tenancy>/<your_image>", >>> entrypoint=["/bin/sh", -c], >>> cmd="sleep 5 && echo Hello World", >>> ) >>> .with_environment_variable(MY_ENV="MY_VALUE")
The entrypoint and cmd can be either “exec form” or “shell form” (See references). The exec form is used when a list is passed in. The shell form is used when a space separated string is passed in.
When using the ContainerRuntime with OCI Data Science Job, the exec form is recommended. For most images, when the entrypoint is set to
["/bin/sh", "-c"]
,cmd
can be a string as if you are running shell command.References
https://docs.docker.com/engine/reference/builder/#entrypoint https://docs.docker.com/engine/reference/builder/#cmd
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_ARGS = 'args'
- CONST_CMD = 'cmd'
- CONST_ENTRYPOINT = 'entrypoint'
- CONST_ENV_VAR = 'env'
- CONST_IMAGE = 'image'
- CONST_MAXIMUM_RUNTIME_IN_MINUTES = 'maximumRuntimeInMinutes'
- CONST_TAG = 'freeformTags'
- property args: list
Command line arguments
- attribute_map = {'cmd': 'cmd', 'entrypoint': 'entrypoint', 'env': 'env', 'freeformTags': 'freeform_tags', 'image': 'image'}
- property cmd: str
Command of the container job
- property entrypoint: str
Entrypoint of the container job
- property environment_variables: dict
Environment variables
- Returns:
The runtime environment variables. The returned dictionary is a copy.
- Return type:
dict
- property envs: dict
Environment variables
- property freeform_tags: dict
- classmethod from_dict(obj_dict: dict) Self
Initialize the object from a Python dictionary
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property image: str
The container image
- property kind: str
Kind of the object to be stored in YAML. All runtime implementations will have “runtime” as kind. Subclass will have different types.
- property maximum_runtime_in_minutes: int
Maximum runtime in minutes
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- to_dict() dict
Converts the object to dictionary with kind, type and spec as keys.
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML
- with_argument(*args, **kwargs) Self
Adds command line arguments to the runtime.
This method can be called (chained) multiple times to add various arguments.
- Parameters:
args – Positional arguments. In a single method call, positional arguments are always added before keyword arguments. You can call with_argument() to add positional arguments after keyword arguments.
kwargs – Keyword arguments. To add a keyword argument without value, set the value to None.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- Raises:
ValueError – Keyword arguments with space in a key.
Examples
>>> runtime = Runtime().with_argument(key1="val1", key2="val2").with_argument("pos1") >>> print(runtime.args) ["--key1", "val1", "--key2", "val2", "pos1"]
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1="val1", key2="val2.1 val2.2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1=None, key2="val2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ["pos1", "--key1", "--key2", "val2", "pos2"]
- with_cmd(cmd: str) ContainerRuntime
Specifies the command for the container job.
- Parameters:
cmd (str) – Command for the container job
- Returns:
The runtime instance.
- Return type:
- with_entrypoint(entrypoint: Union[str, list]) ContainerRuntime
Specifies the entrypoint for the container job.
- Parameters:
entrypoint (str or list) – Entrypoint for the container job
- Returns:
The runtime instance.
- Return type:
- with_environment_variable(**kwargs) Self
Sets environment variables
Environment variables enclosed by
${...}
will be substituted.You can use
$$
to escape the substitution.Undefined variable enclosed by
${}
will be ignored.Double dollar signs
$$
will be substituted by a single one$
.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
Examples
>>> runtime = ( ... PythonRuntime() ... .with_environment_variable( ... HOST="10.0.0.1", ... PORT="443", ... URL="http://${HOST}:${PORT}/path/", ... ESCAPED_URL="http://$${HOST}:$${PORT}/path/", ... MISSING_VAR="This is ${UNDEFINED}", ... VAR_WITH_DOLLAR="$10", ... DOUBLE_DOLLAR="$$10" ... ) ... ) >>> for k, v in runtime.environment_variables.items(): ... print(f"{k}: {v}") HOST: 10.0.0.1 PORT: 443 URL: http://10.0.0.1:443/path/ ESCAPED_URL: http://${HOST}:${PORT}/path/ MISSING_VAR: This is ${UNDEFINED} VAR_WITH_DOLLAR: $10 DOUBLE_DOLLAR: $10
- with_freeform_tag(**kwargs) Self
Sets freeform tag
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_image(image: str, entrypoint: Optional[Union[str, list]] = None, cmd: Optional[str] = None) ContainerRuntime
Specify the image for the container job.
- Parameters:
image (str) – The container image, e.g. iad.ocir.io/<your_tenancy>/<your_image>:<your_tag>
entrypoint (str or list, optional) – Entrypoint for the job, by default None (the entrypoint defined in the image will be used).
cmd (str, optional) – Command for the job, by default None.
- Returns:
The runtime instance.
- Return type:
- with_maximum_runtime_in_minutes(maximum_runtime_in_minutes: int) Self
Sets maximum runtime in minutes
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- class ads.jobs.DataFlow(spec: Optional[dict] = None, **kwargs)
Bases:
Infrastructure
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_BUCKET_URI = 'logs_bucket_uri'
- CONST_COMPARTMENT_ID = 'compartment_id'
- CONST_CONFIG = 'configuration'
- CONST_DRIVER_SHAPE = 'driver_shape'
- CONST_DRIVER_SHAPE_CONFIG = 'driver_shape_config'
- CONST_EXECUTE = 'execute'
- CONST_EXECUTOR_SHAPE = 'executor_shape'
- CONST_EXECUTOR_SHAPE_CONFIG = 'executor_shape_config'
- CONST_ID = 'id'
- CONST_LANGUAGE = 'language'
- CONST_MEMORY_IN_GBS = 'memory_in_gbs'
- CONST_METASTORE_ID = 'metastore_id'
- CONST_NUM_EXECUTORS = 'num_executors'
- CONST_OCPUS = 'ocpus'
- CONST_PRIVATE_ENDPOINT_ID = 'private_endpoint_id'
- CONST_SPARK_VERSION = 'spark_version'
- CONST_WAREHOUSE_BUCKET_URI = 'warehouse_bucket_uri'
- attribute_map = {'compartment_id': 'compartmentId', 'configuration': 'configuration', 'driver_shape': 'driverShape', 'driver_shape_config': 'driverShapeConfig', 'execute': 'execute', 'executor_shape': 'executorShape', 'executor_shape_config': 'executorShapeConfig', 'id': 'id', 'logs_bucket_uri': 'logsBucketUri', 'memory_in_gbs': 'memoryInGBs', 'metastore_id': 'metastoreId', 'num_executors': 'numExecutors', 'ocpus': 'ocpus', 'private_endpoint_id': 'privateEndpointId', 'spark_version': 'sparkVersion', 'warehouse_bucket_uri': 'warehouseBucketUri'}
- create(runtime: DataFlowRuntime, **kwargs) DataFlow
Create a Data Flow job given a runtime.
- Parameters:
runtime – runtime to bind to the Data Flow job
kwargs – additional keyword arguments
- Returns:
a Data Flow job instance
- Return type:
- delete()
Delete a Data Flow job and canceling associated runs.
- Return type:
None
- classmethod from_dict(config: dict) DataFlow
Load a Data Flow job instance from a dictionary of configurations.
- Parameters:
config (dict) – dictionary of configurations
- Returns:
a Data Flow job instance
- Return type:
- classmethod from_id(id: str) DataFlow
Load a Data Flow job given an id.
- Parameters:
id (str) – id of the Data Flow job to load
- Returns:
a Data Flow job instance
- Return type:
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property job_id: Optional[str]
The OCID of the job
- property kind: str
Kind of the object to be stored in YAML. All runtimes will have “infrastructure” as kind. Subclass will have different types.
- classmethod list_jobs(compartment_id: Optional[str] = None, **kwargs) List[DataFlow]
List Data Flow jobs in a given compartment.
- Parameters:
compartment_id (str) – id of that compartment
kwargs – additional keyword arguments for filtering jobs
- Returns:
list of Data Flow jobs
- Return type:
List[DataFlow]
- property name: str
Display name of the job
- run(name: Optional[str] = None, args: Optional[List[str]] = None, env_vars: Optional[Dict[str, str]] = None, freeform_tags: Optional[Dict[str, str]] = None, wait: bool = False, **kwargs) DataFlowRun
Run a Data Flow job.
- Parameters:
name (str, optional) – name of the run. If a name is not provided, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
args (List[str], optional) – list of command line arguments
env_vars (Dict[str, str], optional) – dictionary of environment variables (not used for data flow)
freeform_tags (Dict[str, str], optional) – freeform tags
wait (bool, optional) – whether to wait for a run to terminate
kwargs – additional keyword arguments
- Returns:
a DataFlowRun instance
- Return type:
- run_list(**kwargs) List[DataFlowRun]
List runs associated with a Data Flow job.
- Parameters:
kwargs – additional arguments for filtering runs.
- Returns:
list of DataFlowRun instances
- Return type:
List[DataFlowRun]
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- to_dict() dict
Serialize job to a dictionary.
- Returns:
serialized job as a dictionary
- Return type:
dict
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml() str
Serializes the object into YAML string.
- Returns:
YAML stored in a string.
- Return type:
str
- property type: str
The type of the object as showing in YAML.
This implementation returns the class name with the first letter coverted to lower case.
- with_compartment_id(id: str) DataFlow
Set compartment id for a Data Flow job.
- Parameters:
id (str) – compartment id
- Returns:
the Data Flow instance itself
- Return type:
- with_configuration(configs: dict) DataFlow
Set configuration for a Data Flow job.
- Parameters:
configs (dict) – dictionary of configurations
- Returns:
the Data Flow instance itself
- Return type:
- with_driver_shape(shape: str) DataFlow
Set driver shape for a Data Flow job.
- Parameters:
shape (str) – driver shape
- Returns:
the Data Flow instance itself
- Return type:
- with_driver_shape_config(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) DataFlow
Sets the driver shape config details of Data Flow job infrastructure. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.
- Parameters:
memory_in_gbs (float) – The size of the memory in GBs.
ocpus (float) – The OCPUs count.
kwargs – Additional keyword arguments.
- Returns:
the Data Flow instance itself.
- Return type:
- with_execute(exec: str) DataFlow
Set command for spark-submit.
- Parameters:
exec (str) – str of commands
- Returns:
the Data Flow instance itself
- Return type:
- with_executor_shape(shape: str) DataFlow
Set executor shape for a Data Flow job.
- Parameters:
shape (str) – executor shape
- Returns:
the Data Flow instance itself
- Return type:
- with_executor_shape_config(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) DataFlow
Sets the executor shape config details of Data Flow job infrastructure. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.
- Parameters:
memory_in_gbs (float) – The size of the memory in GBs.
ocpus (float) – The OCPUs count.
kwargs – Additional keyword arguments.
- Returns:
the Data Flow instance itself.
- Return type:
- with_id(id: str) DataFlow
Set id for a Data Flow job.
- Parameters:
id (str) – id of a job
- Returns:
the Data Flow instance itself
- Return type:
- with_language(lang: str) DataFlow
Set language for a Data Flow job.
- Parameters:
lang (str) – language for the job
- Returns:
the Data Flow instance itself
- Return type:
- with_logs_bucket_uri(uri: str) DataFlow
Set logs bucket uri for a Data Flow job.
- Parameters:
uri (str) – uri to logs bucket
- Returns:
the Data Flow instance itself
- Return type:
- with_metastore_id(id: str) DataFlow
Set Hive metastore id for a Data Flow job.
- Parameters:
id (str) – metastore id
- Returns:
the Data Flow instance itself
- Return type:
- with_num_executors(n: int) DataFlow
Set number of executors for a Data Flow job.
- Parameters:
n (int) – number of executors
- Returns:
the Data Flow instance itself
- Return type:
- with_private_endpoint_id(private_endpoint_id: str) DataFlow
Set the private endpoint ID for a Data Flow job infrastructure.
- Parameters:
private_endpoint_id (str) – The OCID of a private endpoint.
- Returns:
the Data Flow instance itself
- Return type:
- with_spark_version(ver: str) DataFlow
Set spark version for a Data Flow job. Currently supported versions are 2.4.4, 3.0.2 and 3.2.1 Documentation: https://docs.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#before_you_begin
- Parameters:
ver (str) – spark version
- Returns:
the Data Flow instance itself
- Return type:
- class ads.jobs.DataFlowNotebookRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
DataFlowRuntime
,NotebookRuntime
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_ARCHIVE_BUCKET = 'archiveBucket'
- CONST_ARCHIVE_URI = 'archiveUri'
- CONST_ARGS = 'args'
- CONST_CONDA = 'conda'
- CONST_CONDA_AUTH_TYPE = 'condaAuthType'
- CONST_CONDA_REGION = 'region'
- CONST_CONDA_SLUG = 'slug'
- CONST_CONDA_TYPE = 'type'
- CONST_CONDA_TYPE_CUSTOM = 'published'
- CONST_CONDA_TYPE_SERVICE = 'service'
- CONST_CONDA_URI = 'uri'
- CONST_CONFIGURATION = 'configuration'
- CONST_ENTRYPOINT = 'entrypoint'
- CONST_ENV_VAR = 'env'
- CONST_EXCLUDE_TAG = 'excludeTags'
- CONST_MAXIMUM_RUNTIME_IN_MINUTES = 'maximumRuntimeInMinutes'
- CONST_NOTEBOOK_ENCODING = 'notebookEncoding'
- CONST_NOTEBOOK_PATH = 'notebookPathURI'
- CONST_OUTPUT_URI = 'outputUri'
- CONST_OUTPUT_URI_ALT = 'outputURI'
- CONST_OVERWRITE = 'overwrite'
- CONST_SCRIPT_BUCKET = 'scriptBucket'
- CONST_SCRIPT_PATH = 'scriptPathURI'
- CONST_SOURCE = 'source'
- CONST_TAG = 'freeformTags'
- property archive_bucket: str
Bucket to save archive zip
- property archive_uri
The Uri of archive zip
- property args: list
Command line arguments
- attribute_map = {'archiveUri': 'archive_uri', 'condaAuthType': 'conda_auth_type', 'configuration': 'configuration', 'env': 'env', 'freeformTags': 'freeform_tags', 'overwrite': 'overwrite', 'scriptBucket': 'script_bucket', 'scriptPathURI': 'script_path_uri'}
- property conda: dict
The conda environment specification.
For service conda environment, the specification contains:
type
, the type of the conda environment. This is alwaysservice
for service conda environment.slug
, the slug of the conda environment.
For custom conda environment, the specification contains:
type
, the type of the conda environment. This is alwayspublished
for custom conda environment.uri
, the uri of the conda environment, e.g. oci://bucket@namespace/prefix/to/condaregion
, the region of the bucket in which the conda environment is stored. By default, ADS will determine the region based on the authenticated API key or resource principal. This is only needed if your conda environment is stored in a different region.
- Returns:
A dictionary containing the conda environment specifications.
- Return type:
dict
- property configuration: dict
Configuration for Spark
- convert(overwrite=False)
- property environment_variables: dict
Environment variables
- Returns:
The runtime environment variables. The returned dictionary is a copy.
- Return type:
dict
- property envs: dict
Environment variables
- property exclude_tag: list
A list of cell tags indicating cells to be excluded from the job
- property freeform_tags: dict
- classmethod from_dict(obj_dict: dict) Self
Initialize the object from a Python dictionary
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property kind: str
Kind of the object to be stored in YAML. All runtime implementations will have “runtime” as kind. Subclass will have different types.
- property maximum_runtime_in_minutes: int
Maximum runtime in minutes
- property notebook: str
The path of the notebook relative to the source.
- property notebook_encoding: str
The encoding of the notebook
- property notebook_uri: str
The URI of the notebook
- property output_uri: list
URI for storing the output notebook and files
- property overwrite: str
Whether to overwrite the existing script in object storage (script bucket).
- property script_bucket: str
Bucket to save script
- property script_uri: str
The URI of the source code
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- property source: str
The source code location.
- to_dict() dict
Converts the object to dictionary with kind, type and spec as keys.
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML
- with_archive_bucket(bucket) DataFlowRuntime
Set object storage bucket to save the archive zip, in case archive uri given is local.
- Parameters:
bucket (str) – name of the bucket
- Returns:
runtime instance itself
- Return type:
- with_archive_uri(uri: str) DataFlowRuntime
Set archive uri (which is a zip file containing dependencies).
- Parameters:
uri (str) – uri to the archive zip
- Returns:
runtime instance itself
- Return type:
- with_argument(*args, **kwargs) Self
Adds command line arguments to the runtime.
This method can be called (chained) multiple times to add various arguments.
- Parameters:
args – Positional arguments. In a single method call, positional arguments are always added before keyword arguments. You can call with_argument() to add positional arguments after keyword arguments.
kwargs – Keyword arguments. To add a keyword argument without value, set the value to None.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- Raises:
ValueError – Keyword arguments with space in a key.
Examples
>>> runtime = Runtime().with_argument(key1="val1", key2="val2").with_argument("pos1") >>> print(runtime.args) ["--key1", "val1", "--key2", "val2", "pos1"]
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1="val1", key2="val2.1 val2.2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1=None, key2="val2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ["pos1", "--key1", "--key2", "val2", "pos2"]
- with_conda(conda_spec: Optional[dict] = None)
- with_configuration(config: dict) DataFlowRuntime
Set Configuration for Spark.
- Parameters:
config (dict) – dictionary of configuration details https://spark.apache.org/docs/latest/configuration.html#available-properties. Example: { “spark.app.name” : “My App Name”, “spark.shuffle.io.maxRetries” : “4” }
- Returns:
runtime instance itself
- Return type:
- with_custom_conda(uri: str, region: Optional[str] = None, auth_type: Optional[str] = None)
Specifies the custom conda pack for running the job
- Parameters:
uri (str) – The OCI object storage URI for the conda pack, e.g. “oci://your_bucket@namespace/object_name.” In the Environment Explorer of an OCI notebook session, this is shown as the “source” of the conda pack.
region (str, optional) – The region of the bucket storing the custom conda pack, by default None. If region is not specified, ADS will use the region from your authentication credentials, * For API Key, config[“region”] is used. * For Resource Principal, signer.region is used. This is required if the conda pack is stored in a different region.
auth_type (str, (="resource_principal")) – One of “resource_principal”, “api_keys”, “instance_principal”, etc. Auth mechanism used to read the conda back uri provided.
- Returns:
The runtime instance.
- Return type:
self
See also
https
//docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm
- with_environment_variable(**kwargs) Self
Sets environment variables
Environment variables enclosed by
${...}
will be substituted.You can use
$$
to escape the substitution.Undefined variable enclosed by
${}
will be ignored.Double dollar signs
$$
will be substituted by a single one$
.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
Examples
>>> runtime = ( ... PythonRuntime() ... .with_environment_variable( ... HOST="10.0.0.1", ... PORT="443", ... URL="http://${HOST}:${PORT}/path/", ... ESCAPED_URL="http://$${HOST}:$${PORT}/path/", ... MISSING_VAR="This is ${UNDEFINED}", ... VAR_WITH_DOLLAR="$10", ... DOUBLE_DOLLAR="$$10" ... ) ... ) >>> for k, v in runtime.environment_variables.items(): ... print(f"{k}: {v}") HOST: 10.0.0.1 PORT: 443 URL: http://10.0.0.1:443/path/ ESCAPED_URL: http://${HOST}:${PORT}/path/ MISSING_VAR: This is ${UNDEFINED} VAR_WITH_DOLLAR: $10 DOUBLE_DOLLAR: $10
- with_exclude_tag(*tags) NotebookRuntime
Specifies the cell tags in the notebook to exclude cells from the job script.
- Parameters:
*tags (list) – A list of tags (strings).
- Returns:
The runtime instance.
- Return type:
self
- with_freeform_tag(**kwargs) Self
Sets freeform tag
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_maximum_runtime_in_minutes(maximum_runtime_in_minutes: int) Self
Sets maximum runtime in minutes
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_notebook(path: str, encoding='utf-8') NotebookRuntime
Specifies the notebook to be run as a job. Use this method if you would like to run a single notebook. Use
with_source()
method if you would like to run a notebook with additional dependency files.- Parameters:
path (str) – The path of the Jupyter notebook
encoding (str) – The encoding for opening the notebook. Defaults to utf-8.
- Returns:
The runtime instance.
- Return type:
self
- with_output(output_uri: str) NotebookRuntime
Specifies the output URI for storing the output notebook and files. All files in the directory containing the notebook will be saved.
- Parameters:
output_uri (str) – URI for a directory storing the output notebook and files. For example, oci://bucket@namespace/path/to/dir
- Returns:
The runtime instance.
- Return type:
self
- with_overwrite(overwrite: bool) DataFlowRuntime
Whether to overwrite the existing script in object storage (script bucket). If the Object Storage bucket already contains a script with the same name, then it will be overwritten with the new one if the overwrite flag equal to True.
- Parameters:
overwrite (bool) – Whether to overwrite the existing script in object storage (script bucket).
- Returns:
The DataFlowRuntime instance (self).
- Return type:
- with_script_bucket(bucket) DataFlowRuntime
Set object storage bucket to save the script, in case script uri given is local.
- Parameters:
bucket (str) – name of the bucket
- Returns:
runtime instance itself
- Return type:
- with_script_uri(path: str) DataFlowRuntime
Set script uri.
- Parameters:
path (str) – uri to the script
- Returns:
runtime instance itself
- Return type:
- with_service_conda(slug: str)
Specifies the service conda pack for running the job
- Parameters:
slug (str) – The slug name of the service conda pack
- Returns:
The runtime instance.
- Return type:
self
- with_source(uri: str, notebook: str, encoding='utf-8')
Specify source code directory containing the notebook and dependencies for the job. Use this method if you would like to run a notebook with additional dependency files. Use the with_notebook() method if you would like to run a single notebook.
In the following example, local folder “path/to/source” contains the notebook and dependencies, The local path of the notebook is “path/to/source/relative/path/to/notebook.ipynb”:
runtime.with_source(uri="path/to/source", notebook="relative/path/to/notebook.ipynb")
- Parameters:
uri (str) – URI of the source code directory. This can be local or on OCI object storage.
notebook (str) – The relative path of the notebook from the source URI.
encoding (str) – The encoding for opening the notebook. Defaults to utf-8.
- Returns:
The runtime instance.
- Return type:
Self
- class ads.jobs.DataFlowRun(config: Optional[dict] = None, signer: Optional[Signer] = None, client_kwargs: Optional[dict] = None, **kwargs)
Bases:
OCIModelMixin
,Run
,RunInstance
Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,
and the authentication will be determined by OCI Python SDK.
- Parameters:
config (dict, optional) – OCI API key config dictionary, by default None.
signer (oci.signer.Signer, optional) – OCI authentication signer, by default None.
client_kwargs (dict, optional) – Additional keyword arguments for initializing the OCI client.
- CONS_COMPARTMENT_ID = 'compartment_id'
- LANGUAGE_JAVA = 'JAVA'
A constant which can be used with the language property of a Run. This constant has a value of “JAVA”
- LANGUAGE_PYTHON = 'PYTHON'
A constant which can be used with the language property of a Run. This constant has a value of “PYTHON”
- LANGUAGE_SCALA = 'SCALA'
A constant which can be used with the language property of a Run. This constant has a value of “SCALA”
- LANGUAGE_SQL = 'SQL'
A constant which can be used with the language property of a Run. This constant has a value of “SQL”
- LIFECYCLE_STATE_ACCEPTED = 'ACCEPTED'
A constant which can be used with the lifecycle_state property of a Run. This constant has a value of “ACCEPTED”
- LIFECYCLE_STATE_CANCELED = 'CANCELED'
A constant which can be used with the lifecycle_state property of a Run. This constant has a value of “CANCELED”
- LIFECYCLE_STATE_CANCELING = 'CANCELING'
A constant which can be used with the lifecycle_state property of a Run. This constant has a value of “CANCELING”
- LIFECYCLE_STATE_FAILED = 'FAILED'
A constant which can be used with the lifecycle_state property of a Run. This constant has a value of “FAILED”
- LIFECYCLE_STATE_IN_PROGRESS = 'IN_PROGRESS'
A constant which can be used with the lifecycle_state property of a Run. This constant has a value of “IN_PROGRESS”
- LIFECYCLE_STATE_STOPPED = 'STOPPED'
A constant which can be used with the lifecycle_state property of a Run. This constant has a value of “STOPPED”
- LIFECYCLE_STATE_STOPPING = 'STOPPING'
A constant which can be used with the lifecycle_state property of a Run. This constant has a value of “STOPPING”
- LIFECYCLE_STATE_SUCCEEDED = 'SUCCEEDED'
A constant which can be used with the lifecycle_state property of a Run. This constant has a value of “SUCCEEDED”
- OCI_MODEL_PATTERN = 'oci.[^.]+\\.models[\\..*]?'
- TERMINATED_STATES = ['CANCELED', 'FAILED', 'SUCCEEDED']
- TYPE_BATCH = 'BATCH'
A constant which can be used with the type property of a Run. This constant has a value of “BATCH”
- TYPE_SESSION = 'SESSION'
A constant which can be used with the type property of a Run. This constant has a value of “SESSION”
- TYPE_STREAMING = 'STREAMING'
A constant which can be used with the type property of a Run. This constant has a value of “STREAMING”
- property application_id
[Required] Gets the application_id of this Run. The application ID.
- Returns:
The application_id of this Run.
- Return type:
str
- property application_log_config
Gets the application_log_config of this Run.
- Returns:
The application_log_config of this Run.
- Return type:
oci.data_flow.models.ApplicationLogConfig
- property archive_uri
Gets the archive_uri of this Run. A comma separated list of one or more archive files as Oracle Cloud Infrastructure URIs. For example,
oci://path/to/a.zip,oci://path/to/b.zip
. An Oracle Cloud Infrastructure URI of an archive.zip file containing custom dependencies that may be used to support the execution of a Python, Java, or Scala application. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.- Returns:
The archive_uri of this Run.
- Return type:
str
- property arguments
Gets the arguments of this Run. The arguments passed to the running application as command line arguments. An argument is either a plain text or a placeholder. Placeholders are replaced using values from the parameters map. Each placeholder specified must be represented in the parameters map else the request (POST or PUT) will fail with a HTTP 400 status code. Placeholders are specified as Service Api Spec, where name is the name of the parameter. Example: [ “–input”, “${input_file}”, “–name”, “John Doe” ] If “input_file” has a value of “mydata.xml”, then the value above will be translated to –input mydata.xml –name “John Doe”
- Returns:
The arguments of this Run.
- Return type:
list[str]
- property auth: dict
The ADS authentication config used to initialize the client. This auth has the same format as those obtained by calling functions in ads.common.auth. The config is a dict containing the following key-value pairs: config: The config contains the config loaded from the configuration loaded from oci_config. signer: The signer contains the signer object created from the api keys. client_kwargs: client_kwargs contains the client_kwargs that was passed in as input parameter.
- cancel() DataFlowRun
Cancel a Data Flow run if it is not yet terminated. Will be executed synchronously.
- Returns:
The dataflow run instance.
- Return type:
self
- static check_compartment_id(compartment_id: Optional[str]) str
- Checks if a compartment ID has value and
return the value from NB_SESSION_COMPARTMENT_OCID environment variable if it is not specified.
- Parameters:
compartment_id (str) – Compartment OCID or None
- Returns:
str: Compartment OCID
- Return type:
type
- Raises:
ValueError – compartment_id is not specified and NB_SESSION_COMPARTMENT_OCID environment variable is not set
- property class_name
Gets the class_name of this Run. The class for the application.
- Returns:
The class_name of this Run.
- Return type:
str
- property client: DataFlowClient
OCI client
- property compartment_id
[Required] Gets the compartment_id of this Run. The OCID of a compartment.
- Returns:
The compartment_id of this Run.
- Return type:
str
- config = None
- property configuration
Gets the configuration of this Run. The Spark configuration passed to the running process. See https://spark.apache.org/docs/latest/configuration.html#available-properties. Example: { “spark.app.name” : “My App Name”, “spark.shuffle.io.maxRetries” : “4” } Note: Not all Spark properties are permitted to be set. Attempting to set a property that is not allowed to be overwritten will cause a 400 status to be returned.
- Returns:
The configuration of this Run.
- Return type:
dict(str, str)
- create() DataFlowRun
Create a Data Flow run.
- Returns:
a DataFlowRun instance
- Return type:
- classmethod create_instance(*args, **kwargs)
Creates an instance using the same authentication as the class or an existing instance. If this method is called by a class, the default ADS authentication method will be used. If this method is called by an instance, the authentication method set in the instance will be used.
- property data_read_in_bytes
Gets the data_read_in_bytes of this Run. The data read by the run in bytes.
- Returns:
The data_read_in_bytes of this Run.
- Return type:
int
- property data_written_in_bytes
Gets the data_written_in_bytes of this Run. The data written by the run in bytes.
- Returns:
The data_written_in_bytes of this Run.
- Return type:
int
- property defined_tags
Gets the defined_tags of this Run. Defined tags for this resource. Each key is predefined and scoped to a namespace. For more information, see Resource Tags. Example: {“Operations”: {“CostCenter”: “42”}}
- Returns:
The defined_tags of this Run.
- Return type:
dict(str, dict(str, object))
- delete() DataFlowRun
Cancel and delete a Data Flow run if it is not yet terminated. Will be executed asynchronously.
- Returns:
The dataflow run instance.
- Return type:
self
- classmethod deserialize(data: dict, to_cls: Optional[str] = None)
Deserialize data
- Parameters:
data (dict) – A dictionary containing the data to be deserialized.
to_cls (str) – The name of the OCI model class to be initialized using the data. The OCI model class must be from the same OCI service of the OCI client (self.client). Defaults to None, the parent OCI model class name will be used if current class is inherited from an OCI model. If parent OCI model class is not found or not from the same OCI service, the data will be returned as is.
- property display_name
Gets the display_name of this Run. A user-friendly name. This name is not necessarily unique.
- Returns:
The display_name of this Run.
- Return type:
str
- property driver_shape
[Required] Gets the driver_shape of this Run. The VM shape for the driver. Sets the driver cores and memory.
- Returns:
The driver_shape of this Run.
- Return type:
str
- property driver_shape_config
Gets the driver_shape_config of this Run.
- Returns:
The driver_shape_config of this Run.
- Return type:
oci.data_flow.models.ShapeConfig
- property execute
Gets the execute of this Run. The input used for spark-submit command. For more details see https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit. Supported options include
--class
,--file
,--jars
,--conf
,--py-files
, and main application file with arguments. Example:--jars oci://path/to/a.jar,oci://path/to/b.jar --files oci://path/to/a.json,oci://path/to/b.csv --py-files oci://path/to/a.py,oci://path/to/b.py --conf spark.sql.crossJoin.enabled=true --class org.apache.spark.examples.SparkPi oci://path/to/main.jar 10
Note: If execute is specified together with applicationId, className, configuration, fileUri, language, arguments, parameters during application create/update, or run create/submit, Data Flow service will use derived information from execute input only.- Returns:
The execute of this Run.
- Return type:
str
- property executor_shape
[Required] Gets the executor_shape of this Run. The VM shape for the executors. Sets the executor cores and memory.
- Returns:
The executor_shape of this Run.
- Return type:
str
- property executor_shape_config
Gets the executor_shape_config of this Run.
- Returns:
The executor_shape_config of this Run.
- Return type:
oci.data_flow.models.ShapeConfig
- property file_uri
[Required] Gets the file_uri of this Run. An Oracle Cloud Infrastructure URI of the file containing the application to execute. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.
- Returns:
The file_uri of this Run.
- Return type:
str
- static flatten(data: dict) dict
Flattens a nested dictionary.
- Parameters:
data (A nested dictionary) –
- Returns:
The flattened dictionary.
- Return type:
dict
- property freeform_tags
Gets the freeform_tags of this Run. Free-form tags for this resource. Each tag is a simple key-value pair with no predefined name, type, or namespace. For more information, see Resource Tags. Example: {“Department”: “Finance”}
- Returns:
The freeform_tags of this Run.
- Return type:
dict(str, str)
- classmethod from_dict(data)
Initialize an instance from a dictionary.
- Parameters:
data (dict) – A dictionary containing the properties to initialize the class.
- classmethod from_oci_model(oci_instance)
Initialize an instance from an instance of OCI model.
- Parameters:
oci_instance – An instance of an OCI model.
- classmethod from_ocid(ocid: str)
Initializes an object from OCID
- Parameters:
ocid (str) – The OCID of the object
- property id
[Required] Gets the id of this Run. The ID of a run.
- Returns:
The id of this Run.
- Return type:
str
- property idle_timeout_in_minutes
Gets the idle_timeout_in_minutes of this Run. The timeout value in minutes used to manage Runs. A Run would be stopped after inactivity for this amount of time period. Note: This parameter is currently only applicable for Runs of type SESSION. Default value is 2880 minutes (2 days)
- Returns:
The idle_timeout_in_minutes of this Run.
- Return type:
int
- classmethod init_client(**kwargs) DataFlowClient
Initializes the OCI client specified in the “client” keyword argument Sub-class should override this method and call cls._init_client(client=OCI_CLIENT)
- Parameters:
**kwargs – Additional keyword arguments for initializing the OCI client.
- Return type:
An instance of OCI client.
- kwargs = None
- property language
[Required] Gets the language of this Run. The Spark language.
Allowed values for this property are: “SCALA”, “JAVA”, “PYTHON”, “SQL”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.
- Returns:
The language of this Run.
- Return type:
str
- property lifecycle_details
Gets the lifecycle_details of this Run. The detailed messages about the lifecycle state.
- Returns:
The lifecycle_details of this Run.
- Return type:
str
- property lifecycle_state
[Required] Gets the lifecycle_state of this Run. The current state of this run.
Allowed values for this property are: “ACCEPTED”, “IN_PROGRESS”, “CANCELING”, “CANCELED”, “FAILED”, “SUCCEEDED”, “STOPPING”, “STOPPED”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.
- Returns:
The lifecycle_state of this Run.
- Return type:
str
- classmethod list_resource(compartment_id: Optional[str] = None, limit: int = 0, **kwargs) list
Generic method to list OCI resources
- Parameters:
compartment_id (str) – Compartment ID of the OCI resources. Defaults to None. If compartment_id is not specified, the value of NB_SESSION_COMPARTMENT_OCID in environment variable will be used.
limit (int) – The maximum number of items to return. Defaults to 0, All items will be returned
**kwargs – Additional keyword arguments to filter the resource. The kwargs are passed into OCI API.
- Returns:
A list of OCI resources
- Return type:
list
- Raises:
NotImplementedError – List method is not supported or implemented.
- load_properties_from_env()
Loads properties from the environment
- property logs: DataFlowLogs
Show logs from a run. There are three types of logs: application log, driver log and executor log, each with stdout and stderr separately. To access each type of logs, >>> dfr.logs.application.stdout >>> dfr.logs.driver.stderr
- Returns:
an instance of DataFlowLogs
- Return type:
- property logs_bucket_uri
Gets the logs_bucket_uri of this Run. An Oracle Cloud Infrastructure URI of the bucket where the Spark job logs are to be uploaded. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.
- Returns:
The logs_bucket_uri of this Run.
- Return type:
str
- property max_duration_in_minutes
Gets the max_duration_in_minutes of this Run. The maximum duration in minutes for which an Application should run. Data Flow Run would be terminated once it reaches this duration from the time it transitions to IN_PROGRESS state.
- Returns:
The max_duration_in_minutes of this Run.
- Return type:
int
- property metastore_id
Gets the metastore_id of this Run. The OCID of OCI Hive Metastore.
- Returns:
The metastore_id of this Run.
- Return type:
str
- property name: str
Gets the name of the object.
- property num_executors
[Required] Gets the num_executors of this Run. The number of executor VMs requested.
- Returns:
The num_executors of this Run.
- Return type:
int
- property opc_request_id
Gets the opc_request_id of this Run. Unique Oracle assigned identifier for the request. If you need to contact Oracle about a particular request, please provide the request ID.
- Returns:
The opc_request_id of this Run.
- Return type:
str
- property owner_principal_id
Gets the owner_principal_id of this Run. The OCID of the user who created the resource.
- Returns:
The owner_principal_id of this Run.
- Return type:
str
- property owner_user_name
Gets the owner_user_name of this Run. The username of the user who created the resource. If the username of the owner does not exist, null will be returned and the caller should refer to the ownerPrincipalId value instead.
- Returns:
The owner_user_name of this Run.
- Return type:
str
- property parameters
Gets the parameters of this Run. An array of name/value pairs used to fill placeholders found in properties like Application.arguments. The name must be a string of one or more word characters (a-z, A-Z, 0-9, _). The value can be a string of 0 or more characters of any kind. Example: [ { name: “iterations”, value: “10”}, { name: “input_file”, value: “mydata.xml” }, { name: “variable_x”, value: “${x}”} ]
- Returns:
The parameters of this Run.
- Return type:
list[oci.data_flow.models.ApplicationParameter]
- property private_endpoint_dns_zones
Gets the private_endpoint_dns_zones of this Run. An array of DNS zone names. Example: [ “app.examplecorp.com”, “app.examplecorp2.com” ]
- Returns:
The private_endpoint_dns_zones of this Run.
- Return type:
list[str]
- property private_endpoint_id
Gets the private_endpoint_id of this Run. The OCID of a private endpoint.
- Returns:
The private_endpoint_id of this Run.
- Return type:
str
- property private_endpoint_max_host_count
Gets the private_endpoint_max_host_count of this Run. The maximum number of hosts to be accessed through the private endpoint. This value is used to calculate the relevant CIDR block and should be a multiple of 256. If the value is not a multiple of 256, it is rounded up to the next multiple of 256. For example, 300 is rounded up to 512.
- Returns:
The private_endpoint_max_host_count of this Run.
- Return type:
int
- property private_endpoint_nsg_ids
Gets the private_endpoint_nsg_ids of this Run. An array of network security group OCIDs.
- Returns:
The private_endpoint_nsg_ids of this Run.
- Return type:
list[str]
- property private_endpoint_subnet_id
Gets the private_endpoint_subnet_id of this Run. The OCID of a subnet.
- Returns:
The private_endpoint_subnet_id of this Run.
- Return type:
str
- property run_details_link: str
Link to run details page in OCI console
- Returns:
The link to the details page in OCI console.
- Return type:
str
- property run_duration_in_milliseconds
Gets the run_duration_in_milliseconds of this Run. The duration of the run in milliseconds.
- Returns:
The run_duration_in_milliseconds of this Run.
- Return type:
int
- serialize()
Serialize the model to a dictionary that is ready to be send to OCI API.
- Returns:
A dictionary that is ready to be send to OCI API.
- Return type:
dict
- signer = None
- property spark_version
[Required] Gets the spark_version of this Run. The Spark version utilized to run the application.
- Returns:
The spark_version of this Run.
- Return type:
str
- property status: str
Show status (lifecycle state) of a run.
- Returns:
status of the run
- Return type:
str
- sync(merge_strategy: MergeStrategy = MergeStrategy.OVERRIDE)
Refreshes the properties of the object from OCI
- property time_created
[Required] Gets the time_created of this Run. The date and time a application was created, expressed in RFC 3339 timestamp format. Example: 2018-04-03T21:10:29.600Z
- Returns:
The time_created of this Run.
- Return type:
datetime
- property time_updated
[Required] Gets the time_updated of this Run. The date and time a application was updated, expressed in RFC 3339 timestamp format. Example: 2018-04-03T21:10:29.600Z
- Returns:
The time_updated of this Run.
- Return type:
datetime
- to_dict(flatten: bool = False) dict
Converts the properties to a dictionary
- Parameters:
flatten – (Default value = False)
- to_oci_model(oci_model)
Converts the object into an instance of OCI data model.
- Parameters:
oci_model (class or str) – The OCI model to be converted to. This can be a string of the model name.
type_mapping (dict) – A dictionary mapping the models. Returns: An instance of the oci_model
- to_yaml() str
Serializes the object into YAML string.
- Returns:
YAML stored in a string.
- Return type:
str
- property total_o_cpu
Gets the total_o_cpu of this Run. The total number of oCPU requested by the run.
- Returns:
The total_o_cpu of this Run.
- Return type:
int
- property type
Gets the type of this Run. The Spark application processing type.
Allowed values for this property are: “BATCH”, “STREAMING”, “SESSION”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.
- Returns:
The type of this Run.
- Return type:
str
- type_mappings = None
- update_from_oci_model(oci_model_instance, merge_strategy: MergeStrategy = MergeStrategy.OVERRIDE)
Updates the properties from OCI model with the same properties.
- Parameters:
oci_model_instance – An instance of OCI model, which should have the same properties of this class.
- wait(interval: int = 3) DataFlowRun
Wait for a run to terminate.
- Parameters:
interval (int, optional) – interval to wait before probing again
- Returns:
a DataFlowRun instance
- Return type:
- property warehouse_bucket_uri
Gets the warehouse_bucket_uri of this Run. An Oracle Cloud Infrastructure URI of the bucket to be used as default warehouse directory for BATCH SQL runs. See https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/hdfsconnector.htm#uriformat.
- Returns:
The warehouse_bucket_uri of this Run.
- Return type:
str
- watch(interval: int = 3) DataFlowRun
This is an alias of wait() method. It waits for a run to terminate.
- Parameters:
interval (int, optional) – interval to wait before probing again
- Returns:
a DataFlowRun instance
- Return type:
- class ads.jobs.DataFlowRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
CondaRuntime
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_ARCHIVE_BUCKET = 'archiveBucket'
- CONST_ARCHIVE_URI = 'archiveUri'
- CONST_ARGS = 'args'
- CONST_CONDA = 'conda'
- CONST_CONDA_AUTH_TYPE = 'condaAuthType'
- CONST_CONDA_REGION = 'region'
- CONST_CONDA_SLUG = 'slug'
- CONST_CONDA_TYPE = 'type'
- CONST_CONDA_TYPE_CUSTOM = 'published'
- CONST_CONDA_TYPE_SERVICE = 'service'
- CONST_CONDA_URI = 'uri'
- CONST_CONFIGURATION = 'configuration'
- CONST_ENV_VAR = 'env'
- CONST_MAXIMUM_RUNTIME_IN_MINUTES = 'maximumRuntimeInMinutes'
- CONST_OVERWRITE = 'overwrite'
- CONST_SCRIPT_BUCKET = 'scriptBucket'
- CONST_SCRIPT_PATH = 'scriptPathURI'
- CONST_TAG = 'freeformTags'
- property archive_bucket: str
Bucket to save archive zip
- property archive_uri
The Uri of archive zip
- property args: list
Command line arguments
- attribute_map = {'archiveUri': 'archive_uri', 'condaAuthType': 'conda_auth_type', 'configuration': 'configuration', 'env': 'env', 'freeformTags': 'freeform_tags', 'overwrite': 'overwrite', 'scriptBucket': 'script_bucket', 'scriptPathURI': 'script_path_uri'}
- property conda: dict
The conda environment specification.
For service conda environment, the specification contains:
type
, the type of the conda environment. This is alwaysservice
for service conda environment.slug
, the slug of the conda environment.
For custom conda environment, the specification contains:
type
, the type of the conda environment. This is alwayspublished
for custom conda environment.uri
, the uri of the conda environment, e.g. oci://bucket@namespace/prefix/to/condaregion
, the region of the bucket in which the conda environment is stored. By default, ADS will determine the region based on the authenticated API key or resource principal. This is only needed if your conda environment is stored in a different region.
- Returns:
A dictionary containing the conda environment specifications.
- Return type:
dict
- property configuration: dict
Configuration for Spark
- convert(**kwargs)
- property environment_variables: dict
Environment variables
- Returns:
The runtime environment variables. The returned dictionary is a copy.
- Return type:
dict
- property envs: dict
Environment variables
- property freeform_tags: dict
- classmethod from_dict(obj_dict: dict) Self
Initialize the object from a Python dictionary
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property kind: str
Kind of the object to be stored in YAML. All runtime implementations will have “runtime” as kind. Subclass will have different types.
- property maximum_runtime_in_minutes: int
Maximum runtime in minutes
- property overwrite: str
Whether to overwrite the existing script in object storage (script bucket).
- property script_bucket: str
Bucket to save script
- property script_uri: str
The URI of the source code
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- to_dict() dict
Converts the object to dictionary with kind, type and spec as keys.
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML
- with_archive_bucket(bucket) DataFlowRuntime
Set object storage bucket to save the archive zip, in case archive uri given is local.
- Parameters:
bucket (str) – name of the bucket
- Returns:
runtime instance itself
- Return type:
- with_archive_uri(uri: str) DataFlowRuntime
Set archive uri (which is a zip file containing dependencies).
- Parameters:
uri (str) – uri to the archive zip
- Returns:
runtime instance itself
- Return type:
- with_argument(*args, **kwargs) Self
Adds command line arguments to the runtime.
This method can be called (chained) multiple times to add various arguments.
- Parameters:
args – Positional arguments. In a single method call, positional arguments are always added before keyword arguments. You can call with_argument() to add positional arguments after keyword arguments.
kwargs – Keyword arguments. To add a keyword argument without value, set the value to None.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- Raises:
ValueError – Keyword arguments with space in a key.
Examples
>>> runtime = Runtime().with_argument(key1="val1", key2="val2").with_argument("pos1") >>> print(runtime.args) ["--key1", "val1", "--key2", "val2", "pos1"]
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1="val1", key2="val2.1 val2.2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1=None, key2="val2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ["pos1", "--key1", "--key2", "val2", "pos2"]
- with_conda(conda_spec: Optional[dict] = None)
- with_configuration(config: dict) DataFlowRuntime
Set Configuration for Spark.
- Parameters:
config (dict) – dictionary of configuration details https://spark.apache.org/docs/latest/configuration.html#available-properties. Example: { “spark.app.name” : “My App Name”, “spark.shuffle.io.maxRetries” : “4” }
- Returns:
runtime instance itself
- Return type:
- with_custom_conda(uri: str, region: Optional[str] = None, auth_type: Optional[str] = None)
Specifies the custom conda pack for running the job
- Parameters:
uri (str) – The OCI object storage URI for the conda pack, e.g. “oci://your_bucket@namespace/object_name.” In the Environment Explorer of an OCI notebook session, this is shown as the “source” of the conda pack.
region (str, optional) – The region of the bucket storing the custom conda pack, by default None. If region is not specified, ADS will use the region from your authentication credentials, * For API Key, config[“region”] is used. * For Resource Principal, signer.region is used. This is required if the conda pack is stored in a different region.
auth_type (str, (="resource_principal")) – One of “resource_principal”, “api_keys”, “instance_principal”, etc. Auth mechanism used to read the conda back uri provided.
- Returns:
The runtime instance.
- Return type:
self
See also
https
//docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm
- with_environment_variable(**kwargs) Self
Sets environment variables
Environment variables enclosed by
${...}
will be substituted.You can use
$$
to escape the substitution.Undefined variable enclosed by
${}
will be ignored.Double dollar signs
$$
will be substituted by a single one$
.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
Examples
>>> runtime = ( ... PythonRuntime() ... .with_environment_variable( ... HOST="10.0.0.1", ... PORT="443", ... URL="http://${HOST}:${PORT}/path/", ... ESCAPED_URL="http://$${HOST}:$${PORT}/path/", ... MISSING_VAR="This is ${UNDEFINED}", ... VAR_WITH_DOLLAR="$10", ... DOUBLE_DOLLAR="$$10" ... ) ... ) >>> for k, v in runtime.environment_variables.items(): ... print(f"{k}: {v}") HOST: 10.0.0.1 PORT: 443 URL: http://10.0.0.1:443/path/ ESCAPED_URL: http://${HOST}:${PORT}/path/ MISSING_VAR: This is ${UNDEFINED} VAR_WITH_DOLLAR: $10 DOUBLE_DOLLAR: $10
- with_freeform_tag(**kwargs) Self
Sets freeform tag
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_maximum_runtime_in_minutes(maximum_runtime_in_minutes: int) Self
Sets maximum runtime in minutes
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_overwrite(overwrite: bool) DataFlowRuntime
Whether to overwrite the existing script in object storage (script bucket). If the Object Storage bucket already contains a script with the same name, then it will be overwritten with the new one if the overwrite flag equal to True.
- Parameters:
overwrite (bool) – Whether to overwrite the existing script in object storage (script bucket).
- Returns:
The DataFlowRuntime instance (self).
- Return type:
- with_script_bucket(bucket) DataFlowRuntime
Set object storage bucket to save the script, in case script uri given is local.
- Parameters:
bucket (str) – name of the bucket
- Returns:
runtime instance itself
- Return type:
- with_script_uri(path: str) DataFlowRuntime
Set script uri.
- Parameters:
path (str) – uri to the script
- Returns:
runtime instance itself
- Return type:
- with_service_conda(slug: str)
Specifies the service conda pack for running the job
- Parameters:
slug (str) – The slug name of the service conda pack
- Returns:
The runtime instance.
- Return type:
self
- class ads.jobs.DataScienceJob(spec: Optional[Dict] = None, **kwargs)
Bases:
Infrastructure
Represents the OCI Data Science Job infrastructure.
To configure the infrastructure for a Data Science Job:
infrastructure = ( DataScienceJob() # Configure logging for getting the job run outputs. .with_log_group_id("<log_group_ocid>") # Log resource will be auto-generated if log ID is not specified. .with_log_id("<log_ocid>") # If you are in an OCI data science notebook session, # the following configurations are not required. # Configurations from the notebook session will be used as defaults. .with_compartment_id("<compartment_ocid>") .with_project_id("<project_ocid>") .with_subnet_id("<subnet_ocid>") .with_shape_name("VM.Standard.E3.Flex") # Shape config details are applicable only for the flexible shapes. .with_shape_config_details(memory_in_gbs=16, ocpus=1) # Minimum/Default block storage size is 50 (GB). .with_block_storage_size(50) )
Initializes a data science job infrastructure
- Parameters:
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_BLOCK_STORAGE = 'blockStorageSize'
- CONST_COMPARTMENT_ID = 'compartmentId'
- CONST_DISPLAY_NAME = 'displayName'
- CONST_JOB_INFRA = 'jobInfrastructureType'
- CONST_JOB_TYPE = 'jobType'
- CONST_LOG_GROUP_ID = 'logGroupId'
- CONST_LOG_ID = 'logId'
- CONST_MEMORY_IN_GBS = 'memoryInGBs'
- CONST_OCPUS = 'ocpus'
- CONST_PROJECT_ID = 'projectId'
- CONST_SHAPE_CONFIG_DETAILS = 'shapeConfigDetails'
- CONST_SHAPE_NAME = 'shapeName'
- CONST_SUBNET_ID = 'subnetId'
- attribute_map = {'blockStorageSize': 'block_storage_size', 'compartmentId': 'compartment_id', 'displayName': 'display_name', 'jobInfrastructureType': 'job_infrastructure_type', 'jobType': 'job_type', 'logGroupId': 'log_group_id', 'logId': 'log_id', 'projectId': 'project_id', 'shapeConfigDetails': 'shape_config_details', 'shapeName': 'shape_name', 'subnetId': 'subnet_id'}
- property block_storage_size: int
Block storage size for the job
- build() DataScienceJob
- property compartment_id: Optional[str]
The compartment OCID
- create(runtime, **kwargs) DataScienceJob
Creates a job with runtime.
- Parameters:
runtime (Runtime) – An ADS job runtime.
- Returns:
The DataScienceJob instance (self)
- Return type:
- delete() None
Deletes a job
- classmethod fast_launch_shapes(compartment_id: Optional[str] = None, **kwargs) list
Lists the supported fast launch shapes for running jobs in a compartment.
- Parameters:
compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
- Returns:
A list of oci.data_science.models.FastLaunchJobConfigSummary objects containing the information of the supported shapes.
- Return type:
list
Examples
To get a list of shape names:
shapes = DataScienceJob.fast_launch_shapes( compartment_id=os.environ["PROJECT_COMPARTMENT_OCID"] ) shape_names = [shape.shape_name for shape in shapes]
- classmethod from_dict(obj_dict: dict) Self
Initialize the object from a Python dictionary
- classmethod from_dsc_job(dsc_job: DSCJob) DataScienceJob
Initialize a DataScienceJob instance from a DSCJob
- Parameters:
dsc_job (DSCJob) – An instance of DSCJob
- Returns:
An instance of DataScienceJob
- Return type:
- classmethod from_id(job_id: str) DataScienceJob
Gets an existing job using Job OCID
- Parameters:
job_id (str) – Job OCID
- Returns:
An instance of DataScienceJob
- Return type:
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- classmethod instance_shapes(compartment_id: Optional[str] = None, **kwargs) list
Lists the supported shapes for running jobs in a compartment.
- Parameters:
compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
- Returns:
A list of oci.data_science.models.JobShapeSummary objects containing the information of the supported shapes.
- Return type:
list
Examples
To get a list of shape names:
shapes = DataScienceJob.fast_launch_shapes( compartment_id=os.environ["PROJECT_COMPARTMENT_OCID"] ) shape_names = [shape.name for shape in shapes]
- property job_id: Optional[str]
The OCID of the job
- property job_infrastructure_type: Optional[str]
Job infrastructure type
- property job_type: Optional[str]
Job type
- property kind: str
Kind of the object to be stored in YAML. All runtimes will have “infrastructure” as kind. Subclass will have different types.
- classmethod list_jobs(compartment_id: Optional[str] = None, **kwargs) List[DataScienceJob]
Lists all jobs in a compartment.
- Parameters:
compartment_id (str, optional) – The compartment ID for running the jobs, by default None. This is optional in a OCI Data Science notebook session. If this is not specified, the compartment ID of the notebook session will be used.
**kwargs – Keyword arguments to be passed into OCI list_jobs API for filtering the jobs.
- Returns:
A list of DataScienceJob object.
- Return type:
List[DataScienceJob]
- property log_group_id: str
Log group OCID of the data science job
- Returns:
Log group OCID
- Return type:
str
- property log_id: str
Log OCID for the data science job.
- Returns:
Log OCID
- Return type:
str
- property name: str
Display name of the job
- payload_attribute_map = {'blockStorageSize': 'job_infrastructure_configuration_details.block_storage_size_in_gbs', 'compartmentId': 'compartment_id', 'displayName': 'display_name', 'jobInfrastructureType': 'job_infrastructure_configuration_details.job_infrastructure_type', 'jobType': 'job_configuration_details.job_type', 'logGroupId': 'job_log_configuration_details.log_group_id', 'logId': 'job_log_configuration_details.log_id', 'projectId': 'project_id', 'shapeConfigDetails': 'job_infrastructure_configuration_details.job_shape_config_details', 'shapeName': 'job_infrastructure_configuration_details.shape_name', 'subnetId': 'job_infrastructure_configuration_details.subnet_id'}
- property project_id: Optional[str]
Project OCID
- run(name=None, args=None, env_var=None, freeform_tags=None, wait=False) DataScienceJobRun
Runs a job on OCI Data Science job
- Parameters:
name (str, optional) – The name of the job run, by default None.
args (str, optional) – Command line arguments for the job run, by default None.
env_var (dict, optional) – Environment variable for the job run, by default None
freeform_tags (dict, optional) – Freeform tags for the job run, by default None
wait (bool, optional) – Indicate if this method should wait for the run to finish before it returns, by default False.
- Returns:
A Data Science Job Run instance.
- Return type:
- run_list(**kwargs) List[DataScienceJobRun]
Gets a list of job runs.
- Parameters:
**kwargs – Keyword arguments for filtering the job runs. These arguments will be passed to OCI API.
- Returns:
A list of job runs.
- Return type:
List[DSCJobRun]
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- property shape_config_details: Dict
The details for the job run shape configuration.
- shape_config_details_attribute_map = {'memoryInGBs': 'memory_in_gbs', 'ocpus': 'ocpus'}
- property shape_name: Optional[str]
Shape name
- snake_to_camel_map = {'block_storage_size_in_gbs': 'blockStorageSize', 'compartment_id': 'compartmentId', 'display_name': 'displayName', 'job_infrastructure_type': 'jobInfrastructureType', 'job_shape_config_details': 'shapeConfigDetails', 'job_type': 'jobType', 'log_group_id': 'logGroupId', 'log_id': 'logId', 'project_id': 'projectId', 'shape_name': 'shapeName', 'subnet_id': 'subnetId'}
- static standardize_spec(spec)
- property status: Optional[str]
Status of the job.
- Returns:
Status of the job.
- Return type:
str
- property subnet_id: str
Subnet ID
- to_dict() dict
Converts the object to dictionary with kind, type and spec as keys.
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML.
This implementation returns the class name with the first letter coverted to lower case.
- with_block_storage_size(size_in_gb: int) DataScienceJob
Sets the block storage size in GB
- Parameters:
size_in_gb (int) – Block storage size in GB
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_compartment_id(compartment_id: str) DataScienceJob
Sets the compartment OCID
- Parameters:
compartment_id (str) – The compartment OCID
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_job_infrastructure_type(infrastructure_type: str) DataScienceJob
Sets the job infrastructure type
- Parameters:
infrastructure_type (str) – Job infrastructure type as string
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_job_type(job_type: str) DataScienceJob
Sets the job type
- Parameters:
job_type (str) – Job type as string
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_log_group_id(log_group_id: str) DataScienceJob
Sets the log group OCID for the data science job. If log group ID is specified but log ID is not, a new log resource will be created automatically for each job run to store the logs.
- Parameters:
log_group_id (str) – Log Group OCID
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_log_id(log_id: str) DataScienceJob
Sets the log OCID for the data science job. If log ID is specified, setting the log group ID (with_log_group_id()) is not strictly needed. ADS will look up the log group ID automatically. However, this may require additional permission, and the look up may not be available for newly created log group. Specifying both log ID (with_log_id()) and log group ID (with_log_group_id()) can avoid such lookup and speed up the job creation.
- Parameters:
log_id (str) – Log resource OCID.
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_project_id(project_id: str) DataScienceJob
Sets the project OCID
- Parameters:
project_id (str) – The project OCID
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_shape_config_details(memory_in_gbs: float, ocpus: float, **kwargs: Dict[str, Any]) DataScienceJob
Sets the details for the job run shape configuration. Specify only when a flex shape is selected. For example VM.Standard.E3.Flex allows the memory_in_gbs and cpu count to be specified.
- Parameters:
memory_in_gbs (float) – The size of the memory in GBs.
ocpus (float) – The OCPUs count.
kwargs – Additional keyword arguments.
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_shape_name(shape_name: str) DataScienceJob
Sets the shape name for running the job
- Parameters:
shape_name (str) – Shape name
- Returns:
The DataScienceJob instance (self)
- Return type:
- with_subnet_id(subnet_id: str) DataScienceJob
Sets the subnet ID
- Parameters:
subnet_id (str) – Subnet ID
- Returns:
The DataScienceJob instance (self)
- Return type:
- class ads.jobs.DataScienceJobRun(config: Optional[dict] = None, signer: Optional[Signer] = None, client_kwargs: Optional[dict] = None, **kwargs)
Bases:
OCIDataScienceMixin
,JobRun
,RunInstance
Represents a Data Science Job run
Initializes a service/resource with OCI client as a property. If config or signer is specified, it will be used to initialize the OCI client. If neither of them is specified, the client will be initialized with ads.common.auth.default_signer. If both of them are specified, both of them will be passed into the OCI client,
and the authentication will be determined by OCI Python SDK.
- Parameters:
config (dict, optional) – OCI API key config dictionary, by default None.
signer (oci.signer.Signer, optional) – OCI authentication signer, by default None.
client_kwargs (dict, optional) – Additional keyword arguments for initializing the OCI client.
- CONS_COMPARTMENT_ID = 'compartment_id'
- LIFECYCLE_STATE_ACCEPTED = 'ACCEPTED'
A constant which can be used with the lifecycle_state property of a JobRun. This constant has a value of “ACCEPTED”
- LIFECYCLE_STATE_CANCELED = 'CANCELED'
A constant which can be used with the lifecycle_state property of a JobRun. This constant has a value of “CANCELED”
- LIFECYCLE_STATE_CANCELING = 'CANCELING'
A constant which can be used with the lifecycle_state property of a JobRun. This constant has a value of “CANCELING”
- LIFECYCLE_STATE_DELETED = 'DELETED'
A constant which can be used with the lifecycle_state property of a JobRun. This constant has a value of “DELETED”
- LIFECYCLE_STATE_FAILED = 'FAILED'
A constant which can be used with the lifecycle_state property of a JobRun. This constant has a value of “FAILED”
- LIFECYCLE_STATE_IN_PROGRESS = 'IN_PROGRESS'
A constant which can be used with the lifecycle_state property of a JobRun. This constant has a value of “IN_PROGRESS”
- LIFECYCLE_STATE_NEEDS_ATTENTION = 'NEEDS_ATTENTION'
A constant which can be used with the lifecycle_state property of a JobRun. This constant has a value of “NEEDS_ATTENTION”
- LIFECYCLE_STATE_SUCCEEDED = 'SUCCEEDED'
A constant which can be used with the lifecycle_state property of a JobRun. This constant has a value of “SUCCEEDED”
- OCI_MODEL_PATTERN = 'oci.[^.]+\\.models[\\..*]?'
- TERMINAL_STATES = ['SUCCEEDED', 'FAILED', 'CANCELED', 'DELETED']
- property auth: dict
The ADS authentication config used to initialize the client. This auth has the same format as those obtained by calling functions in ads.common.auth. The config is a dict containing the following key-value pairs: config: The config contains the config loaded from the configuration loaded from oci_config. signer: The signer contains the signer object created from the api keys. client_kwargs: client_kwargs contains the client_kwargs that was passed in as input parameter.
- cancel() DataScienceJobRun
Cancels a job run This method will wait for the job run to be canceled before returning.
- Returns:
The job run instance.
- Return type:
self
- static check_compartment_id(compartment_id: Optional[str]) str
- Checks if a compartment ID has value and
return the value from NB_SESSION_COMPARTMENT_OCID environment variable if it is not specified.
- Parameters:
compartment_id (str) – Compartment OCID or None
- Returns:
str: Compartment OCID
- Return type:
type
- Raises:
ValueError – compartment_id is not specified and NB_SESSION_COMPARTMENT_OCID environment variable is not set
- property client: DataScienceClient
OCI client
- property client_composite: DataScienceClientCompositeOperations
- property compartment_id
[Required] Gets the compartment_id of this JobRun. The OCID of the compartment where you want to create the job.
- Returns:
The compartment_id of this JobRun.
- Return type:
str
- config = None
- create() DataScienceJobRun
Creates a job run
- classmethod create_instance(*args, **kwargs)
Creates an instance using the same authentication as the class or an existing instance. If this method is called by a class, the default ADS authentication method will be used. If this method is called by an instance, the authentication method set in the instance will be used.
- property created_by
[Required] Gets the created_by of this JobRun. The OCID of the user who created the job run.
- Returns:
The created_by of this JobRun.
- Return type:
str
- property defined_tags
Gets the defined_tags of this JobRun. Defined tags for this resource. Each key is predefined and scoped to a namespace. See Resource Tags. Example: {“Operations”: {“CostCenter”: “42”}}
- Returns:
The defined_tags of this JobRun.
- Return type:
dict(str, dict(str, object))
- delete()
Deletes the resource
- classmethod deserialize(data: dict, to_cls: Optional[str] = None)
Deserialize data
- Parameters:
data (dict) – A dictionary containing the data to be deserialized.
to_cls (str) – The name of the OCI model class to be initialized using the data. The OCI model class must be from the same OCI service of the OCI client (self.client). Defaults to None, the parent OCI model class name will be used if current class is inherited from an OCI model. If parent OCI model class is not found or not from the same OCI service, the data will be returned as is.
- property display_name
Gets the display_name of this JobRun. A user-friendly display name for the resource.
- Returns:
The display_name of this JobRun.
- Return type:
str
- download(to_dir)
Downloads files from job run output URI to local.
- Parameters:
to_dir (str) – Local directory to which the files will be downloaded to.
- Returns:
The job run instance (self)
- Return type:
- static flatten(data: dict) dict
Flattens a nested dictionary.
- Parameters:
data (A nested dictionary) –
- Returns:
The flattened dictionary.
- Return type:
dict
- property freeform_tags
Gets the freeform_tags of this JobRun. Free-form tags for this resource. Each tag is a simple key-value pair with no predefined name, type, or namespace. See Resource Tags. Example: {“Department”: “Finance”}
- Returns:
The freeform_tags of this JobRun.
- Return type:
dict(str, str)
- classmethod from_dict(data)
Initialize an instance from a dictionary.
- Parameters:
data (dict) – A dictionary containing the properties to initialize the class.
- classmethod from_oci_model(oci_instance)
Initialize an instance from an instance of OCI model.
- Parameters:
oci_instance – An instance of an OCI model.
- classmethod from_ocid(ocid: str)
Initializes an object from OCID
- Parameters:
ocid (str) – The OCID of the object
- property id
[Required] Gets the id of this JobRun. The OCID of the job run.
- Returns:
The id of this JobRun.
- Return type:
str
- classmethod init_client(**kwargs) DataScienceClient
Initializes the OCI client specified in the “client” keyword argument Sub-class should override this method and call cls._init_client(client=OCI_CLIENT)
- Parameters:
**kwargs – Additional keyword arguments for initializing the OCI client.
- Return type:
An instance of OCI client.
- property job_configuration_override_details
[Required] Gets the job_configuration_override_details of this JobRun.
- Returns:
The job_configuration_override_details of this JobRun.
- Return type:
oci.data_science.models.JobConfigurationDetails
- property job_id
[Required] Gets the job_id of this JobRun. The OCID of the job run.
- Returns:
The job_id of this JobRun.
- Return type:
str
- property job_infrastructure_configuration_details
[Required] Gets the job_infrastructure_configuration_details of this JobRun.
- Returns:
The job_infrastructure_configuration_details of this JobRun.
- Return type:
oci.data_science.models.JobInfrastructureConfigurationDetails
- property job_log_configuration_override_details
Gets the job_log_configuration_override_details of this JobRun.
- Returns:
The job_log_configuration_override_details of this JobRun.
- Return type:
oci.data_science.models.JobLogConfigurationDetails
- kwargs = None
- property lifecycle_details
Gets the lifecycle_details of this JobRun. Details of the state of the job run.
- Returns:
The lifecycle_details of this JobRun.
- Return type:
str
- property lifecycle_state
[Required] Gets the lifecycle_state of this JobRun. The state of the job run.
Allowed values for this property are: “ACCEPTED”, “IN_PROGRESS”, “FAILED”, “SUCCEEDED”, “CANCELING”, “CANCELED”, “DELETED”, “NEEDS_ATTENTION”, ‘UNKNOWN_ENUM_VALUE’. Any unrecognized values returned by a service will be mapped to ‘UNKNOWN_ENUM_VALUE’.
- Returns:
The lifecycle_state of this JobRun.
- Return type:
str
- classmethod list_resource(compartment_id: Optional[str] = None, limit: int = 0, **kwargs) list
Generic method to list OCI resources
- Parameters:
compartment_id (str) – Compartment ID of the OCI resources. Defaults to None. If compartment_id is not specified, the value of NB_SESSION_COMPARTMENT_OCID in environment variable will be used.
limit (int) – The maximum number of items to return. Defaults to 0, All items will be returned
**kwargs – Additional keyword arguments to filter the resource. The kwargs are passed into OCI API.
- Returns:
A list of OCI resources
- Return type:
list
- Raises:
NotImplementedError – List method is not supported or implemented.
- load_properties_from_env()
Loads properties from the environment
- property log_details
Gets the log_details of this JobRun.
- Returns:
The log_details of this JobRun.
- Return type:
oci.data_science.models.JobRunLogDetails
- property log_group_id: str
The log group ID from OCI logging service containing the logs from the job run.
- property log_id: str
The log ID from OCI logging service containing the logs from the job run.
- logs(limit: Optional[int] = None) list
Gets the logs of the job run.
- Parameters:
limit (int, optional) – Limit the number of logs to be returned. Defaults to None. All logs will be returned.
- Returns:
A list of log records. Each log record is a dictionary with the following keys: id, time, message.
- Return type:
list
- property name: str
Gets the name of the object.
- property project_id
[Required] Gets the project_id of this JobRun. The OCID of the project to associate the job with.
- Returns:
The project_id of this JobRun.
- Return type:
str
- property run_details_link: str
Link to run details page in OCI console
- Returns:
The link to the details page in OCI console.
- Return type:
str
- serialize()
Serialize the model to a dictionary that is ready to be send to OCI API.
- Returns:
A dictionary that is ready to be send to OCI API.
- Return type:
dict
- signer = None
- property status: str
Lifecycle status
- Returns:
Status in a string.
- Return type:
str
- sync(merge_strategy: MergeStrategy = MergeStrategy.OVERRIDE)
Refreshes the properties of the object from OCI
- property time_accepted
[Required] Gets the time_accepted of this JobRun. The date and time the job run was accepted in the timestamp format defined by RFC3339.
- Returns:
The time_accepted of this JobRun.
- Return type:
datetime
- property time_finished
Gets the time_finished of this JobRun. The date and time the job run request was finished in the timestamp format defined by RFC3339.
- Returns:
The time_finished of this JobRun.
- Return type:
datetime
- property time_started
Gets the time_started of this JobRun. The date and time the job run request was started in the timestamp format defined by RFC3339.
- Returns:
The time_started of this JobRun.
- Return type:
datetime
- to_dict(flatten: bool = False) dict
Converts the properties to a dictionary
- Parameters:
flatten – (Default value = False)
- to_oci_model(oci_model)
Converts the object into an instance of OCI data model.
- Parameters:
oci_model (class or str) – The OCI model to be converted to. This can be a string of the model name.
type_mapping (dict) – A dictionary mapping the models. Returns: An instance of the oci_model
- to_yaml() str
Serializes the object into YAML string.
- Returns:
YAML stored in a string.
- Return type:
str
- type_mappings = None
- update_from_oci_model(oci_model_instance, merge_strategy: MergeStrategy = MergeStrategy.OVERRIDE)
Updates the properties from OCI model with the same properties.
- Parameters:
oci_model_instance – An instance of OCI model, which should have the same properties of this class.
- watch(interval: float = 3, wait: float = 90) DataScienceJobRun
Watches the job run until it finishes. Before the job start running, this method will output the job run status. Once the job start running, the logs will be streamed until the job is success, failed or cancelled.
- Parameters:
interval (float) – Time interval in seconds between each request to update the logs. Defaults to 3 (seconds).
wait (float) – Time in seconds to keep updating the logs after the job run finished. It may take some time for logs to appear in OCI logging service after the job run is finished. Defaults to 90 (seconds).
- class ads.jobs.GitPythonRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
CondaRuntime
,_PythonRuntimeMixin
Represents a job runtime with source code from git repository
Example:
runtime = ( GitPythonRuntime() .with_environment_variable(GREETINGS="Welcome to OCI Data Science") # Specify the service conda environment by slug name. .with_service_conda("pytorch19_p37_gpu_v1") # Specify the git repository # Optionally, you can specify the branch or commit .with_source("https://github.com/pytorch/tutorials.git") # Entrypoint is a relative path from the root of the git repo. .with_entrypoint("beginner_source/examples_nn/polynomial_nn.py") # Copy files in "beginner_source/examples_nn" to object storage after job finishes. .with_output( output_dir="beginner_source/examples_nn", output_uri="oci://bucket_name@namespace/path/to/dir" ) )
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_ARGS = 'args'
- CONST_BRANCH = 'branch'
- CONST_COMMIT = 'commit'
- CONST_CONDA = 'conda'
- CONST_CONDA_REGION = 'region'
- CONST_CONDA_SLUG = 'slug'
- CONST_CONDA_TYPE = 'type'
- CONST_CONDA_TYPE_CUSTOM = 'published'
- CONST_CONDA_TYPE_SERVICE = 'service'
- CONST_CONDA_URI = 'uri'
- CONST_ENTRYPOINT = 'entrypoint'
- CONST_ENTRY_FUNCTION = 'entryFunction'
- CONST_ENV_VAR = 'env'
- CONST_GIT_SSH_SECRET_ID = 'gitSecretId'
- CONST_GIT_URL = 'url'
- CONST_MAXIMUM_RUNTIME_IN_MINUTES = 'maximumRuntimeInMinutes'
- CONST_OUTPUT_DIR = 'outputDir'
- CONST_OUTPUT_URI = 'outputUri'
- CONST_PYTHON_PATH = 'pythonPath'
- CONST_SKIP_METADATA = 'skipMetadataUpdate'
- CONST_TAG = 'freeformTags'
- CONST_WORKING_DIR = 'workingDir'
- property args: list
Command line arguments
- attribute_map = {'branch': 'branch', 'commit': 'commit', 'conda': 'conda', 'entryFunction': 'entry_function', 'entrypoint': 'entrypoint', 'env': 'env', 'freeformTags': 'freeform_tags', 'gitSecretId': 'git_secret_id', 'outputDir': 'output_dir', 'outputUri': 'output_uri', 'pythonPath': 'python_path', 'skipMetadataUpdate': 'skip_metadata_update', 'url': 'url', 'workingDir': 'working_dir'}
- property branch: str
Git branch name.
- property commit: str
Git commit ID (SHA1 hash)
- property conda: dict
The conda environment specification.
For service conda environment, the specification contains:
type
, the type of the conda environment. This is alwaysservice
for service conda environment.slug
, the slug of the conda environment.
For custom conda environment, the specification contains:
type
, the type of the conda environment. This is alwayspublished
for custom conda environment.uri
, the uri of the conda environment, e.g. oci://bucket@namespace/prefix/to/condaregion
, the region of the bucket in which the conda environment is stored. By default, ADS will determine the region based on the authenticated API key or resource principal. This is only needed if your conda environment is stored in a different region.
- Returns:
A dictionary containing the conda environment specifications.
- Return type:
dict
- property entry_function: str
The name of the entry function in the entry script
- property entry_script: str
The path of the entry script
- property environment_variables: dict
Environment variables
- Returns:
The runtime environment variables. The returned dictionary is a copy.
- Return type:
dict
- property envs: dict
Environment variables
- property freeform_tags: dict
- classmethod from_dict(obj_dict: dict) Self
Initialize the object from a Python dictionary
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property kind: str
Kind of the object to be stored in YAML. All runtime implementations will have “runtime” as kind. Subclass will have different types.
- property maximum_runtime_in_minutes: int
Maximum runtime in minutes
- property output_dir: str
Directory in the Job run container for saving output files generated in the job
- property output_uri: str
OCI object storage URI prefix for saving output files generated in the job
- property python_path
Additional python paths for running the source code.
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- property skip_metadata_update
Indicate if the metadata update should be skipped after the job run
By default, the job run metadata will be updated with the following freeform tags: * repo: The URL of the Git repository * commit: The Git commit ID * module: The entry script/module * method: The entry function/method * outputs. The prefix of the output files in object storage.
This update step also requires resource principals to have the permission to update the job run.
- Returns:
True if the metadata update will be skipped. Otherwise False.
- Return type:
bool
- property ssh_secret_ocid: str
The OCID of the OCI Vault secret storing the Git SSH key.
- to_dict() dict
Converts the object to dictionary with kind, type and spec as keys.
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML
- property url: str
URL of the Git repository.
- with_argument(*args, **kwargs) Self
Adds command line arguments to the runtime.
This method can be called (chained) multiple times to add various arguments.
- Parameters:
args – Positional arguments. In a single method call, positional arguments are always added before keyword arguments. You can call with_argument() to add positional arguments after keyword arguments.
kwargs – Keyword arguments. To add a keyword argument without value, set the value to None.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- Raises:
ValueError – Keyword arguments with space in a key.
Examples
>>> runtime = Runtime().with_argument(key1="val1", key2="val2").with_argument("pos1") >>> print(runtime.args) ["--key1", "val1", "--key2", "val2", "pos1"]
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1="val1", key2="val2.1 val2.2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1=None, key2="val2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ["pos1", "--key1", "--key2", "val2", "pos2"]
- with_custom_conda(uri: str, region: Optional[str] = None)
Specifies the custom conda pack for running the job Make sure you have configured the IAM policy for the job run to access the conda environment.
- Parameters:
uri (str) – The OCI object storage URI for the conda pack, e.g. “oci://your_bucket@namespace/object_name.” In the Environment Explorer of an OCI notebook session, this is shown as the “source” of the conda pack.
region (str, optional) –
The region of the bucket storing the custom conda pack, by default None. If region is not specified, ADS will use the region from your authentication credentials:
For API Key, config[“region”] is used.
For Resource Principal, signer.region is used.
This is required if the conda pack is stored in a different region.
- Returns:
The runtime instance.
- Return type:
self
See also
https
//docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm
- with_entrypoint(path: str, func: Optional[str] = None)
Specifies the entrypoint for the job. The entrypoint can be a script or a function in a script.
- Parameters:
script (str) – The relative path for the script/module starting the job.
func (str, optional) – The function name in the script for starting the job, by default None. If this is not specified, the script will be run with python command in a subprocess.
- Returns:
The runtime instance.
- Return type:
self
- with_environment_variable(**kwargs) Self
Sets environment variables
Environment variables enclosed by
${...}
will be substituted.You can use
$$
to escape the substitution.Undefined variable enclosed by
${}
will be ignored.Double dollar signs
$$
will be substituted by a single one$
.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
Examples
>>> runtime = ( ... PythonRuntime() ... .with_environment_variable( ... HOST="10.0.0.1", ... PORT="443", ... URL="http://${HOST}:${PORT}/path/", ... ESCAPED_URL="http://$${HOST}:$${PORT}/path/", ... MISSING_VAR="This is ${UNDEFINED}", ... VAR_WITH_DOLLAR="$10", ... DOUBLE_DOLLAR="$$10" ... ) ... ) >>> for k, v in runtime.environment_variables.items(): ... print(f"{k}: {v}") HOST: 10.0.0.1 PORT: 443 URL: http://10.0.0.1:443/path/ ESCAPED_URL: http://${HOST}:${PORT}/path/ MISSING_VAR: This is ${UNDEFINED} VAR_WITH_DOLLAR: $10 DOUBLE_DOLLAR: $10
- with_freeform_tag(**kwargs) Self
Sets freeform tag
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_maximum_runtime_in_minutes(maximum_runtime_in_minutes: int) Self
Sets maximum runtime in minutes
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_output(output_dir: str, output_uri: str)
Specifies the outputs of the job. The output files in output_dir will be copied to remote output_uri when the job is finished.
- Parameters:
output_dir (str) – Path to the output directory in the job run. This path should be a relative path from the working directory. The source code should write all outputs into this directory.
output_uri (str) – The OCI object storage URI prefix for saving the output files. For example, oci://bucket_name@namespace/path/to/directory
- Returns:
The runtime instance.
- Return type:
Self
- with_python_path(*python_paths)
Specifies additional python paths for running the source code.
- Parameters:
*python_paths – Additional python path(s) for running the source code. Each path should be a relative path from the working directory.
- Returns:
The runtime instance.
- Return type:
self
- with_service_conda(slug: str)
Specifies the service conda pack for running the job
- Parameters:
slug (str) – The slug name of the service conda pack
- Returns:
The runtime instance.
- Return type:
self
- with_source(url: str, branch: Optional[str] = None, commit: Optional[str] = None, secret_ocid: Optional[str] = None)
Specifies the Git repository and branch/commit for the job source code.
- Parameters:
url (str) – URL of the Git repository.
branch (str, optional) – Git branch name, by default None, the default branch will be used.
commit (str, optional) – Git commit ID (SHA1 hash), by default None, the most recent commit will be used.
secret_ocid (str) – The secret OCID storing the SSH key content for checking out the Git repository.
- Returns:
The runtime instance.
- Return type:
self
- with_working_dir(working_dir: str)
Specifies the working directory in the job run. By default, the working directory will the directory containing the user code (job artifact directory). This can be changed by specifying a relative path to the job artifact directory.
- Parameters:
working_dir (str) – The path of the working directory. This can be a relative path from the job artifact directory.
- Returns:
The runtime instance.
- Return type:
self
- property working_dir: str
The working directory for the job run.
- class ads.jobs.Job(name: Optional[str] = None, infrastructure=None, runtime=None)
Bases:
Builder
Represents a Job defined by infrastructure and runtime.
Examples
Here is an example for creating and running a job:
from ads.jobs import Job, DataScienceJob, PythonRuntime # Define an OCI Data Science job to run a python script job = ( Job(name="<job_name>") .with_infrastructure( DataScienceJob() # Configure logging for getting the job run outputs. .with_log_group_id("<log_group_ocid>") # Log resource will be auto-generated if log ID is not specified. .with_log_id("<log_ocid>") # If you are in an OCI data science notebook session, # the following configurations are not required. # Configurations from the notebook session will be used as defaults. .with_compartment_id("<compartment_ocid>") .with_project_id("<project_ocid>") .with_subnet_id("<subnet_ocid>") .with_shape_name("VM.Standard.E3.Flex") # Shape config details are applicable only for the flexible shapes. .with_shape_config_details(memory_in_gbs=16, ocpus=1) # Minimum/Default block storage size is 50 (GB). .with_block_storage_size(50) ) .with_runtime( PythonRuntime() # Specify the service conda environment by slug name. .with_service_conda("pytorch110_p38_cpu_v1") # The job artifact can be a single Python script, a directory or a zip file. .with_source("local/path/to/code_dir") # Environment variable .with_environment_variable(NAME="Welcome to OCI Data Science.") # Command line argument, arg1 --key arg2 .with_argument("arg1", key="arg2") # Set the working directory # When using a directory as source, the default working dir is the parent of code_dir. # Working dir should be a relative path beginning from the source directory (code_dir) .with_working_dir("code_dir") # The entrypoint is applicable only to directory or zip file as source # The entrypoint should be a path relative to the working dir. # Here my_script.py is a file in the code_dir/my_package directory .with_entrypoint("my_package/my_script.py") # Add an additional Python path, relative to the working dir (code_dir/other_packages). .with_python_path("other_packages") # Copy files in "code_dir/output" to object storage after job finishes. .with_output("output", "oci://bucket_name@namespace/path/to/dir") ) ) # Create and Run the job run = job.create().run() # Stream the job run outputs run.watch()
If you are in an OCI notebook session and you would like to use the same infrastructure configurations, the infrastructure configuration can be simplified. Here is another example of creating and running a jupyter notebook as a job:
from ads.jobs import Job, DataScienceJob, NotebookRuntime # Define an OCI Data Science job to run a jupyter Python notebook job = ( Job(name="<job_name>") .with_infrastructure( # The same configurations as the OCI notebook session will be used. DataScienceJob() .with_log_group_id("<log_group_ocid>") .with_log_id("<log_ocid>") ) .with_runtime( NotebookRuntime() .with_notebook("path/to/notebook.ipynb") .with_service_conda(tensorflow28_p38_cpu_v1") # Saves the notebook with outputs to OCI object storage. .with_output("oci://bucket_name@namespace/path/to/dir") ) ).create() # Run and monitor the job run = job.run().watch() # Download the notebook and outputs to local directory run.download(to_dir="path/to/local/dir/")
See also
https
//docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/user_guide/jobs/index.html
Initializes a job.
- The infrastructure and runtime can be configured when initializing the job,
or by calling with_infrastructure() and with_runtime().
The infrastructure should be a subclass of ADS job Infrastructure, e.g., DataScienceJob, DataFlow. The runtime should be a subclass of ADS job Runtime, e.g., PythonRuntime, NotebookRuntime.
- Parameters:
name (str, optional) – The name of the job, by default None. If it is None, a default name may be generated by the infrastructure, depending on the implementation of the infrastructure. For OCI data science job, the default name contains the job artifact name and a timestamp. If no artifact, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
infrastructure (Infrastructure, optional) – Job infrastructure, by default None
runtime (Runtime, optional) – Job runtime, by default None.
- attribute_map = {}
- create(**kwargs) Job
Creates the job on the infrastructure.
- Returns:
The job instance (self)
- Return type:
- static dataflow_job(compartment_id: Optional[str] = None, **kwargs) List[Job]
List data flow jobs under a given compartment.
- Parameters:
compartment_id (str) – compartment id
kwargs – additional keyword arguments
- Returns:
list of Job instances
- Return type:
List[Job]
- static datascience_job(compartment_id: Optional[str] = None, **kwargs) List[DataScienceJob]
Lists the existing data science jobs in the compartment.
- Parameters:
compartment_id (str) – The compartment ID for listing the jobs. This is optional if running in an OCI notebook session. The jobs in the same compartment of the notebook session will be returned.
- Returns:
A list of Job objects.
- Return type:
list
- delete() None
Deletes the job from the infrastructure.
- download(to_dir: str, output_uri=None, **storage_options)
Downloads files from remote output URI to local.
- Parameters:
to_dir (str) – Local directory to which the files will be downloaded to.
output_uri ((str, optional). Default is None.) – The remote URI from which the files will be downloaded. Defaults to None. If output_uri is not specified, this method will try to get the output_uri from the runtime.
storage_options – Extra keyword arguments for particular storage connection. This method uses fsspec to download the files from remote URI. storage_options will to be passed into fsspec.open_files().
- Returns:
The job instance (self)
- Return type:
- Raises:
AttributeError – The output_uri is not specified and the runtime is not configured with output_uri.
- static from_dataflow_job(job_id: str) Job
Create a Data Flow job given a job id.
- Parameters:
job_id (str) – id of the job
- Returns:
a Job instance
- Return type:
- static from_datascience_job(job_id) Job
Loads a data science job from OCI.
- Parameters:
job_id (str) – OCID of an existing data science job.
- Returns:
A job instance.
- Return type:
- classmethod from_dict(config: dict) Job
Initializes a job from a dictionary containing the configurations.
- Parameters:
config (dict) – A dictionary containing the infrastructure and runtime specifications.
- Returns:
A job instance
- Return type:
- Raises:
NotImplementedError – If the type of the intrastructure or runtime is not supported.
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property id: str
The ID of the job. For jobs running on OCI, this is the OCID.
- Returns:
ID of the job.
- Return type:
str
- property infrastructure: Union[DataScienceJob, DataFlow]
The job infrastructure.
- Returns:
Job infrastructure.
- Return type:
- property kind: str
The kind of the object as showing in YAML.
- Returns:
“job”
- Return type:
str
- property name: str
The name of the job. For jobs running on OCI, this is the display name.
- Returns:
The name of the job.
- Return type:
str
- run(name=None, args=None, env_var=None, freeform_tags=None, wait=False) Union[DataScienceJobRun, DataFlowRun]
Runs the job.
- Parameters:
name (str, optional) – Name of the job run, by default None. The infrastructure handles the naming of the job run. For data science job, if a name is not provided, a default name will be generated containing the job name and the timestamp of the run. If no artifact, a randomly generated easy to remember name with timestamp will be generated, like ‘strange-spider-2022-08-17-23:55.02’.
args (str, optional) – Command line arguments for the job run, by default None. This will override the configurations on the job. If this is None, the args from the job configuration will be used.
env_var (dict, optional) – Additional environment variables for the job run, by default None
freeform_tags (dict, optional) – Freeform tags for the job run, by default None
wait (bool, optional) – Indicate if this method call should wait for the job run. By default False, this method returns as soon as the job run is created. If this is set to True, this method will stream the job logs and wait until it finishes, similar to job.run().watch().
- Returns:
A job run instance, depending on the infrastructure.
- Return type:
Job Run Instance
Examples
To run a job and override the configurations:
job_run = job.run( name="<my_job_run_name>", args="new_arg --new_key new_val", env_var={"new_env": "new_val"}, freeform_tags={"new_tag": "new_tag_val"} )
- run_list(**kwargs) list
Gets a list of runs of the job.
- Returns:
A list of job run instances, the actual object type depends on the infrastructure.
- Return type:
list
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- status() str
Status of the job
- Returns:
Status of the job
- Return type:
str
- to_dict() dict
Serialize the job specifications to a dictionary.
- Returns:
A dictionary containing job specifications.
- Return type:
dict
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML.
This implementation returns the class name with the first letter coverted to lower case.
- with_infrastructure(infrastructure) Job
Sets the infrastructure for the job.
- Parameters:
infrastructure (Infrastructure) – Job infrastructure.
- Returns:
The job instance (self)
- Return type:
- class ads.jobs.NotebookRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
CondaRuntime
Represents a job runtime with Jupyter notebook
To run a job with a single Jupyter notebook, you can define the run time as:
runtime = ( NotebookRuntime() .with_notebook( path="https://raw.githubusercontent.com/tensorflow/docs/master/site/en/tutorials/customization/basics.ipynb", encoding='utf-8' ) .with_service_conda("tensorflow28_p38_cpu_v1") .with_environment_variable(GREETINGS="Welcome to OCI Data Science") .with_exclude_tag(["ignore", "remove"]) .with_output("oci://bucket_name@namespace/path/to/dir") )
Note that the notebook path can be local or remote path supported by fsspec, including OCI object storage path like
oci://bucket@namespace/path/to/notebook
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_ARGS = 'args'
- CONST_CONDA = 'conda'
- CONST_CONDA_REGION = 'region'
- CONST_CONDA_SLUG = 'slug'
- CONST_CONDA_TYPE = 'type'
- CONST_CONDA_TYPE_CUSTOM = 'published'
- CONST_CONDA_TYPE_SERVICE = 'service'
- CONST_CONDA_URI = 'uri'
- CONST_ENTRYPOINT = 'entrypoint'
- CONST_ENV_VAR = 'env'
- CONST_EXCLUDE_TAG = 'excludeTags'
- CONST_MAXIMUM_RUNTIME_IN_MINUTES = 'maximumRuntimeInMinutes'
- CONST_NOTEBOOK_ENCODING = 'notebookEncoding'
- CONST_NOTEBOOK_PATH = 'notebookPathURI'
- CONST_OUTPUT_URI = 'outputUri'
- CONST_OUTPUT_URI_ALT = 'outputURI'
- CONST_SOURCE = 'source'
- CONST_TAG = 'freeformTags'
- property args: list
Command line arguments
- attribute_map = {'conda': 'conda', 'entrypoint': 'entrypoint', 'env': 'env', 'excludeTags': 'exclude_tags', 'freeformTags': 'freeform_tags', 'notebookEncoding': 'notebook_encoding', 'notebookPathURI': 'notebook_path_uri', 'outputUri': 'output_uri', 'source': 'source'}
- property conda: dict
The conda environment specification.
For service conda environment, the specification contains:
type
, the type of the conda environment. This is alwaysservice
for service conda environment.slug
, the slug of the conda environment.
For custom conda environment, the specification contains:
type
, the type of the conda environment. This is alwayspublished
for custom conda environment.uri
, the uri of the conda environment, e.g. oci://bucket@namespace/prefix/to/condaregion
, the region of the bucket in which the conda environment is stored. By default, ADS will determine the region based on the authenticated API key or resource principal. This is only needed if your conda environment is stored in a different region.
- Returns:
A dictionary containing the conda environment specifications.
- Return type:
dict
- property environment_variables: dict
Environment variables
- Returns:
The runtime environment variables. The returned dictionary is a copy.
- Return type:
dict
- property envs: dict
Environment variables
- property exclude_tag: list
A list of cell tags indicating cells to be excluded from the job
- property freeform_tags: dict
- classmethod from_dict(obj_dict: dict) Self
Initialize the object from a Python dictionary
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property kind: str
Kind of the object to be stored in YAML. All runtime implementations will have “runtime” as kind. Subclass will have different types.
- property maximum_runtime_in_minutes: int
Maximum runtime in minutes
- property notebook: str
The path of the notebook relative to the source.
- property notebook_encoding: str
The encoding of the notebook
- property notebook_uri: str
The URI of the notebook
- property output_uri: list
URI for storing the output notebook and files
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- property source: str
The source code location.
- to_dict() dict
Converts the object to dictionary with kind, type and spec as keys.
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML
- with_argument(*args, **kwargs) Self
Adds command line arguments to the runtime.
This method can be called (chained) multiple times to add various arguments.
- Parameters:
args – Positional arguments. In a single method call, positional arguments are always added before keyword arguments. You can call with_argument() to add positional arguments after keyword arguments.
kwargs – Keyword arguments. To add a keyword argument without value, set the value to None.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- Raises:
ValueError – Keyword arguments with space in a key.
Examples
>>> runtime = Runtime().with_argument(key1="val1", key2="val2").with_argument("pos1") >>> print(runtime.args) ["--key1", "val1", "--key2", "val2", "pos1"]
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1="val1", key2="val2.1 val2.2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1=None, key2="val2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ["pos1", "--key1", "--key2", "val2", "pos2"]
- with_custom_conda(uri: str, region: Optional[str] = None)
Specifies the custom conda pack for running the job Make sure you have configured the IAM policy for the job run to access the conda environment.
- Parameters:
uri (str) – The OCI object storage URI for the conda pack, e.g. “oci://your_bucket@namespace/object_name.” In the Environment Explorer of an OCI notebook session, this is shown as the “source” of the conda pack.
region (str, optional) –
The region of the bucket storing the custom conda pack, by default None. If region is not specified, ADS will use the region from your authentication credentials:
For API Key, config[“region”] is used.
For Resource Principal, signer.region is used.
This is required if the conda pack is stored in a different region.
- Returns:
The runtime instance.
- Return type:
self
See also
https
//docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm
- with_environment_variable(**kwargs) Self
Sets environment variables
Environment variables enclosed by
${...}
will be substituted.You can use
$$
to escape the substitution.Undefined variable enclosed by
${}
will be ignored.Double dollar signs
$$
will be substituted by a single one$
.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
Examples
>>> runtime = ( ... PythonRuntime() ... .with_environment_variable( ... HOST="10.0.0.1", ... PORT="443", ... URL="http://${HOST}:${PORT}/path/", ... ESCAPED_URL="http://$${HOST}:$${PORT}/path/", ... MISSING_VAR="This is ${UNDEFINED}", ... VAR_WITH_DOLLAR="$10", ... DOUBLE_DOLLAR="$$10" ... ) ... ) >>> for k, v in runtime.environment_variables.items(): ... print(f"{k}: {v}") HOST: 10.0.0.1 PORT: 443 URL: http://10.0.0.1:443/path/ ESCAPED_URL: http://${HOST}:${PORT}/path/ MISSING_VAR: This is ${UNDEFINED} VAR_WITH_DOLLAR: $10 DOUBLE_DOLLAR: $10
- with_exclude_tag(*tags) NotebookRuntime
Specifies the cell tags in the notebook to exclude cells from the job script.
- Parameters:
*tags (list) – A list of tags (strings).
- Returns:
The runtime instance.
- Return type:
self
- with_freeform_tag(**kwargs) Self
Sets freeform tag
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_maximum_runtime_in_minutes(maximum_runtime_in_minutes: int) Self
Sets maximum runtime in minutes
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_notebook(path: str, encoding='utf-8') NotebookRuntime
Specifies the notebook to be run as a job. Use this method if you would like to run a single notebook. Use
with_source()
method if you would like to run a notebook with additional dependency files.- Parameters:
path (str) – The path of the Jupyter notebook
encoding (str) – The encoding for opening the notebook. Defaults to utf-8.
- Returns:
The runtime instance.
- Return type:
self
- with_output(output_uri: str) NotebookRuntime
Specifies the output URI for storing the output notebook and files. All files in the directory containing the notebook will be saved.
- Parameters:
output_uri (str) – URI for a directory storing the output notebook and files. For example, oci://bucket@namespace/path/to/dir
- Returns:
The runtime instance.
- Return type:
self
- with_service_conda(slug: str)
Specifies the service conda pack for running the job
- Parameters:
slug (str) – The slug name of the service conda pack
- Returns:
The runtime instance.
- Return type:
self
- with_source(uri: str, notebook: str, encoding='utf-8')
Specify source code directory containing the notebook and dependencies for the job. Use this method if you would like to run a notebook with additional dependency files. Use the with_notebook() method if you would like to run a single notebook.
In the following example, local folder “path/to/source” contains the notebook and dependencies, The local path of the notebook is “path/to/source/relative/path/to/notebook.ipynb”:
runtime.with_source(uri="path/to/source", notebook="relative/path/to/notebook.ipynb")
- Parameters:
uri (str) – URI of the source code directory. This can be local or on OCI object storage.
notebook (str) – The relative path of the notebook from the source URI.
encoding (str) – The encoding for opening the notebook. Defaults to utf-8.
- Returns:
The runtime instance.
- Return type:
Self
- class ads.jobs.PythonRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
ScriptRuntime
,_PythonRuntimeMixin
Represents a job runtime using ADS driver script to run Python code
Example:
runtime = ( PythonRuntime() # Specify the service conda environment by slug name. .with_service_conda("pytorch110_p38_cpu_v1") # The job artifact can be a single Python script, a directory or a zip file. .with_source("local/path/to/code_dir") # Environment variable .with_environment_variable(NAME="Welcome to OCI Data Science.") # Command line argument, arg1 --key arg2 .with_argument("arg1", key="arg2") # Set the working directory # When using a directory as source, the default working dir is the parent of code_dir. # Working dir should be a relative path beginning from the source directory (code_dir) .with_working_dir("code_dir") # The entrypoint is applicable only to directory or zip file as source # The entrypoint should be a path relative to the working dir. # Here my_script.py is a file in the code_dir/my_package directory .with_entrypoint("my_package/my_script.py") # Add an additional Python path, relative to the working dir (code_dir/other_packages). .with_python_path("other_packages") # Copy files in "code_dir/output" to object storage after job finishes. .with_output("output", "oci://bucket_name@namespace/path/to/dir") )
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_ARGS = 'args'
- CONST_CONDA = 'conda'
- CONST_CONDA_REGION = 'region'
- CONST_CONDA_SLUG = 'slug'
- CONST_CONDA_TYPE = 'type'
- CONST_CONDA_TYPE_CUSTOM = 'published'
- CONST_CONDA_TYPE_SERVICE = 'service'
- CONST_CONDA_URI = 'uri'
- CONST_ENTRYPOINT = 'entrypoint'
- CONST_ENTRY_FUNCTION = 'entryFunction'
- CONST_ENV_VAR = 'env'
- CONST_MAXIMUM_RUNTIME_IN_MINUTES = 'maximumRuntimeInMinutes'
- CONST_OUTPUT_DIR = 'outputDir'
- CONST_OUTPUT_URI = 'outputUri'
- CONST_PYTHON_PATH = 'pythonPath'
- CONST_SCRIPT_PATH = 'scriptPathURI'
- CONST_TAG = 'freeformTags'
- CONST_WORKING_DIR = 'workingDir'
- property args: list
Command line arguments
- attribute_map = {'conda': 'conda', 'entryFunction': 'entry_function', 'entrypoint': 'entrypoint', 'env': 'env', 'freeformTags': 'freeform_tags', 'outputDir': 'output_dir', 'outputUri': 'output_uri', 'pythonPath': 'python_path', 'scriptPathURI': 'script_path_uri', 'workingDir': 'working_dir'}
- property conda: dict
The conda environment specification.
For service conda environment, the specification contains:
type
, the type of the conda environment. This is alwaysservice
for service conda environment.slug
, the slug of the conda environment.
For custom conda environment, the specification contains:
type
, the type of the conda environment. This is alwayspublished
for custom conda environment.uri
, the uri of the conda environment, e.g. oci://bucket@namespace/prefix/to/condaregion
, the region of the bucket in which the conda environment is stored. By default, ADS will determine the region based on the authenticated API key or resource principal. This is only needed if your conda environment is stored in a different region.
- Returns:
A dictionary containing the conda environment specifications.
- Return type:
dict
- property entry_function: str
The name of the entry function in the entry script
- property entry_script: str
The path of the entry script
- property entrypoint: str
The relative path of the script to be set as entrypoint when source is a zip/tar/directory.
- property environment_variables: dict
Environment variables
- Returns:
The runtime environment variables. The returned dictionary is a copy.
- Return type:
dict
- property envs: dict
Environment variables
- property freeform_tags: dict
- classmethod from_dict(obj_dict: dict) Self
Initialize the object from a Python dictionary
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property kind: str
Kind of the object to be stored in YAML. All runtime implementations will have “runtime” as kind. Subclass will have different types.
- property maximum_runtime_in_minutes: int
Maximum runtime in minutes
- property output_dir: str
Directory in the Job run container for saving output files generated in the job
- property output_uri: str
OCI object storage URI prefix for saving output files generated in the job
- property python_path
Additional python paths for running the source code.
- property script_uri: str
The URI of the source code
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- property source_uri: str
The URI of the source code
- to_dict() dict
Converts the object to dictionary with kind, type and spec as keys.
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML
- with_argument(*args, **kwargs) Self
Adds command line arguments to the runtime.
This method can be called (chained) multiple times to add various arguments.
- Parameters:
args – Positional arguments. In a single method call, positional arguments are always added before keyword arguments. You can call with_argument() to add positional arguments after keyword arguments.
kwargs – Keyword arguments. To add a keyword argument without value, set the value to None.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- Raises:
ValueError – Keyword arguments with space in a key.
Examples
>>> runtime = Runtime().with_argument(key1="val1", key2="val2").with_argument("pos1") >>> print(runtime.args) ["--key1", "val1", "--key2", "val2", "pos1"]
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1="val1", key2="val2.1 val2.2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1=None, key2="val2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ["pos1", "--key1", "--key2", "val2", "pos2"]
- with_custom_conda(uri: str, region: Optional[str] = None)
Specifies the custom conda pack for running the job Make sure you have configured the IAM policy for the job run to access the conda environment.
- Parameters:
uri (str) – The OCI object storage URI for the conda pack, e.g. “oci://your_bucket@namespace/object_name.” In the Environment Explorer of an OCI notebook session, this is shown as the “source” of the conda pack.
region (str, optional) –
The region of the bucket storing the custom conda pack, by default None. If region is not specified, ADS will use the region from your authentication credentials:
For API Key, config[“region”] is used.
For Resource Principal, signer.region is used.
This is required if the conda pack is stored in a different region.
- Returns:
The runtime instance.
- Return type:
self
See also
https
//docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm
- with_entrypoint(entrypoint: str)
Specify the entrypoint for the job
- Parameters:
entrypoint (str) – The relative path of the script to be set as entrypoint when source is a zip/tar/directory.
- Returns:
The runtime instance.
- Return type:
self
- with_environment_variable(**kwargs) Self
Sets environment variables
Environment variables enclosed by
${...}
will be substituted.You can use
$$
to escape the substitution.Undefined variable enclosed by
${}
will be ignored.Double dollar signs
$$
will be substituted by a single one$
.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
Examples
>>> runtime = ( ... PythonRuntime() ... .with_environment_variable( ... HOST="10.0.0.1", ... PORT="443", ... URL="http://${HOST}:${PORT}/path/", ... ESCAPED_URL="http://$${HOST}:$${PORT}/path/", ... MISSING_VAR="This is ${UNDEFINED}", ... VAR_WITH_DOLLAR="$10", ... DOUBLE_DOLLAR="$$10" ... ) ... ) >>> for k, v in runtime.environment_variables.items(): ... print(f"{k}: {v}") HOST: 10.0.0.1 PORT: 443 URL: http://10.0.0.1:443/path/ ESCAPED_URL: http://${HOST}:${PORT}/path/ MISSING_VAR: This is ${UNDEFINED} VAR_WITH_DOLLAR: $10 DOUBLE_DOLLAR: $10
- with_freeform_tag(**kwargs) Self
Sets freeform tag
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_maximum_runtime_in_minutes(maximum_runtime_in_minutes: int) Self
Sets maximum runtime in minutes
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_output(output_dir: str, output_uri: str)
Specifies the outputs of the job. The output files in output_dir will be copied to remote output_uri when the job is finished.
- Parameters:
output_dir (str) – Path to the output directory in the job run. This path should be a relative path from the working directory. The source code should write all outputs into this directory.
output_uri (str) – The OCI object storage URI prefix for saving the output files. For example, oci://bucket_name@namespace/path/to/directory
- Returns:
The runtime instance.
- Return type:
Self
- with_python_path(*python_paths)
Specifies additional python paths for running the source code.
- Parameters:
*python_paths – Additional python path(s) for running the source code. Each path should be a relative path from the working directory.
- Returns:
The runtime instance.
- Return type:
self
- with_script(uri: str)
Specifies the source code script for the job
- with_service_conda(slug: str)
Specifies the service conda pack for running the job
- Parameters:
slug (str) – The slug name of the service conda pack
- Returns:
The runtime instance.
- Return type:
self
- with_source(uri: str, entrypoint: Optional[str] = None)
Specifies the source code for the job
- Parameters:
uri (str) – URI to the source code, which can be a (.py/.sh) script, a zip/tar file or directory containing the scripts/modules If the source code is a single file, URI can be any URI supported by fsspec, including http://, https:// and OCI object storage. For example: oci://your_bucket@your_namespace/path/to/script.py URI can also be a folder or a zip file containing the source code. In that case, entrypoint is required.
entrypoint (str, optional) – The relative path of the script to be set as entrypoint when source is a zip/tar/directory. By default None. This is not needed when the source is a single script.
- Returns:
The runtime instance.
- Return type:
self
- with_working_dir(working_dir: str)
Specifies the working directory in the job run. By default, the working directory will the directory containing the user code (job artifact directory). This can be changed by specifying a relative path to the job artifact directory.
- Parameters:
working_dir (str) – The path of the working directory. This can be a relative path from the job artifact directory.
- Returns:
The runtime instance.
- Return type:
self
- property working_dir: str
The working directory for the job run.
- class ads.jobs.ScriptRuntime(spec: Optional[Dict] = None, **kwargs)
Bases:
CondaRuntime
Represents job runtime with scripts and conda pack.
This runtime is designed to define job artifacts and configurations supported by OCI Data Science Jobs natively. It can be used with any script types that is supported by the OCI Data Science Jobs, including shell scripts and python scripts.
To run a script with all dependencies contained in a local folder:
runtime = ( ScriptRuntime() # Specify the service conda environment by slug name. .with_service_conda("pytorch110_p38_cpu_v1") # The job artifact can be a single Python script, a directory or a zip file. .with_source("local/path/to/code_dir") # Environment variable .with_environment_variable(NAME="Welcome to OCI Data Science.") # Command line argument .with_argument("100 linux "hi there"") # The entrypoint is applicable only to directory or zip file as source # The entrypoint should be a path relative to the working dir. # Here my_script.sh is a file in the code_dir/my_package directory .with_entrypoint("my_package/my_script.sh") )
References
https://docs.oracle.com/en-us/iaas/data-science/using/jobs-artifact.htm
To initialize the object, user can either pass in the specification as a dictionary or through keyword arguments.
- Parameters:
spec (dict, optional) – Object specification, by default None
kwargs (dict) – Specification as keyword arguments. If spec contains the same key as the one in kwargs, the value from kwargs will be used.
- CONST_ARGS = 'args'
- CONST_CONDA = 'conda'
- CONST_CONDA_REGION = 'region'
- CONST_CONDA_SLUG = 'slug'
- CONST_CONDA_TYPE = 'type'
- CONST_CONDA_TYPE_CUSTOM = 'published'
- CONST_CONDA_TYPE_SERVICE = 'service'
- CONST_CONDA_URI = 'uri'
- CONST_ENTRYPOINT = 'entrypoint'
- CONST_ENV_VAR = 'env'
- CONST_MAXIMUM_RUNTIME_IN_MINUTES = 'maximumRuntimeInMinutes'
- CONST_SCRIPT_PATH = 'scriptPathURI'
- CONST_TAG = 'freeformTags'
- property args: list
Command line arguments
- attribute_map = {'conda': 'conda', 'entrypoint': 'entrypoint', 'env': 'env', 'freeformTags': 'freeform_tags', 'scriptPathURI': 'script_path_uri'}
- property conda: dict
The conda environment specification.
For service conda environment, the specification contains:
type
, the type of the conda environment. This is alwaysservice
for service conda environment.slug
, the slug of the conda environment.
For custom conda environment, the specification contains:
type
, the type of the conda environment. This is alwayspublished
for custom conda environment.uri
, the uri of the conda environment, e.g. oci://bucket@namespace/prefix/to/condaregion
, the region of the bucket in which the conda environment is stored. By default, ADS will determine the region based on the authenticated API key or resource principal. This is only needed if your conda environment is stored in a different region.
- Returns:
A dictionary containing the conda environment specifications.
- Return type:
dict
- property entrypoint: str
The relative path of the script to be set as entrypoint when source is a zip/tar/directory.
- property environment_variables: dict
Environment variables
- Returns:
The runtime environment variables. The returned dictionary is a copy.
- Return type:
dict
- property envs: dict
Environment variables
- property freeform_tags: dict
- classmethod from_dict(obj_dict: dict) Self
Initialize the object from a Python dictionary
- classmethod from_json(json_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, decoder: callable = <class 'json.decoder.JSONDecoder'>, **kwargs) Self
Creates an object from JSON string provided or from URI location containing JSON string
- Parameters:
json_string (str, optional) – JSON string. Defaults to None.
uri (str, optional) – URI location of file containing JSON string. Defaults to None.
decoder (callable, optional) – Custom decoder. Defaults to simple JSON decoder.
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this should be config=”path/to/.oci/config”. For other storage connections consider e.g. host, port, username, password, etc.
json_string – JSON string, by default None
uri – URI location of file containing JSON string, by default None
decoder – Decoder for custom data structures, by default json.JSONDecoder
kwargs – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Raises:
ValueError – Raised if neither string nor uri is provided
ValueError – Both json_string and uri are empty, or The input is not a valid JSON.
- Returns:
Returns instance of the class
- Return type:
cls
- Returns:
Object initialized from JSON data.
- Return type:
Type[Self]
- classmethod from_string(obj_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML/JSON string or URI location containing the YAML/JSON
- Parameters:
obj_string (str, optional) – YAML/JSON string, by default None
uri (str, optional) – URI location of file containing YAML/JSON, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- classmethod from_yaml(yaml_string: ~typing.Optional[str] = None, uri: ~typing.Optional[str] = None, loader: callable = <class 'yaml.loader.SafeLoader'>, **kwargs) Self
Initializes an object from YAML string or URI location containing the YAML
- Parameters:
yaml_string (str, optional) – YAML string, by default None
uri (str, optional) – URI location of file containing YAML, by default None
loader (callable, optional) – Custom YAML loader, by default yaml.SafeLoader
- Returns:
Object initialized from the YAML.
- Return type:
Self
- Raises:
ValueError – Raised if neither string nor uri is provided
- get_spec(key: str, default: Optional[Any] = None) Any
Gets the value of a specification property
- Parameters:
key (str) – The name of the property.
default (Any, optional) – The default value to be used, if the property does not exist, by default None.
- Returns:
The value of the property.
- Return type:
Any
- property kind: str
Kind of the object to be stored in YAML. All runtime implementations will have “runtime” as kind. Subclass will have different types.
- property maximum_runtime_in_minutes: int
Maximum runtime in minutes
- property script_uri: str
The URI of the source code
- set_spec(k: str, v: Any) Self
Sets a specification property for the object.
- Parameters:
k (str) – key, the name of the property.
v (Any) – value, the value of the property.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- property source_uri: str
The URI of the source code
- to_dict() dict
Converts the object to dictionary with kind, type and spec as keys.
- to_json(uri: ~typing.Optional[str] = None, encoder: callable = <class 'json.encoder.JSONEncoder'>, **kwargs) str
Returns the object serialized as a JSON string
- Parameters:
uri (str, optional) – URI location to save the JSON string, by default None
encoder (callable, optional) – Encoder for custom data structures, by default json.JSONEncoder
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
object serialized as a JSON string
- Return type:
str
- to_yaml(uri: ~typing.Optional[str] = None, dumper: callable = <class 'yaml.dumper.SafeDumper'>, **kwargs) Optional[str]
Returns object serialized as a YAML string
- Parameters:
uri (str, optional) – URI location to save the YAML string, by default None
dumper (callable, optional) – Custom YAML Dumper, by default yaml.SafeDumper
kwargs (dict) – keyword arguments to be passed into fsspec.open(). For OCI object storage, this can be config=”path/to/.oci/config”.
- Returns:
If uri is specified, None will be returned. Otherwise, the yaml content will be returned.
- Return type:
Union[str, None]
- property type: str
The type of the object as showing in YAML
- with_argument(*args, **kwargs) Self
Adds command line arguments to the runtime.
This method can be called (chained) multiple times to add various arguments.
- Parameters:
args – Positional arguments. In a single method call, positional arguments are always added before keyword arguments. You can call with_argument() to add positional arguments after keyword arguments.
kwargs – Keyword arguments. To add a keyword argument without value, set the value to None.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- Raises:
ValueError – Keyword arguments with space in a key.
Examples
>>> runtime = Runtime().with_argument(key1="val1", key2="val2").with_argument("pos1") >>> print(runtime.args) ["--key1", "val1", "--key2", "val2", "pos1"]
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1="val1", key2="val2.1 val2.2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ['pos1', '--key1', 'val1', '--key2', 'val2.1 val2.2', 'pos2']
>>> runtime = Runtime() >>> runtime.with_argument("pos1") >>> runtime.with_argument(key1=None, key2="val2") >>> runtime.with_argument("pos2") >>> print(runtime.args) ["pos1", "--key1", "--key2", "val2", "pos2"]
- with_custom_conda(uri: str, region: Optional[str] = None)
Specifies the custom conda pack for running the job Make sure you have configured the IAM policy for the job run to access the conda environment.
- Parameters:
uri (str) – The OCI object storage URI for the conda pack, e.g. “oci://your_bucket@namespace/object_name.” In the Environment Explorer of an OCI notebook session, this is shown as the “source” of the conda pack.
region (str, optional) –
The region of the bucket storing the custom conda pack, by default None. If region is not specified, ADS will use the region from your authentication credentials:
For API Key, config[“region”] is used.
For Resource Principal, signer.region is used.
This is required if the conda pack is stored in a different region.
- Returns:
The runtime instance.
- Return type:
self
See also
https
//docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm
- with_entrypoint(entrypoint: str)
Specify the entrypoint for the job
- Parameters:
entrypoint (str) – The relative path of the script to be set as entrypoint when source is a zip/tar/directory.
- Returns:
The runtime instance.
- Return type:
self
- with_environment_variable(**kwargs) Self
Sets environment variables
Environment variables enclosed by
${...}
will be substituted.You can use
$$
to escape the substitution.Undefined variable enclosed by
${}
will be ignored.Double dollar signs
$$
will be substituted by a single one$
.
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
Examples
>>> runtime = ( ... PythonRuntime() ... .with_environment_variable( ... HOST="10.0.0.1", ... PORT="443", ... URL="http://${HOST}:${PORT}/path/", ... ESCAPED_URL="http://$${HOST}:$${PORT}/path/", ... MISSING_VAR="This is ${UNDEFINED}", ... VAR_WITH_DOLLAR="$10", ... DOUBLE_DOLLAR="$$10" ... ) ... ) >>> for k, v in runtime.environment_variables.items(): ... print(f"{k}: {v}") HOST: 10.0.0.1 PORT: 443 URL: http://10.0.0.1:443/path/ ESCAPED_URL: http://${HOST}:${PORT}/path/ MISSING_VAR: This is ${UNDEFINED} VAR_WITH_DOLLAR: $10 DOUBLE_DOLLAR: $10
- with_freeform_tag(**kwargs) Self
Sets freeform tag
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_maximum_runtime_in_minutes(maximum_runtime_in_minutes: int) Self
Sets maximum runtime in minutes
- Returns:
This method returns self to support chaining methods.
- Return type:
Self
- with_script(uri: str)
Specifies the source code script for the job
- with_service_conda(slug: str)
Specifies the service conda pack for running the job
- Parameters:
slug (str) – The slug name of the service conda pack
- Returns:
The runtime instance.
- Return type:
self
- with_source(uri: str, entrypoint: Optional[str] = None)
Specifies the source code for the job
- Parameters:
uri (str) – URI to the source code, which can be a (.py/.sh) script, a zip/tar file or directory containing the scripts/modules If the source code is a single file, URI can be any URI supported by fsspec, including http://, https:// and OCI object storage. For example: oci://your_bucket@your_namespace/path/to/script.py URI can also be a folder or a zip file containing the source code. In that case, entrypoint is required.
entrypoint (str, optional) – The relative path of the script to be set as entrypoint when source is a zip/tar/directory. By default None. This is not needed when the source is a single script.
- Returns:
The runtime instance.
- Return type:
self