ads.hpo package

Submodules

ads.hpo.distributions module

class ads.hpo.distributions.CategoricalDistribution(choices: Sequence[Union[None, bool, int, float, str]])

Bases: Distribution

A categorical distribution.

Parameters: choices – Parameter value candidates. It is recommended to restrict the types of the choices to the following: None, bool, int, float and str.

class ads.hpo.distributions.DiscreteUniformDistribution(low: float, high: float, step: float)

Bases: Distribution

A discretized uniform distribution in the linear domain.

Note

If the range \([\mathsf{low}, \mathsf{high}]\) is not divisible by \(q\), \(\mathsf{high}\) will be replaced with the maximum of \(k q + \mathsf{low} \lt \mathsf{high}\), where \(k\) is an integer.

Parameters

low (float) – Lower endpoint of the range of the distribution. low is included in the range.
high (float) – Upper endpoint of the range of the distribution. high is included in the range.
step (float) – A discretization step.

class ads.hpo.distributions.Distribution(dist)

Bases: object

Defines the abstract base class for hyperparameter search distributions

get_distribution(): Returns the distribution

class ads.hpo.distributions.DistributionEncode(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

Constructor for JSONEncoder, with sensible defaults.

If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.

If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.

If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an OverflowError). Otherwise, no such check takes place.

If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.

If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.

If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.

If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.

default(dist: Distribution) → Dict[str, Any]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

static from_json(json_object: Dict[Any, Any])

class ads.hpo.distributions.IntLogUniformDistribution(low: float, high: float, step: float = 1)

Bases: Distribution

A uniform distribution on integers in the log domain.

Parameters

low – Lower endpoint of the range of the distribution. low is included in the range.
high – Upper endpoint of the range of the distribution. high is included in the range.
step – A step for spacing between values.

class ads.hpo.distributions.IntUniformDistribution(low: float, high: float, step: float = 1)

Bases: Distribution

A uniform distribution on integers.

Note

If the range \([\mathsf{low}, \mathsf{high}]\) is not divisible by \(\mathsf{step}\), \(\mathsf{high}\) will be replaced with the maximum of \(k \times \mathsf{step} + \mathsf{low} \lt \mathsf{high}\), where \(k\) is an integer.

Parameters

low – Lower endpoint of the range of the distribution. low is included in the range.
high – Upper endpoint of the range of the distribution. high is included in the range.
step – A step for spacing between values.

class ads.hpo.distributions.LogUniformDistribution(low: float, high: float)

Bases: Distribution

A uniform distribution in the log domain.

Parameters

low – Lower endpoint of the range of the distribution. low is included in the range.
high – Upper endpoint of the range of the distribution. high is excluded from the range.

class ads.hpo.distributions.UniformDistribution(low: float, high: float)

Bases: Distribution

A uniform distribution in the linear domain.

Parameters

low – Lower endpoint of the range of the distribution. low is included in the range.
high – Upper endpoint of the range of the distribution. high is excluded from the range.

ads.hpo.distributions.decode(s: str)

Decodes a string to an object

Parameters: s (str) – The string being decoded to a distribution object
Returns: Decoded string
Return type: Distribution or Dict

ads.hpo.distributions.encode(o: Distribution) → str

Encodes a distribution to a string

Parameters: o (Distribution) – The distribution to encode
Returns: The distribution encoded as a string
Return type: str (DistributionEncode)

ads.hpo.search_cv module

class ads.hpo.search_cv.ADSTuner(model, strategy='perfunctory', scoring=None, cv=5, study_name=None, storage=None, load_if_exists=True, random_state=None, loglevel=20, n_jobs=1, X=None, y=None)

Bases: BaseEstimator

Hyperparameter search with cross-validation.

Returns a hyperparameter tuning object

Parameters

model – Object to use to fit the data. This is assumed to implement the scikit-learn estimator or pipeline interface.
strategy – perfunctory, detailed or a dictionary/mapping of hyperparameter and its distribution . If obj:perfunctory, picks a few relatively more important hyperparmeters to tune . If obj:detailed, extends to a larger search space. If obj:dict, user defined search space: Dictionary where keys are hyperparameters and values are distributions. Distributions are assumed to implement the ads distribution interface.
scoring (Optional[Union[Callable[..., float], str]]) – String or callable to evaluate the predictions on the validation data. If None, score on the estimator is used.
cv (int) – Integer to specify the number of folds in a CV splitter. If estimator is a classifier and y is either binary or multiclass, sklearn.model_selection.StratifiedKFold is used. otherwise, sklearn.model_selection.KFold is used.
study_name (str,) – Name of the current experiment for the ADSTuner object. One ADSTuner object can only be attached to one study_name.
storage – Database URL. (e.g. sqlite:///example.db). Default to sqlite:////tmp/hpo_*.db.
load_if_exists – Flag to control the behavior to handle a conflict of study names. In the case where a study named study_name already exists in the storage, a DuplicatedStudyError is raised if load_if_exists is set to False. Otherwise, the existing one is returned.
random_state – Seed of the pseudo random number generator. If int, this is the seed used by the random number generator. If None, the global random state from numpy.random is used.
loglevel – loglevel. can be logging.NOTSET, logging.INFO, logging.DEBUG, logging.WARNING
n_jobs (int) – Number of parallel jobs. -1 means using all processors.
X (TwoDimArrayLikeType, Union[List[List[float]], np.ndarray,) –
pd.DataFrame – Training data.
spmatrix – Training data.
ADSData] – Training data.
y (Union[OneDimArrayLikeType, TwoDimArrayLikeType], optional) –
OneDimArrayLikeType (Union[List[float], np.ndarray, pd.Series]) –
TwoDimArrayLikeType (Union[List[List[float]], np.ndarray, pd.DataFrame, spmatrix, ADSData]) – Target.

Example:

from ads.hpo.stopping_criterion import *
from ads.hpo.search_cv import ADSTuner
from sklearn.datasets import load_iris
from sklearn.svm import SVC

tuner = ADSTuner(
                SVC(),
                strategy='detailed',
                scoring='f1_weighted',
                random_state=42
            )

X, y = load_iris(return_X_y=True)
tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])

property best_index: returns: Index which corresponds to the best candidate parameter setting. :rtype: int

property best_params: returns: Parameters of the best trial. :rtype: Dict[str, Any]

property best_score: returns: Mean cross-validated score of the best estimator. :rtype: float

best_scores(n: int = 5, reverse: bool = True)

Return the best scores from the study

Parameters

n (int) – The maximum number of results to show. Defaults to 5. If None or negative return all.
reverse (bool) – Whether to reverse the sort order so results are in descending order. Defaults to True

Returns

List of the best scores

Return type

list[float or int]

Raises

ValueError –

get_status()

return the status of the current tuning process.

Alias for the property status.

Returns: The status of the process
Return type: Status

Example:

from ads.hpo.stopping_criterion import *
from ads.hpo.search_cv import ADSTuner
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier

tuner = ADSTuner(
                SGDClassifier(),
                strategy='detailed',
                scoring='f1_weighted',
                random_state=42
            )
tuner.search_space({'max_iter': 100})
X, y = load_iris(return_X_y=True)
tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])
tuner.get_status()

halt()

Halt the current running tuning process.

Returns: Nothing
Return type: None
Raises: InvalidStateTransition –

Example:

from ads.hpo.stopping_criterion import *
from ads.hpo.search_cv import ADSTuner
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier

tuner = ADSTuner(
                SGDClassifier(),
                strategy='detailed',
                scoring='f1_weighted',
                random_state=42
            )
tuner.search_space({'max_iter': 100})
X, y = load_iris(return_X_y=True)
tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])
tuner.halt()

is_completed()

Returns: True if the ADSTuner instance has completed; False otherwise.
Return type: bool

is_halted()

Returns: True if the ADSTuner instance is halted; False otherwise.
Return type: bool

is_running()

Returns: True if the ADSTuner instance is running; False otherwise.
Return type: bool

is_terminated()

Returns: True if the ADSTuner instance has been terminated; False otherwise.
Return type: bool

property n_trials: returns: Number of completed trials. Alias for trial_count. :rtype: int

static optimizer(study_name, pruner, sampler, storage, load_if_exists, objective_func, global_start, global_stop, **kwargs)

Static method for running ADSTuner tuning process

Parameters

study_name (str) – The name of the study.
pruner – The pruning method for pruning trials.
sampler – The sampling method used for tuning.
storage (str) – Storage endpoint.
load_if_exists (bool) – Load existing study if it exists.
objective_func – The objective function to be maximized.
global_start (multiprocesing.Value) – The global start time.
global_stop (multiprocessing.Value) – The global stop time.
kwargs (dict) – Keyword/value pairs passed into the optimize process

Raises

Exception – Raised for any exceptions thrown by the underlying optimization process

Returns

Nothing

Return type

None

plot_best_scores(best=True, inferior=True, time_interval=1, fig_size=(800, 500))

Plot optimization history of all trials in a study.

Parameters

best – controls whether to plot the lines for the best scores so far.
inferior – controls whether to plot the dots for the actual objective scores.
time_interval – how often(in seconds) the plot refresh to check on the new trial results.
fig_size (tuple) – width and height of the figure.

Returns

Nothing.

Return type

None

plot_contour_scores(params=None, time_interval=1, fig_size=(800, 500))

Contour plot of the scores.

Parameters

params (Optional[List[str]]) – Parameter list to visualize. Defaults to all.
time_interval (float) – Time interval for the plot. Defaults to 1.
fig_size (tuple[int, int]) – Figure size. Defaults to (800, 500).

Returns

Nothing.

Return type

None

plot_edf_scores(time_interval=1, fig_size=(800, 500))

Plot the EDF (empirical distribution function) of the scores.

Only completed trials are used.

Parameters

time_interval (float) – Time interval for the plot. Defaults to 1.
fig_size (tuple[int, int]) – Figure size. Defaults to (800, 500).

Returns

Nothing.

Return type

None

plot_intermediate_scores(time_interval=1, fig_size=(800, 500))

Plot intermediate values of all trials in a study.

Parameters

time_interval (float) – Time interval for the plot. Defaults to 1.
fig_size (tuple[int, int]) – Figure size. Defaults to (800, 500).

Returns

Nothing.

Return type

None

plot_parallel_coordinate_scores(params=None, time_interval=1, fig_size=(800, 500))

Plot the high-dimentional parameter relationships in a study.

Note that, If a parameter contains missing values, a trial with missing values is not plotted.

Parameters

params (Optional[List[str]]) – Parameter list to visualize. Defaults to all.
time_interval (float) – Time interval for the plot. Defaults to 1.
fig_size (tuple[int, int]) – Figure size. Defaults to (800, 500).

Returns

Nothing.

Return type

None

plot_param_importance(importance_evaluator='Fanova', time_interval=1, fig_size=(800, 500))

Plot hyperparameter importances.

Parameters

importance_evaluator (str) – Importance evaluator. Valid values: “Fanova”, “MeanDecreaseImpurity”. Defaults to “Fanova”.
time_interval (float) – How often the plot refresh to check on the new trial results.
fig_size (tuple) – Width and height of the figure.

Raises

NotImplementedErorr – Raised for unsupported importance evaluators

Returns

Nothing.

Return type

None

resume()

Resume the current halted tuning process.

Returns: Nothing
Return type: None

Example:

from ads.hpo.stopping_criterion import *
from ads.hpo.search_cv import ADSTuner
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier

tuner = ADSTuner(
                SGDClassifier(),
                strategy='detailed',
                scoring='f1_weighted',
                random_state=42
            )
tuner.search_space({'max_iter': 100})
X, y = load_iris(return_X_y=True)
tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])
tuner.halt()
tuner.resume()

property score_remaining

returns: The difference between the best score and the optimal score. :rtype: float

Raises: ExitCriterionError – Error is raised if there is no score-based criteria for tuning.

property scoring_name: returns: Scoring name. :rtype: str

search_space(strategy=None, overwrite=False)

Returns the search space. If strategy is not passed in, return the existing search space. When strategy is passed in, overwrite the existing search space if overwrite is set True, otherwise, only update the existing search space.

Parameters

strategy (Union[str, dict], optional) – perfunctory, detailed or a dictionary/mapping of the hyperparameters and their distributions. If obj:perfunctory, picks a few relatively more important hyperparmeters to tune . If obj:detailed, extends to a larger search space. If obj:dict, user defined search space: Dictionary where keys are parameters and values are distributions. Distributions are assumed to implement the ads distribution interface.
overwrite (bool, optional) – Ignored when strategy is None. Otherwise, search space is overwritten if overwrite is set True and updated if it is False.

Returns

A mapping of the hyperparameters and their distributions.

Return type

dict

Example:

from ads.hpo.stopping_criterion import *
from ads.hpo.search_cv import ADSTuner
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier

tuner = ADSTuner(
                SGDClassifier(),
                strategy='detailed',
                scoring='f1_weighted',
                random_state=42
            )
tuner.search_space({'max_iter': 100})
X, y = load_iris(return_X_y=True)
tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])
tuner.search_space()

property sklearn_steps: returns: Search space which corresponds to the best candidate parameter setting. :rtype: int

property status: returns: The status of the current tuning process. :rtype: Status

terminate()

Terminate the current tuning process.

Returns: Nothing
Return type: None

Example:

from ads.hpo.stopping_criterion import *
from ads.hpo.search_cv import ADSTuner
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier

tuner = ADSTuner(
                SGDClassifier(),
                strategy='detailed',
                scoring='f1_weighted',
                random_state=42
            )
tuner.search_space({'max_iter': 100})
X, y = load_iris(return_X_y=True)
tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])
tuner.terminate()

property time_elapsed

Return the time in seconds that the HPO process has been searching

Returns: int
Return type: The number of seconds the HPO process has been searching

property time_remaining

Returns the number of seconds remaining in the study

Returns: int
Return type: Number of seconds remaining in the budget. 0 if complete/terminated
Raises: ExitCriterionError – Error is raised if time has not been included in the budget.

property time_since_resume

Return the seconds since the process has been resumed from a halt.

Returns: int
Return type: the number of seconds since the process was last resumed
Raises: NoRestartError –

property trial_count: returns: Number of completed trials. Alias for trial_count. :rtype: int

property trials: returns: Trial data up to this point. :rtype: pandas.DataFrame

trials_export(file_uri, metadata=None, script_dict={'model': None, 'scoring': None})

Export the meta data as well as files needed to reconstruct the ADSTuner object to the object storage. Data is not stored. To resume the same ADSTuner object from object storage and continue tuning from previous trials, you have to provide the dataset.

Parameters

file_uri (str) – Object storage path, ‘oci://bucketname@namespace/filepath/on/objectstorage’. For example, oci://test_bucket@ociodsccust/tuner/test.zip
metadata (str, optional) – User defined metadata
script_dict (dict, optional) – Script paths for model and scoring. This is only recommended for unsupported models and user-defined scoring functions. You can store the model and scoring function in a dictionary with keys model and scoring and the respective paths as values. The model and scoring scripts must import necessary libraries for the script to run. The model and scoring variables must be set to your model and scoring function.

Returns

Nothing

Return type

None

Example:

# Print out a list of supported models
from ads.hpo.ads_search_space import model_list
print(model_list)

# Example scoring dictionary
{'model':'/home/datascience/advanced-ds/notebooks/scratch/ADSTunerV2/mymodel.py',
'scoring':'/home/datascience/advanced-ds/notebooks/scratch/ADSTunerV2/customized_scoring.py'}

Example:

from ads.hpo.stopping_criterion import *
from ads.hpo.search_cv import ADSTuner
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier

tuner = ADSTuner(
                SGDClassifier(),
                strategy='detailed',
                scoring='f1_weighted',
                random_state=42
            )
tuner.search_space({'max_iter': 100})
X, y = load_iris(return_X_y=True)
tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)], synchronous=True)
tuner.trials_export('oci://<bucket_name>@<namespace>/tuner/test.zip')

classmethod trials_import(file_uri, delete_zip_file=True, target_file_path=None)

Import the database file from the object storage

Parameters

file_uri (str) – ‘oci://bucketname@namespace/filepath/on/objectstorage’ Example: ‘oci://<bucket_name>@<namespace>/tuner/test.zip’
delete_zip_file (bool, defaults to True, optional) – Whether delete the zip file afterwards.
target_file_path (str, optional) – The path where the zip file will be saved. For example, ‘/home/datascience/myfile.zip’.

Returns

ADSTuner object

Return type

ADSTuner

Examples

>>> from ads.hpo.stopping_criterion import *
>>> from ads.hpo.search_cv import ADSTuner
>>> from sklearn.datasets import load_iris
>>> from sklearn.linear_model import SGDClassifier
>>> X, y = load_iris(return_X_y=True)
>>> tuner = ADSTuner.trials_import('oci://<bucket_name>@<namespace>/tuner/test.zip')
>>> tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)], synchronous=True)

property trials_remaining

returns: The number of trials remaining in the budget. :rtype: int

Raises: ExitCriterionError – Raised if the current tuner does not include a trials-based exit condition.

tune(X=None, y=None, exit_criterion=[], loglevel=None, synchronous=False)

Run hypyerparameter tuning until one of the <code>exit_criterion</code> is met. The default is to run 50 trials.

Parameters

X (TwoDimArrayLikeType, Union[List[List[float]], np.ndarray, pd.DataFrame, spmatrix, ADSData]) – Training data.
y (Union[OneDimArrayLikeType, TwoDimArrayLikeType], optional) –
OneDimArrayLikeType (Union[List[float], np.ndarray, pd.Series]) –
TwoDimArrayLikeType (Union[List[List[float]], np.ndarray, pd.DataFrame, spmatrix, ADSData]) – Target.
exit_criterion (list, optional) – A list of ads stopping criterion. Can be ScoreValue(), NTrials(), TimeBudget(). For example, [ScoreValue(0.96), NTrials(40), TimeBudget(10)]. It will exit when any of the stopping criterion is satisfied in the exit_criterion list. By default, the run will stop after 50 trials.
loglevel (int, optional) – Log level.
synchronous (boolean, optional) – Tune synchronously or not. Defaults to False

Returns

Nothing

Return type

None

Example:

from ads.hpo.stopping_criterion import *
from ads.hpo.search_cv import ADSTuner
from sklearn.datasets import load_iris
from sklearn.svm import SVC

tuner = ADSTuner(
                SVC(),
                strategy='detailed',
                scoring='f1_weighted',
                random_state=42
            )
tuner.search_space({'max_iter': 100})
X, y = load_iris(return_X_y=True)
tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])

wait()

Wait for the current tuning process to finish running.

Returns: Nothing
Return type: None

Example:

from ads.hpo.stopping_criterion import *
from ads.hpo.search_cv import ADSTuner
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier

tuner = ADSTuner(
                SGDClassifier(),
                strategy='detailed',
                scoring='f1_weighted',
                random_state=42
            )
tuner.search_space({'max_iter': 100})
X, y = load_iris(return_X_y=True)
tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])
tuner.wait()

exception ads.hpo.search_cv.DuplicatedStudyError

Bases: Exception

DuplicatedStudyError is raised when a new tuner process is created with a study name that already exists in storage.

exception ads.hpo.search_cv.ExitCriterionError

Bases: Exception

ExitCriterionError is raised when an attempt is made to check exit status for a different exit type than the tuner was initialized with. For example, if an HPO study has an exit criteria based on the number of trials and a request is made for the time remaining, which is a different exit criterion, an exception is raised.

exception ads.hpo.search_cv.InvalidStateTransition

Bases: Exception

Invalid State Transition is raised when an invalid transition request is made, such as calling halt without a running process.

exception ads.hpo.search_cv.NoRestartError

Bases: Exception

NoRestartError is raised when an attempt is made to check how many seconds have transpired since the HPO process was last resumed from a halt. This can happen if the process has been terminated or it was never halted and then resumed to begin with.

class ads.hpo.search_cv.State(value)

Bases: Enum

An enumeration.

COMPLETED = 5

HALTED = 3

INITIATED = 1

RUNNING = 2

TERMINATED = 4

ads.hpo.stopping_criterion

class ads.hpo.stopping_criterion.NTrials(n_trials: int)

Bases: object

Exit based on number of trials.

Parameters: n_trials (int) – Number of trials (sets of hyperparamters tested). If None, there is no limitation on the number of trials.
Returns: NTrials object
Return type: NTrials

class ads.hpo.stopping_criterion.ScoreValue(score: float)

Bases: object

Exit if the score is greater than or equal to the threshold.

Parameters: score (float) – The threshold for exiting the tuning process. If a trial value is greater or equal to score, process exits.
Returns: ScoreValue object
Return type: ScoreValue

class ads.hpo.stopping_criterion.TimeBudget(seconds: float)

Bases: object

Exit based on the number of seconds.

Parameters: seconds (float) – Time limit, in seconds. If None there is no time limit.
Returns: TimeBudget object
Return type: TimeBudget

ads.hpo package

Submodules

ads.hpo.distributions module

ads.hpo.search_cv module

ads.hpo.stopping_criterion

Module contents