ads.hpo package
Submodules
ads.hpo.distributions module
- class ads.hpo.distributions.CategoricalDistribution(choices: Sequence[Union[None, bool, int, float, str]])
Bases:
Distribution
A categorical distribution.
- Parameters
choices – Parameter value candidates. It is recommended to restrict the types of the choices to the following:
None
,bool
,int
,float
andstr
.
- class ads.hpo.distributions.DiscreteUniformDistribution(low: float, high: float, step: float)
Bases:
Distribution
A discretized uniform distribution in the linear domain.
Note
If the range \([\mathsf{low}, \mathsf{high}]\) is not divisible by \(q\), \(\mathsf{high}\) will be replaced with the maximum of \(k q + \mathsf{low} \lt \mathsf{high}\), where \(k\) is an integer.
- Parameters
low (float) – Lower endpoint of the range of the distribution. low is included in the range.
high (float) – Upper endpoint of the range of the distribution. high is included in the range.
step (float) – A discretization step.
- class ads.hpo.distributions.Distribution(dist)
Bases:
object
Defines the abstract base class for hyperparameter search distributions
- get_distribution()
Returns the distribution
- class ads.hpo.distributions.DistributionEncode(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
Bases:
JSONEncoder
Constructor for JSONEncoder, with sensible defaults.
If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.
If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.
If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an OverflowError). Otherwise, no such check takes place.
If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.
If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.
If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.
If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is
None
and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a
TypeError
.- default(dist: Distribution) Dict[str, Any]
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- static from_json(json_object: Dict[Any, Any])
- class ads.hpo.distributions.IntLogUniformDistribution(low: float, high: float, step: float = 1)
Bases:
Distribution
A uniform distribution on integers in the log domain.
- Parameters
low – Lower endpoint of the range of the distribution. low is included in the range.
high – Upper endpoint of the range of the distribution. high is included in the range.
step – A step for spacing between values.
- class ads.hpo.distributions.IntUniformDistribution(low: float, high: float, step: float = 1)
Bases:
Distribution
A uniform distribution on integers.
Note
If the range \([\mathsf{low}, \mathsf{high}]\) is not divisible by \(\mathsf{step}\), \(\mathsf{high}\) will be replaced with the maximum of \(k \times \mathsf{step} + \mathsf{low} \lt \mathsf{high}\), where \(k\) is an integer.
- Parameters
low – Lower endpoint of the range of the distribution. low is included in the range.
high – Upper endpoint of the range of the distribution. high is included in the range.
step – A step for spacing between values.
- class ads.hpo.distributions.LogUniformDistribution(low: float, high: float)
Bases:
Distribution
A uniform distribution in the log domain.
- Parameters
low – Lower endpoint of the range of the distribution. low is included in the range.
high – Upper endpoint of the range of the distribution. high is excluded from the range.
- class ads.hpo.distributions.UniformDistribution(low: float, high: float)
Bases:
Distribution
A uniform distribution in the linear domain.
- Parameters
low – Lower endpoint of the range of the distribution. low is included in the range.
high – Upper endpoint of the range of the distribution. high is excluded from the range.
- ads.hpo.distributions.decode(s: str)
Decodes a string to an object
- Parameters
s (str) – The string being decoded to a distribution object
- Returns
Decoded string
- Return type
Distribution
orDict
- ads.hpo.distributions.encode(o: Distribution) str
Encodes a distribution to a string
- Parameters
o (
Distribution
) – The distribution to encode- Returns
The distribution encoded as a string
- Return type
str (
DistributionEncode
)
ads.hpo.search_cv module
- class ads.hpo.search_cv.ADSTuner(model, strategy='perfunctory', scoring=None, cv=5, study_name=None, storage=None, load_if_exists=True, random_state=None, loglevel=20, n_jobs=1, X=None, y=None)
Bases:
BaseEstimator
Hyperparameter search with cross-validation.
Returns a hyperparameter tuning object
- Parameters
model – Object to use to fit the data. This is assumed to implement the scikit-learn estimator or pipeline interface.
strategy –
perfunctory
,detailed
or a dictionary/mapping of hyperparameter and its distribution . If obj:perfunctory, picks a few relatively more important hyperparmeters to tune . If obj:detailed, extends to a larger search space. If obj:dict, user defined search space: Dictionary where keys are hyperparameters and values are distributions. Distributions are assumed to implement the ads distribution interface.scoring (Optional[Union[Callable[..., float], str]]) – String or callable to evaluate the predictions on the validation data. If
None
,score
on the estimator is used.cv (int) – Integer to specify the number of folds in a CV splitter. If
estimator
is a classifier andy
is either binary or multiclass,sklearn.model_selection.StratifiedKFold
is used. otherwise,sklearn.model_selection.KFold
is used.study_name (str,) – Name of the current experiment for the ADSTuner object. One ADSTuner object can only be attached to one study_name.
storage – Database URL. (e.g. sqlite:///example.db). Default to sqlite:////tmp/hpo_*.db.
load_if_exists – Flag to control the behavior to handle a conflict of study names. In the case where a study named
study_name
already exists in thestorage
, aDuplicatedStudyError
is raised ifload_if_exists
is set toFalse
. Otherwise, the existing one is returned.random_state – Seed of the pseudo random number generator. If int, this is the seed used by the random number generator. If
None
, the global random state fromnumpy.random
is used.loglevel – loglevel. can be logging.NOTSET, logging.INFO, logging.DEBUG, logging.WARNING
n_jobs (int) – Number of parallel jobs.
-1
means using all processors.X (TwoDimArrayLikeType, Union[List[List[float]], np.ndarray,) –
pd.DataFrame – Training data.
spmatrix – Training data.
ADSData] – Training data.
y (Union[OneDimArrayLikeType, TwoDimArrayLikeType], optional) –
OneDimArrayLikeType (Union[List[float], np.ndarray, pd.Series]) –
TwoDimArrayLikeType (Union[List[List[float]], np.ndarray, pd.DataFrame, spmatrix, ADSData]) – Target.
Example:
from ads.hpo.stopping_criterion import * from ads.hpo.search_cv import ADSTuner from sklearn.datasets import load_iris from sklearn.svm import SVC tuner = ADSTuner( SVC(), strategy='detailed', scoring='f1_weighted', random_state=42 ) X, y = load_iris(return_X_y=True) tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])
- property best_index
returns: Index which corresponds to the best candidate parameter setting. :rtype: int
- property best_params
returns: Parameters of the best trial. :rtype: Dict[str, Any]
- property best_score
returns: Mean cross-validated score of the best estimator. :rtype: float
- best_scores(n: int = 5, reverse: bool = True)
Return the best scores from the study
- Parameters
n (int) – The maximum number of results to show. Defaults to 5. If None or negative return all.
reverse (bool) – Whether to reverse the sort order so results are in descending order. Defaults to True
- Returns
List of the best scores
- Return type
list[float or int]
- Raises
ValueError –
- get_status()
return the status of the current tuning process.
Alias for the property status.
- Returns
The status of the process
- Return type
Status
Example:
from ads.hpo.stopping_criterion import * from ads.hpo.search_cv import ADSTuner from sklearn.datasets import load_iris from sklearn.linear_model import SGDClassifier tuner = ADSTuner( SGDClassifier(), strategy='detailed', scoring='f1_weighted', random_state=42 ) tuner.search_space({'max_iter': 100}) X, y = load_iris(return_X_y=True) tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)]) tuner.get_status()
- halt()
Halt the current running tuning process.
- Returns
Nothing
- Return type
None
- Raises
Example:
from ads.hpo.stopping_criterion import * from ads.hpo.search_cv import ADSTuner from sklearn.datasets import load_iris from sklearn.linear_model import SGDClassifier tuner = ADSTuner( SGDClassifier(), strategy='detailed', scoring='f1_weighted', random_state=42 ) tuner.search_space({'max_iter': 100}) X, y = load_iris(return_X_y=True) tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)]) tuner.halt()
- is_completed()
- Returns
True if the
ADSTuner
instance has completed; False otherwise.- Return type
bool
- is_terminated()
- Returns
True if the
ADSTuner
instance has been terminated; False otherwise.- Return type
bool
- property n_trials
returns: Number of completed trials. Alias for trial_count. :rtype: int
- static optimizer(study_name, pruner, sampler, storage, load_if_exists, objective_func, global_start, global_stop, **kwargs)
Static method for running ADSTuner tuning process
- Parameters
study_name (str) – The name of the study.
pruner – The pruning method for pruning trials.
sampler – The sampling method used for tuning.
storage (str) – Storage endpoint.
load_if_exists (bool) – Load existing study if it exists.
objective_func – The objective function to be maximized.
global_start (
multiprocesing.Value
) – The global start time.global_stop (
multiprocessing.Value
) – The global stop time.kwargs (dict) – Keyword/value pairs passed into the optimize process
- Raises
Exception – Raised for any exceptions thrown by the underlying optimization process
- Returns
Nothing
- Return type
None
- plot_best_scores(best=True, inferior=True, time_interval=1, fig_size=(800, 500))
Plot optimization history of all trials in a study.
- Parameters
best – controls whether to plot the lines for the best scores so far.
inferior – controls whether to plot the dots for the actual objective scores.
time_interval – how often(in seconds) the plot refresh to check on the new trial results.
fig_size (tuple) – width and height of the figure.
- Returns
Nothing.
- Return type
None
- plot_contour_scores(params=None, time_interval=1, fig_size=(800, 500))
Contour plot of the scores.
- Parameters
params (Optional[List[str]]) – Parameter list to visualize. Defaults to all.
time_interval (float) – Time interval for the plot. Defaults to 1.
fig_size (tuple[int, int]) – Figure size. Defaults to (800, 500).
- Returns
Nothing.
- Return type
None
- plot_edf_scores(time_interval=1, fig_size=(800, 500))
Plot the EDF (empirical distribution function) of the scores.
Only completed trials are used.
- Parameters
time_interval (float) – Time interval for the plot. Defaults to 1.
fig_size (tuple[int, int]) – Figure size. Defaults to (800, 500).
- Returns
Nothing.
- Return type
None
- plot_intermediate_scores(time_interval=1, fig_size=(800, 500))
Plot intermediate values of all trials in a study.
- Parameters
time_interval (float) – Time interval for the plot. Defaults to 1.
fig_size (tuple[int, int]) – Figure size. Defaults to (800, 500).
- Returns
Nothing.
- Return type
None
- plot_parallel_coordinate_scores(params=None, time_interval=1, fig_size=(800, 500))
Plot the high-dimentional parameter relationships in a study.
Note that, If a parameter contains missing values, a trial with missing values is not plotted.
- Parameters
params (Optional[List[str]]) – Parameter list to visualize. Defaults to all.
time_interval (float) – Time interval for the plot. Defaults to 1.
fig_size (tuple[int, int]) – Figure size. Defaults to (800, 500).
- Returns
Nothing.
- Return type
None
- plot_param_importance(importance_evaluator='Fanova', time_interval=1, fig_size=(800, 500))
Plot hyperparameter importances.
- Parameters
importance_evaluator (str) – Importance evaluator. Valid values: “Fanova”, “MeanDecreaseImpurity”. Defaults to “Fanova”.
time_interval (float) – How often the plot refresh to check on the new trial results.
fig_size (tuple) – Width and height of the figure.
- Raises
NotImplementedErorr – Raised for unsupported importance evaluators
- Returns
Nothing.
- Return type
None
- resume()
Resume the current halted tuning process.
- Returns
Nothing
- Return type
None
Example:
from ads.hpo.stopping_criterion import * from ads.hpo.search_cv import ADSTuner from sklearn.datasets import load_iris from sklearn.linear_model import SGDClassifier tuner = ADSTuner( SGDClassifier(), strategy='detailed', scoring='f1_weighted', random_state=42 ) tuner.search_space({'max_iter': 100}) X, y = load_iris(return_X_y=True) tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)]) tuner.halt() tuner.resume()
- property score_remaining
returns: The difference between the best score and the optimal score. :rtype: float
- Raises
ExitCriterionError – Error is raised if there is no score-based criteria for tuning.
- property scoring_name
returns: Scoring name. :rtype: str
- search_space(strategy=None, overwrite=False)
Returns the search space. If strategy is not passed in, return the existing search space. When strategy is passed in, overwrite the existing search space if overwrite is set True, otherwise, only update the existing search space.
- Parameters
strategy (Union[str, dict], optional) –
perfunctory
,detailed
or a dictionary/mapping of the hyperparameters and their distributions. If obj:perfunctory, picks a few relatively more important hyperparmeters to tune . If obj:detailed, extends to a larger search space. If obj:dict, user defined search space: Dictionary where keys are parameters and values are distributions. Distributions are assumed to implement the ads distribution interface.overwrite (bool, optional) – Ignored when strategy is None. Otherwise, search space is overwritten if overwrite is set True and updated if it is False.
- Returns
A mapping of the hyperparameters and their distributions.
- Return type
dict
Example:
from ads.hpo.stopping_criterion import * from ads.hpo.search_cv import ADSTuner from sklearn.datasets import load_iris from sklearn.linear_model import SGDClassifier tuner = ADSTuner( SGDClassifier(), strategy='detailed', scoring='f1_weighted', random_state=42 ) tuner.search_space({'max_iter': 100}) X, y = load_iris(return_X_y=True) tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)]) tuner.search_space()
- property sklearn_steps
returns: Search space which corresponds to the best candidate parameter setting. :rtype: int
- property status
returns: The status of the current tuning process. :rtype:
Status
- terminate()
Terminate the current tuning process.
- Returns
Nothing
- Return type
None
Example:
from ads.hpo.stopping_criterion import * from ads.hpo.search_cv import ADSTuner from sklearn.datasets import load_iris from sklearn.linear_model import SGDClassifier tuner = ADSTuner( SGDClassifier(), strategy='detailed', scoring='f1_weighted', random_state=42 ) tuner.search_space({'max_iter': 100}) X, y = load_iris(return_X_y=True) tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)]) tuner.terminate()
- property time_elapsed
Return the time in seconds that the HPO process has been searching
- Returns
int
- Return type
The number of seconds the HPO process has been searching
- property time_remaining
Returns the number of seconds remaining in the study
- Returns
int
- Return type
Number of seconds remaining in the budget. 0 if complete/terminated
- Raises
ExitCriterionError – Error is raised if time has not been included in the budget.
- property time_since_resume
Return the seconds since the process has been resumed from a halt.
- Returns
int
- Return type
the number of seconds since the process was last resumed
- Raises
- property trial_count
returns: Number of completed trials. Alias for trial_count. :rtype: int
- property trials
returns: Trial data up to this point. :rtype:
pandas.DataFrame
- trials_export(file_uri, metadata=None, script_dict={'model': None, 'scoring': None})
Export the meta data as well as files needed to reconstruct the ADSTuner object to the object storage. Data is not stored. To resume the same ADSTuner object from object storage and continue tuning from previous trials, you have to provide the dataset.
- Parameters
file_uri (str) – Object storage path, ‘oci://bucketname@namespace/filepath/on/objectstorage’. For example, oci://test_bucket@ociodsccust/tuner/test.zip
metadata (str, optional) – User defined metadata
script_dict (dict, optional) – Script paths for model and scoring. This is only recommended for unsupported models and user-defined scoring functions. You can store the model and scoring function in a dictionary with keys model and scoring and the respective paths as values. The model and scoring scripts must import necessary libraries for the script to run. The
model
andscoring
variables must be set to your model and scoring function.
- Returns
Nothing
- Return type
None
Example:
# Print out a list of supported models from ads.hpo.ads_search_space import model_list print(model_list) # Example scoring dictionary {'model':'/home/datascience/advanced-ds/notebooks/scratch/ADSTunerV2/mymodel.py', 'scoring':'/home/datascience/advanced-ds/notebooks/scratch/ADSTunerV2/customized_scoring.py'}
Example:
from ads.hpo.stopping_criterion import * from ads.hpo.search_cv import ADSTuner from sklearn.datasets import load_iris from sklearn.linear_model import SGDClassifier tuner = ADSTuner( SGDClassifier(), strategy='detailed', scoring='f1_weighted', random_state=42 ) tuner.search_space({'max_iter': 100}) X, y = load_iris(return_X_y=True) tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)], synchronous=True) tuner.trials_export('oci://<bucket_name>@<namespace>/tuner/test.zip')
- classmethod trials_import(file_uri, delete_zip_file=True, target_file_path=None)
Import the database file from the object storage
- Parameters
file_uri (str) – ‘oci://bucketname@namespace/filepath/on/objectstorage’ Example: ‘oci://<bucket_name>@<namespace>/tuner/test.zip’
delete_zip_file (bool, defaults to True, optional) – Whether delete the zip file afterwards.
target_file_path (str, optional) – The path where the zip file will be saved. For example, ‘/home/datascience/myfile.zip’.
- Returns
ADSTuner object
- Return type
Examples
>>> from ads.hpo.stopping_criterion import * >>> from ads.hpo.search_cv import ADSTuner >>> from sklearn.datasets import load_iris >>> from sklearn.linear_model import SGDClassifier >>> X, y = load_iris(return_X_y=True) >>> tuner = ADSTuner.trials_import('oci://<bucket_name>@<namespace>/tuner/test.zip') >>> tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)], synchronous=True)
- property trials_remaining
returns: The number of trials remaining in the budget. :rtype: int
- Raises
ExitCriterionError – Raised if the current tuner does not include a trials-based exit condition.
- tune(X=None, y=None, exit_criterion=[], loglevel=None, synchronous=False)
Run hypyerparameter tuning until one of the <code>exit_criterion</code> is met. The default is to run 50 trials.
- Parameters
X (TwoDimArrayLikeType, Union[List[List[float]], np.ndarray, pd.DataFrame, spmatrix, ADSData]) – Training data.
y (Union[OneDimArrayLikeType, TwoDimArrayLikeType], optional) –
OneDimArrayLikeType (Union[List[float], np.ndarray, pd.Series]) –
TwoDimArrayLikeType (Union[List[List[float]], np.ndarray, pd.DataFrame, spmatrix, ADSData]) – Target.
exit_criterion (list, optional) – A list of ads stopping criterion. Can be ScoreValue(), NTrials(), TimeBudget(). For example, [ScoreValue(0.96), NTrials(40), TimeBudget(10)]. It will exit when any of the stopping criterion is satisfied in the exit_criterion list. By default, the run will stop after 50 trials.
loglevel (int, optional) – Log level.
synchronous (boolean, optional) – Tune synchronously or not. Defaults to False
- Returns
Nothing
- Return type
None
Example:
from ads.hpo.stopping_criterion import * from ads.hpo.search_cv import ADSTuner from sklearn.datasets import load_iris from sklearn.svm import SVC tuner = ADSTuner( SVC(), strategy='detailed', scoring='f1_weighted', random_state=42 ) tuner.search_space({'max_iter': 100}) X, y = load_iris(return_X_y=True) tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)])
- wait()
Wait for the current tuning process to finish running.
- Returns
Nothing
- Return type
None
Example:
from ads.hpo.stopping_criterion import * from ads.hpo.search_cv import ADSTuner from sklearn.datasets import load_iris from sklearn.linear_model import SGDClassifier tuner = ADSTuner( SGDClassifier(), strategy='detailed', scoring='f1_weighted', random_state=42 ) tuner.search_space({'max_iter': 100}) X, y = load_iris(return_X_y=True) tuner.tune(X=X, y=y, exit_criterion=[TimeBudget(1)]) tuner.wait()
- exception ads.hpo.search_cv.DuplicatedStudyError
Bases:
Exception
DuplicatedStudyError is raised when a new tuner process is created with a study name that already exists in storage.
- exception ads.hpo.search_cv.ExitCriterionError
Bases:
Exception
ExitCriterionError is raised when an attempt is made to check exit status for a different exit type than the tuner was initialized with. For example, if an HPO study has an exit criteria based on the number of trials and a request is made for the time remaining, which is a different exit criterion, an exception is raised.
- exception ads.hpo.search_cv.InvalidStateTransition
Bases:
Exception
Invalid State Transition is raised when an invalid transition request is made, such as calling halt without a running process.
- exception ads.hpo.search_cv.NoRestartError
Bases:
Exception
NoRestartError is raised when an attempt is made to check how many seconds have transpired since the HPO process was last resumed from a halt. This can happen if the process has been terminated or it was never halted and then resumed to begin with.
ads.hpo.stopping_criterion
- class ads.hpo.stopping_criterion.NTrials(n_trials: int)
Bases:
object
Exit based on number of trials.
- Parameters
n_trials (int) – Number of trials (sets of hyperparamters tested). If
None
, there is no limitation on the number of trials.- Returns
NTrials object
- Return type
- class ads.hpo.stopping_criterion.ScoreValue(score: float)
Bases:
object
Exit if the score is greater than or equal to the threshold.
- Parameters
score (float) – The threshold for exiting the tuning process. If a trial value is greater or equal to score, process exits.
- Returns
ScoreValue object
- Return type
- class ads.hpo.stopping_criterion.TimeBudget(seconds: float)
Bases:
object
Exit based on the number of seconds.
- Parameters
seconds (float) – Time limit, in seconds. If
None
there is no time limit.- Returns
TimeBudget object
- Return type