ads.automl package

Submodules

ads.automl.driver module

class ads.automl.driver.AutoML(training_data, validation_data=None, provider=None, baseline='dummy', client=None)

Bases: object

Creates an Automatic machine learning object.

Parameters
  • training_data (ADSData instance) –

  • validation_data (ADSData instance) –

  • provider (None or object of ads.automl.provider.AutoMLProvider) – If None, the default OracleAutoMLProvider will be used to generate the model

  • baseline (None, "dummy", or object of ads.common.model.ADSModel (Default is "dummy")) –

    • If None, than no baseline is created,

    • If “dummy”, than the DummyClassifier or DummyRegressor are used

    • If Object, than whatever estimator is provided will be used.

    This estimator must include a part of its pipeline which does preprocessing to handle categorical data

  • client – Dask Client to use (optional)

Examples

>>> train, test = ds.train_test_split()
>>> olabs_automl = OracleAutoMLProvider()
>>> model, baseline = AutoML(train, provider=olabs_automl).train()
train(**kwargs)

Returns a fitted automl model and a fitted baseline model.

Parameters

kwargs (dict, optional) – kwargs passed to provider’s train method

Returns

  • model (object of ads.common.model.ADSModel) – the trained automl model

  • baseline (object of ads.common.model.ADSModel) – the baseline model to compare

Examples

>>> train, test = ds.train_test_split()
>>> olabs_automl = OracleAutoMLProvider()
>>> model, baseline = AutoML(train, provider=olabs_automl).train()
ads.automl.driver.get_ml_task_type(X, y, classes)

Gets the ML task type and returns it.

Parameters
  • X (Dataframe) – The training dataframe

  • Y (Dataframe) – The testing dataframe

  • Classes (List) – a list of classes

Returns

A particular task type like REGRESSION, MULTI_CLASS_CLASSIFICATION

Return type

ml_task_type

ads.automl.provider module

class ads.automl.provider.AutoMLFeatureSelection(msg)

Bases: object

fit(X)

Fits the baseline estimator

Parameters

X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on

Returns

Self – The fitted estimator

Return type

Estimator

transform(X)

Runs the Baselines transform function and returns the result

Parameters

X (Dataframe or list-like) – A Dataframe or list-like object holding data to be transformed

Returns

X – The transformed Dataframe.

Return type

Dataframe or list-like

class ads.automl.provider.AutoMLPreprocessingTransformer(msg)

Bases: object

fit(X)

Fits the preprocessing Transformer

Parameters

X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on

Returns

Self – The fitted estimator

Return type

Estimator

transform(X)

Runs the preprocessing transform function and returns the result

Parameters

X (Dataframe or list-like) – A Dataframe or list-like object holding data to be transformed

Returns

X – The transformed Dataframe.

Return type

Dataframe or list-like

class ads.automl.provider.AutoMLProvider

Bases: ABC

Abstract Base Class defining the structure of an AutoML solution. The solution needs to implement train() and get_transformer_pipeline().

property est

Returns the estimator.

The estimator can be a standard sklearn estimator or any object that implement methods from (BaseEstimator, RegressorMixin) for regression or (BaseEstimator, ClassifierMixin) for classification.

Returns

est

Return type

An instance of estimator

abstract get_transformer_pipeline()

Returns a list of transformers representing the transformations done on data before model prediction.

This method is optional to implement, and is used only for visualizing transformations on data using ADSModel#visualize_transforms().

Returns

transformers_list

Return type

list of transformers implementing fit and transform

setup(X_train, y_train, ml_task_type, X_valid=None, y_valid=None, class_names=None, client=None)

Setup arguments to the AutoML instance.

Parameters
  • X_train (DataFrame) – Training features

  • y_train (DataFrame) – Training labels

  • ml_task_type (One of ml_task_type.{REGRESSION,BINARY_CLASSIFICATION,) – MULTI_CLASS_CLASSIFICATION,BINARY_TEXT_CLASSIFICATION,MULTI_CLASS_TEXT_CLASSIFICATION}

  • X_valid (DataFrame) – Validation features

  • y_valid (DataFrame) – Validation labels

  • class_names (list) – Unique values in y_train

  • client (object) – Dask client instance for distributed execution

abstract train(**kwargs)

Calls fit on estimator.

This method is expected to set the ‘est’ property.

Parameters
  • kwargs (dict, optional) –

  • method (kwargs to decide the estimator and arguments for the fit) –

class ads.automl.provider.BaselineAutoMLProvider(est)

Bases: AutoMLProvider

Generates a baseline model using the Zero Rule algorithm by default. For a classification predictive modeling problem where a categorical value is predicted, the Zero Rule algorithm predicts the class value that has the most observations in the training dataset.

Parameters

est (BaselineModel) – An estimator that supports the fit/predict/predict_proba interface. By default, DummyClassifier/DummyRegressor are used as estimators

decide_estimator(**kwargs)

Decides which type of BaselineModel to generate.

Returns

Modell – A baseline model generated for the particular ML task being performed

Return type

BaselineModel

get_transformer_pipeline()

Returns a list of transformers representing the transformations done on data before model prediction.

This method is used only for visualizing transformations on data using ADSModel#visualize_transforms().

Returns

transformers_list

Return type

list of transformers implementing fit and transform

train(**kwargs)

Calls fit on estimator.

This method is expected to set the ‘est’ property.

Parameters
  • kwargs (dict, optional) –

  • method (kwargs to decide the estimator and arguments for the fit) –

class ads.automl.provider.BaselineModel(est)

Bases: object

A BaselineModel object that supports fit/predict/predict_proba/transform interface. Labels (y) are encoded using DataFrameLabelEncoder.

fit(X, y)

Fits the baseline estimator.

Parameters
  • X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on

  • Y (Dataframe, Series, or list-like) – A Dataframe, series, or list-like object holding the labels

Returns

estimator

Return type

The fitted estimator

predict(X)

Runs the Baselines predict function and returns the result.

Parameters

X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on

Returns

List

Return type

A list of predictions performed on the input data.

predict_proba(X)

Runs the Baselines predict_proba function and returns the result.

Parameters

X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on

Returns

List

Return type

A list of probabilities of being part of a class

transform(X)

Runs the Baselines transform function and returns the result.

Parameters

X (Dataframe or list-like) – A Dataframe or list-like object holding data to be transformed

Returns

Dataframe or list-like

Return type

The transformed Dataframe. Currently, no transformation is performed by the default Baseline Estimator.

class ads.automl.provider.OracleAutoMLProvider(n_jobs=-1, loglevel=None, logger_override=None, model_n_jobs: int = 1)

Bases: AutoMLProvider, ABC

The Oracle AutoML Provider automatically provides a tuned ML pipeline that best models the given a training dataset and a prediction task at hand.

Parameters
  • n_jobs (int) – Specifies the degree of parallelism for Oracle AutoML. -1 (default) means that AutoML will use all available cores.

  • loglevel (int) – The verbosity of output for Oracle AutoML. Can be specified using the Python logging module (https://docs.python.org/3/library/logging.html#logging-levels).

  • model_n_jobs ((optional, int). Defaults to 1.) – Specifies the model parallelism used by AutoML. This will be passed to the underlying model it is training.

get_transformer_pipeline()

Returns a list of transformers representing the transformations done on data before model prediction.

This method is used only for visualizing transformations on data using ADSModel#visualize_transforms().

Returns

transformers_list

Return type

list of transformers implementing fit and transform

print_summary(max_rows=None, sort_column='Mean Validation Score', ranking_table_only=False)

Prints a summary of the Oracle AutoML Pipeline in the last train() call.

Parameters
  • max_rows (int) – Number of trials to print. Pass in None to print all trials

  • sort_column (string) – Column to sort results by. Must be one of [‘Algorithm’, ‘#Samples’, ‘#Features’, ‘Mean Validation Score’, ‘Hyperparameters’, ‘All Validation Scores’, ‘CPU Time’]

  • ranking_table_only (bool) – Table to be displayed. Pass in False to display the complete table. Pass in True to display the ranking table only.

print_trials(max_rows=None, sort_column='Mean Validation Score')

Prints all trials executed by the Oracle AutoML Pipeline in the last train() call.

Parameters
  • max_rows (int) – Number of trials to print. Pass in None to print all trials

  • sort_column (string) – Column to sort results by. Must be one of [‘Algorithm’, ‘#Samples’, ‘#Features’, ‘Mean Validation Score’, ‘Hyperparameters’, ‘All Validation Scores’, ‘CPU Time’]

selected_model_name()

Return the name of the selected model by AutoML.

selected_score_label()

Return the name of score_metric used in train.

train(**kwargs)

Train the Oracle AutoML Pipeline. This looks at the training data, and identifies the best set of features, the best algorithm and the best set of hyperparameters for this data. A model is then generated, trained on this data and returned.

Parameters
  • score_metric (str, callable) – Score function (or loss function) with signature score_func(y, y_pred, **kwargs) or string specified as https://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values

  • random_state (int) – Random seed used by AutoML

  • model_list (list of str) – Models that will be evaluated by the Pipeline. Supported models: - Classification: AdaBoostClassifier, DecisionTreeClassifier, ExtraTreesClassifier, KNeighborsClassifier, LGBMClassifier, LinearSVC, LogisticRegression, RandomForestClassifier, SVC, XGBClassifier - Regression: AdaBoostRegressor, DecisionTreeRegressor, ExtraTreesRegressor, KNeighborsRegressor, LGBMRegressor, LinearSVR, LinearRegression, RandomForestRegressor, SVR, XGBRegressor

  • time_budget (float, optional) – Time budget in seconds where 0 means no time budget constraint (best effort)

  • min_features (int, float, list, optional (default: 1)) – Minimum number of features to keep. Acceptable values: - If int, 0 < min_features <= n_features - If float, 0 < min_features <= 1.0 - If list, names of features to keep, for example [‘a’, ‘b’] means keep features ‘a’ and ‘b’

Returns

self

Return type

object

visualize_adaptive_sampling_trials()

Visualize the trials for Adaptive Sampling.

visualize_algorithm_selection_trials(ylabel=None)

Plot the scores predicted by Algorithm Selection for each algorithm. The horizontal line shows the average score across all algorithms. Algorithms below the line are colored turquoise, whereas those with a score higher than the mean are colored teal. The orange bar shows the algorithm with the highest predicted score. The error bar is +/- one standard error.

Parameters

ylabel (str,) – Label for the y-axis. Defaults to the scoring metric.

visualize_feature_selection_trials(ylabel=None)

Visualize the feature selection trials taken to arrive at optimal set of features. The orange line shows the optimal number of features chosen by Feature Selection.

Parameters

ylabel (str,) – Label for the y-axis. Defaults to the scoring metric.

visualize_tuning_trials(ylabel=None)

Visualize (plot) the hyperparamter tuning trials taken to arrive at the optimal hyper parameters. Each trial in the plot represents a particular hyperparamter combination.

Parameters

ylabel (str,) – Label for the y-axis. Defaults to the scoring metric.

Module contents