ads.automl package

Submodules

ads.automl.driver module

class ads.automl.driver.AutoML(training_data, validation_data=None, provider=None, baseline='dummy', client=None)

Bases: object

Creates an Automatic machine learning object.

Parameters

training_data (ADSData instance) –
validation_data (ADSData instance) –
provider (None or object of ads.automl.provider.AutoMLProvider) – If None, the default OracleAutoMLProvider will be used to generate the model
baseline (None, "dummy", or object of ads.common.model.ADSModel (Default is "dummy")) –
- If None, than no baseline is created,
- If “dummy”, than the DummyClassifier or DummyRegressor are used
- If Object, than whatever estimator is provided will be used.
This estimator must include a part of its pipeline which does preprocessing to handle categorical data
client – Dask Client to use (optional)

Examples

>>> train, test = ds.train_test_split()
>>> olabs_automl = OracleAutoMLProvider()
>>> model, baseline = AutoML(train, provider=olabs_automl).train()

train(**kwargs)

Returns a fitted automl model and a fitted baseline model.

Parameters

kwargs (dict, optional) – kwargs passed to provider’s train method

Returns

model (object of ads.common.model.ADSModel) – the trained automl model
baseline (object of ads.common.model.ADSModel) – the baseline model to compare

Examples

>>> train, test = ds.train_test_split()
>>> olabs_automl = OracleAutoMLProvider()
>>> model, baseline = AutoML(train, provider=olabs_automl).train()

ads.automl.driver.get_ml_task_type(X, y, classes)

Gets the ML task type and returns it.

Parameters

X (Dataframe) – The training dataframe
Y (Dataframe) – The testing dataframe
Classes (List) – a list of classes

Returns

A particular task type like REGRESSION, MULTI_CLASS_CLASSIFICATION…

Return type

ml_task_type

ads.automl.provider module

class ads.automl.provider.AutoMLFeatureSelection(msg)

Bases: object

fit(X)

Fits the baseline estimator

Parameters: X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
Returns: Self – The fitted estimator
Return type: Estimator

transform(X)

Runs the Baselines transform function and returns the result

Parameters: X (Dataframe or list-like) – A Dataframe or list-like object holding data to be transformed
Returns: X – The transformed Dataframe.
Return type: Dataframe or list-like

class ads.automl.provider.AutoMLPreprocessingTransformer(msg)

Bases: object

fit(X)

Fits the preprocessing Transformer

Parameters: X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
Returns: Self – The fitted estimator
Return type: Estimator

transform(X)

Runs the preprocessing transform function and returns the result

Parameters: X (Dataframe or list-like) – A Dataframe or list-like object holding data to be transformed
Returns: X – The transformed Dataframe.
Return type: Dataframe or list-like

class ads.automl.provider.AutoMLProvider

Bases: ABC

Abstract Base Class defining the structure of an AutoML solution. The solution needs to implement train() and get_transformer_pipeline().

property est

Returns the estimator.

The estimator can be a standard sklearn estimator or any object that implement methods from (BaseEstimator, RegressorMixin) for regression or (BaseEstimator, ClassifierMixin) for classification.

Returns: est
Return type: An instance of estimator

abstract get_transformer_pipeline()

Returns a list of transformers representing the transformations done on data before model prediction.

This method is optional to implement, and is used only for visualizing transformations on data using ADSModel#visualize_transforms().

Returns: transformers_list
Return type: list of transformers implementing fit and transform

setup(X_train, y_train, ml_task_type, X_valid=None, y_valid=None, class_names=None, client=None)

Setup arguments to the AutoML instance.

Parameters

X_train (DataFrame) – Training features
y_train (DataFrame) – Training labels
ml_task_type (One of ml_task_type.{REGRESSION,BINARY_CLASSIFICATION,) – MULTI_CLASS_CLASSIFICATION,BINARY_TEXT_CLASSIFICATION,MULTI_CLASS_TEXT_CLASSIFICATION}
X_valid (DataFrame) – Validation features
y_valid (DataFrame) – Validation labels
class_names (list) – Unique values in y_train
client (object) – Dask client instance for distributed execution

abstract train(**kwargs)

Calls fit on estimator.

This method is expected to set the ‘est’ property.

Parameters

kwargs (dict, optional) –
method (kwargs to decide the estimator and arguments for the fit) –

class ads.automl.provider.BaselineAutoMLProvider(est)

Bases: AutoMLProvider

Generates a baseline model using the Zero Rule algorithm by default. For a classification predictive modeling problem where a categorical value is predicted, the Zero Rule algorithm predicts the class value that has the most observations in the training dataset.

Parameters: est (BaselineModel) – An estimator that supports the fit/predict/predict_proba interface. By default, DummyClassifier/DummyRegressor are used as estimators

decide_estimator(**kwargs)

Decides which type of BaselineModel to generate.

Returns: Modell – A baseline model generated for the particular ML task being performed
Return type: BaselineModel

get_transformer_pipeline()

Returns a list of transformers representing the transformations done on data before model prediction.

This method is used only for visualizing transformations on data using ADSModel#visualize_transforms().

Returns: transformers_list
Return type: list of transformers implementing fit and transform

train(**kwargs)

Calls fit on estimator.

This method is expected to set the ‘est’ property.

Parameters

kwargs (dict, optional) –
method (kwargs to decide the estimator and arguments for the fit) –

class ads.automl.provider.BaselineModel(est)

Bases: object

A BaselineModel object that supports fit/predict/predict_proba/transform interface. Labels (y) are encoded using DataFrameLabelEncoder.

fit(X, y)

Fits the baseline estimator.

Parameters

X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
Y (Dataframe, Series, or list-like) – A Dataframe, series, or list-like object holding the labels

Returns

estimator

Return type

The fitted estimator

predict(X)

Runs the Baselines predict function and returns the result.

Parameters: X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
Returns: List
Return type: A list of predictions performed on the input data.

predict_proba(X)

Runs the Baselines predict_proba function and returns the result.

Parameters: X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
Returns: List
Return type: A list of probabilities of being part of a class

transform(X)

Runs the Baselines transform function and returns the result.

Parameters: X (Dataframe or list-like) – A Dataframe or list-like object holding data to be transformed
Returns: Dataframe or list-like
Return type: The transformed Dataframe. Currently, no transformation is performed by the default Baseline Estimator.

class ads.automl.provider.OracleAutoMLProvider(n_jobs=-1, loglevel=None, logger_override=None, model_n_jobs: int = 1)

Bases: AutoMLProvider, ABC

The Oracle AutoML Provider automatically provides a tuned ML pipeline that best models the given a training dataset and a prediction task at hand.

Parameters

n_jobs (int) – Specifies the degree of parallelism for Oracle AutoML. -1 (default) means that AutoML will use all available cores.
loglevel (int) – The verbosity of output for Oracle AutoML. Can be specified using the Python logging module (https://docs.python.org/3/library/logging.html#logging-levels).
model_n_jobs ((optional, int). Defaults to 1.) – Specifies the model parallelism used by AutoML. This will be passed to the underlying model it is training.

get_transformer_pipeline()

Returns a list of transformers representing the transformations done on data before model prediction.

This method is used only for visualizing transformations on data using ADSModel#visualize_transforms().

Returns: transformers_list
Return type: list of transformers implementing fit and transform

print_summary(max_rows=None, sort_column='Mean Validation Score', ranking_table_only=False)

Prints a summary of the Oracle AutoML Pipeline in the last train() call.

Parameters

max_rows (int) – Number of trials to print. Pass in None to print all trials
sort_column (string) – Column to sort results by. Must be one of [‘Algorithm’, ‘#Samples’, ‘#Features’, ‘Mean Validation Score’, ‘Hyperparameters’, ‘All Validation Scores’, ‘CPU Time’]
ranking_table_only (bool) – Table to be displayed. Pass in False to display the complete table. Pass in True to display the ranking table only.

print_trials(max_rows=None, sort_column='Mean Validation Score')

Prints all trials executed by the Oracle AutoML Pipeline in the last train() call.

Parameters

max_rows (int) – Number of trials to print. Pass in None to print all trials
sort_column (string) – Column to sort results by. Must be one of [‘Algorithm’, ‘#Samples’, ‘#Features’, ‘Mean Validation Score’, ‘Hyperparameters’, ‘All Validation Scores’, ‘CPU Time’]

selected_model_name(): Return the name of the selected model by AutoML.

selected_score_label(): Return the name of score_metric used in train.

train(**kwargs)

Train the Oracle AutoML Pipeline. This looks at the training data, and identifies the best set of features, the best algorithm and the best set of hyperparameters for this data. A model is then generated, trained on this data and returned.

Parameters

score_metric (str, callable) – Score function (or loss function) with signature score_func(y, y_pred, **kwargs) or string specified as https://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values
random_state (int) – Random seed used by AutoML
model_list (list of str) – Models that will be evaluated by the Pipeline. Supported models: - Classification: AdaBoostClassifier, DecisionTreeClassifier, ExtraTreesClassifier, KNeighborsClassifier, LGBMClassifier, LinearSVC, LogisticRegression, RandomForestClassifier, SVC, XGBClassifier - Regression: AdaBoostRegressor, DecisionTreeRegressor, ExtraTreesRegressor, KNeighborsRegressor, LGBMRegressor, LinearSVR, LinearRegression, RandomForestRegressor, SVR, XGBRegressor
time_budget (float, optional) – Time budget in seconds where 0 means no time budget constraint (best effort)
min_features (int, float, list, optional (default: 1)) – Minimum number of features to keep. Acceptable values: - If int, 0 < min_features <= n_features - If float, 0 < min_features <= 1.0 - If list, names of features to keep, for example [‘a’, ‘b’] means keep features ‘a’ and ‘b’

Returns

self

Return type

object

visualize_adaptive_sampling_trials(): Visualize the trials for Adaptive Sampling.

visualize_algorithm_selection_trials(ylabel=None)

Plot the scores predicted by Algorithm Selection for each algorithm. The horizontal line shows the average score across all algorithms. Algorithms below the line are colored turquoise, whereas those with a score higher than the mean are colored teal. The orange bar shows the algorithm with the highest predicted score. The error bar is +/- one standard error.

Parameters: ylabel (str,) – Label for the y-axis. Defaults to the scoring metric.

visualize_feature_selection_trials(ylabel=None)

Visualize the feature selection trials taken to arrive at optimal set of features. The orange line shows the optimal number of features chosen by Feature Selection.

Parameters: ylabel (str,) – Label for the y-axis. Defaults to the scoring metric.

visualize_tuning_trials(ylabel=None)

Visualize (plot) the hyperparamter tuning trials taken to arrive at the optimal hyper parameters. Each trial in the plot represents a particular hyperparamter combination.

Parameters: ylabel (str,) – Label for the y-axis. Defaults to the scoring metric.

ads.automl package

Submodules

ads.automl.driver module

ads.automl.provider module

Module contents