ads.automl package
Submodules
ads.automl.driver module
- class ads.automl.driver.AutoML(training_data, validation_data=None, provider=None, baseline='dummy', client=None)
Bases:
object
Creates an Automatic machine learning object.
- Parameters
training_data (ADSData instance) –
validation_data (ADSData instance) –
provider (None or object of ads.automl.provider.AutoMLProvider) – If None, the default OracleAutoMLProvider will be used to generate the model
baseline (None, "dummy", or object of ads.common.model.ADSModel (Default is "dummy")) –
If None, than no baseline is created,
If “dummy”, than the DummyClassifier or DummyRegressor are used
If Object, than whatever estimator is provided will be used.
This estimator must include a part of its pipeline which does preprocessing to handle categorical data
client – Dask Client to use (optional)
Examples
>>> train, test = ds.train_test_split() >>> olabs_automl = OracleAutoMLProvider() >>> model, baseline = AutoML(train, provider=olabs_automl).train()
- train(**kwargs)
Returns a fitted automl model and a fitted baseline model.
- Parameters
kwargs (dict, optional) – kwargs passed to provider’s train method
- Returns
model (object of ads.common.model.ADSModel) – the trained automl model
baseline (object of ads.common.model.ADSModel) – the baseline model to compare
Examples
>>> train, test = ds.train_test_split() >>> olabs_automl = OracleAutoMLProvider() >>> model, baseline = AutoML(train, provider=olabs_automl).train()
- ads.automl.driver.get_ml_task_type(X, y, classes)
Gets the ML task type and returns it.
- Parameters
X (Dataframe) – The training dataframe
Y (Dataframe) – The testing dataframe
Classes (List) – a list of classes
- Returns
A particular task type like REGRESSION, MULTI_CLASS_CLASSIFICATION…
- Return type
ml_task_type
ads.automl.provider module
- class ads.automl.provider.AutoMLFeatureSelection(msg)
Bases:
object
- fit(X)
Fits the baseline estimator
- Parameters
X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
- Returns
Self – The fitted estimator
- Return type
Estimator
- transform(X)
Runs the Baselines transform function and returns the result
- Parameters
X (Dataframe or list-like) – A Dataframe or list-like object holding data to be transformed
- Returns
X – The transformed Dataframe.
- Return type
Dataframe or list-like
- class ads.automl.provider.AutoMLPreprocessingTransformer(msg)
Bases:
object
- fit(X)
Fits the preprocessing Transformer
- Parameters
X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
- Returns
Self – The fitted estimator
- Return type
Estimator
- transform(X)
Runs the preprocessing transform function and returns the result
- Parameters
X (Dataframe or list-like) – A Dataframe or list-like object holding data to be transformed
- Returns
X – The transformed Dataframe.
- Return type
Dataframe or list-like
- class ads.automl.provider.AutoMLProvider
Bases:
ABC
Abstract Base Class defining the structure of an AutoML solution. The solution needs to implement train() and get_transformer_pipeline().
- property est
Returns the estimator.
The estimator can be a standard sklearn estimator or any object that implement methods from (BaseEstimator, RegressorMixin) for regression or (BaseEstimator, ClassifierMixin) for classification.
- Returns
est
- Return type
An instance of estimator
- abstract get_transformer_pipeline()
Returns a list of transformers representing the transformations done on data before model prediction.
This method is optional to implement, and is used only for visualizing transformations on data using ADSModel#visualize_transforms().
- Returns
transformers_list
- Return type
list of transformers implementing fit and transform
- setup(X_train, y_train, ml_task_type, X_valid=None, y_valid=None, class_names=None, client=None)
Setup arguments to the AutoML instance.
- Parameters
X_train (DataFrame) – Training features
y_train (DataFrame) – Training labels
ml_task_type (One of ml_task_type.{REGRESSION,BINARY_CLASSIFICATION,) – MULTI_CLASS_CLASSIFICATION,BINARY_TEXT_CLASSIFICATION,MULTI_CLASS_TEXT_CLASSIFICATION}
X_valid (DataFrame) – Validation features
y_valid (DataFrame) – Validation labels
class_names (list) – Unique values in y_train
client (object) – Dask client instance for distributed execution
- abstract train(**kwargs)
Calls fit on estimator.
This method is expected to set the ‘est’ property.
- Parameters
kwargs (dict, optional) –
method (kwargs to decide the estimator and arguments for the fit) –
- class ads.automl.provider.BaselineAutoMLProvider(est)
Bases:
AutoMLProvider
Generates a baseline model using the Zero Rule algorithm by default. For a classification predictive modeling problem where a categorical value is predicted, the Zero Rule algorithm predicts the class value that has the most observations in the training dataset.
- Parameters
est (BaselineModel) – An estimator that supports the fit/predict/predict_proba interface. By default, DummyClassifier/DummyRegressor are used as estimators
- decide_estimator(**kwargs)
Decides which type of BaselineModel to generate.
- Returns
Modell – A baseline model generated for the particular ML task being performed
- Return type
- get_transformer_pipeline()
Returns a list of transformers representing the transformations done on data before model prediction.
This method is used only for visualizing transformations on data using ADSModel#visualize_transforms().
- Returns
transformers_list
- Return type
list of transformers implementing fit and transform
- train(**kwargs)
Calls fit on estimator.
This method is expected to set the ‘est’ property.
- Parameters
kwargs (dict, optional) –
method (kwargs to decide the estimator and arguments for the fit) –
- class ads.automl.provider.BaselineModel(est)
Bases:
object
A BaselineModel object that supports fit/predict/predict_proba/transform interface. Labels (y) are encoded using DataFrameLabelEncoder.
- fit(X, y)
Fits the baseline estimator.
- Parameters
X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
Y (Dataframe, Series, or list-like) – A Dataframe, series, or list-like object holding the labels
- Returns
estimator
- Return type
The fitted estimator
- predict(X)
Runs the Baselines predict function and returns the result.
- Parameters
X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
- Returns
List
- Return type
A list of predictions performed on the input data.
- predict_proba(X)
Runs the Baselines predict_proba function and returns the result.
- Parameters
X (Dataframe or list-like) – A Dataframe or list-like object holding data to be predicted on
- Returns
List
- Return type
A list of probabilities of being part of a class
- transform(X)
Runs the Baselines transform function and returns the result.
- Parameters
X (Dataframe or list-like) – A Dataframe or list-like object holding data to be transformed
- Returns
Dataframe or list-like
- Return type
The transformed Dataframe. Currently, no transformation is performed by the default Baseline Estimator.
- class ads.automl.provider.OracleAutoMLProvider(n_jobs=-1, loglevel=None, logger_override=None, model_n_jobs: int = 1)
Bases:
AutoMLProvider
,ABC
The Oracle AutoML Provider automatically provides a tuned ML pipeline that best models the given a training dataset and a prediction task at hand.
- Parameters
n_jobs (int) – Specifies the degree of parallelism for Oracle AutoML. -1 (default) means that AutoML will use all available cores.
loglevel (int) – The verbosity of output for Oracle AutoML. Can be specified using the Python logging module (https://docs.python.org/3/library/logging.html#logging-levels).
model_n_jobs ((optional, int). Defaults to 1.) – Specifies the model parallelism used by AutoML. This will be passed to the underlying model it is training.
- get_transformer_pipeline()
Returns a list of transformers representing the transformations done on data before model prediction.
This method is used only for visualizing transformations on data using ADSModel#visualize_transforms().
- Returns
transformers_list
- Return type
list of transformers implementing fit and transform
- print_summary(max_rows=None, sort_column='Mean Validation Score', ranking_table_only=False)
Prints a summary of the Oracle AutoML Pipeline in the last train() call.
- Parameters
max_rows (int) – Number of trials to print. Pass in None to print all trials
sort_column (string) – Column to sort results by. Must be one of [‘Algorithm’, ‘#Samples’, ‘#Features’, ‘Mean Validation Score’, ‘Hyperparameters’, ‘All Validation Scores’, ‘CPU Time’]
ranking_table_only (bool) – Table to be displayed. Pass in False to display the complete table. Pass in True to display the ranking table only.
- print_trials(max_rows=None, sort_column='Mean Validation Score')
Prints all trials executed by the Oracle AutoML Pipeline in the last train() call.
- Parameters
max_rows (int) – Number of trials to print. Pass in None to print all trials
sort_column (string) – Column to sort results by. Must be one of [‘Algorithm’, ‘#Samples’, ‘#Features’, ‘Mean Validation Score’, ‘Hyperparameters’, ‘All Validation Scores’, ‘CPU Time’]
- selected_model_name()
Return the name of the selected model by AutoML.
- selected_score_label()
Return the name of score_metric used in train.
- train(**kwargs)
Train the Oracle AutoML Pipeline. This looks at the training data, and identifies the best set of features, the best algorithm and the best set of hyperparameters for this data. A model is then generated, trained on this data and returned.
- Parameters
score_metric (str, callable) – Score function (or loss function) with signature
score_func(y, y_pred, **kwargs)
or string specified as https://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-valuesrandom_state (int) – Random seed used by AutoML
model_list (list of str) – Models that will be evaluated by the Pipeline. Supported models: - Classification: AdaBoostClassifier, DecisionTreeClassifier, ExtraTreesClassifier, KNeighborsClassifier, LGBMClassifier, LinearSVC, LogisticRegression, RandomForestClassifier, SVC, XGBClassifier - Regression: AdaBoostRegressor, DecisionTreeRegressor, ExtraTreesRegressor, KNeighborsRegressor, LGBMRegressor, LinearSVR, LinearRegression, RandomForestRegressor, SVR, XGBRegressor
time_budget (float, optional) – Time budget in seconds where 0 means no time budget constraint (best effort)
min_features (int, float, list, optional (default: 1)) – Minimum number of features to keep. Acceptable values: - If int, 0 < min_features <= n_features - If float, 0 < min_features <= 1.0 - If list, names of features to keep, for example [‘a’, ‘b’] means keep features ‘a’ and ‘b’
- Returns
self
- Return type
object
- visualize_adaptive_sampling_trials()
Visualize the trials for Adaptive Sampling.
- visualize_algorithm_selection_trials(ylabel=None)
Plot the scores predicted by Algorithm Selection for each algorithm. The horizontal line shows the average score across all algorithms. Algorithms below the line are colored turquoise, whereas those with a score higher than the mean are colored teal. The orange bar shows the algorithm with the highest predicted score. The error bar is +/- one standard error.
- Parameters
ylabel (str,) – Label for the y-axis. Defaults to the scoring metric.
- visualize_feature_selection_trials(ylabel=None)
Visualize the feature selection trials taken to arrive at optimal set of features. The orange line shows the optimal number of features chosen by Feature Selection.
- Parameters
ylabel (str,) – Label for the y-axis. Defaults to the scoring metric.
- visualize_tuning_trials(ylabel=None)
Visualize (plot) the hyperparamter tuning trials taken to arrive at the optimal hyper parameters. Each trial in the plot represents a particular hyperparamter combination.
- Parameters
ylabel (str,) – Label for the y-axis. Defaults to the scoring metric.