ads.evaluations package

Submodules

ads.evaluations.evaluation_plot module

class ads.evaluations.evaluation_plot.EvaluationPlot

Bases: object

EvaluationPlot holds data and methods for plots and it used to output them

baseline(bool)

whether to plot the null model or zero information model

baseline_kwargs(dict)

keyword arguments for the baseline plot

color_wheel(dict)

color information used by the plot

font_sz(dict)

dictionary of plot methods

perfect(bool)

determines whether a “perfect” classifier curve is displayed

perfect_kwargs(dict)

parameters for the perfect classifier for precision/recall curves

prob_type(str)

model type, i.e. classification or regression

get_legend_labels(legend_labels)

Renders the legend labels on the plot

plot(evaluation, plots, num_classes, perfect, baseline, legend_labels)

Generates the evalation plot

baseline = None
baseline_kwargs = {'c': '.2', 'ls': '--'}
color_wheel = ['teal', 'blueviolet', 'forestgreen', 'peru', 'y', 'dodgerblue', 'r']
double_overlay_plots = ['pr_and_roc_curve', 'lift_and_gain_chart']
font_sz = {'l': 14, 'm': 12, 's': 10, 'xl': 16, 'xs': 8}
classmethod get_legend_labels(legend_labels)

Gets the legend labels, resolves any conflicts such as length, and renders the labels for the plot

Parameters:

(dict) (legend_labels) – key/value dictionary containing legend label data

Return type:

Nothing

Examples

EvaluationPlot.get_legend_labels({‘class_0’: ‘green’, ‘class_1’: ‘yellow’, ‘class_2’: ‘red’})

perfect = None
perfect_kwargs = {'color': 'gold', 'label': 'Perfect Classifier', 'ls': '--'}
classmethod plot(evaluation, plots, num_classes, perfect=False, baseline=True, legend_labels=None)

Generates the evaluation plot

Parameters:
  • (DataFrame) (evaluation) – DataFrame with models as columns and metrics as rows.

  • (str) (plots) – The plot type based on class attribute prob_type.

  • (int) (num_classes) – The number of classes for the model.

  • (bool (baseline) – Whether to display the curve of a perfect classifier. Default value is False.

  • optional) – Whether to display the curve of a perfect classifier. Default value is False.

  • (bool – Whether to display the curve of the baseline, featureless model. Default value is True.

  • optional) – Whether to display the curve of the baseline, featureless model. Default value is True.

  • (dict (legend_labels) – Legend labels dictionary. Default value is None. If legend_labels not specified class names will be used for plots.

  • optional) – Legend labels dictionary. Default value is None. If legend_labels not specified class names will be used for plots.

Return type:

Nothing

prob_type = None
single_overlay_plots = ['lift_chart', 'gain_chart', 'roc_curve', 'pr_curve']

ads.evaluations.evaluator module

class ads.evaluations.evaluator.ADSEvaluator(test_data, models, training_data=None, positive_class=None, legend_labels=None, show_full_name=False, classes=None, classification_threshold=50)

Bases: object

ADS Evaluator class. This class holds field and methods for creating and using ADS evaluator objects.

evaluations

list of evaluations.

Type:

list[DataFrame]

is_classifier

Whether the dataset looks like a classification problem (versus regression).

Type:

bool

legend_labels

List of legend labels. Defaults to None.

Type:

dict

metrics_to_show

Names of metrics to show.

Type:

list[str]

models

The object built using ADSModel.from_estimator().

Type:

list[ads.common.model.ADSModel]

positive_class

The class to report metrics for binary dataset, assumed to be true.

Type:

str or int

show_full_name

Whether to show the name of the evaluator in relevant contexts.

Type:

bool

test_data

Test data to evaluate model on.

Type:

ads.common.data.ADSData

training_data

Training data to evaluate model.

Type:

ads.common.data.ADSData

Positive_Class_names

Class attribute listing the ways to represent positive classes

Type:

list

add_metrics(func, names)

Adds the listed metics to the evaluator it is called on

del_metrics(names)

Removes listed metrics from the evaluator object it is called on

add_models(models, show_full_name)

Adds the listed models to the evaluator object

del_models(names)

Removes the listed models from the evaluator object

show_in_notebook(plots, use_training_data, perfect, baseline, legend_labels)

Visualize evalutation plots in the notebook

calculate_cost(tn_weight, fp_weight, fn_weight, tp_weight, use_training_data)

Returns a cost associated with the input weights

Creates an ads evaluator object.

Parameters:
  • test_data (ads.common.data.ADSData instance) – Test data to evaluate model on. The object can be built using ADSData.build().

  • models (list[ads.common.model.ADSModel]) – The object can be built using ADSModel.from_estimator(). Maximum length of the list is 3

  • training_data (ads.common.data.ADSData instance, optional) – Training data to evaluate model on and compare metrics against test data. The object can be built using ADSData.build()

  • positive_class (str or int, optional) – The class to report metrics for binary dataset. If the target classes is True or False, positive_class will be set to True by default. If the dataset is multiclass or multilabel, this will be ignored.

  • legend_labels (dict, optional) – List of legend labels. Defaults to None. If legend_labels not specified class names will be used for plots.

  • show_full_name (bool, optional) – Show the name of the evaluator object. Defaults to False.

  • classes (List or None, optional) – A List of the possible labels for y, when evaluating a classification use case

  • classification_threshold (int, defaults to 50) – The maximum number of unique values that y must have to qualify as classification. If this threshold is exceeded, Evaluator assumes the model is regression.

Examples

>>> train, test = ds.train_test_split()
>>> model1 = MyModelClass1.train(train)
>>> model2 = MyModelClass2.train(train)
>>> evaluator = ADSEvaluator(test, [model1, model2])
>>> legend_labels={'class_0': 'one', 'class_1': 'two', 'class_2': 'three'}
>>> multi_evaluator = ADSEvaluator(test, models=[model1, model2],
...             legend_labels=legend_labels)
class EvaluationMetrics(ev_test, ev_train, use_training=False, less_is_more=None, precision=4)

Bases: object

Class holding evaluation metrics.

ev_test

evaluation test metrics

Type:

list

ev_train

evaluation training metrics

Type:

list

use_training

use training data

Type:

bool

less_is_more

metrics list

Type:

list

show_in_notebook()

Shows visualization metrics as a color coded table

DEFAULT_LABELS_MAP = {'accuracy': 'Accuracy', 'auc': 'ROC AUC', 'f1': 'F1', 'hamming_loss': 'Hamming distance', 'kappa_score_': "Cohen's kappa coefficient", 'precision': 'Precision', 'recall': 'Recall'}
property precision
show_in_notebook(labels={'accuracy': 'Accuracy', 'auc': 'ROC AUC', 'f1': 'F1', 'hamming_loss': 'Hamming distance', 'kappa_score_': "Cohen's kappa coefficient", 'precision': 'Precision', 'recall': 'Recall'})

Visualizes evaluation metrics as a color coded table.

Parameters:

labels (dictionary) – map printing specific labels for metrics display

Return type:

Nothing

Positive_Class_Names = ['yes', 'y', 't', 'true', '1']
add_metrics(funcs, names)

Adds the listed metrics to the evaluator object it is called on.

Parameters:
  • funcs (list) – The list of metrics to be added. This function will be provided y_true and y_pred, the true and predicted values for each model.

  • names (list[str])) – The list of metric names corresponding to the functions.

Return type:

Nothing

Examples

>>> def f1(y_true, y_pred):
...    return np.max(y_true - y_pred)
>>> evaluator = ADSEvaluator(test, [model1, model2])
>>> evaluator.add_metrics([f1], ['Max Residual'])
>>> evaluator.metrics
Output table will include the desired metric
add_models(models, show_full_name=False)

Adds the listed models to the evaluator object it is called on.

Parameters:
  • models (list[ADSModel]) – The list of models to be added

  • show_full_name (bool, optional) – Whether to show the full model name. Defaults to False. ** NOT USED **

Return type:

Nothing

Examples

>>> evaluator = ADSEvaluator(test, [model1, model2])
>>> evaluator.add_models("model3])
calculate_cost(tn_weight, fp_weight, fn_weight, tp_weight, use_training_data=False)

Returns a cost associated with the input weights.

Parameters:
  • tn_weight (int, float) – The weight to assign true negatives in calculating the cost

  • fp_weight (int, float) – The weight to assign false positives in calculating the cost

  • fn_weight (int, float) – The weight to assign false negatives in calculating the cost

  • tp_weight (int, float) – The weight to assign true positives in calculating the cost

  • use_training_data (bool, optional) – Use training data to pull the metrics. Defaults to False

Returns:

DataFrame with the cost calculated for each model

Return type:

pandas.DataFrame

Examples

>>> evaluator = ADSEvaluator(test, [model1, model2])
>>> costs_table = evaluator.calculate_cost(0, 10, 1000, 0)
del_metrics(names)

Removes the listed metrics from the evaluator object it is called on.

Parameters:

names (list[str]) – The list of names of metrics to be deleted. Names can be found by calling evaluator.test_evaluations.index.

Returns:

None

Return type:

None

Examples

>>> evaluator = ADSEvaluator(test, [model1, model2])
>>> evaluator.del_metrics(['mse])
>>> evaluator.metrics
Output table will exclude the desired metric
del_models(names)

Removes the listed models from the evaluator object it is called on.

Parameters:

names (list[str]) – the list of models to be delete. Names are the model names by default, and assigned internally when conflicts exist. Actual names can be found using evaluator.test_evaluations.columns

Return type:

Nothing

Examples

>>> model3.rename("model3")
>>> evaluator = ADSEvaluator(test, [model1, model2, model3])
>>> evaluator.del_models([model3])
property metrics

Returns evaluation metrics

Returns:

HTML representation of a table comparing relevant metrics.

Return type:

metrics

Examples

>>> evaluator = ADSEvaluator(test, [model1, model2])
>>> evaluator.metrics
Outputs table displaying metrics.
property raw_metrics

Returns the raw metric numbers

Parameters:
  • metrics (list, optional) – Request metrics to pull. Defaults to all.

  • use_training_data (bool, optional) – Use training data to pull metrics. Defaults to False

Returns:

The requested raw metrics for each model. If metrics is None return all.

Return type:

dict

Examples

>>> evaluator = ADSEvaluator(test, [model1, model2])
>>> raw_metrics_dictionary = evaluator.raw_metrics()
show_in_notebook(plots=None, use_training_data=False, perfect=False, baseline=True, legend_labels=None)

Visualize evaluation plots.

Parameters:
  • plots (list, optional) –

    Filter the plots that are displayed. Defaults to None. The name of the plots are as below:

    • regression - residuals_qq, residuals_vs_fitted

    • binary classification - normalized_confusion_matrix, roc_curve, pr_curve

    • multi class classification - normalized_confusion_matrix, precision_by_label, recall_by_label, f1_by_label

  • use_training_data (bool, optional) – Use training data to generate plots. Defaults to False. By default, this method uses test data to generate plots

  • legend_labels (dict, optional) – Rename legend labels, that used for multi class classification plots. Defaults to None. legend_labels dict keys are the same as class names. legend_labels dict values are strings. If legend_labels not specified class names will be used for plots.

Returns:

Nothing. Outputs several evaluation plots as specified by plots.

Return type:

None

Examples

>>> evaluator = ADSEvaluator(test, [model1, model2])
>>> evaluator.show_in_notebook()
>>> legend_labels={'class_0': 'green', 'class_1': 'yellow', 'class_2': 'red'}
>>> multi_evaluator = ADSEvaluator(test, [model1, model2],
...             legend_labels=legend_labels)
>>> multi_evaluator.show_in_notebook(plots=["normalized_confusion_matrix",
...             "precision_by_label", "recall_by_label", "f1_by_label"])
class ads.evaluations.evaluator.Evaluator(models: List[GenericModel], X: Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]], y: Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]], y_preds: Optional[List[Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]]] = None, y_scores: Optional[List[Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]]] = None, X_train: Optional[Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]] = None, y_train: Optional[Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]] = None, classes: Optional[List] = None, positive_class: Optional[str] = None, legend_labels: Optional[dict] = None, use_case_type: Optional[UseCaseType] = None)

Bases: object

BETA FEATURE Evaluator is the new and preferred way to evaluate a model of list of models. It contains a superset of the features of the soon-to-be-deprecated ADSEvaluator.

display()

Shows all plots and metrics within the jupyter notebook.

html()

Returns the raw string of the html report

save(filename)

Saves the html report to the provided file location.

add_model(model)

Adds a model to the existsing report. See documentation for more details.

add_metric(metric_fn)

Adds a metric to the existsing report. See documentation for more details.

add_plot(plotting_fn)

Adds a plot to the existing report. See documentation for more details.

Creates an ads evaluator object.

Parameters:
  • models (ads.model.GenericModel instance) – Test data to evaluate model on. The object can be built using from one of the framworks supported in ads.model.framework

  • X (DataFrame-like) – The data used to make a prediction. Can be set to None if y_preds is given. (And y_scores for more thorough analysis).

  • y (array-like) – The true values corresponding to the input data

  • y_preds (list of array-like, optional) – The predictions from each model in the same order as the models

  • y_scores (list of array-like, optional) – The predict_probas from each model in the same order as the models

  • X_train (DataFrame-like, optional) – The data used to train the model

  • y_train (array-like, optional) – The true values corresponding to the input training data

  • positive_class (str or int, optional) – The class to report metrics for binary dataset. If the target classes is True or False, positive_class will be set to True by default. If the dataset is multiclass or multilabel, this will be ignored.

  • legend_labels (dict, optional) – List of legend labels. Defaults to None. If legend_labels not specified class names will be used for plots.

  • classes (List or None, optional) – A List of the possible labels for y, when evaluating a classification use case

  • use_case_type (str, optional) – The type of problem this model is solving. This can be set during prepare(). Examples: “binary_classification”, “regression”, “multinomial_classification” Full list of supported types can be found here: ads.common.model_metadata.UseCaseType

Examples

>>> import tempfile
>>> from ads.evaluations.evaluator import Evaluator
>>> from sklearn.tree import DecisionTreeClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from ads.model.framework.sklearn_model import SklearnModel
>>> from ads.common.model_metadata import UseCaseType
>>>
>>> X, y = make_classification(n_samples=1000)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
>>> est = DecisionTreeClassifier().fit(X_train, y_train)
>>> model = SklearnModel(estimator=est, artifact_dir=tempfile.mkdtemp())
>>> model.prepare(
        inference_conda_env="generalml_p38_cpu_v1",
        training_conda_env="generalml_p38_cpu_v1",
        X_sample=X_test,
        y_sample=y_test,
        use_case_type=UseCaseType.BINARY_CLASSIFICATION,
    )
>>> report = Evaluator([my_model], X=X_test, y=y_test)
>>> report.display()
add_models(models: List[GenericModel], y_preds: Optional[List[Any]] = None, y_scores: Optional[List[Any]] = None)

Add a model to an existing Evaluator to avoid re-calculating the values.

Parameters:
  • models (List[ads.model.GenericModel]) – Test data to evaluate model on. The object can be built using from one of the framworks supported in ads.model.framework

  • y_preds (list of array-like, optional) – The predictions from each model in the same order as the models

  • y_scores (list of array-like, optional) – The predict_probas from each model in the same order as the models

Return type:

self

Examples

>>> evaluator = Evaluator(models = [model1, model2], X=X, y=y)
>>> evaluator.add_models(models = [model3])
display(plots=None, perfect=False, baseline=True, legend_labels=None, precision=4, metrics_labels=None)

Visualize evaluation report.

Parameters:
  • plots (list, optional) –

    Filter the plots that are displayed. Defaults to None. The name of the plots are as below:

    • regression - residuals_qq, residuals_vs_fitted

    • binary classification - normalized_confusion_matrix, roc_curve, pr_curve

    • multi class classification - normalized_confusion_matrix, precision_by_label, recall_by_label, f1_by_label

  • perfect (bool, optional (default False)) – If True, will show how a perfect classifier would perform.

  • baseline (bool, optional (default True)) – If True, will show how a random classifier would perform.

  • legend_labels (dict, optional) – Rename legend labels, that used for multi class classification plots. Defaults to None. legend_labels dict keys are the same as class names. legend_labels dict values are strings. If legend_labels not specified class names will be used for plots.

  • precision (int, optional (default 4)) – The number of decimal points to show for each score/loss value

  • metrics_labels (List, optional) – The metrics that should be included in the html table.

Returns:

Nothing. Outputs several evaluation plots as specified by plots.

Return type:

None

Examples

>>> evaluator = Evaluator(models=[model1, model2], X=X, y=y)
>>> evaluator.display()
>>> legend_labels={'class_0': 'green', 'class_1': 'yellow', 'class_2': 'red'}
>>> multi_evaluator = Evaluator(models=[model1, model2], X=X, y=y, legend_labels=legend_labels)
>>> multi_evaluator.display(plots=["normalized_confusion_matrix",
...             "precision_by_label", "recall_by_label", "f1_by_label"])
html(plots=None, perfect=False, baseline=True, legend_labels=None, precision=4, metrics_labels=None)

Get raw HTML report.

Parameters:
  • plots (list, optional) –

    Filter the plots that are displayed. Defaults to None. The name of the plots are as below:

    • regression - residuals_qq, residuals_vs_fitted

    • binary classification - normalized_confusion_matrix, roc_curve, pr_curve

    • multi class classification - normalized_confusion_matrix, precision_by_label, recall_by_label, f1_by_label

  • perfect (bool, optional (default False)) – If True, will show how a perfect classifier would perform.

  • baseline (bool, optional (default True)) – If True, will show how a random classifier would perform.

  • legend_labels (dict, optional) – Rename legend labels, that used for multi class classification plots. Defaults to None. legend_labels dict keys are the same as class names. legend_labels dict values are strings. If legend_labels not specified class names will be used for plots.

  • precision (int, optional (default 4)) – The number of decimal points to show for each score/loss value

  • metrics_labels (List, optional) – The metrics that should be included in the html table.

Returns:

Nothing. Outputs several evaluation plots as specified by plots.

Return type:

None

Examples

>>> evaluator = Evaluator(models=[model1, model2], X=X, y=y)
>>> raw_html = evaluator.html()
save(filename: str, **kwargs)

Save HTML report.

Parameters:
  • filename (str) – The name and path of where to save the html report.

  • plots (list, optional) –

    Filter the plots that are displayed. Defaults to None. The name of the plots are as below:

    • regression - residuals_qq, residuals_vs_fitted

    • binary classification - normalized_confusion_matrix, roc_curve, pr_curve

    • multi class classification - normalized_confusion_matrix, precision_by_label, recall_by_label, f1_by_label

  • perfect (bool, optional (default False)) – If True, will show how a perfect classifier would perform.

  • baseline (bool, optional (default True)) – If True, will show how a random classifier would perform.

  • legend_labels (dict, optional) – Rename legend labels, that used for multi class classification plots. Defaults to None. legend_labels dict keys are the same as class names. legend_labels dict values are strings. If legend_labels not specified class names will be used for plots.

  • precision (int, optional (default 4)) – The number of decimal points to show for each score/loss value

  • metrics_labels (List, optional) – The metrics that should be included in the html table.

Returns:

Nothing. Outputs several evaluation plots as specified by plots.

Return type:

None

Examples

>>> evaluator = Evaluator(models=[model1, model2], X=X, y=y)
>>> evaluator.save("report.html")

ads.evaluations.statistical_metrics module

class ads.evaluations.statistical_metrics.ModelEvaluator(y_true, y_pred, model_name, classes=None, positive_class=None, y_score=None)

Bases: object

ModelEvaluator takes in the true and predicted values and returns a pandas dataframe

y_true
Type:

array-like object holding the true values for the model

y_pred
Type:

array-like object holding the predicted values for the model

model_name(str)
Type:

the name of the model

classes(list)
Type:

list of target classes

positive_class(str)
Type:

label for positive outcome from model

y_score
Type:

array-like object holding the scores for true values for the model

metrics(dict)
Type:

dictionary object holding model data

get_metrics()

Gets the metrics information in a dataframe based on the number of classes

safe_metrics_call(scoring_functions, \*args)

Applies sklearn scoring functions to parameters in args

get_metrics()

Gets the metrics information in a dataframe based on the number of classes

Parameters:

self ((ModelEvaluator instance)) – The ModelEvaluator instance with the metrics.

Returns:

Pandas dataframe containing the metrics

Return type:

pandas.DataFrame

safe_metrics_call(scoring_functions, *args, **kwargs)

Applies the sklearn function in scoring_functions to parameters in args.

Parameters:
  • scoring_functions ((dict)) – Scoring functions dictionary

  • args ((keyword arguments)) – Arguments passed to the sklearn function from metrics

Returns:

Nothing

Raises:

Exception – If an error is enountered applying the sklearn function fn to arguments.

Module contents

class ads.evaluations.EvaluatorMixin

Bases: object

evaluate(X: Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]], y: Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]], y_pred: Optional[Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]] = None, y_score: Optional[Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]] = None, X_train: Optional[Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]] = None, y_train: Optional[Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], _SupportsArray[dtype], Sequence[_SupportsArray[dtype]], Sequence[Sequence[_SupportsArray[dtype]]], Sequence[Sequence[Sequence[_SupportsArray[dtype]]]], Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]] = None, classes: Optional[List] = None, positive_class: Optional[str] = None, legend_labels: Optional[dict] = None, perfect: bool = True, filename: Optional[str] = None, use_case_type: Optional[str] = None)

Creates an ads evaluation report.

Parameters:
  • X (DataFrame-like) – The data used to make a prediction. Can be set to None if y_preds is given. (And y_scores for more thorough analysis).

  • y (array-like) – The true values corresponding to the input data

  • y_pred (array-like, optional) – The predictions from each model in the same order as the models

  • y_score (array-like, optional) – The predict_probas from each model in the same order as the models

  • X_train (DataFrame-like, optional) – The data used to train the model

  • y_train (array-like, optional) – The true values corresponding to the input training data

  • classes (List or None, optional) – A List of the possible labels for y, when evaluating a classification use case

  • positive_class (str or int, optional) – The class to report metrics for binary dataset. If the target classes is True or False, positive_class will be set to True by default. If the dataset is multiclass or multilabel, this will be ignored.

  • legend_labels (dict, optional) – List of legend labels. Defaults to None. If legend_labels not specified class names will be used for plots.

  • use_case_type (str, optional) – The type of problem this model is solving. This can be set during prepare(). Examples: “binary_classification”, “regression”, “multinomial_classification” Full list of supported types can be found here: ads.common.model_metadata.UseCaseType

  • filename (str, optional) – If filename is given, the html report will be saved to the location specified.

Examples

>>> import tempfile
>>> from ads.evaluations.evaluator import Evaluator
>>> from sklearn.tree import DecisionTreeClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from ads.model.framework.sklearn_model import SklearnModel
>>> from ads.common.model_metadata import UseCaseType
>>>
>>> X, y = make_classification(n_samples=1000)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
>>> est = DecisionTreeClassifier().fit(X_train, y_train)
>>> model = SklearnModel(estimator=est, artifact_dir=tempfile.mkdtemp())
>>> model.prepare(
        inference_conda_env="generalml_p38_cpu_v1",
        training_conda_env="generalml_p38_cpu_v1",
        X_sample=X_test,
        y_sample=y_test,
        use_case_type=UseCaseType.BINARY_CLASSIFICATION,
    )
>>> model.evaluate(X_test, y_test, filename="report.html")