ads.evaluations package¶
Submodules¶
ads.evaluations.evaluation_plot module¶
- class ads.evaluations.evaluation_plot.EvaluationPlot[source]¶
Bases:
object
EvaluationPlot holds data and methods for plots and it used to output them
- baseline(bool)¶
whether to plot the null model or zero information model
- baseline_kwargs(dict)¶
keyword arguments for the baseline plot
- color_wheel(dict)¶
color information used by the plot
- font_sz(dict)¶
dictionary of plot methods
- perfect(bool)¶
determines whether a “perfect” classifier curve is displayed
- perfect_kwargs(dict)¶
parameters for the perfect classifier for precision/recall curves
- prob_type(str)¶
model type, i.e. classification or regression
- plot(evaluation, plots, num_classes, perfect, baseline, legend_labels)[source]¶
Generates the evalation plot
- baseline = None¶
- baseline_kwargs = {'c': '.2', 'ls': '--'}¶
- color_wheel = ['teal', 'blueviolet', 'forestgreen', 'peru', 'y', 'dodgerblue', 'r']¶
- double_overlay_plots = ['pr_and_roc_curve', 'lift_and_gain_chart']¶
- font_sz = {'l': 14, 'm': 12, 's': 10, 'xl': 16, 'xs': 8}¶
- classmethod get_legend_labels(legend_labels)[source]¶
Gets the legend labels, resolves any conflicts such as length, and renders the labels for the plot
- Parameters:
(dict) (legend_labels) – key/value dictionary containing legend label data
- Return type:
Nothing
Examples
EvaluationPlot.get_legend_labels({‘class_0’: ‘green’, ‘class_1’: ‘yellow’, ‘class_2’: ‘red’})
- perfect = None¶
- perfect_kwargs = {'color': 'gold', 'label': 'Perfect Classifier', 'ls': '--'}¶
- classmethod plot(evaluation, plots, num_classes, perfect=False, baseline=True, legend_labels=None)[source]¶
Generates the evaluation plot
- Parameters:
(DataFrame) (evaluation) – DataFrame with models as columns and metrics as rows.
(str) (plots) – The plot type based on class attribute prob_type.
(int) (num_classes) – The number of classes for the model.
(bool (baseline) – Whether to display the curve of a perfect classifier. Default value is False.
optional) – Whether to display the curve of a perfect classifier. Default value is False.
(bool – Whether to display the curve of the baseline, featureless model. Default value is True.
optional) – Whether to display the curve of the baseline, featureless model. Default value is True.
(dict (legend_labels) – Legend labels dictionary. Default value is None. If legend_labels not specified class names will be used for plots.
optional) – Legend labels dictionary. Default value is None. If legend_labels not specified class names will be used for plots.
- Return type:
Nothing
- prob_type = None¶
- single_overlay_plots = ['lift_chart', 'gain_chart', 'roc_curve', 'pr_curve']¶
ads.evaluations.evaluator module¶
- class ads.evaluations.evaluator.ADSEvaluator(test_data, models, training_data=None, positive_class=None, legend_labels=None, show_full_name=False, classes=None, classification_threshold=50)[source]¶
Bases:
object
ADS Evaluator class. This class holds field and methods for creating and using ADS evaluator objects.
- is_classifier¶
Whether the dataset looks like a classification problem (versus regression).
- Type:
- models¶
The object built using ADSModel.from_estimator().
- Type:
- positive_class¶
The class to report metrics for binary dataset, assumed to be true.
- test_data¶
Test data to evaluate model on.
- Type:
- training_data¶
Training data to evaluate model.
- Type:
- show_in_notebook(plots, use_training_data, perfect, baseline, legend_labels)[source]¶
Visualize evalutation plots in the notebook
- calculate_cost(tn_weight, fp_weight, fn_weight, tp_weight, use_training_data)[source]¶
Returns a cost associated with the input weights
Creates an ads evaluator object.
- Parameters:
test_data (ads.common.data.ADSData instance) – Test data to evaluate model on. The object can be built using ADSData.build().
models (list[ads.common.model.ADSModel]) – The object can be built using ADSModel.from_estimator(). Maximum length of the list is 3
training_data (ads.common.data.ADSData instance, optional) – Training data to evaluate model on and compare metrics against test data. The object can be built using ADSData.build()
positive_class (str or int, optional) – The class to report metrics for binary dataset. If the target classes is True or False, positive_class will be set to True by default. If the dataset is multiclass or multilabel, this will be ignored.
legend_labels (dict, optional) – List of legend labels. Defaults to None. If legend_labels not specified class names will be used for plots.
show_full_name (bool, optional) – Show the name of the evaluator object. Defaults to False.
classes (List or None, optional) – A List of the possible labels for y, when evaluating a classification use case
classification_threshold (int, defaults to 50) – The maximum number of unique values that y must have to qualify as classification. If this threshold is exceeded, Evaluator assumes the model is regression.
Examples
>>> train, test = ds.train_test_split() >>> model1 = MyModelClass1.train(train) >>> model2 = MyModelClass2.train(train) >>> evaluator = ADSEvaluator(test, [model1, model2])
>>> legend_labels={'class_0': 'one', 'class_1': 'two', 'class_2': 'three'} >>> multi_evaluator = ADSEvaluator(test, models=[model1, model2], ... legend_labels=legend_labels)
- class EvaluationMetrics(ev_test, ev_train, use_training=False, less_is_more=None, precision=4)[source]¶
Bases:
object
Class holding evaluation metrics.
- DEFAULT_LABELS_MAP = {'accuracy': 'Accuracy', 'auc': 'ROC AUC', 'f1': 'F1', 'hamming_loss': 'Hamming distance', 'kappa_score_': "Cohen's kappa coefficient", 'precision': 'Precision', 'recall': 'Recall'}¶
- property precision¶
- show_in_notebook(labels={'accuracy': 'Accuracy', 'auc': 'ROC AUC', 'f1': 'F1', 'hamming_loss': 'Hamming distance', 'kappa_score_': "Cohen's kappa coefficient", 'precision': 'Precision', 'recall': 'Recall'})[source]¶
Visualizes evaluation metrics as a color coded table.
- Parameters:
labels (dictionary) – map printing specific labels for metrics display
- Return type:
Nothing
- Positive_Class_Names = ['yes', 'y', 't', 'true', '1']¶
- add_metrics(funcs, names)[source]¶
Adds the listed metrics to the evaluator object it is called on.
- Parameters:
- Return type:
Nothing
Examples
>>> def f1(y_true, y_pred): ... return np.max(y_true - y_pred) >>> evaluator = ADSEvaluator(test, [model1, model2]) >>> evaluator.add_metrics([f1], ['Max Residual']) >>> evaluator.metrics Output table will include the desired metric
- add_models(models, show_full_name=False)[source]¶
Adds the listed models to the evaluator object it is called on.
- Parameters:
- Return type:
Nothing
Examples
>>> evaluator = ADSEvaluator(test, [model1, model2]) >>> evaluator.add_models("model3])
- calculate_cost(tn_weight, fp_weight, fn_weight, tp_weight, use_training_data=False)[source]¶
Returns a cost associated with the input weights.
- Parameters:
tn_weight (int, float) – The weight to assign true negatives in calculating the cost
fp_weight (int, float) – The weight to assign false positives in calculating the cost
fn_weight (int, float) – The weight to assign false negatives in calculating the cost
tp_weight (int, float) – The weight to assign true positives in calculating the cost
use_training_data (bool, optional) – Use training data to pull the metrics. Defaults to False
- Returns:
DataFrame with the cost calculated for each model
- Return type:
pandas.DataFrame
Examples
>>> evaluator = ADSEvaluator(test, [model1, model2]) >>> costs_table = evaluator.calculate_cost(0, 10, 1000, 0)
- del_metrics(names)[source]¶
Removes the listed metrics from the evaluator object it is called on.
- Parameters:
names (list[str]) – The list of names of metrics to be deleted. Names can be found by calling evaluator.test_evaluations.index.
- Returns:
None
- Return type:
None
Examples
>>> evaluator = ADSEvaluator(test, [model1, model2]) >>> evaluator.del_metrics(['mse]) >>> evaluator.metrics Output table will exclude the desired metric
- del_models(names)[source]¶
Removes the listed models from the evaluator object it is called on.
- Parameters:
names (list[str]) – the list of models to be delete. Names are the model names by default, and assigned internally when conflicts exist. Actual names can be found using evaluator.test_evaluations.columns
- Return type:
Nothing
Examples
>>> model3.rename("model3") >>> evaluator = ADSEvaluator(test, [model1, model2, model3]) >>> evaluator.del_models([model3])
- property metrics¶
Returns evaluation metrics
- Returns:
HTML representation of a table comparing relevant metrics.
- Return type:
metrics
Examples
>>> evaluator = ADSEvaluator(test, [model1, model2]) >>> evaluator.metrics Outputs table displaying metrics.
- property raw_metrics¶
Returns the raw metric numbers
- Parameters:
- Returns:
The requested raw metrics for each model. If metrics is None return all.
- Return type:
Examples
>>> evaluator = ADSEvaluator(test, [model1, model2]) >>> raw_metrics_dictionary = evaluator.raw_metrics()
- show_in_notebook(plots=None, use_training_data=False, perfect=False, baseline=True, legend_labels=None)[source]¶
Visualize evaluation plots.
- Parameters:
plots (list, optional) –
Filter the plots that are displayed. Defaults to None. The name of the plots are as below:
regression - residuals_qq, residuals_vs_fitted
binary classification - normalized_confusion_matrix, roc_curve, pr_curve
multi class classification - normalized_confusion_matrix, precision_by_label, recall_by_label, f1_by_label
use_training_data (bool, optional) – Use training data to generate plots. Defaults to False. By default, this method uses test data to generate plots
legend_labels (dict, optional) – Rename legend labels, that used for multi class classification plots. Defaults to None. legend_labels dict keys are the same as class names. legend_labels dict values are strings. If legend_labels not specified class names will be used for plots.
- Returns:
Nothing. Outputs several evaluation plots as specified by plots.
- Return type:
None
Examples
>>> evaluator = ADSEvaluator(test, [model1, model2]) >>> evaluator.show_in_notebook()
>>> legend_labels={'class_0': 'green', 'class_1': 'yellow', 'class_2': 'red'} >>> multi_evaluator = ADSEvaluator(test, [model1, model2], ... legend_labels=legend_labels) >>> multi_evaluator.show_in_notebook(plots=["normalized_confusion_matrix", ... "precision_by_label", "recall_by_label", "f1_by_label"])
- class ads.evaluations.evaluator.Evaluator(models: List[GenericModel], X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_preds: List[_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]] | None = None, y_scores: List[_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]] | None = None, X_train: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, y_train: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, classes: List | None = None, positive_class: str | None = None, legend_labels: dict | None = None, use_case_type: UseCaseType | None = None)[source]¶
Bases:
object
BETA FEATURE Evaluator is the new and preferred way to evaluate a model of list of models. It contains a superset of the features of the soon-to-be-deprecated ADSEvaluator.
- add_model(model)¶
Adds a model to the existsing report. See documentation for more details.
- add_metric(metric_fn)¶
Adds a metric to the existsing report. See documentation for more details.
- add_plot(plotting_fn)¶
Adds a plot to the existing report. See documentation for more details.
Creates an ads evaluator object.
- Parameters:
models (ads.model.GenericModel instance) – Test data to evaluate model on. The object can be built using from one of the framworks supported in ads.model.framework
X (DataFrame-like) – The data used to make a prediction. Can be set to None if y_preds is given. (And y_scores for more thorough analysis).
y (array-like) – The true values corresponding to the input data
y_preds (list of array-like, optional) – The predictions from each model in the same order as the models
y_scores (list of array-like, optional) – The predict_probas from each model in the same order as the models
X_train (DataFrame-like, optional) – The data used to train the model
y_train (array-like, optional) – The true values corresponding to the input training data
positive_class (str or int, optional) – The class to report metrics for binary dataset. If the target classes is True or False, positive_class will be set to True by default. If the dataset is multiclass or multilabel, this will be ignored.
legend_labels (dict, optional) – List of legend labels. Defaults to None. If legend_labels not specified class names will be used for plots.
classes (List or None, optional) – A List of the possible labels for y, when evaluating a classification use case
use_case_type (str, optional) – The type of problem this model is solving. This can be set during prepare(). Examples: “binary_classification”, “regression”, “multinomial_classification” Full list of supported types can be found here: ads.common.model_metadata.UseCaseType
Examples
>>> import tempfile >>> from ads.evaluations.evaluator import Evaluator >>> from sklearn.tree import DecisionTreeClassifier >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> from ads.model.framework.sklearn_model import SklearnModel >>> from ads.common.model_metadata import UseCaseType >>> >>> X, y = make_classification(n_samples=1000) >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) >>> est = DecisionTreeClassifier().fit(X_train, y_train) >>> model = SklearnModel(estimator=est, artifact_dir=tempfile.mkdtemp()) >>> model.prepare( inference_conda_env="generalml_p38_cpu_v1", training_conda_env="generalml_p38_cpu_v1", X_sample=X_test, y_sample=y_test, use_case_type=UseCaseType.BINARY_CLASSIFICATION, ) >>> report = Evaluator([my_model], X=X_test, y=y_test) >>> report.display()
- add_models(models: List[GenericModel], y_preds: List[Any] | None = None, y_scores: List[Any] | None = None)[source]¶
Add a model to an existing Evaluator to avoid re-calculating the values.
- Parameters:
models (List[ads.model.GenericModel]) – Test data to evaluate model on. The object can be built using from one of the framworks supported in ads.model.framework
y_preds (list of array-like, optional) – The predictions from each model in the same order as the models
y_scores (list of array-like, optional) – The predict_probas from each model in the same order as the models
- Return type:
self
Examples
>>> evaluator = Evaluator(models = [model1, model2], X=X, y=y) >>> evaluator.add_models(models = [model3])
- display(plots=None, perfect=False, baseline=True, legend_labels=None, precision=4, metrics_labels=None)[source]¶
Visualize evaluation report.
- Parameters:
plots (list, optional) –
Filter the plots that are displayed. Defaults to None. The name of the plots are as below:
regression - residuals_qq, residuals_vs_fitted
binary classification - normalized_confusion_matrix, roc_curve, pr_curve
multi class classification - normalized_confusion_matrix, precision_by_label, recall_by_label, f1_by_label
perfect (bool, optional (default False)) – If True, will show how a perfect classifier would perform.
baseline (bool, optional (default True)) – If True, will show how a random classifier would perform.
legend_labels (dict, optional) – Rename legend labels, that used for multi class classification plots. Defaults to None. legend_labels dict keys are the same as class names. legend_labels dict values are strings. If legend_labels not specified class names will be used for plots.
precision (int, optional (default 4)) – The number of decimal points to show for each score/loss value
metrics_labels (List, optional) – The metrics that should be included in the html table.
- Returns:
Nothing. Outputs several evaluation plots as specified by plots.
- Return type:
None
Examples
>>> evaluator = Evaluator(models=[model1, model2], X=X, y=y) >>> evaluator.display()
>>> legend_labels={'class_0': 'green', 'class_1': 'yellow', 'class_2': 'red'} >>> multi_evaluator = Evaluator(models=[model1, model2], X=X, y=y, legend_labels=legend_labels) >>> multi_evaluator.display(plots=["normalized_confusion_matrix", ... "precision_by_label", "recall_by_label", "f1_by_label"])
- html(plots=None, perfect=False, baseline=True, legend_labels=None, precision=4, metrics_labels=None)[source]¶
Get raw HTML report.
- Parameters:
plots (list, optional) –
Filter the plots that are displayed. Defaults to None. The name of the plots are as below:
regression - residuals_qq, residuals_vs_fitted
binary classification - normalized_confusion_matrix, roc_curve, pr_curve
multi class classification - normalized_confusion_matrix, precision_by_label, recall_by_label, f1_by_label
perfect (bool, optional (default False)) – If True, will show how a perfect classifier would perform.
baseline (bool, optional (default True)) – If True, will show how a random classifier would perform.
legend_labels (dict, optional) – Rename legend labels, that used for multi class classification plots. Defaults to None. legend_labels dict keys are the same as class names. legend_labels dict values are strings. If legend_labels not specified class names will be used for plots.
precision (int, optional (default 4)) – The number of decimal points to show for each score/loss value
metrics_labels (List, optional) – The metrics that should be included in the html table.
- Returns:
Nothing. Outputs several evaluation plots as specified by plots.
- Return type:
None
Examples
>>> evaluator = Evaluator(models=[model1, model2], X=X, y=y) >>> raw_html = evaluator.html()
- save(filename: str, **kwargs)[source]¶
Save HTML report.
- Parameters:
filename (str) – The name and path of where to save the html report.
plots (list, optional) –
Filter the plots that are displayed. Defaults to None. The name of the plots are as below:
regression - residuals_qq, residuals_vs_fitted
binary classification - normalized_confusion_matrix, roc_curve, pr_curve
multi class classification - normalized_confusion_matrix, precision_by_label, recall_by_label, f1_by_label
perfect (bool, optional (default False)) – If True, will show how a perfect classifier would perform.
baseline (bool, optional (default True)) – If True, will show how a random classifier would perform.
legend_labels (dict, optional) – Rename legend labels, that used for multi class classification plots. Defaults to None. legend_labels dict keys are the same as class names. legend_labels dict values are strings. If legend_labels not specified class names will be used for plots.
precision (int, optional (default 4)) – The number of decimal points to show for each score/loss value
metrics_labels (List, optional) – The metrics that should be included in the html table.
- Returns:
Nothing. Outputs several evaluation plots as specified by plots.
- Return type:
None
Examples
>>> evaluator = Evaluator(models=[model1, model2], X=X, y=y) >>> evaluator.save("report.html")
ads.evaluations.statistical_metrics module¶
- class ads.evaluations.statistical_metrics.ModelEvaluator(y_true, y_pred, model_name, classes=None, positive_class=None, y_score=None)[source]¶
Bases:
object
ModelEvaluator takes in the true and predicted values and returns a pandas dataframe
- y_true¶
- Type:
array-like object holding the true values for the model
- y_pred¶
- Type:
array-like object holding the predicted values for the model
- model_name(str)¶
- Type:
the name of the model
- positive_class(str)¶
- Type:
label for positive outcome from model
- y_score¶
- Type:
array-like object holding the scores for true values for the model
- metrics(dict)¶
- Type:
dictionary object holding model data
- safe_metrics_call(scoring_functions, \*args)[source]¶
Applies sklearn scoring functions to parameters in args
- get_metrics()[source]¶
Gets the metrics information in a dataframe based on the number of classes
- Parameters:
self ((ModelEvaluator instance)) – The ModelEvaluator instance with the metrics.
- Returns:
Pandas dataframe containing the metrics
- Return type:
pandas.DataFrame
Module contents¶
- class ads.evaluations.EvaluatorMixin[source]¶
Bases:
object
- evaluate(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, y_score: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, X_train: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, y_train: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, classes: List | None = None, positive_class: str | None = None, legend_labels: dict | None = None, perfect: bool = True, filename: str | None = None, use_case_type: str | None = None)[source]¶
Creates an ads evaluation report.
- Parameters:
X (DataFrame-like) – The data used to make a prediction. Can be set to None if y_preds is given. (And y_scores for more thorough analysis).
y (array-like) – The true values corresponding to the input data
y_pred (array-like, optional) – The predictions from each model in the same order as the models
y_score (array-like, optional) – The predict_probas from each model in the same order as the models
X_train (DataFrame-like, optional) – The data used to train the model
y_train (array-like, optional) – The true values corresponding to the input training data
classes (List or None, optional) – A List of the possible labels for y, when evaluating a classification use case
positive_class (str or int, optional) – The class to report metrics for binary dataset. If the target classes is True or False, positive_class will be set to True by default. If the dataset is multiclass or multilabel, this will be ignored.
legend_labels (dict, optional) – List of legend labels. Defaults to None. If legend_labels not specified class names will be used for plots.
use_case_type (str, optional) – The type of problem this model is solving. This can be set during prepare(). Examples: “binary_classification”, “regression”, “multinomial_classification” Full list of supported types can be found here: ads.common.model_metadata.UseCaseType
filename (str, optional) – If filename is given, the html report will be saved to the location specified.
Examples
>>> import tempfile >>> from ads.evaluations.evaluator import Evaluator >>> from sklearn.tree import DecisionTreeClassifier >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> from ads.model.framework.sklearn_model import SklearnModel >>> from ads.common.model_metadata import UseCaseType >>> >>> X, y = make_classification(n_samples=1000) >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) >>> est = DecisionTreeClassifier().fit(X_train, y_train) >>> model = SklearnModel(estimator=est, artifact_dir=tempfile.mkdtemp()) >>> model.prepare( inference_conda_env="generalml_p38_cpu_v1", training_conda_env="generalml_p38_cpu_v1", X_sample=X_test, y_sample=y_test, use_case_type=UseCaseType.BINARY_CLASSIFICATION, ) >>> model.evaluate(X_test, y_test, filename="report.html")