ads.feature_engineering.accessor.mixin package¶
Submodules¶
ads.feature_engineering.accessor.mixin.correlation module¶
- ads.feature_engineering.accessor.mixin.correlation.cat_vs_cat(df: DataFrame, normal_form: bool = True) DataFrame [source]¶
Calculates the correlation of all pairs of categorical features and categorical features.
ads.feature_engineering.accessor.mixin.eda_mixin module¶
This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Dataframe. The series of purpose-driven methods enable the data scientist to complete analysis on the dataframe.
From the accessor we have access to the pandas object the user is interacting with as well as corresponding lists of feature types per column.
- class ads.feature_engineering.accessor.mixin.eda_mixin.EDAMixin[source]¶
Bases:
object
- correlation_ratio() DataFrame [source]¶
Generate a Correlation Ratio data frame for all categorical-continuous variable pairs.
- Returns:
pandas.DataFrame
Correlation Ratio correlation data frame with the following 3 columns –
Column 1 (name of the first categorical/continuous column)
Column 2 (name of the second categorical/continuous column)
Value (correlation value)
Note
Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.
- correlation_ratio_plot() Axes [source]¶
Generate a heatmap of the Correlation Ratio correlation for all categorical-continuous variable pairs.
- Returns:
Correlation Ratio correlation plot object that can be updated by the customer
- Return type:
Plot object
- cramersv() DataFrame [source]¶
Generate a Cramer’s V correlation data frame for all categorical variable pairs.
Gives a warning for dropped non-categorical columns.
- Returns:
- Cramer’s V correlation data frame with the following 3 columns:
Column 1 (name of the first categorical column)
Column 2 (name of the second categorical column)
Value (correlation value)
- Return type:
pandas.DataFrame
Note
Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.
- cramersv_plot() Axes [source]¶
Generate a heatmap of the Cramer’s V correlation for all categorical variable pairs.
Gives a warning for dropped non-categorical columns.
- Returns:
Cramer’s V correlation plot object that can be updated by the customer
- Return type:
Plot object
- feature_count() DataFrame [source]¶
Counts the number of columns for each feature type and each primary feature. The column of primary is the number of primary feature types that is assigned to the column.
- Returns:
The number of columns for each feature type The number of columns for each primary feature
- Return type:
Dataframe with
Examples
>>> df.ads.feature_type {'PassengerId': ['ordinal', 'category'], 'Survived': ['ordinal'], 'Pclass': ['ordinal'], 'Name': ['category'], 'Sex': ['category']} >>> df.ads.feature_count() Feature Type Count Primary 0 category 3 2 1 ordinal 3 3
- feature_plot() DataFrame [source]¶
For every column in the dataframe plot generate a list of summary plots based on the most relevant feature type.
- Returns:
Dataframe with 2 columns: 1. Column - feature name 2. Plot - plot object
- Return type:
pandas.DataFrame
- feature_stat() DataFrame [source]¶
Summary statistics Dataframe provided.
This returns feature stats on each column using FeatureType summary method.
Examples
>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv') >>> df.ads.feature_stat().head() Column Metric Value 0 PassengerId count 891.000 1 PassengerId mean 446.000 2 PassengerId standard deviation 257.354 3 PassengerId sample minimum 1.000 4 PassengerId lower quartile 223.500
- Returns:
Dataframe with 3 columns: name, metric, value
- Return type:
pandas.DataFrame
- pearson() DataFrame [source]¶
Generate a Pearson correlation data frame for all continuous variable pairs.
Gives a warning for dropped non-numerical columns.
- Returns:
pandas.DataFrame
Pearson correlation data frame with the following 3 columns –
Column 1 (name of the first continuous column)
Column 2 (name of the second continuous column)
Value (correlation value)
Note
Pairs will be replicated. For example for variables x and y, we’d have (x,y), (y,x) both with same correlation value. We’ll also have (x,x) and (y,y) with value 1.0.
- pearson_plot() Axes [source]¶
Generate a heatmap of the Pearson correlation for all continuous variable pairs.
- Returns:
Pearson correlation plot object that can be updated by the customer
- Return type:
Plot object
- warning() DataFrame [source]¶
Generates a data frame that lists feature specific warnings.
- Returns:
The list of feature specific warnings.
- Return type:
pandas.DataFrame
Examples
>>> df.ads.warning() Column Feature Type Warning Message Metric Value -------------------------------------------------------------------------------------- 0 Age continuous Zeros Age has 38 zeros Count 38 1 Age continuous Zeros Age has 12.2% zeros Percentage 12.2%
ads.feature_engineering.accessor.mixin.eda_mixin_series module¶
This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Series. The series of purpose-driven methods enable the data scientist to complete univariate analysis.
From the accessor we have access to the pandas object the user is interacting with as well as corresponding list of feature types.
- class ads.feature_engineering.accessor.mixin.eda_mixin_series.EDAMixinSeries[source]¶
Bases:
object
- feature_plot() Axes [source]¶
For the series generate a summary plot based on the most relevant feature type.
- Returns:
Plot object for the series based on the most relevant feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- feature_stat() DataFrame [source]¶
Summary statistics Dataframe provided.
This returns feature stats on series using FeatureType summary method.
Examples
>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv') >>> df['Cabin'].ads.feature_stat() Metric Value 0 count 891 1 unqiue 147 2 missing 687
- Returns:
Dataframe with 2 columns and rows for different metric values
- Return type:
pandas.DataFrame
- warning() DataFrame [source]¶
Generates a data frame that lists feature specific warnings.
- Returns:
The list of feature specific warnings.
- Return type:
pandas.DataFrame
Examples
>>> df["Age"].ads.warning() Feature Type Warning Message Metric Value --------------------------------------------------------------------------- 0 continuous Zeros Age has 38 zeros Count 38 1 continuous Zeros Age has 12.2% zeros Percentage 12.2%
ads.feature_engineering.accessor.mixin.feature_types_mixin module¶
The module that represents the ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.
Classes¶
- ADSFeatureTypesMixin
ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.
- class ads.feature_engineering.accessor.mixin.feature_types_mixin.ADSFeatureTypesMixin[source]¶
Bases:
object
ADS Feature Types Mixin class that extends Pandas Series and DataFrame accessors.
- warning_registered(cls) pd.DataFrame [source]¶
Lists registered warnings for registered feature types.
- validator_registered(cls) pd.DataFrame [source]¶
Lists registered validators for registered feature types.
- help(self, prop: str = None) None [source]¶
Help method that prints either a table of available properties or, given a property, returns its docstring.
- help(prop: str | None = None) None [source]¶
Help method that prints either a table of available properties or, given an individual property, returns its docstring.
- Parameters:
prop (str) – The Name of property.
- Returns:
Nothing.
- Return type:
None
- validator_registered() DataFrame [source]¶
Lists registered validators for registered feature types.
- Returns:
The list of registered validators for registered feature types
- Return type:
pandas.DataFrame
Examples
>>> df.ads.validator_registered() Column Feature Type Validator Condition Handler ------------------------------------------------------------------------------------------------------ 0 PhoneNumber phone_number is_phone_number () default_handler 1 PhoneNumber phone_number is_phone_number {'country_code': '+7'} specific_country_handler 2 CreditCard credit_card is_credit_card () default_handler
>>> df['PhoneNumber'].ads.validator_registered() Feature Type Validator Condition Handler ------------------------------------------------------------------------------------------- 0 phone_number is_phone_number () default_handler 1 phone_number is_phone_number {'country_code': '+7'} specific_country_handler
- warning_registered() DataFrame [source]¶
Lists registered warnings for all registered feature types.
- Returns:
The list of registered warnings for registered feature types.
- Return type:
pandas.DataFrame
Examples
>>> df.ads.warning_registered() Column Feature Type Warning Handler ------------------------------------------------------------------------- 0 Age continuous zeros zeros_handler 1 Age continuous high_cardinality high_cardinality_handler
>>> df["Age"].ads.warning_registered() Feature Type Warning Handler --------------------------------------------------------------- 0 continuous zeros zeros_handler 1 continuous high_cardinality high_cardinality_handler