ads.feature_engineering package

Submodules

ads.feature_engineering.exceptions module

exception ads.feature_engineering.exceptions.InvalidFeatureType(tname: str)

Bases: TypeError

exception ads.feature_engineering.exceptions.NameAlreadyRegistered(name: str)

Bases: NameError

exception ads.feature_engineering.exceptions.TypeAlreadyAdded(tname: str)

Bases: TypeError

exception ads.feature_engineering.exceptions.TypeAlreadyRegistered(tname: str)

Bases: TypeError

exception ads.feature_engineering.exceptions.TypeNotFound(tname: str)

Bases: TypeError

exception ads.feature_engineering.exceptions.WarningAlreadyExists(name: str)

Bases: ValueError

exception ads.feature_engineering.exceptions.WarningNotFound(name: str)

Bases: ValueError

ads.feature_engineering.feature_type_manager module

The module that helps to manage feature types. Provides functionalities to register, unregister, list feature types.

Classes

FeatureTypeManager

Feature Types Manager class that manages feature types.

Examples

>>> from ads.feature_engineering.feature_type.base import FeatureType
>>> class NewType(FeatureType):
...    description="My personal type."
...    pass
>>> FeatureTypeManager.feature_type_register(NewType)
>>> FeatureTypeManager.feature_type_registered()
            Name        Feature Type                                  Description
---------------------------------------------------------------------------------
0     Continuous          continuous          Type representing continuous values.
1       DateTime           date_time           Type representing date and/or time.
2       Category            category  Type representing discrete unordered values.
3        Ordinal             ordinal             Type representing ordered values.
4        NewType            new_type                             My personal type.
>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler
>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler
>>> FeatureTypeManager.feature_type_unregister(NewType)
>>> FeatureTypeManager.feature_type_reset()
>>> FeatureTypeManager.feature_type_object('continuous')
Continuous
class ads.feature_engineering.feature_type_manager.FeatureTypeManager

Bases: object

Feature Types Manager class that manages feature types.

Provides functionalities to register, unregister, list feature types.

feature_type_object(cls, feature_type: Union[FeatureType, str]) FeatureType

Gets a feature type by class object or name.

feature_type_register(cls, feature_type_cls: FeatureType) None

Registers a feature type.

feature_type_unregister(cls, feature_type_cls: Union[FeatureType, str]) None

Unregisters a feature type.

feature_type_reset(cls) None

Resets feature types to be default.

feature_type_registered(cls) pd.DataFrame

Lists all registered feature types as a DataFrame.

warning_registered(cls) pd.DataFrame

Lists registered warnings for all registered feature types.

validator_registered(cls) pd.DataFrame

Lists registered validators for all registered feature types.

Examples

>>> from ads.feature_engineering.feature_type.base import FeatureType
>>> class NewType(FeatureType):
...    pass
>>> FeatureTypeManager.register_feature_type(NewType)
>>> FeatureTypeManager.feature_type_registered()
            Name      Feature Type                                  Description
-------------------------------------------------------------------------------
0     Continuous        continuous          Type representing continuous values.
1       DateTime         date_time           Type representing date and/or time.
2       Category          category  Type representing discrete unordered values.
3        Ordinal           ordinal             Type representing ordered values.
>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler
>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler
>>> FeatureTypeManager.feature_type_unregister(NewType)
>>> FeatureTypeManager.feature_type_reset()
>>> FeatureTypeManager.feature_type_object('continuous')
Continuous
classmethod feature_type_object(feature_type: Union[FeatureType, str]) FeatureType

Gets a feature type by class object or name.

Parameters:

feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.

Returns:

Found feature type.

Return type:

FeatureType

Raises:
  • TypeNotFound – If provided feature type not registered.

  • TypeError – If provided feature type not a subclass of FeatureType.

classmethod feature_type_register(feature_type_cls: FeatureType) None

Registers new feature type.

Parameters:

feature_type (FeatureType) – Subclass of FeatureType to be registered.

Returns:

Nothing.

Return type:

None

Raises:
  • TypeError – Type is not a subclass of FeatureType.

  • TypeError – Type has already been registered.

  • NameError – Name has already been used.

classmethod feature_type_registered() DataFrame

Lists all registered feature types as a DataFrame.

Returns:

The list of feature types in a DataFrame format.

Return type:

pd.DataFrame

classmethod feature_type_reset() None

Resets feature types to be default.

Returns:

Nothing.

Return type:

None

classmethod feature_type_unregister(feature_type: Union[FeatureType, str]) None

Unregisters a feature type.

Parameters:

feature_type ((FeatureType | str)) – The FeatureType subclass or a str indicating feature type.

Returns:

Nothing.

Return type:

None

Raises:

TypeError – In attempt to unregister a default feature type.

classmethod is_type_registered(feature_type: Union[FeatureType, str]) bool

Checks if provided feature type registered in the system.

Parameters:

feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.

Returns:

True if provided feature type registered, False otherwise.

Return type:

bool

classmethod validator_registered() DataFrame

Lists registered validators for registered feature types.

Returns:

The list of registered validators for registered feature types in a DataFrame format.

Return type:

pd.DataFrame

Examples

>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler
classmethod warning_registered() DataFrame

Lists registered warnings for all registered feature types.

Returns:

The list of registered warnings for registered feature types in a DataFrame format.

Return type:

pd.DataFrame

Examples

>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler

ads.feature_engineering.accessor.dataframe_accessor module

The ADS accessor for the Pandas DataFrame. The accessor will be initialized with the pandas object the user is interacting with.

Examples

>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
    >>> from ads.feature_engineering.feature_type.continuous import Continuous
    >>> from ads.feature_engineering.feature_type.creditcard import CreditCard
    >>> from ads.feature_engineering.feature_type.string import String
    >>> from ads.feature_engineering.feature_type.base import Tag
>>> df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
          Column   Feature Type                        Description
------------------------------------------------------------------
0           Name         string    Type representing string values.
1    Credit Card         string    Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
                    Credit Card
-------------------------------
0                 4532640527811543
class ads.feature_engineering.accessor.dataframe_accessor.ADSDataFrameAccessor(pandas_obj)

Bases: ADSFeatureTypesMixin, EDAMixin, DBAccessMixin, DataLabelingAccessMixin

ADS accessor for the Pandas DataFrame.

columns

The column labels of the DataFrame.

Type:

List[str]

tags(self) Dict[str, str]

Gets the dictionary of user defined tags for the dataframe.

default_type(self) Dict[str, str]

Gets the map of columns and associated default feature type names.

feature_type(self) Dict[str, List[str]]

Gets the list of registered feature types.

feature_type_description(self) pd.DataFrame

Gets the list of registered feature types in a DataFrame format.

sync(self, src: Union[pd.DataFrame, pd.Series]) pd.DataFrame

Syncs feature types of current DataFrame with that from src.

feature_select(self, include: List[Union[FeatureType, str]] = None, exclude: List[Union[FeatureType, str]] = None) pd.DataFrame

Gets the list of registered feature types in a DataFrame format.

help(self, prop: str = None) None

Provids docstring for affordable methods and properties.

Examples

>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
>>> from ads.feature_engineering.feature_type.continuous import Continuous
>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.base import Tag
df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
          Column   Feature Type                        Description
-------------------------------------------------------------------
0           Name         string    Type representing string values.
1    Credit Card         string    Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
                   Credit Card
------------------------------
0             4532640527811543

Initializes ADS Pandas DataFrame Accessor.

Parameters:

pandas_obj (pandas.DataFrame) – Pandas dataframe

Raises:

ValueError – If provided DataFrame has duplicate columns.

property default_type: Dict[str, str]

Gets the map of columns and associated default feature type names.

Returns:

The dictionary where key is column name and value is the name of default feature type.

Return type:

Dict[str, str]

feature_select(include: Optional[List[Union[FeatureType, str]]] = None, exclude: Optional[List[Union[FeatureType, str]]] = None) DataFrame

Returns a subset of the DataFrame’s columns based on the column feature_types.

Parameters:
  • include (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be included.

  • exclude (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be excluded.

Raises:
  • ValueError – If both of include and exclude are empty

  • ValueError – If include and exclude are used simultaneously

Returns:

The subset of the frame including the feature types in include and excluding the feature types in exclude.

Return type:

pandas.DataFrame

property feature_type: Dict[str, List[str]]

Gets the list of registered feature types.

Returns:

The dictionary where key is column name and value is list of associated feature type names.

Return type:

Dict[str, List[str]]

property feature_type_description: DataFrame

Gets the list of registered feature types in a DataFrame format.

Return type:

pandas.DataFrame

Examples

>>> df.ads.feature_type_description()
          Column   Feature Type                         Description
-------------------------------------------------------------------
0           City         string    Type representing string values.
1   Phone Number         string    Type representing string values.
info() Any

Gets information about the dataframe.

Returns:

The information about the dataframe.

Return type:

Any

model_schema(max_col_num: int = 2000)

Generates schema from the dataframe.

Parameters:

max_col_num (int, optional. Defaults to 1000) – The maximum column size of the data that allows to auto generate schema.

Examples

>>> df = pd.read_csv('./orcl_attrition.csv', usecols=['Age', 'Attrition'])
>>> schema = df.ads.model_schema()
>>> schema
Schema:
    - description: Attrition
    domain:
        constraints: []
        stats:
        count: 1470
        unique: 2
        values: String
    dtype: object
    feature_type: String
    name: Attrition
    required: true
    - description: Age
    domain:
        constraints: []
        stats:
        25%: 31.0
        50%: 37.0
        75%: 44.0
        count: 1470.0
        max: 61.0
        mean: 37.923809523809524
        min: 19.0
        std: 9.135373489136732
        values: Integer
    dtype: int64
    feature_type: Integer
    name: Age
    required: true
>>> schema.to_dict()
{'Schema': [{'dtype': 'object',
    'feature_type': 'String',
    'name': 'Attrition',
    'domain': {'values': 'String',
        'stats': {'count': 1470, 'unique': 2},
        'constraints': []},
    'required': True,
    'description': 'Attrition'},
    {'dtype': 'int64',
    'feature_type': 'Integer',
    'name': 'Age',
    'domain': {'values': 'Integer',
        'stats': {'count': 1470.0,
        'mean': 37.923809523809524,
        'std': 9.135373489136732,
        'min': 19.0,
        '25%': 31.0,
        '50%': 37.0,
        '75%': 44.0,
        'max': 61.0},
        'constraints': []},
    'required': True,
    'description': 'Age'}]}
Returns:

data schema.

Return type:

ads.feature_engineering.schema.Schema

Raises:

ads.feature_engineering.schema.DataSizeTooWide – If the number of columns of input data exceeds max_col_num.

sync(src: Union[DataFrame, Series]) DataFrame

Syncs feature types of current DataFrame with that from src.

Syncs feature types of current dataframe with that from src, where src can be a dataframe or a series. In either case, only columns with matched names are synced.

Parameters:

src (pd.DataFrame | pd.Series) – The source to sync from.

Returns:

Synced dataframe.

Return type:

pandas.DataFrame

property tags: Dict[str, List[str]]

Gets the dictionary of user defined tags for the dataframe. Key is column name and value is list of tag names.

Returns:

The map of columns and associated default tags.

Return type:

Dict[str, List[str]]

ads.feature_engineering.accessor.series_accessor module

The ADS accessor for the Pandas Series. The accessor will be initialized with the pandas object the user is interacting with.

Examples

>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
    Feature Type                         Description
----------------------------------------------------
0         string    Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']
class ads.feature_engineering.accessor.series_accessor.ADSSeriesAccessor(pandas_obj: Series)

Bases: ADSFeatureTypesMixin, EDAMixinSeries

ADS accessor for Pandas Series.

name

The name of Series.

Type:

str

tags

The list of tags for the Series.

Type:

List[str]

help(self, prop: str = None) None

Provids docstring for affordable methods and properties.

sync(self, src: Union[pd.DataFrame, pd.Series]) None

Syncs feature types of current series with that from src.

default_type(self) str

Gets the name of default feature type for the series.

feature_type(self) List[str]

Gets the list of registered feature types for the series.

feature_type_description(self) pd.DataFrame

Gets the list of registered feature types in a DataFrame format.

Examples

>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
    Feature Type                         Description
----------------------------------------------------
0         string    Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']

Initializes ADS Pandas Series Accessor.

Parameters:

pandas_obj (pd.Series) – The pandas series

property default_type: str

Gets the name of default feature type for the series.

Returns:

The name of default feature type.

Return type:

str

property feature_type: List[str]

Gets the list of registered feature types for the series.

Returns:

Names of feature types.

Return type:

List[str]

Examples

>>> series = pd.Series(['name1'])
>>> series.ads.feature_type = ['name', 'string', Tag('tag for name')]
>>> series.ads.feature_type
['name', 'string', 'tag for name']
property feature_type_description: DataFrame

Gets the list of registered feature types in a DataFrame format.

Returns:

The DataFrame with feature types for this series.

Return type:

pd.DataFrame

Examples

>>> series = pd.Series(['name1'])
>>> series.ads.feature_type = ['name', 'string', Tag('Name tag')]
>>> series.ads.feature_type_description
        Feature Type                               Description
    ----------------------------------------------------------
    0           name            Type representing name values.
    1         string          Type representing string values.
    2        Name tag                                     Tag.
sync(src: Union[DataFrame, Series]) None

Syncs feature types of current series with that from src.

The src could be a dataframe or a series. In either case, only columns with matched names are synced.

Parameters:

src ((pd.DataFrame | pd.Series)) – The source to sync from.

Returns:

Nothing.

Return type:

None

Examples

>>> series = pd.Series(['name1', 'name2', 'name3', None])
>>> series.ads.feature_type = ['name']
>>> series.ads.feature_type
['name', string]
>>> series.dropna().ads.feature_type
['string']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['name', 'string']
class ads.feature_engineering.accessor.series_accessor.ADSSeriesValidator(feature_type_list: List[FeatureType], series: Series)

Bases: object

Class helper to invoke registerred validator on a series level.

Initializes ADS series validator.

Parameters:
  • feature_type_list (List[FeatureType]) – The list of feature types.

  • series (pd.Series) – The pandas series.

ads.feature_engineering.accessor.mixin.correlation module

ads.feature_engineering.accessor.mixin.correlation.cat_vs_cat(df: DataFrame, normal_form: bool = True) DataFrame

Calculates the correlation of all pairs of categorical features and categorical features.

ads.feature_engineering.accessor.mixin.correlation.cat_vs_cont(df: DataFrame, categorical_columns, continuous_columns, normal_form: bool = True) DataFrame

Calculates the correlation of all pairs of categorical features and continuous features.

ads.feature_engineering.accessor.mixin.correlation.cont_vs_cont(df: DataFrame, normal_form: bool = True) DataFrame

Calculates the Pearson correlation between two columns of the DataFrame.

ads.feature_engineering.accessor.mixin.eda_mixin module

This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Dataframe. The series of purpose-driven methods enable the data scientist to complete analysis on the dataframe.

From the accessor we have access to the pandas object the user is interacting with as well as corresponding lists of feature types per column.

class ads.feature_engineering.accessor.mixin.eda_mixin.EDAMixin

Bases: object

correlation_ratio() DataFrame

Generate a Correlation Ratio data frame for all categorical-continuous variable pairs.

Returns:

  • pandas.DataFrame

  • Correlation Ratio correlation data frame with the following 3 columns

    1. Column 1 (name of the first categorical/continuous column)

    2. Column 2 (name of the second categorical/continuous column)

    3. Value (correlation value)

Note

Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.

correlation_ratio_plot() Axes

Generate a heatmap of the Correlation Ratio correlation for all categorical-continuous variable pairs.

Returns:

Correlation Ratio correlation plot object that can be updated by the customer

Return type:

Plot object

cramersv() DataFrame

Generate a Cramer’s V correlation data frame for all categorical variable pairs.

Gives a warning for dropped non-categorical columns.

Returns:

Cramer’s V correlation data frame with the following 3 columns:
  1. Column 1 (name of the first categorical column)

  2. Column 2 (name of the second categorical column)

  3. Value (correlation value)

Return type:

pandas.DataFrame

Note

Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.

cramersv_plot() Axes

Generate a heatmap of the Cramer’s V correlation for all categorical variable pairs.

Gives a warning for dropped non-categorical columns.

Returns:

Cramer’s V correlation plot object that can be updated by the customer

Return type:

Plot object

feature_count() DataFrame

Counts the number of columns for each feature type and each primary feature. The column of primary is the number of primary feature types that is assigned to the column.

Returns:

The number of columns for each feature type The number of columns for each primary feature

Return type:

Dataframe with

Examples

>>> df.ads.feature_type
{'PassengerId': ['ordinal', 'category'],
'Survived': ['ordinal'],
'Pclass': ['ordinal'],
'Name': ['category'],
'Sex': ['category']}
>>> df.ads.feature_count()
    Feature Type        Count       Primary
0       category            3             2
1        ordinal            3             3
feature_plot() DataFrame

For every column in the dataframe plot generate a list of summary plots based on the most relevant feature type.

Returns:

Dataframe with 2 columns: 1. Column - feature name 2. Plot - plot object

Return type:

pandas.DataFrame

feature_stat() DataFrame

Summary statistics Dataframe provided.

This returns feature stats on each column using FeatureType summary method.

Examples

>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv')
>>> df.ads.feature_stat().head()
         Column    Metric                       Value
0       PassengerId         count                       891.000
1       PassengerId         mean                        446.000
2       PassengerId         standard deviation      257.354
3       PassengerId         sample minimum          1.000
4       PassengerId         lower quartile              223.500
Returns:

Dataframe with 3 columns: name, metric, value

Return type:

pandas.DataFrame

pearson() DataFrame

Generate a Pearson correlation data frame for all continuous variable pairs.

Gives a warning for dropped non-numerical columns.

Returns:

  • pandas.DataFrame

  • Pearson correlation data frame with the following 3 columns

    1. Column 1 (name of the first continuous column)

    2. Column 2 (name of the second continuous column)

    3. Value (correlation value)

Note

Pairs will be replicated. For example for variables x and y, we’d have (x,y), (y,x) both with same correlation value. We’ll also have (x,x) and (y,y) with value 1.0.

pearson_plot() Axes

Generate a heatmap of the Pearson correlation for all continuous variable pairs.

Returns:

Pearson correlation plot object that can be updated by the customer

Return type:

Plot object

warning() DataFrame

Generates a data frame that lists feature specific warnings.

Returns:

The list of feature specific warnings.

Return type:

pandas.DataFrame

Examples

>>> df.ads.warning()
    Column    Feature Type         Warning               Message       Metric    Value
--------------------------------------------------------------------------------------
0      Age      continuous           Zeros      Age has 38 zeros        Count       38
1      Age      continuous           Zeros   Age has 12.2% zeros   Percentage    12.2%

ads.feature_engineering.accessor.mixin.eda_mixin_series module

This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Series. The series of purpose-driven methods enable the data scientist to complete univariate analysis.

From the accessor we have access to the pandas object the user is interacting with as well as corresponding list of feature types.

class ads.feature_engineering.accessor.mixin.eda_mixin_series.EDAMixinSeries

Bases: object

feature_plot() Axes

For the series generate a summary plot based on the most relevant feature type.

Returns:

Plot object for the series based on the most relevant feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

feature_stat() DataFrame

Summary statistics Dataframe provided.

This returns feature stats on series using FeatureType summary method.

Examples

>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv')
>>> df['Cabin'].ads.feature_stat()
    Metric      Value
0       count       891
1       unqiue      147
2       missing     687
Returns:

Dataframe with 2 columns and rows for different metric values

Return type:

pandas.DataFrame

warning() DataFrame

Generates a data frame that lists feature specific warnings.

Returns:

The list of feature specific warnings.

Return type:

pandas.DataFrame

Examples

>>> df["Age"].ads.warning()
  Feature Type       Warning               Message         Metric      Value
 ---------------------------------------------------------------------------
0   continuous         Zeros      Age has 38 zeros          Count         38
1   continuous         Zeros   Age has 12.2% zeros     Percentage      12.2%

ads.feature_engineering.accessor.mixin.feature_types_mixin module

The module that represents the ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.

Classes

ADSFeatureTypesMixin

ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.

class ads.feature_engineering.accessor.mixin.feature_types_mixin.ADSFeatureTypesMixin

Bases: object

ADS Feature Types Mixin class that extends Pandas Series and DataFrame accessors.

warning_registered(cls) pd.DataFrame

Lists registered warnings for registered feature types.

validator_registered(cls) pd.DataFrame

Lists registered validators for registered feature types.

help(self, prop: str = None) None

Help method that prints either a table of available properties or, given a property, returns its docstring.

help(prop: Optional[str] = None) None

Help method that prints either a table of available properties or, given an individual property, returns its docstring.

Parameters:

prop (str) – The Name of property.

Returns:

Nothing.

Return type:

None

validator_registered() DataFrame

Lists registered validators for registered feature types.

Returns:

The list of registered validators for registered feature types

Return type:

pandas.DataFrame

Examples

>>> df.ads.validator_registered()
         Column     Feature Type        Validator                 Condition                    Handler
------------------------------------------------------------------------------------------------------
0   PhoneNumber    phone_number   is_phone_number                        ()            default_handler
1   PhoneNumber    phone_number   is_phone_number    {'country_code': '+7'}   specific_country_handler
2    CreditCard    credit_card     is_credit_card                        ()            default_handler
>>> df['PhoneNumber'].ads.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
warning_registered() DataFrame

Lists registered warnings for all registered feature types.

Returns:

The list of registered warnings for registered feature types.

Return type:

pandas.DataFrame

Examples

>>> df.ads.warning_registered()
       Column    Feature Type             Warning                    Handler
   -------------------------------------------------------------------------
   0      Age      continuous               zeros              zeros_handler
   1      Age      continuous    high_cardinality   high_cardinality_handler
>>> df["Age"].ads.warning_registered()
       Feature Type             Warning                    Handler
   ---------------------------------------------------------------
   0     continuous               zeros              zeros_handler
   1     continuous    high_cardinality   high_cardinality_handler

ads.feature_engineering.adsstring.common_regex_mixin module

class ads.feature_engineering.adsstring.common_regex_mixin.CommonRegexMixin

Bases: object

property address
property credit_card
property date
property email
property ip
property phone_number_US
property price
redact(fields: Union[List[str], Dict[str, str]]) str

Remove personal information in a string. For example, “Jane’s phone number is 123-456-7890” is turned into “Jane’s phone number is [phone_number_US].”

Parameters:

fields ((list(str) | dict)) – either a list of fields to redact, e.g. [‘email’, ‘phone_number_US’], in which case the redacted text is replaced with capitalized word like [EMAIL] or [PHONE_NUMBER_US_WITH_EXT], or a dictionary where key is a field to redact and value is the replacement text, e.g., {‘email’: ‘HIDDEN_EMAIL’}.

Returns:

redacted string

Return type:

str

redact_map = {'address': '[ADDRESS]', 'address_with_zip': '[ADDRESS_WITH_ZIP]', 'credit_card': '[CREDIT_CARD]', 'date': '[DATE]', 'email': '[EMAIL]', 'ip': '[IP]', 'ipv6': '[IPV6]', 'link': '[LINK]', 'phone_number_US': '[PHONE_NUMBER_US]', 'phone_number_US_with_ext': '[PHONE_NUMBER_US_WITH_EXT]', 'po_box': '[PO_BOX]', 'price': '[PRICE]', 'ssn': '[SSN]', 'time': '[TIME]', 'zip_code': '[ZIP_CODE]'}
property ssn
property time
property zip_code

ads.feature_engineering.adsstring.oci_language module

ads.feature_engineering.adsstring.string module

ads.feature_engineering.feature_type.address module

The module that represents an Address feature type.

Classes:
Address

The Address feature type.

class ads.feature_engineering.feature_type.address.Address

Bases: String

Type representing address.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the location of given address on map base on zip code.

Example

>>> from ads.feature_engineering.feature_type.address import Address
>>> import pandas as pd
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
                        '1 Berkeley Street, Boston, MA 67891',
                        '54305 Oxford Street, Seattle, WA 95132',
                        ''])
>>> Address.validator.is_address(address)
0     True
1     True
2     True
3    False
dtype: bool
description = 'Type representing address.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 1
    unique: 3
values: Address
Returns:

Domain based on the Address feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the location of given address on map base on zip code.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_plot()
Returns:

Plot object for the series based on the Address feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  3
2       missing 1
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.address.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pd.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.base module

class ads.feature_engineering.feature_type.base.FeatureBaseType(classname, bases, dictionary)

Bases: type

The helper metaclass to extend fucntionality of FeatureType class.

class ads.feature_engineering.feature_type.base.FeatureBaseTypeMeta(classname, bases, dictionary)

Bases: FeatureBaseType, ABCMeta

The class to provide compatibility between ABC and FeatureBaseType metaclass.

class ads.feature_engineering.feature_type.base.FeatureType

Bases: ABC

Abstract case for feature types. Default class attribute include name and description. Name is auto generated using camel to snake conversion unless specified.

description = 'Base feature type.'
name = 'feature_type'
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
class ads.feature_engineering.feature_type.base.Name

Bases: object

class ads.feature_engineering.feature_type.base.Tag(name: str)

Bases: object

Class for free form tags. Name must be specified.

Initialize a tag instance.

Parameters:

name (str) – The name of the tag.

ads.feature_engineering.feature_type.boolean module

The module that represents a Boolean feature type.

Classes:
Boolean

The feature type that represents binary values True/False.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.boolean.Boolean

Bases: FeatureType

Type representing binary values True/False.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Show the counts of observations in True/False using bars.

Examples

>>> from ads.feature_engineering.feature_type.boolean import Boolean
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> Boolean.validator.is_boolean(s)
0     True
1     True
2     True
3     True
4    False
5    False
dtype: bool
description = 'Type representing binary values True/False.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_domain()
constraints:
- expression: $x in [True, False]
    language: python
stats:
    count: 6
    missing: 2
    unique: 2
values: Boolean
Returns:

Domain based on the Boolean feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in True/False using bars.

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Plot object for the series based on the Boolean feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_plot()
static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.boolean.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.category module

The module that represents a Category feature type.

Classes:
Category

The Category feature type.

class ads.feature_engineering.feature_type.category.Category

Bases: FeatureType

Type representing discrete unordered values.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing discrete unordered values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='category')
>>> cat.ads.feature_type = ['category']
>>> cat.ads.feature_domain()
constraints:
- expression: $x in ['S', 'C', 'Q', '']
    language: python
stats:
    count: 22
    missing: 3
    unique: 3
values: Category
Returns:

Domain based on the Category feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in each categorical bin using bar chart.

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Plot object for the series based on the Category feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_plot()
static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there are any.

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.constant module

The module that represents a Constant feature type.

Classes:
Constant

The Constant feature type.

class ads.feature_engineering.feature_type.constant.Constant

Bases: FeatureType

Type representing constant values.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the counts of observations in bars.

description = 'Type representing constant values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type. .. rubric:: Example

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 5
    unique: 1
values: Constant
Returns:

Domain based on the Constant feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in bars.

Parameters:

x (pandas.Series) – The feature being shown.

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_plot()
Returns:

Plot object for the series based on the Constant feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_stat()
    Metric  Value
0       count   5
1       unique  1
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.continuous module

The module that represents a Continuous feature type.

Classes:
Continuous

The Continuous feature type.

class ads.feature_engineering.feature_type.continuous.Continuous

Bases: FeatureType

Type representing continuous values.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using box plot.

description = 'Type representing continuous values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_domain()
constraints: []
stats:
    count: 10.0
    lower quartile: 3.058
    mean: 4.959
    median: 3.81
    missing: 2.0
    sample maximum: 13.32
    sample minimum: 2.25
    skew: 2.175
    standard deviation: 3.62
    upper quartile: 4.908
values: Continuous
Returns:

Domain based on the Continuous feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datasets using box plot.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feture_plot()
Returns:

Plot object for the series based on the Continuous feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, skew and missing(count).

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_stat()
    Metric                  Value
0       count                   10.000
1       mean                    4.959
2       standard deviation          3.620
3       sample minimum          2.250
4       lower quartile          3.058
5       median                  3.810
6       upper quartile          4.908
7       sample maximum          13.320
8       skew                    2.175
9       missing                 2.000
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.creditcard module

The module that represents a CreditCard feature type.

Classes:
CreditCard

The CreditCard feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

_luhn_checksum(card_number: str) -> float

Implements Luhn algorithm to validate a credit card number.

class ads.feature_engineering.feature_type.creditcard.CreditCard

Bases: String

Type representing credit card numbers.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the counts of observations in each credit card type using bar chart.

Examples

>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> import pandas as pd
>>> s = pd.Series(["4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190"], name='credit_card')
>>> s.ads.feature_type = ['credit_card']
>>> CreditCard.validator.is_credit_card(s)
0     True
1    False
2     True
3     True
4     True
5     True
Name: credit_card, dtype: bool
description = 'Type representing credit card numbers.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_domain()
constraints: []
stats:
    count: 16
    count_Amex: 5
    count_Diners Club: 2
    count_MasterCard: 3
    count_Visa: 5
    count_missing: 1
    missing: 1
    unique: 15
values: CreditCard
Returns:

Domain based on the CreditCard feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in each credit card type using bar chart.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_plot()
Returns:

Plot object for the series based on the CreditCard feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series)

Generates feature statistics.

Feature statistics include (total)count, unique(count), missing(count) and

count of each credit card type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_stat()
    Metric              Value
0       count               16
1       unique              15
2       missing             1
3       count_Amex              5
4       count_Visa              5
5       count_MasterCard        3
6       count_Diners Club       2
7       count_missing       1
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.creditcard.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.datetime module

The module that represents a DateTime feature type.

Classes:
DateTime

The DateTime feature type.

class ads.feature_engineering.feature_type.datetime.DateTime

Bases: FeatureType

Type representing date and/or time.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datetime datasets using histograms.

Example

>>> from ads.feature_engineering.feature_type.datetime import DateTime
>>> import pandas as pd
>>> s = pd.Series(["12/12/12", "12/12/13", None, "12/12/14"], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> DateTime.validator.is_datetime(s)
0     True
1     True
2    False
3     True
Name: datetime, dtype: bool
description = 'Type representing date and/or time.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 8
    missing: 3
    sample maximum: April/15/11
    sample minimum: 3/11/2000
values: DateTime
Returns:

Domain based on the DateTime feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datetime datasets using histograms.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_plot()
Returns:

Plot object for the series based on the DateTime feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, sample maximum, sample minimum, and missing(count) if there is any.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_stat()
    Metric              Value
0       count               8
1       sample maximum      April/15/11
2       sample minimum      3/11/2000
3       missing             3
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.datetime.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.discrete module

The module that represents a Discrete feature type.

Classes:
Discrete

The Discrete feature type.

class ads.feature_engineering.feature_type.discrete.Discrete

Bases: FeatureType

Type representing discrete values.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using box plot.

description = 'Type representing discrete values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_domain()
constraints: []
stats:
    count: 4
    unique: 4
values: Discrete
Returns:

Domain based on the Discrete feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datasets using box plot.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  4
Returns:

Plot object for the series based on the Discrete feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
            discrete
count   4
unique  4
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.document module

The module that represents a Document feature type.

Classes:
Document

The Document feature type.

class ads.feature_engineering.feature_type.document.Document

Bases: FeatureType

Type representing document values.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

description = 'Type representing document values.'
classmethod feature_domain()
Returns:

Nothing.

Return type:

None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.gis module

The module that represents a GIS feature type.

Classes:
GIS

The GIS feature type.

class ads.feature_engineering.feature_type.gis.GIS

Bases: FeatureType

Type representing geographic information.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.gis import GIS
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='gis')
>>> s.ads.feature_type = ['gis']
>>> GIS.validator.is_gis(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: gis, dtype: bool
description = 'Type representing geographic information.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: GIS
Returns:

Domain based on the GIS feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the location of given address on map base on longitude and latitute.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_plot()
Returns:

Plot object for the series based on the GIS feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_stat()
        gis
count   13
unique  10
missing 3
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.gis.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.integer module

The module that represents an Integer feature type.

Classes:
Integer

The Integer feature type.

class ads.feature_engineering.feature_type.integer.Integer

Bases: FeatureType

Type representing integer values.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using box plot.

description = 'Type representing integer values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='integer')
>>> s.ads.feature_type = ['integer']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    freq: 2
    missing: 2
    top: true
    unique: 2
values: Integer
Returns:

Domain based on the Integer feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datasets using box plot.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_plot()
Returns:

Plot object for the series based on the Integer feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, max and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_stat()
    Metric                  Value
0       count                   7
1       mean                    1
2       standard deviation          1
3       sample minimum          0
4       lower quartile          1
5       median                  1
6       upper quartile          2
7       sample maximum          4
8       missing                 1
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address module

The module that represents an IpAddress feature type.

Classes:
IpAddress

The IpAddress feature type.

class ads.feature_engineering.feature_type.ip_address.IpAddress

Bases: FeatureType

Type representing IP Address.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address import IpAddress
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> IpAddress.validator.is_ip_address(s)
0     True
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool
description = 'Type representing IP Address.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 3
values: IpAddress
Returns:

Domain based on the IpAddress feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.ip_address.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.ip_address_v4 module

The module that represents an IpAddressV4 feature type.

Classes:
IpAddressV4

The IpAddressV4 feature type.

class ads.feature_engineering.feature_type.ip_address_v4.IpAddressV4

Bases: FeatureType

Type representing IP Address V4.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v4 import IpAddressV4
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> IpAddressV4.validator.is_ip_address_v4(s)
0     True
1    False
2    False
3    False
4    False
Name: ip_address, dtype: bool
description = 'Type representing IP Address V4.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address_v4')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 4
values: IpAddressV4
Returns:

Domain based on the IpAddressV4 feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  4
2       missing 2
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.ip_address_v4.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.ip_address_v6 module

The module that represents an IpAddressV6 feature type.

Classes:
IpAddressV6

The IpAddressV6 feature type.

class ads.feature_engineering.feature_type.ip_address_v6.IpAddressV6

Bases: FeatureType

Type representing IP Address V6.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v6 import IpAddressV6
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> IpAddressV6.validator.is_ip_address_v6(s)
0    False
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool
description = 'Type representing IP Address V6.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address_v6')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 2
values: IpAddressV6
Returns:

Domain based on the IpAddressV6 feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2
Returns:

Summary statistics of the Series provided.

Return type:

Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.ip_address_v6.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.lat_long module

The module that represents a LatLong feature type.

Classes:
LatLong

The LatLong feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.lat_long.LatLong

Bases: String

Type representing longitude and latitute.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.lat_long import LatLong
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='latlong')
>>> s.ads.feature_type = ['lat_long']
>>> LatLong.validator.is_lat_long(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: latlong, dtype: bool
description = 'Type representing longitude and latitute.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> latlong_series = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: LatLong
Returns:

Domain based on the LatLong feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the location of given address on map base on longitude and latitute.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_plot()
Returns:

Plot object for the series based on the LatLong feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generate feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_stat()
    Metric  Value
0       count   13
1       unique  10
2       missing 3
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.lat_long.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.object module

The module that represents an Object feature type.

Classes:
Object

The Object feature type.

class ads.feature_engineering.feature_type.object.Object

Bases: FeatureType

Type representing object.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

description = 'Type representing object.'
classmethod feature_domain()
Returns:

Nothing.

Return type:

None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ordinal module

The module that represents an Ordinal feature type.

Classes:
Ordinal

The Ordinal feature type.

class ads.feature_engineering.feature_type.ordinal.Ordinal

Bases: FeatureType

Type representing ordered values.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing ordered values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_domain()
constraints:
- expression: $x in [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
    language: python
stats:
    count: 10
    missing: 1
    unique: 9
values: Ordinal
Returns:

Domain based on the Ordinal feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in each categorical bin using bar chart.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_plot()
Returns:

The bart chart plot object for the series based on the Continuous feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count), and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_stat()
    Metric  Value
0       count   10
1       unique  9
2       missing 1
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.phone_number module

The module that represents a Phone Number feature type.

Classes:
PhoneNumber

The Phone Number feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.phone_number.PhoneNumber

Bases: String

Type representing phone numbers.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

Examples

>>> from ads.feature_engineering.feature_type.phone_number import PhoneNumber
>>> import pandas as pd
>>> s = pd.Series([None, "1-640-124-5367", "1-573-916-4412"])
>>> PhoneNumber.validator.is_phone_number(s)
0    False
1     True
2     True
dtype: bool
description = 'Type representing phone numbers.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 7
    missing: 4
    unique: 2
values: PhoneNumber
Returns:

Domain based on the PhoneNumber feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_stat()
    Metric  Value
1       count   7
2       unique  2
3       missing 4
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.phone_number.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.string module

The module that represents a String feature type.

Classes:
String

The feature type that represents string values.

class ads.feature_engineering.feature_type.string.String

Bases: FeatureType

Type representing string values.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using wordcloud.

Example

>>> from ads.feature_engineering.feature_type.string import String
>>> import pandas as pd
>>> s = pd.Series(["Hello", "world", None], name='string')
>>> String.validator.is_string(s)
0     True
1     True
2    False
Name: string, dtype: bool
description = 'Type representing string values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_domain()
constraints: []
stats:
    count: 22
    missing: 3
    unique: 3
values: String
Returns:

Domain based on the String feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datasets using wordcloud.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_plot()
Returns:

Plot object for the series based on the String feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.string.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pd.Series) – The data to process.

Returns:

pd.Series

Return type:

The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.text module

The module that represents a Text feature type.

Classes:
Text

The Text feature type.

class ads.feature_engineering.feature_type.text.Text

Bases: String

Type representing text values.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using wordcloud.

description = 'Type representing text values.'
classmethod feature_domain()
Returns:

Nothing.

Return type:

None

static feature_plot(x: Series) Axes

Shows distributions of datasets using wordcloud.

Examples

>>> text = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='text')
>>> text.ads.feature_type = ['text']
>>> text.ads.feature_plot()
Returns:

Plot object for the series based on the Text feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.unknown module

The module that represents an Unknown feature type.

Classes:
Text

The Unknown feature type.

class ads.feature_engineering.feature_type.unknown.Unknown

Bases: FeatureType

Type representing third-party dtypes.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

description = 'Type representing unknown type.'
classmethod feature_domain()
Returns:

Nothing.

Return type:

None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.zip_code module

The module that represents a ZipCode feature type.

Classes:
ZipCode

The ZipCode feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.zip_code.ZipCode

Bases: String

Type representing postal code.

description

The feature type description.

Type:

str

name

The feature type name.

Type:

str

warning

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the geometry distribution base on location of zipcode.

Example

>>> from ads.feature_engineering.feature_type.zip_code import ZipCode
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(["94065", "90210", np.NaN, None], name='zipcode')
>>> ZipCode.validator.is_zip_code(s)
0     True
1     True
2    False
3    False
Name: zipcode, dtype: bool
description = 'Type representing postal code.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 2
    unique: 2
values: ZipCode
Returns:

Domain based on the ZipCode feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the geometry distribution base on location of zipcode.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_plot()
Returns:

Plot object for the series based on the ZipCode feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  2
2       missing 2
Returns:

Summary statistics of the Series provided.

Return type:

Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.zip_code.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters:

data (pd.Series) – The data to process.

Returns:

pd.Series

Return type:

The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.handler.feature_validator module

The module that helps to register custom validators for the feature types and extending registered validators with dispatching based on the specific arguments.

Classes

FeatureValidator

The Feature Validator class to manage custom validators.

FeatureValidatorMethod

The Feature Validator Method class. Extends methods which requires dispatching based on the specific arguments.

class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator

Bases: object

The Feature Validator class to manage custom validators.

register(self, name: str, handler: Callable, condition: Union[Tuple, Dict[str, Any]] = None, replace: bool = False) None

Registers new validator.

unregister(self, name: str, condition: Union[Tuple, Dict[str, Any]] = None) None

Unregisters validator.

registered(self) pd.DataFrame

Gets the list of registered validators.

Examples

>>> series = pd.Series(['+1-202-555-0141', '+1-202-555-0142'], name='Phone Number')
>>> def phone_number_validator(data: pd.Series) -> pd.Series:
...    print("phone_number_validator")
...    return data
>>> def universal_phone_number_validator(data: pd.Series, country_code) -> pd.Series:
...    print("universal_phone_number_validator")
...    return data
>>> def us_phone_number_validator(data: pd.Series, country_code) -> pd.Series:
...    print("us_phone_number_validator")
...    return data
>>> PhoneNumber.validator.register(name="is_phone_number", handler=phone_number_validator, replace=True)
>>> PhoneNumber.validator.register(name="is_phone_number", handler=universal_phone_number_validator, condition = ('country_code',))
>>> PhoneNumber.validator.register(name="is_phone_number", handler=us_phone_number_validator, condition = {'country_code':'+1'})
>>> PhoneNumber.validator.is_phone_number(series)
    phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142
>>> PhoneNumber.validator.is_phone_number(series, country_code = '+7')
    universal_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142
>>> PhoneNumber.validator.is_phone_number(series, country_code = '+1')
    us_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142
>>> PhoneNumber.validator.registered()
               Validator                 Condition                            Handler
    ---------------------------------------------------------------------------------
    0    is_phone_number                        ()             phone_number_validator
    1    is_phone_number          ('country_code')   universal_phone_number_validator
    2    is_phone_number    {'country_code': '+1'}          us_phone_number_validator
>>> series.ads.validator.is_phone_number()
    phone_number_validator
        0     +1-202-555-0141
        1     +1-202-555-0142
>>> series.ads.validator.is_phone_number(country_code = '+7')
    universal_phone_number_validator
        0     +1-202-555-0141
        1     +1-202-555-0142
>>> series.ads.validator.is_phone_number(country_code = '+1')
    us_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

Initializes the FeatureValidator.

register(name: str, handler: Callable, condition: Optional[Union[Tuple, Dict[str, Any]]] = None, replace: bool = False) None

Registers new validator.

Parameters:
  • name (str) – The validator name.

  • handler (callable) – The handler.

  • condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator.

  • replace (bool) – The flag indicating if the registered validator should be replaced with the new one.

Returns:

Nothing.

Return type:

None

Raises:
  • ValueError – The name is empty or handler is not provided.

  • TypeError – The handler is not callable. The name of the validator is not a string.

  • ValidatorAlreadyExists – The validator is already registered.

registered() DataFrame

Gets the list of registered validators.

Returns:

The list of registerd validators.

Return type:

pd.DataFrame

unregister(name: str, condition: Optional[Union[Tuple, Dict[str, Any]]] = None) None

Unregisters validator.

Parameters:
  • name (str) – The name of the validator to be unregistered.

  • condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator to be unregistered.

Returns:

Nothing.

Return type:

None

Raises:
  • TypeError – The name of the validator is not a string.

  • ValidatorNotFound – The validator not found.

  • ValidatorWIthConditionNotFound – The validator with provided condition not found.

class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidatorMethod(handler: Callable)

Bases: object

The Feature Validator Method class.

Extends methods which requires dispatching based on the specific arguments.

register(self, condition: Union[Tuple, Dict[str, Any]], handler: Callable) None

Registers new handler.

unregister(self, condition: Union[Tuple, Dict[str, Any]]) None

Unregisters existing handler.

registered(self) pd.DataFrame

Gets the list of registered handlers.

Initializes the Feature Validator Method.

Parameters:

handler (Callable) – The handler that will be called by default if suitable one not found.

register(condition: Union[Tuple, Dict[str, Any]], handler: Callable) None

Registers new handler.

Parameters:
  • condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to register a new handler.

  • handler (Callable) – The handler to be registered.

Returns:

Nothing.

Return type:

None

Raises:

ValueError – If condition not provided or provided in the wrong format. If handler not provided or has wrong format.

registered() DataFrame

Gets the list of registered handlers.

Returns:

The list of registerd handlers.

Return type:

pd.DataFrame

unregister(condition: Union[Tuple, Dict[str, Any]]) None

Unregisters existing handler.

Parameters:

condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to unregister a handler.

Returns:

Nothing.

Return type:

None

Raises:

ValueError – If condition not provided or provided in the wrong format. If condition not registered.

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorAlreadyExists(name: str)

Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorNotFound(name: str)

Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionAlreadyExists(name: str)

Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionNotFound(name: str)

Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.WrongHandlerMethodSignature(handler_name: str, condition: str, handler_signature: str)

Bases: ValueError

ads.feature_engineering.feature_type.handler.feature_warning module

The module that helps to register custom warnings for the feature types.

Classes

FeatureWarning

The Feature Warning class. Provides functionality to register warning handlers and invoke them.

Examples

>>> warning = FeatureWarning()
>>> def warning_handler_zeros_count(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 38 zeros', 'Count', 38]],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> def warning_handler_zeros_percentage(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> warning.register(name="zeros_count", handler=warning_handler_zeros_count)
>>> warning.register(name="zeros_percentage", handler=warning_handler_percentage)
>>> warning.registered()
                    Name                               Handler
    ----------------------------------------------------------
    0         zeros_count          warning_handler_zeros_count
    1    zeros_percentage     warning_handler_zeros_percentage
>>> warning.zeros_percentage(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
>>> warning.zeros_count(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%
>>> warning(data_series)
        Warning                    Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%
>>> warning.unregister('zeros_count')
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros   Age has 12.2% zeros     Percentage      12.2%
class ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning

Bases: object

The Feature Warning class.

Provides functionality to register warning handlers and invoke them.

register(self, name: str, handler: Callable) None

Registers a new warning for the feature type.

unregister(self, name: str) None

Unregisters warning.

registered(self) pd.DataFrame

Gets the list of registered warnings.

Examples

>>> warning = FeatureWarning()
>>> def warning_handler_zeros_count(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 38 zeros', 'Count', 38]],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> def warning_handler_zeros_percentage(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> warning.register(name="zeros_count", handler=warning_handler_zeros_count)
>>> warning.register(name="zeros_percentage", handler=warning_handler_percentage)
>>> warning.registered()
                  Warning                              Handler
    ----------------------------------------------------------
    0         zeros_count          warning_handler_zeros_count
    1    zeros_percentage     warning_handler_zeros_percentage
>>> warning.zeros_percentage(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
>>> warning.zeros_count(data_series)
              Warning              Message         Metric      Value
    ----------------------------------------------------------------
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%
>>> warning.unregister('zeros_count')
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros   Age has 12.2% zeros     Percentage      12.2%

Initializes the FeatureWarning.

register(name: str, handler: Callable, replace: bool = False) None

Registers a new warning.

Parameters:
  • name (str) – The warning name.

  • handler (callable) – The handler associated with the warning.

  • replace (bool) – The flag indicating if the registered warning should be replaced with the new one.

Returns:

Nothing

Return type:

None

Raises:
  • ValueError – If warning name is empty or handler not defined.

  • TypeError – If handler is not callable.

  • WarningAlreadyExists – If warning is already registered.

registered() DataFrame

Gets the list of registered warnings.

Return type:

pd.DataFrame

Examples

>>>    The list of registerd warnings in DataFrame format.
                     Name                               Handler
    -----------------------------------------------------------
    0         zeros_count           warning_handler_zeros_count
    1    zeros_percentage      warning_handler_zeros_percentage
unregister(name: str) None

Unregisters warning.

Parameters:

name (str) – The name of warning to be unregistered.

Returns:

Nothing.

Return type:

None

Raises:
  • ValueError – If warning name is not provided or empty.

  • WarningNotFound – If warning not found.

ads.feature_engineering.feature_type.handler.warnings module

The module with all default warnings provided to user. These are registered to relevant feature types directly in the feature type files themselves.

ads.feature_engineering.feature_type.handler.warnings.high_cardinality_handler(s: Series) DataFrame

Warning if number of unique values (including Nan) in series is greater than or equal to 15.

Parameters:

s (pd.Series) – Pandas series - column of some feature type.

Returns:

Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists count of unique values.

Return type:

pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.missing_values_handler(s: Series) DataFrame

Warning for > 5 percent missing values (Nans) in series.

Parameters:

s (pd.Series) – Pandas series - column of some feature type.

Returns:

Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of missing values and second is percentage of missing values.

Return type:

pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.skew_handler(s: Series) DataFrame

Warning if absolute value of skew is greater than 1.

Parameters:

s (pd.Series) – Pandas series - column of some feature type, expects continuous values.

Returns:

Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists skew value of that column.

Return type:

pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.zeros_handler(s: Series) DataFrame

Warning for greater than 10 percent zeros in series.

Parameters:

s (pd.Series) – Pandas series - column of some feature type.

Returns:

Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of zero values and second is percentage of zero values.

Return type:

pd.Dataframe

Module contents