ads.feature_engineering package

Submodules

ads.feature_engineering.exceptions module

exception ads.feature_engineering.exceptions.InvalidFeatureType(tname: str)

Bases: TypeError

exception ads.feature_engineering.exceptions.NameAlreadyRegistered(name: str)

Bases: NameError

exception ads.feature_engineering.exceptions.TypeAlreadyAdded(tname: str)

Bases: TypeError

exception ads.feature_engineering.exceptions.TypeAlreadyRegistered(tname: str)

Bases: TypeError

exception ads.feature_engineering.exceptions.TypeNotFound(tname: str)

Bases: TypeError

exception ads.feature_engineering.exceptions.WarningAlreadyExists(name: str)

Bases: ValueError

exception ads.feature_engineering.exceptions.WarningNotFound(name: str)

Bases: ValueError

ads.feature_engineering.feature_type_manager module

The module that helps to manage feature types. Provides functionalities to register, unregister, list feature types.

Classes

FeatureTypeManager

Feature Types Manager class that manages feature types.

Examples

>>> from ads.feature_engineering.feature_type.base import FeatureType
>>> class NewType(FeatureType):
...    description="My personal type."
...    pass
>>> FeatureTypeManager.feature_type_register(NewType)
>>> FeatureTypeManager.feature_type_registered()
            Name        Feature Type                                  Description
---------------------------------------------------------------------------------
0     Continuous          continuous          Type representing continuous values.
1       DateTime           date_time           Type representing date and/or time.
2       Category            category  Type representing discrete unordered values.
3        Ordinal             ordinal             Type representing ordered values.
4        NewType            new_type                             My personal type.
>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler
>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler
>>> FeatureTypeManager.feature_type_unregister(NewType)
>>> FeatureTypeManager.feature_type_reset()
>>> FeatureTypeManager.feature_type_object('continuous')
Continuous
class ads.feature_engineering.feature_type_manager.FeatureTypeManager

Bases: object

Feature Types Manager class that manages feature types.

Provides functionalities to register, unregister, list feature types.

feature_type_object(cls, feature_type: Union[FeatureType, str]) FeatureType

Gets a feature type by class object or name.

feature_type_register(cls, feature_type_cls: FeatureType) None

Registers a feature type.

feature_type_unregister(cls, feature_type_cls: Union[FeatureType, str]) None

Unregisters a feature type.

feature_type_reset(cls) None

Resets feature types to be default.

feature_type_registered(cls) pd.DataFrame

Lists all registered feature types as a DataFrame.

warning_registered(cls) pd.DataFrame

Lists registered warnings for all registered feature types.

validator_registered(cls) pd.DataFrame

Lists registered validators for all registered feature types.

Examples

>>> from ads.feature_engineering.feature_type.base import FeatureType
>>> class NewType(FeatureType):
...    pass
>>> FeatureTypeManager.register_feature_type(NewType)
>>> FeatureTypeManager.feature_type_registered()
            Name      Feature Type                                  Description
-------------------------------------------------------------------------------
0     Continuous        continuous          Type representing continuous values.
1       DateTime         date_time           Type representing date and/or time.
2       Category          category  Type representing discrete unordered values.
3        Ordinal           ordinal             Type representing ordered values.
>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler
>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler
>>> FeatureTypeManager.feature_type_unregister(NewType)
>>> FeatureTypeManager.feature_type_reset()
>>> FeatureTypeManager.feature_type_object('continuous')
Continuous
classmethod feature_type_object(feature_type: Union[FeatureType, str]) FeatureType

Gets a feature type by class object or name.

Parameters

feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.

Returns

Found feature type.

Return type

FeatureType

Raises
  • TypeNotFound – If provided feature type not registered.

  • TypeError – If provided feature type not a subclass of FeatureType.

classmethod feature_type_register(feature_type_cls: FeatureType) None

Registers new feature type.

Parameters

feature_type (FeatureType) – Subclass of FeatureType to be registered.

Returns

Nothing.

Return type

None

Raises
  • TypeError – Type is not a subclass of FeatureType.

  • TypeError – Type has already been registered.

  • NameError – Name has already been used.

classmethod feature_type_registered() DataFrame

Lists all registered feature types as a DataFrame.

Returns

The list of feature types in a DataFrame format.

Return type

pd.DataFrame

classmethod feature_type_reset() None

Resets feature types to be default.

Returns

Nothing.

Return type

None

classmethod feature_type_unregister(feature_type: Union[FeatureType, str]) None

Unregisters a feature type.

Parameters

feature_type ((FeatureType | str)) – The FeatureType subclass or a str indicating feature type.

Returns

Nothing.

Return type

None

Raises

TypeError – In attempt to unregister a default feature type.

classmethod is_type_registered(feature_type: Union[FeatureType, str]) bool

Checks if provided feature type registered in the system.

Parameters

feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.

Returns

True if provided feature type registered, False otherwise.

Return type

bool

classmethod validator_registered() DataFrame

Lists registered validators for registered feature types.

Returns

The list of registered validators for registered feature types in a DataFrame format.

Return type

pd.DataFrame

Examples

>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler
classmethod warning_registered() DataFrame

Lists registered warnings for all registered feature types.

Returns

The list of registered warnings for registered feature types in a DataFrame format.

Return type

pd.DataFrame

Examples

>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler

ads.feature_engineering.accessor.dataframe_accessor module

The ADS accessor for the Pandas DataFrame. The accessor will be initialized with the pandas object the user is interacting with.

Examples

>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
    >>> from ads.feature_engineering.feature_type.continuous import Continuous
    >>> from ads.feature_engineering.feature_type.creditcard import CreditCard
    >>> from ads.feature_engineering.feature_type.string import String
    >>> from ads.feature_engineering.feature_type.base import Tag
>>> df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
          Column   Feature Type                        Description
------------------------------------------------------------------
0           Name         string    Type representing string values.
1    Credit Card         string    Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
                    Credit Card
-------------------------------
0                 4532640527811543
class ads.feature_engineering.accessor.dataframe_accessor.ADSDataFrameAccessor(pandas_obj)

Bases: ADSFeatureTypesMixin, EDAMixin, DBAccessMixin, DataLabelingAccessMixin

ADS accessor for the Pandas DataFrame.

columns

The column labels of the DataFrame.

Type

List[str]

tags(self) Dict[str, str]

Gets the dictionary of user defined tags for the dataframe.

default_type(self) Dict[str, str]

Gets the map of columns and associated default feature type names.

feature_type(self) Dict[str, List[str]]

Gets the list of registered feature types.

feature_type_description(self) pd.DataFrame

Gets the list of registered feature types in a DataFrame format.

sync(self, src: Union[pd.DataFrame, pd.Series]) pd.DataFrame

Syncs feature types of current DataFrame with that from src.

feature_select(self, include: List[Union[FeatureType, str]] = None, exclude: List[Union[FeatureType, str]] = None) pd.DataFrame

Gets the list of registered feature types in a DataFrame format.

help(self, prop: str = None) None

Provids docstring for affordable methods and properties.

Examples

>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
>>> from ads.feature_engineering.feature_type.continuous import Continuous
>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.base import Tag
df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
          Column   Feature Type                        Description
-------------------------------------------------------------------
0           Name         string    Type representing string values.
1    Credit Card         string    Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
                   Credit Card
------------------------------
0             4532640527811543

Initializes ADS Pandas DataFrame Accessor.

Parameters

pandas_obj (pandas.DataFrame) – Pandas dataframe

Raises

ValueError – If provided DataFrame has duplicate columns.

property default_type: Dict[str, str]

Gets the map of columns and associated default feature type names.

Returns

The dictionary where key is column name and value is the name of default feature type.

Return type

Dict[str, str]

feature_select(include: Optional[List[Union[FeatureType, str]]] = None, exclude: Optional[List[Union[FeatureType, str]]] = None) DataFrame

Returns a subset of the DataFrame’s columns based on the column feature_types.

Parameters
  • include (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be included.

  • exclude (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be excluded.

Raises
  • ValueError – If both of include and exclude are empty

  • ValueError – If include and exclude are used simultaneously

Returns

The subset of the frame including the feature types in include and excluding the feature types in exclude.

Return type

pandas.DataFrame

property feature_type: Dict[str, List[str]]

Gets the list of registered feature types.

Returns

The dictionary where key is column name and value is list of associated feature type names.

Return type

Dict[str, List[str]]

property feature_type_description: DataFrame

Gets the list of registered feature types in a DataFrame format.

Return type

pandas.DataFrame

Examples

>>> df.ads.feature_type_description()
          Column   Feature Type                         Description
-------------------------------------------------------------------
0           City         string    Type representing string values.
1   Phone Number         string    Type representing string values.
info() Any

Gets information about the dataframe.

Returns

The information about the dataframe.

Return type

Any

model_schema(max_col_num: int = 2000)

Generates schema from the dataframe.

Parameters

max_col_num (int, optional. Defaults to 1000) – The maximum column size of the data that allows to auto generate schema.

Examples

>>> df = pd.read_csv('./orcl_attrition.csv', usecols=['Age', 'Attrition'])
>>> schema = df.ads.model_schema()
>>> schema
Schema:
    - description: Attrition
    domain:
        constraints: []
        stats:
        count: 1470
        unique: 2
        values: String
    dtype: object
    feature_type: String
    name: Attrition
    required: true
    - description: Age
    domain:
        constraints: []
        stats:
        25%: 31.0
        50%: 37.0
        75%: 44.0
        count: 1470.0
        max: 61.0
        mean: 37.923809523809524
        min: 19.0
        std: 9.135373489136732
        values: Integer
    dtype: int64
    feature_type: Integer
    name: Age
    required: true
>>> schema.to_dict()
{'Schema': [{'dtype': 'object',
    'feature_type': 'String',
    'name': 'Attrition',
    'domain': {'values': 'String',
        'stats': {'count': 1470, 'unique': 2},
        'constraints': []},
    'required': True,
    'description': 'Attrition'},
    {'dtype': 'int64',
    'feature_type': 'Integer',
    'name': 'Age',
    'domain': {'values': 'Integer',
        'stats': {'count': 1470.0,
        'mean': 37.923809523809524,
        'std': 9.135373489136732,
        'min': 19.0,
        '25%': 31.0,
        '50%': 37.0,
        '75%': 44.0,
        'max': 61.0},
        'constraints': []},
    'required': True,
    'description': 'Age'}]}
Returns

data schema.

Return type

ads.feature_engineering.schema.Schema

Raises

ads.feature_engineering.schema.DataSizeTooWide – If the number of columns of input data exceeds max_col_num.

sync(src: Union[DataFrame, Series]) DataFrame

Syncs feature types of current DataFrame with that from src.

Syncs feature types of current dataframe with that from src, where src can be a dataframe or a series. In either case, only columns with matched names are synced.

Parameters

src (pd.DataFrame | pd.Series) – The source to sync from.

Returns

Synced dataframe.

Return type

pandas.DataFrame

property tags: Dict[str, List[str]]

Gets the dictionary of user defined tags for the dataframe. Key is column name and value is list of tag names.

Returns

The map of columns and associated default tags.

Return type

Dict[str, List[str]]

ads.feature_engineering.accessor.series_accessor module

The ADS accessor for the Pandas Series. The accessor will be initialized with the pandas object the user is interacting with.

Examples

>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
    Feature Type                         Description
----------------------------------------------------
0         string    Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']
class ads.feature_engineering.accessor.series_accessor.ADSSeriesAccessor(pandas_obj: Series)

Bases: ADSFeatureTypesMixin, EDAMixinSeries

ADS accessor for Pandas Series.

name

The name of Series.

Type

str

tags

The list of tags for the Series.

Type

List[str]

help(self, prop: str = None) None

Provids docstring for affordable methods and properties.

sync(self, src: Union[pd.DataFrame, pd.Series]) None

Syncs feature types of current series with that from src.

default_type(self) str

Gets the name of default feature type for the series.

feature_type(self) List[str]

Gets the list of registered feature types for the series.

feature_type_description(self) pd.DataFrame

Gets the list of registered feature types in a DataFrame format.

Examples

>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
    Feature Type                         Description
----------------------------------------------------
0         string    Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']

Initializes ADS Pandas Series Accessor.

Parameters

pandas_obj (pd.Series) – The pandas series

property default_type: str

Gets the name of default feature type for the series.

Returns

The name of default feature type.

Return type

str

property feature_type: List[str]

Gets the list of registered feature types for the series.

Returns

Names of feature types.

Return type

List[str]

Examples

>>> series = pd.Series(['name1'])
>>> series.ads.feature_type = ['name', 'string', Tag('tag for name')]
>>> series.ads.feature_type
['name', 'string', 'tag for name']
property feature_type_description: DataFrame

Gets the list of registered feature types in a DataFrame format.

Returns

The DataFrame with feature types for this series.

Return type

pd.DataFrame

Examples

>>> series = pd.Series(['name1'])
>>> series.ads.feature_type = ['name', 'string', Tag('Name tag')]
>>> series.ads.feature_type_description
        Feature Type                               Description
    ----------------------------------------------------------
    0           name            Type representing name values.
    1         string          Type representing string values.
    2        Name tag                                     Tag.
sync(src: Union[DataFrame, Series]) None

Syncs feature types of current series with that from src.

The src could be a dataframe or a series. In either case, only columns with matched names are synced.

Parameters

src ((pd.DataFrame | pd.Series)) – The source to sync from.

Returns

Nothing.

Return type

None

Examples

>>> series = pd.Series(['name1', 'name2', 'name3', None])
>>> series.ads.feature_type = ['name']
>>> series.ads.feature_type
['name', string]
>>> series.dropna().ads.feature_type
['string']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['name', 'string']
class ads.feature_engineering.accessor.series_accessor.ADSSeriesValidator(feature_type_list: List[FeatureType], series: Series)

Bases: object

Class helper to invoke registerred validator on a series level.

Initializes ADS series validator.

Parameters
  • feature_type_list (List[FeatureType]) – The list of feature types.

  • series (pd.Series) – The pandas series.

ads.feature_engineering.accessor.mixin.correlation module

ads.feature_engineering.accessor.mixin.correlation.cat_vs_cat(df: DataFrame, normal_form: bool = True) DataFrame

Calculates the correlation of all pairs of categorical features and categorical features.

ads.feature_engineering.accessor.mixin.correlation.cat_vs_cont(df: DataFrame, categorical_columns, continuous_columns, normal_form: bool = True) DataFrame

Calculates the correlation of all pairs of categorical features and continuous features.

ads.feature_engineering.accessor.mixin.correlation.cont_vs_cont(df: DataFrame, normal_form: bool = True) DataFrame

Calculates the Pearson correlation between two columns of the DataFrame.

ads.feature_engineering.accessor.mixin.eda_mixin module

This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Dataframe. The series of purpose-driven methods enable the data scientist to complete analysis on the dataframe.

From the accessor we have access to the pandas object the user is interacting with as well as corresponding lists of feature types per column.

class ads.feature_engineering.accessor.mixin.eda_mixin.EDAMixin

Bases: object

correlation_ratio() DataFrame

Generate a Correlation Ratio data frame for all categorical-continuous variable pairs.

Returns

  • pandas.DataFrame

  • Correlation Ratio correlation data frame with the following 3 columns

    1. Column 1 (name of the first categorical/continuous column)

    2. Column 2 (name of the second categorical/continuous column)

    3. Value (correlation value)

Note

Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.

correlation_ratio_plot() Axes

Generate a heatmap of the Correlation Ratio correlation for all categorical-continuous variable pairs.

Returns

Correlation Ratio correlation plot object that can be updated by the customer

Return type

Plot object

cramersv() DataFrame

Generate a Cramer’s V correlation data frame for all categorical variable pairs.

Gives a warning for dropped non-categorical columns.

Returns

Cramer’s V correlation data frame with the following 3 columns:
  1. Column 1 (name of the first categorical column)

  2. Column 2 (name of the second categorical column)

  3. Value (correlation value)

Return type

pandas.DataFrame

Note

Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.

cramersv_plot() Axes

Generate a heatmap of the Cramer’s V correlation for all categorical variable pairs.

Gives a warning for dropped non-categorical columns.

Returns

Cramer’s V correlation plot object that can be updated by the customer

Return type

Plot object

feature_count() DataFrame

Counts the number of columns for each feature type and each primary feature. The column of primary is the number of primary feature types that is assigned to the column.

Returns

The number of columns for each feature type The number of columns for each primary feature

Return type

Dataframe with

Examples

>>> df.ads.feature_type
{'PassengerId': ['ordinal', 'category'],
'Survived': ['ordinal'],
'Pclass': ['ordinal'],
'Name': ['category'],
'Sex': ['category']}
>>> df.ads.feature_count()
    Feature Type        Count       Primary
0       category            3             2
1        ordinal            3             3
feature_plot() DataFrame

For every column in the dataframe plot generate a list of summary plots based on the most relevant feature type.

Returns

Dataframe with 2 columns: 1. Column - feature name 2. Plot - plot object

Return type

pandas.DataFrame

feature_stat() DataFrame

Summary statistics Dataframe provided.

This returns feature stats on each column using FeatureType summary method.

Examples

>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv')
>>> df.ads.feature_stat().head()
         Column    Metric                       Value
0       PassengerId         count                       891.000
1       PassengerId         mean                        446.000
2       PassengerId         standard deviation      257.354
3       PassengerId         sample minimum          1.000
4       PassengerId         lower quartile              223.500
Returns

Dataframe with 3 columns: name, metric, value

Return type

pandas.DataFrame

pearson() DataFrame

Generate a Pearson correlation data frame for all continuous variable pairs.

Gives a warning for dropped non-numerical columns.

Returns

  • pandas.DataFrame

  • Pearson correlation data frame with the following 3 columns

    1. Column 1 (name of the first continuous column)

    2. Column 2 (name of the second continuous column)

    3. Value (correlation value)

Note

Pairs will be replicated. For example for variables x and y, we’d have (x,y), (y,x) both with same correlation value. We’ll also have (x,x) and (y,y) with value 1.0.

pearson_plot() Axes

Generate a heatmap of the Pearson correlation for all continuous variable pairs.

Returns

Pearson correlation plot object that can be updated by the customer

Return type

Plot object

warning() DataFrame

Generates a data frame that lists feature specific warnings.

Returns

The list of feature specific warnings.

Return type

pandas.DataFrame

Examples

>>> df.ads.warning()
    Column    Feature Type         Warning               Message       Metric    Value
--------------------------------------------------------------------------------------
0      Age      continuous           Zeros      Age has 38 zeros        Count       38
1      Age      continuous           Zeros   Age has 12.2% zeros   Percentage    12.2%

ads.feature_engineering.accessor.mixin.eda_mixin_series module

This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Series. The series of purpose-driven methods enable the data scientist to complete univariate analysis.

From the accessor we have access to the pandas object the user is interacting with as well as corresponding list of feature types.

class ads.feature_engineering.accessor.mixin.eda_mixin_series.EDAMixinSeries

Bases: object

feature_plot() Axes

For the series generate a summary plot based on the most relevant feature type.

Returns

Plot object for the series based on the most relevant feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

feature_stat() DataFrame

Summary statistics Dataframe provided.

This returns feature stats on series using FeatureType summary method.

Examples

>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv')
>>> df['Cabin'].ads.feature_stat()
    Metric      Value
0       count       891
1       unqiue      147
2       missing     687
Returns

Dataframe with 2 columns and rows for different metric values

Return type

pandas.DataFrame

warning() DataFrame

Generates a data frame that lists feature specific warnings.

Returns

The list of feature specific warnings.

Return type

pandas.DataFrame

Examples

>>> df["Age"].ads.warning()
  Feature Type       Warning               Message         Metric      Value
 ---------------------------------------------------------------------------
0   continuous         Zeros      Age has 38 zeros          Count         38
1   continuous         Zeros   Age has 12.2% zeros     Percentage      12.2%

ads.feature_engineering.accessor.mixin.feature_types_mixin module

The module that represents the ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.

Classes

ADSFeatureTypesMixin

ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.

class ads.feature_engineering.accessor.mixin.feature_types_mixin.ADSFeatureTypesMixin

Bases: object

ADS Feature Types Mixin class that extends Pandas Series and DataFrame accessors.

warning_registered(cls) pd.DataFrame

Lists registered warnings for registered feature types.

validator_registered(cls) pd.DataFrame

Lists registered validators for registered feature types.

help(self, prop: str = None) None

Help method that prints either a table of available properties or, given a property, returns its docstring.

help(prop: Optional[str] = None) None

Help method that prints either a table of available properties or, given an individual property, returns its docstring.

Parameters

prop (str) – The Name of property.

Returns

Nothing.

Return type

None

validator_registered() DataFrame

Lists registered validators for registered feature types.

Returns

The list of registered validators for registered feature types

Return type

pandas.DataFrame

Examples

>>> df.ads.validator_registered()
         Column     Feature Type        Validator                 Condition                    Handler
------------------------------------------------------------------------------------------------------
0   PhoneNumber    phone_number   is_phone_number                        ()            default_handler
1   PhoneNumber    phone_number   is_phone_number    {'country_code': '+7'}   specific_country_handler
2    CreditCard    credit_card     is_credit_card                        ()            default_handler
>>> df['PhoneNumber'].ads.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
warning_registered() DataFrame

Lists registered warnings for all registered feature types.

Returns

The list of registered warnings for registered feature types.

Return type

pandas.DataFrame

Examples

>>> df.ads.warning_registered()
       Column    Feature Type             Warning                    Handler
   -------------------------------------------------------------------------
   0      Age      continuous               zeros              zeros_handler
   1      Age      continuous    high_cardinality   high_cardinality_handler
>>> df["Age"].ads.warning_registered()
       Feature Type             Warning                    Handler
   ---------------------------------------------------------------
   0     continuous               zeros              zeros_handler
   1     continuous    high_cardinality   high_cardinality_handler

ads.feature_engineering.adsstring.common_regex_mixin module

class ads.feature_engineering.adsstring.common_regex_mixin.CommonRegexMixin

Bases: object

property address
property credit_card
property date
property email
property ip
property phone_number_US
property price
redact(fields: Union[List[str], Dict[str, str]]) str

Remove personal information in a string. For example, “Jane’s phone number is 123-456-7890” is turned into “Jane’s phone number is [phone_number_US].”

Parameters

fields ((list(str) | dict)) – either a list of fields to redact, e.g. [‘email’, ‘phone_number_US’], in which case the redacted text is replaced with capitalized word like [EMAIL] or [PHONE_NUMBER_US_WITH_EXT], or a dictionary where key is a field to redact and value is the replacement text, e.g., {‘email’: ‘HIDDEN_EMAIL’}.

Returns

redacted string

Return type

str

redact_map = {'address': '[ADDRESS]', 'address_with_zip': '[ADDRESS_WITH_ZIP]', 'credit_card': '[CREDIT_CARD]', 'date': '[DATE]', 'email': '[EMAIL]', 'ip': '[IP]', 'ipv6': '[IPV6]', 'link': '[LINK]', 'phone_number_US': '[PHONE_NUMBER_US]', 'phone_number_US_with_ext': '[PHONE_NUMBER_US_WITH_EXT]', 'po_box': '[PO_BOX]', 'price': '[PRICE]', 'ssn': '[SSN]', 'time': '[TIME]', 'zip_code': '[ZIP_CODE]'}
property ssn
property time
property zip_code

ads.feature_engineering.adsstring.oci_language module

ads.feature_engineering.adsstring.string module

ads.feature_engineering.feature_type.address module

The module that represents an Address feature type.

Classes:
Address

The Address feature type.

class ads.feature_engineering.feature_type.address.Address

Bases: String

Type representing address.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the location of given address on map base on zip code.

Example

>>> from ads.feature_engineering.feature_type.address import Address
>>> import pandas as pd
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
                        '1 Berkeley Street, Boston, MA 67891',
                        '54305 Oxford Street, Seattle, WA 95132',
                        ''])
>>> Address.validator.is_address(address)
0     True
1     True
2     True
3    False
dtype: bool
description = 'Type representing address.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 1
    unique: 3
values: Address
Returns

Domain based on the Address feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the location of given address on map base on zip code.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_plot()
Returns

Plot object for the series based on the Address feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  3
2       missing 1
Returns

Summary statistics of the Series provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.address.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pd.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.base module

class ads.feature_engineering.feature_type.base.FeatureBaseType(classname, bases, dictionary)

Bases: type

The helper metaclass to extend fucntionality of FeatureType class.

class ads.feature_engineering.feature_type.base.FeatureBaseTypeMeta(classname, bases, dictionary)

Bases: FeatureBaseType, ABCMeta

The class to provide compatibility between ABC and FeatureBaseType metaclass.

class ads.feature_engineering.feature_type.base.FeatureType

Bases: ABC

Abstract case for feature types. Default class attribute include name and description. Name is auto generated using camel to snake conversion unless specified.

description = 'Base feature type.'
name = 'feature_type'
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
class ads.feature_engineering.feature_type.base.Name

Bases: object

class ads.feature_engineering.feature_type.base.Tag(name: str)

Bases: object

Class for free form tags. Name must be specified.

Initialize a tag instance.

Parameters

name (str) – The name of the tag.

ads.feature_engineering.feature_type.boolean module

The module that represents a Boolean feature type.

Classes:
Boolean

The feature type that represents binary values True/False.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.boolean.Boolean

Bases: FeatureType

Type representing binary values True/False.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Show the counts of observations in True/False using bars.

Examples

>>> from ads.feature_engineering.feature_type.boolean import Boolean
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> Boolean.validator.is_boolean(s)
0     True
1     True
2     True
3     True
4    False
5    False
dtype: bool
description = 'Type representing binary values True/False.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_domain()
constraints:
- expression: $x in [True, False]
    language: python
stats:
    count: 6
    missing: 2
    unique: 2
values: Boolean
Returns

Domain based on the Boolean feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in True/False using bars.

Parameters

x (pandas.Series) – The feature being evaluated.

Returns

Plot object for the series based on the Boolean feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_plot()
static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters

x (pandas.Series) – The feature being evaluated.

Returns

Summary statistics of the Series or Dataframe provided.

Return type

pandas.DataFrame

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.boolean.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pandas.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.category module

The module that represents a Category feature type.

Classes:
Category

The Category feature type.

class ads.feature_engineering.feature_type.category.Category

Bases: FeatureType

Type representing discrete unordered values.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing discrete unordered values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='category')
>>> cat.ads.feature_type = ['category']
>>> cat.ads.feature_domain()
constraints:
- expression: $x in ['S', 'C', 'Q', '']
    language: python
stats:
    count: 22
    missing: 3
    unique: 3
values: Category
Returns

Domain based on the Category feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in each categorical bin using bar chart.

Parameters

x (pandas.Series) – The feature being evaluated.

Returns

Plot object for the series based on the Category feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_plot()
static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there are any.

Parameters

x (pandas.Series) – The feature being evaluated.

Returns

Summary statistics of the Series or Dataframe provided.

Return type

pandas.DataFrame

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.constant module

The module that represents a Constant feature type.

Classes:
Constant

The Constant feature type.

class ads.feature_engineering.feature_type.constant.Constant

Bases: FeatureType

Type representing constant values.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the counts of observations in bars.

description = 'Type representing constant values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type. .. rubric:: Example

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 5
    unique: 1
values: Constant
Returns

Domain based on the Constant feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in bars.

Parameters

x (pandas.Series) – The feature being shown.

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_plot()
Returns

Plot object for the series based on the Constant feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters

x (pandas.Series) – The feature being evaluated.

Returns

Summary statistics of the Series provided.

Return type

pandas.DataFrame

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_stat()
    Metric  Value
0       count   5
1       unique  1
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.continuous module

The module that represents a Continuous feature type.

Classes:
Continuous

The Continuous feature type.

class ads.feature_engineering.feature_type.continuous.Continuous

Bases: FeatureType

Type representing continuous values.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using box plot.

description = 'Type representing continuous values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_domain()
constraints: []
stats:
    count: 10.0
    lower quartile: 3.058
    mean: 4.959
    median: 3.81
    missing: 2.0
    sample maximum: 13.32
    sample minimum: 2.25
    skew: 2.175
    standard deviation: 3.62
    upper quartile: 4.908
values: Continuous
Returns

Domain based on the Continuous feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datasets using box plot.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feture_plot()
Returns

Plot object for the series based on the Continuous feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, skew and missing(count).

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_stat()
    Metric                  Value
0       count                   10.000
1       mean                    4.959
2       standard deviation          3.620
3       sample minimum          2.250
4       lower quartile          3.058
5       median                  3.810
6       upper quartile          4.908
7       sample maximum          13.320
8       skew                    2.175
9       missing                 2.000
Returns

Summary statistics of the Series or Dataframe provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.creditcard module

The module that represents a CreditCard feature type.

Classes:
CreditCard

The CreditCard feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

_luhn_checksum(card_number: str) -> float

Implements Luhn algorithm to validate a credit card number.

class ads.feature_engineering.feature_type.creditcard.CreditCard

Bases: String

Type representing credit card numbers.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the counts of observations in each credit card type using bar chart.

Examples

>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> import pandas as pd
>>> s = pd.Series(["4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190"], name='credit_card')
>>> s.ads.feature_type = ['credit_card']
>>> CreditCard.validator.is_credit_card(s)
0     True
1    False
2     True
3     True
4     True
5     True
Name: credit_card, dtype: bool
description = 'Type representing credit card numbers.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_domain()
constraints: []
stats:
    count: 16
    count_Amex: 5
    count_Diners Club: 2
    count_MasterCard: 3
    count_Visa: 5
    count_missing: 1
    missing: 1
    unique: 15
values: CreditCard
Returns

Domain based on the CreditCard feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in each credit card type using bar chart.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_plot()
Returns

Plot object for the series based on the CreditCard feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series)

Generates feature statistics.

Feature statistics include (total)count, unique(count), missing(count) and

count of each credit card type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_stat()
    Metric              Value
0       count               16
1       unique              15
2       missing             1
3       count_Amex              5
4       count_Visa              5
5       count_MasterCard        3
6       count_Diners Club       2
7       count_missing       1
Returns

Summary statistics of the Series or Dataframe provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.creditcard.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pandas.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.datetime module

The module that represents a DateTime feature type.

Classes:
DateTime

The DateTime feature type.

class ads.feature_engineering.feature_type.datetime.DateTime

Bases: FeatureType

Type representing date and/or time.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datetime datasets using histograms.

Example

>>> from ads.feature_engineering.feature_type.datetime import DateTime
>>> import pandas as pd
>>> s = pd.Series(["12/12/12", "12/12/13", None, "12/12/14"], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> DateTime.validator.is_datetime(s)
0     True
1     True
2    False
3     True
Name: datetime, dtype: bool
description = 'Type representing date and/or time.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 8
    missing: 3
    sample maximum: April/15/11
    sample minimum: 3/11/2000
values: DateTime
Returns

Domain based on the DateTime feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datetime datasets using histograms.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_plot()
Returns

Plot object for the series based on the DateTime feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, sample maximum, sample minimum, and missing(count) if there is any.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_stat()
    Metric              Value
0       count               8
1       sample maximum      April/15/11
2       sample minimum      3/11/2000
3       missing             3
Returns

Summary statistics of the Series or Dataframe provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.datetime.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pandas.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.discrete module

The module that represents a Discrete feature type.

Classes:
Discrete

The Discrete feature type.

class ads.feature_engineering.feature_type.discrete.Discrete

Bases: FeatureType

Type representing discrete values.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using box plot.

description = 'Type representing discrete values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_domain()
constraints: []
stats:
    count: 4
    unique: 4
values: Discrete
Returns

Domain based on the Discrete feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datasets using box plot.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  4
Returns

Plot object for the series based on the Discrete feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
            discrete
count   4
unique  4
Returns

Summary statistics of the Series provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.document module

The module that represents a Document feature type.

Classes:
Document

The Document feature type.

class ads.feature_engineering.feature_type.document.Document

Bases: FeatureType

Type representing document values.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

description = 'Type representing document values.'
classmethod feature_domain()
Returns

Nothing.

Return type

None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.gis module

The module that represents a GIS feature type.

Classes:
GIS

The GIS feature type.

class ads.feature_engineering.feature_type.gis.GIS

Bases: FeatureType

Type representing geographic information.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.gis import GIS
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='gis')
>>> s.ads.feature_type = ['gis']
>>> GIS.validator.is_gis(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: gis, dtype: bool
description = 'Type representing geographic information.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: GIS
Returns

Domain based on the GIS feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the location of given address on map base on longitude and latitute.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_plot()
Returns

Plot object for the series based on the GIS feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_stat()
        gis
count   13
unique  10
missing 3
Returns

Summary statistics of the Series provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.gis.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pandas.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.integer module

The module that represents an Integer feature type.

Classes:
Integer

The Integer feature type.

class ads.feature_engineering.feature_type.integer.Integer

Bases: FeatureType

Type representing integer values.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using box plot.

description = 'Type representing integer values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='integer')
>>> s.ads.feature_type = ['integer']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    freq: 2
    missing: 2
    top: true
    unique: 2
values: Integer
Returns

Domain based on the Integer feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datasets using box plot.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_plot()
Returns

Plot object for the series based on the Integer feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, max and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_stat()
    Metric                  Value
0       count                   7
1       mean                    1
2       standard deviation          1
3       sample minimum          0
4       lower quartile          1
5       median                  1
6       upper quartile          2
7       sample maximum          4
8       missing                 1
Returns

Summary statistics of the Series or Dataframe provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address module

The module that represents an IpAddress feature type.

Classes:
IpAddress

The IpAddress feature type.

class ads.feature_engineering.feature_type.ip_address.IpAddress

Bases: FeatureType

Type representing IP Address.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address import IpAddress
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> IpAddress.validator.is_ip_address(s)
0     True
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool
description = 'Type representing IP Address.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 3
values: IpAddress
Returns

Domain based on the IpAddress feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2
Returns

Summary statistics of the Series provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.ip_address.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pandas.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.ip_address_v4 module

The module that represents an IpAddressV4 feature type.

Classes:
IpAddressV4

The IpAddressV4 feature type.

class ads.feature_engineering.feature_type.ip_address_v4.IpAddressV4

Bases: FeatureType

Type representing IP Address V4.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v4 import IpAddressV4
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> IpAddressV4.validator.is_ip_address_v4(s)
0     True
1    False
2    False
3    False
4    False
Name: ip_address, dtype: bool
description = 'Type representing IP Address V4.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address_v4')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 4
values: IpAddressV4
Returns

Domain based on the IpAddressV4 feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  4
2       missing 2
Returns

Summary statistics of the Series provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.ip_address_v4.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pandas.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.ip_address_v6 module

The module that represents an IpAddressV6 feature type.

Classes:
IpAddressV6

The IpAddressV6 feature type.

class ads.feature_engineering.feature_type.ip_address_v6.IpAddressV6

Bases: FeatureType

Type representing IP Address V6.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v6 import IpAddressV6
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> IpAddressV6.validator.is_ip_address_v6(s)
0    False
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool
description = 'Type representing IP Address V6.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address_v6')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 2
values: IpAddressV6
Returns

Domain based on the IpAddressV6 feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2
Returns

Summary statistics of the Series provided.

Return type

Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.ip_address_v6.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pandas.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.lat_long module

The module that represents a LatLong feature type.

Classes:
LatLong

The LatLong feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.lat_long.LatLong

Bases: String

Type representing longitude and latitute.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.lat_long import LatLong
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='latlong')
>>> s.ads.feature_type = ['lat_long']
>>> LatLong.validator.is_lat_long(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: latlong, dtype: bool
description = 'Type representing longitude and latitute.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> latlong_series = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: LatLong
Returns

Domain based on the LatLong feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the location of given address on map base on longitude and latitute.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_plot()
Returns

Plot object for the series based on the LatLong feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generate feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_stat()
    Metric  Value
0       count   13
1       unique  10
2       missing 3
Returns

Summary statistics of the Series or Dataframe provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.lat_long.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pandas.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.object module

The module that represents an Object feature type.

Classes:
Object

The Object feature type.

class ads.feature_engineering.feature_type.object.Object

Bases: FeatureType

Type representing object.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

description = 'Type representing object.'
classmethod feature_domain()
Returns

Nothing.

Return type

None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ordinal module

The module that represents an Ordinal feature type.

Classes:
Ordinal

The Ordinal feature type.

class ads.feature_engineering.feature_type.ordinal.Ordinal

Bases: FeatureType

Type representing ordered values.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing ordered values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_domain()
constraints:
- expression: $x in [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
    language: python
stats:
    count: 10
    missing: 1
    unique: 9
values: Ordinal
Returns

Domain based on the Ordinal feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the counts of observations in each categorical bin using bar chart.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_plot()
Returns

The bart chart plot object for the series based on the Continuous feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count), and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_stat()
    Metric  Value
0       count   10
1       unique  9
2       missing 1
Returns

Summary statistics of the Series or Dataframe provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.phone_number module

The module that represents a Phone Number feature type.

Classes:
PhoneNumber

The Phone Number feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.phone_number.PhoneNumber

Bases: String

Type representing phone numbers.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

Examples

>>> from ads.feature_engineering.feature_type.phone_number import PhoneNumber
>>> import pandas as pd
>>> s = pd.Series([None, "1-640-124-5367", "1-573-916-4412"])
>>> PhoneNumber.validator.is_phone_number(s)
0    False
1     True
2     True
dtype: bool
description = 'Type representing phone numbers.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 7
    missing: 4
    unique: 2
values: PhoneNumber
Returns

Domain based on the PhoneNumber feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_stat()
    Metric  Value
1       count   7
2       unique  2
3       missing 4
Returns

Summary statistics of the Series or Dataframe provided.

Return type

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.phone_number.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pandas.Series) – The data to process.

Returns

The logical list indicating if the data matches requirements.

Return type

pandas.Series

ads.feature_engineering.feature_type.string module

The module that represents a String feature type.

Classes:
String

The feature type that represents string values.

class ads.feature_engineering.feature_type.string.String

Bases: FeatureType

Type representing string values.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using wordcloud.

Example

>>> from ads.feature_engineering.feature_type.string import String
>>> import pandas as pd
>>> s = pd.Series(["Hello", "world", None], name='string')
>>> String.validator.is_string(s)
0     True
1     True
2    False
Name: string, dtype: bool
description = 'Type representing string values.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_domain()
constraints: []
stats:
    count: 22
    missing: 3
    unique: 3
values: String
Returns

Domain based on the String feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows distributions of datasets using wordcloud.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_plot()
Returns

Plot object for the series based on the String feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3
Returns

Summary statistics of the Series or Dataframe provided.

Return type

Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.string.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pd.Series) – The data to process.

Returns

pd.Series

Return type

The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.text module

The module that represents a Text feature type.

Classes:
Text

The Text feature type.

class ads.feature_engineering.feature_type.text.Text

Bases: String

Type representing text values.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_plot(x: pd.Series) plt.Axes

Shows distributions of datasets using wordcloud.

description = 'Type representing text values.'
classmethod feature_domain()
Returns

Nothing.

Return type

None

static feature_plot(x: Series) Axes

Shows distributions of datasets using wordcloud.

Examples

>>> text = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='text')
>>> text.ads.feature_type = ['text']
>>> text.ads.feature_plot()
Returns

Plot object for the series based on the Text feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.unknown module

The module that represents an Unknown feature type.

Classes:
Text

The Unknown feature type.

class ads.feature_engineering.feature_type.unknown.Unknown

Bases: FeatureType

Type representing third-party dtypes.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

description = 'Type representing unknown type.'
classmethod feature_domain()
Returns

Nothing.

Return type

None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.zip_code module

The module that represents a ZipCode feature type.

Classes:
ZipCode

The ZipCode feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.zip_code.ZipCode

Bases: String

Type representing postal code.

description

The feature type description.

Type

str

name

The feature type name.

Type

str

warning

Provides functionality to register warnings and invoke them.

Type

FeatureWarning

validator

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes

Shows the geometry distribution base on location of zipcode.

Example

>>> from ads.feature_engineering.feature_type.zip_code import ZipCode
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(["94065", "90210", np.NaN, None], name='zipcode')
>>> ZipCode.validator.is_zip_code(s)
0     True
1     True
2    False
3    False
Name: zipcode, dtype: bool
description = 'Type representing postal code.'
classmethod feature_domain(x: Series) Domain

Generate the domain of the data of this feature type.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 2
    unique: 2
values: ZipCode
Returns

Domain based on the ZipCode feature type.

Return type

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes

Shows the geometry distribution base on location of zipcode.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_plot()
Returns

Plot object for the series based on the ZipCode feature type.

Return type

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  2
2       missing 2
Returns

Summary statistics of the Series provided.

Return type

Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.zip_code.default_handler(data: Series, *args, **kwargs) Series

Processes given data and indicates if the data matches requirements.

Parameters

data (pd.Series) – The data to process.

Returns

pd.Series

Return type

The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.handler.feature_validator module

The module that helps to register custom validators for the feature types and extending registered validators with dispatching based on the specific arguments.

Classes

FeatureValidator

The Feature Validator class to manage custom validators.

FeatureValidatorMethod

The Feature Validator Method class. Extends methods which requires dispatching based on the specific arguments.

class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator

Bases: object

The Feature Validator class to manage custom validators.

register(self, name: str, handler: Callable, condition: Union[Tuple, Dict[str, Any]] = None, replace: bool = False) None

Registers new validator.

unregister(self, name: str, condition: Union[Tuple, Dict[str, Any]] = None) None

Unregisters validator.

registered(self) pd.DataFrame

Gets the list of registered validators.

Examples

>>> series = pd.Series(['+1-202-555-0141', '+1-202-555-0142'], name='Phone Number')
>>> def phone_number_validator(data: pd.Series) -> pd.Series:
...    print("phone_number_validator")
...    return data
>>> def universal_phone_number_validator(data: pd.Series, country_code) -> pd.Series:
...    print("universal_phone_number_validator")
...    return data
>>> def us_phone_number_validator(data: pd.Series, country_code) -> pd.Series:
...    print("us_phone_number_validator")
...    return data
>>> PhoneNumber.validator.register(name="is_phone_number", handler=phone_number_validator, replace=True)
>>> PhoneNumber.validator.register(name="is_phone_number", handler=universal_phone_number_validator, condition = ('country_code',))
>>> PhoneNumber.validator.register(name="is_phone_number", handler=us_phone_number_validator, condition = {'country_code':'+1'})
>>> PhoneNumber.validator.is_phone_number(series)
    phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142
>>> PhoneNumber.validator.is_phone_number(series, country_code = '+7')
    universal_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142
>>> PhoneNumber.validator.is_phone_number(series, country_code = '+1')
    us_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142
>>> PhoneNumber.validator.registered()
               Validator                 Condition                            Handler
    ---------------------------------------------------------------------------------
    0    is_phone_number                        ()             phone_number_validator
    1    is_phone_number          ('country_code')   universal_phone_number_validator
    2    is_phone_number    {'country_code': '+1'}          us_phone_number_validator
>>> series.ads.validator.is_phone_number()
    phone_number_validator
        0     +1-202-555-0141
        1     +1-202-555-0142
>>> series.ads.validator.is_phone_number(country_code = '+7')
    universal_phone_number_validator
        0     +1-202-555-0141
        1     +1-202-555-0142
>>> series.ads.validator.is_phone_number(country_code = '+1')
    us_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

Initializes the FeatureValidator.

register(name: str, handler: Callable, condition: Optional[Union[Tuple, Dict[str, Any]]] = None, replace: bool = False) None

Registers new validator.

Parameters
  • name (str) – The validator name.

  • handler (callable) – The handler.

  • condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator.

  • replace (bool) – The flag indicating if the registered validator should be replaced with the new one.

Returns

Nothing.

Return type

None

Raises
  • ValueError – The name is empty or handler is not provided.

  • TypeError – The handler is not callable. The name of the validator is not a string.

  • ValidatorAlreadyExists – The validator is already registered.

registered() DataFrame

Gets the list of registered validators.

Returns

The list of registerd validators.

Return type

pd.DataFrame

unregister(name: str, condition: Optional[Union[Tuple, Dict[str, Any]]] = None) None

Unregisters validator.

Parameters
  • name (str) – The name of the validator to be unregistered.

  • condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator to be unregistered.

Returns

Nothing.

Return type

None

Raises
  • TypeError – The name of the validator is not a string.

  • ValidatorNotFound – The validator not found.

  • ValidatorWIthConditionNotFound – The validator with provided condition not found.

class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidatorMethod(handler: Callable)

Bases: object

The Feature Validator Method class.

Extends methods which requires dispatching based on the specific arguments.

register(self, condition: Union[Tuple, Dict[str, Any]], handler: Callable) None

Registers new handler.

unregister(self, condition: Union[Tuple, Dict[str, Any]]) None

Unregisters existing handler.

registered(self) pd.DataFrame

Gets the list of registered handlers.

Initializes the Feature Validator Method.

Parameters

handler (Callable) – The handler that will be called by default if suitable one not found.

register(condition: Union[Tuple, Dict[str, Any]], handler: Callable) None

Registers new handler.

Parameters
  • condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to register a new handler.

  • handler (Callable) – The handler to be registered.

Returns

Nothing.

Return type

None

Raises

ValueError – If condition not provided or provided in the wrong format. If handler not provided or has wrong format.

registered() DataFrame

Gets the list of registered handlers.

Returns

The list of registerd handlers.

Return type

pd.DataFrame

unregister(condition: Union[Tuple, Dict[str, Any]]) None

Unregisters existing handler.

Parameters

condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to unregister a handler.

Returns

Nothing.

Return type

None

Raises

ValueError – If condition not provided or provided in the wrong format. If condition not registered.

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorAlreadyExists(name: str)

Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorNotFound(name: str)

Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionAlreadyExists(name: str)

Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionNotFound(name: str)

Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.WrongHandlerMethodSignature(handler_name: str, condition: str, handler_signature: str)

Bases: ValueError

ads.feature_engineering.feature_type.handler.feature_warning module

The module that helps to register custom warnings for the feature types.

Classes

FeatureWarning

The Feature Warning class. Provides functionality to register warning handlers and invoke them.

Examples

>>> warning = FeatureWarning()
>>> def warning_handler_zeros_count(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 38 zeros', 'Count', 38]],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> def warning_handler_zeros_percentage(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> warning.register(name="zeros_count", handler=warning_handler_zeros_count)
>>> warning.register(name="zeros_percentage", handler=warning_handler_percentage)
>>> warning.registered()
                    Name                               Handler
    ----------------------------------------------------------
    0         zeros_count          warning_handler_zeros_count
    1    zeros_percentage     warning_handler_zeros_percentage
>>> warning.zeros_percentage(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
>>> warning.zeros_count(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%
>>> warning(data_series)
        Warning                    Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%
>>> warning.unregister('zeros_count')
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros   Age has 12.2% zeros     Percentage      12.2%
class ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning

Bases: object

The Feature Warning class.

Provides functionality to register warning handlers and invoke them.

register(self, name: str, handler: Callable) None

Registers a new warning for the feature type.

unregister(self, name: str) None

Unregisters warning.

registered(self) pd.DataFrame

Gets the list of registered warnings.

Examples

>>> warning = FeatureWarning()
>>> def warning_handler_zeros_count(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 38 zeros', 'Count', 38]],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> def warning_handler_zeros_percentage(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> warning.register(name="zeros_count", handler=warning_handler_zeros_count)
>>> warning.register(name="zeros_percentage", handler=warning_handler_percentage)
>>> warning.registered()
                  Warning                              Handler
    ----------------------------------------------------------
    0         zeros_count          warning_handler_zeros_count
    1    zeros_percentage     warning_handler_zeros_percentage
>>> warning.zeros_percentage(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
>>> warning.zeros_count(data_series)
              Warning              Message         Metric      Value
    ----------------------------------------------------------------
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%
>>> warning.unregister('zeros_count')
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros   Age has 12.2% zeros     Percentage      12.2%

Initializes the FeatureWarning.

register(name: str, handler: Callable, replace: bool = False) None

Registers a new warning.

Parameters
  • name (str) – The warning name.

  • handler (callable) – The handler associated with the warning.

  • replace (bool) – The flag indicating if the registered warning should be replaced with the new one.

Returns

Nothing

Return type

None

Raises
  • ValueError – If warning name is empty or handler not defined.

  • TypeError – If handler is not callable.

  • WarningAlreadyExists – If warning is already registered.

registered() DataFrame

Gets the list of registered warnings.

Return type

pd.DataFrame

Examples

>>>    The list of registerd warnings in DataFrame format.
                     Name                               Handler
    -----------------------------------------------------------
    0         zeros_count           warning_handler_zeros_count
    1    zeros_percentage      warning_handler_zeros_percentage
unregister(name: str) None

Unregisters warning.

Parameters

name (str) – The name of warning to be unregistered.

Returns

Nothing.

Return type

None

Raises
  • ValueError – If warning name is not provided or empty.

  • WarningNotFound – If warning not found.

ads.feature_engineering.feature_type.handler.warnings module

The module with all default warnings provided to user. These are registered to relevant feature types directly in the feature type files themselves.

ads.feature_engineering.feature_type.handler.warnings.high_cardinality_handler(s: Series) DataFrame

Warning if number of unique values (including Nan) in series is greater than or equal to 15.

Parameters

s (pd.Series) – Pandas series - column of some feature type.

Returns

Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists count of unique values.

Return type

pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.missing_values_handler(s: Series) DataFrame

Warning for > 5 percent missing values (Nans) in series.

Parameters

s (pd.Series) – Pandas series - column of some feature type.

Returns

Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of missing values and second is percentage of missing values.

Return type

pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.skew_handler(s: Series) DataFrame

Warning if absolute value of skew is greater than 1.

Parameters

s (pd.Series) – Pandas series - column of some feature type, expects continuous values.

Returns

Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists skew value of that column.

Return type

pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.zeros_handler(s: Series) DataFrame

Warning for greater than 10 percent zeros in series.

Parameters

s (pd.Series) – Pandas series - column of some feature type.

Returns

Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of zero values and second is percentage of zero values.

Return type

pd.Dataframe

Module contents