ads.feature_engineering package

Submodules

ads.feature_engineering.exceptions module

exception ads.feature_engineering.exceptions.InvalidFeatureType(tname: str): Bases: TypeError

exception ads.feature_engineering.exceptions.NameAlreadyRegistered(name: str): Bases: NameError

exception ads.feature_engineering.exceptions.TypeAlreadyAdded(tname: str): Bases: TypeError

exception ads.feature_engineering.exceptions.TypeAlreadyRegistered(tname: str): Bases: TypeError

exception ads.feature_engineering.exceptions.TypeNotFound(tname: str): Bases: TypeError

exception ads.feature_engineering.exceptions.WarningAlreadyExists(name: str): Bases: ValueError

exception ads.feature_engineering.exceptions.WarningNotFound(name: str): Bases: ValueError

ads.feature_engineering.feature_type_manager module

The module that helps to manage feature types. Provides functionalities to register, unregister, list feature types.

Classes

FeatureTypeManager
Feature Types Manager class that manages feature types.

Examples

>>> from ads.feature_engineering.feature_type.base import FeatureType
>>> class NewType(FeatureType):
...    description="My personal type."
...    pass
>>> FeatureTypeManager.feature_type_register(NewType)
>>> FeatureTypeManager.feature_type_registered()
            Name        Feature Type                                  Description
---------------------------------------------------------------------------------
0     Continuous          continuous          Type representing continuous values.
1       DateTime           date_time           Type representing date and/or time.
2       Category            category  Type representing discrete unordered values.
3        Ordinal             ordinal             Type representing ordered values.
4        NewType            new_type                             My personal type.

>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler

>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler

>>> FeatureTypeManager.feature_type_unregister(NewType)
>>> FeatureTypeManager.feature_type_reset()
>>> FeatureTypeManager.feature_type_object('continuous')
Continuous

class ads.feature_engineering.feature_type_manager.FeatureTypeManager

Bases: object

Feature Types Manager class that manages feature types.

Provides functionalities to register, unregister, list feature types.

feature_type_object(cls, feature_type: Union[FeatureType, str]) → FeatureType: Gets a feature type by class object or name.

feature_type_register(cls, feature_type_cls: FeatureType) → None: Registers a feature type.

feature_type_unregister(cls, feature_type_cls: Union[FeatureType, str]) → None: Unregisters a feature type.

feature_type_reset(cls) → None: Resets feature types to be default.

feature_type_registered(cls) → pd.DataFrame: Lists all registered feature types as a DataFrame.

warning_registered(cls) → pd.DataFrame: Lists registered warnings for all registered feature types.

validator_registered(cls) → pd.DataFrame: Lists registered validators for all registered feature types.

Examples

>>> from ads.feature_engineering.feature_type.base import FeatureType
>>> class NewType(FeatureType):
...    pass
>>> FeatureTypeManager.register_feature_type(NewType)
>>> FeatureTypeManager.feature_type_registered()
            Name      Feature Type                                  Description
-------------------------------------------------------------------------------
0     Continuous        continuous          Type representing continuous values.
1       DateTime         date_time           Type representing date and/or time.
2       Category          category  Type representing discrete unordered values.
3        Ordinal           ordinal             Type representing ordered values.

>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler

>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler

>>> FeatureTypeManager.feature_type_unregister(NewType)
>>> FeatureTypeManager.feature_type_reset()
>>> FeatureTypeManager.feature_type_object('continuous')
Continuous

classmethod feature_type_object(feature_type: Union[FeatureType, str]) → FeatureType

Gets a feature type by class object or name.

Parameters

feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.

Returns

Found feature type.

Return type

FeatureType

Raises

TypeNotFound – If provided feature type not registered.
TypeError – If provided feature type not a subclass of FeatureType.

classmethod feature_type_register(feature_type_cls: FeatureType) → None

Registers new feature type.

Parameters

feature_type (FeatureType) – Subclass of FeatureType to be registered.

Returns

Nothing.

Return type

None

Raises

TypeError – Type is not a subclass of FeatureType.
TypeError – Type has already been registered.
NameError – Name has already been used.

classmethod feature_type_registered() → DataFrame

Lists all registered feature types as a DataFrame.

Returns: The list of feature types in a DataFrame format.
Return type: pd.DataFrame

classmethod feature_type_reset() → None

Resets feature types to be default.

Returns: Nothing.
Return type: None

classmethod feature_type_unregister(feature_type: Union[FeatureType, str]) → None

Unregisters a feature type.

Parameters: feature_type ((FeatureType | str)) – The FeatureType subclass or a str indicating feature type.
Returns: Nothing.
Return type: None
Raises: TypeError – In attempt to unregister a default feature type.

classmethod is_type_registered(feature_type: Union[FeatureType, str]) → bool

Checks if provided feature type registered in the system.

Parameters: feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.
Returns: True if provided feature type registered, False otherwise.
Return type: bool

classmethod validator_registered() → DataFrame

Lists registered validators for registered feature types.

Returns: The list of registered validators for registered feature types in a DataFrame format.
Return type: pd.DataFrame

Examples

>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler

classmethod warning_registered() → DataFrame

Lists registered warnings for all registered feature types.

Returns: The list of registered warnings for registered feature types in a DataFrame format.
Return type: pd.DataFrame

Examples

>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler

ads.feature_engineering.accessor.dataframe_accessor module

The ADS accessor for the Pandas DataFrame. The accessor will be initialized with the pandas object the user is interacting with.

Examples

>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
    >>> from ads.feature_engineering.feature_type.continuous import Continuous
    >>> from ads.feature_engineering.feature_type.creditcard import CreditCard
    >>> from ads.feature_engineering.feature_type.string import String
    >>> from ads.feature_engineering.feature_type.base import Tag
>>> df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
          Column   Feature Type                        Description
------------------------------------------------------------------
0           Name         string    Type representing string values.
1    Credit Card         string    Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
                    Credit Card
-------------------------------
0                 4532640527811543

class ads.feature_engineering.accessor.dataframe_accessor.ADSDataFrameAccessor(pandas_obj)

Bases: ADSFeatureTypesMixin, EDAMixin, DBAccessMixin, DataLabelingAccessMixin

ADS accessor for the Pandas DataFrame.

columns

The column labels of the DataFrame.

Type: List[str]

tags(self) → Dict[str, str]: Gets the dictionary of user defined tags for the dataframe.

default_type(self) → Dict[str, str]: Gets the map of columns and associated default feature type names.

feature_type(self) → Dict[str, List[str]]: Gets the list of registered feature types.

feature_type_description(self) → pd.DataFrame: Gets the list of registered feature types in a DataFrame format.

sync(self, src: Union[pd.DataFrame, pd.Series]) → pd.DataFrame: Syncs feature types of current DataFrame with that from src.

feature_select(self, include: List[Union[FeatureType, str]] = None, exclude: List[Union[FeatureType, str]] = None) → pd.DataFrame: Gets the list of registered feature types in a DataFrame format.

help(self, prop: str = None) → None: Provids docstring for affordable methods and properties.

Examples

>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
>>> from ads.feature_engineering.feature_type.continuous import Continuous
>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.base import Tag
df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
          Column   Feature Type                        Description
-------------------------------------------------------------------
0           Name         string    Type representing string values.
1    Credit Card         string    Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
                   Credit Card
------------------------------
0             4532640527811543

Initializes ADS Pandas DataFrame Accessor.

Parameters: pandas_obj (pandas.DataFrame) – Pandas dataframe
Raises: ValueError – If provided DataFrame has duplicate columns.

property default_type: Dict[str, str]

Gets the map of columns and associated default feature type names.

Returns: The dictionary where key is column name and value is the name of default feature type.
Return type: Dict[str, str]

feature_select(include: Optional[List[Union[FeatureType, str]]] = None, exclude: Optional[List[Union[FeatureType, str]]] = None) → DataFrame

Returns a subset of the DataFrame’s columns based on the column feature_types.

Parameters

include (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be included.
exclude (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be excluded.

Raises

ValueError – If both of include and exclude are empty
ValueError – If include and exclude are used simultaneously

Returns

The subset of the frame including the feature types in include and excluding the feature types in exclude.

Return type

pandas.DataFrame

property feature_type: Dict[str, List[str]]

Gets the list of registered feature types.

Returns: The dictionary where key is column name and value is list of associated feature type names.
Return type: Dict[str, List[str]]

property feature_type_description: DataFrame

Gets the list of registered feature types in a DataFrame format.

Return type: pandas.DataFrame

Examples

>>> df.ads.feature_type_description()
          Column   Feature Type                         Description
-------------------------------------------------------------------
0           City         string    Type representing string values.
1   Phone Number         string    Type representing string values.

info() → Any

Gets information about the dataframe.

Returns: The information about the dataframe.
Return type: Any

model_schema(max_col_num: int = 2000)

Generates schema from the dataframe.

Parameters: max_col_num (int, optional. Defaults to 1000) – The maximum column size of the data that allows to auto generate schema.

Examples

>>> df = pd.read_csv('./orcl_attrition.csv', usecols=['Age', 'Attrition'])
>>> schema = df.ads.model_schema()
>>> schema
Schema:
    - description: Attrition
    domain:
        constraints: []
        stats:
        count: 1470
        unique: 2
        values: String
    dtype: object
    feature_type: String
    name: Attrition
    required: true
    - description: Age
    domain:
        constraints: []
        stats:
        25%: 31.0
        50%: 37.0
        75%: 44.0
        count: 1470.0
        max: 61.0
        mean: 37.923809523809524
        min: 19.0
        std: 9.135373489136732
        values: Integer
    dtype: int64
    feature_type: Integer
    name: Age
    required: true
>>> schema.to_dict()
{'Schema': [{'dtype': 'object',
    'feature_type': 'String',
    'name': 'Attrition',
    'domain': {'values': 'String',
        'stats': {'count': 1470, 'unique': 2},
        'constraints': []},
    'required': True,
    'description': 'Attrition'},
    {'dtype': 'int64',
    'feature_type': 'Integer',
    'name': 'Age',
    'domain': {'values': 'Integer',
        'stats': {'count': 1470.0,
        'mean': 37.923809523809524,
        'std': 9.135373489136732,
        'min': 19.0,
        '25%': 31.0,
        '50%': 37.0,
        '75%': 44.0,
        'max': 61.0},
        'constraints': []},
    'required': True,
    'description': 'Age'}]}

Returns: data schema.
Return type: ads.feature_engineering.schema.Schema
Raises: ads.feature_engineering.schema.DataSizeTooWide – If the number of columns of input data exceeds max_col_num.

sync(src: Union[DataFrame, Series]) → DataFrame

Syncs feature types of current DataFrame with that from src.

Syncs feature types of current dataframe with that from src, where src can be a dataframe or a series. In either case, only columns with matched names are synced.

Parameters: src (pd.DataFrame | pd.Series) – The source to sync from.
Returns: Synced dataframe.
Return type: pandas.DataFrame

property tags: Dict[str, List[str]]

Gets the dictionary of user defined tags for the dataframe. Key is column name and value is list of tag names.

Returns: The map of columns and associated default tags.
Return type: Dict[str, List[str]]

ads.feature_engineering.accessor.series_accessor module

The ADS accessor for the Pandas Series. The accessor will be initialized with the pandas object the user is interacting with.

Examples

>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
    Feature Type                         Description
----------------------------------------------------
0         string    Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']

class ads.feature_engineering.accessor.series_accessor.ADSSeriesAccessor(pandas_obj: Series)

Bases: ADSFeatureTypesMixin, EDAMixinSeries

ADS accessor for Pandas Series.

name

The name of Series.

Type: str

tags

The list of tags for the Series.

Type: List[str]

help(self, prop: str = None) → None: Provids docstring for affordable methods and properties.

sync(self, src: Union[pd.DataFrame, pd.Series]) → None: Syncs feature types of current series with that from src.

default_type(self) → str: Gets the name of default feature type for the series.

feature_type(self) → List[str]: Gets the list of registered feature types for the series.

feature_type_description(self) → pd.DataFrame: Gets the list of registered feature types in a DataFrame format.

Examples

>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
    Feature Type                         Description
----------------------------------------------------
0         string    Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']

Initializes ADS Pandas Series Accessor.

Parameters: pandas_obj (pd.Series) – The pandas series

property default_type: str

Gets the name of default feature type for the series.

Returns: The name of default feature type.
Return type: str

property feature_type: List[str]

Gets the list of registered feature types for the series.

Returns: Names of feature types.
Return type: List[str]

Examples

>>> series = pd.Series(['name1'])
>>> series.ads.feature_type = ['name', 'string', Tag('tag for name')]
>>> series.ads.feature_type
['name', 'string', 'tag for name']

property feature_type_description: DataFrame

Gets the list of registered feature types in a DataFrame format.

Returns: The DataFrame with feature types for this series.
Return type: pd.DataFrame

Examples

>>> series = pd.Series(['name1'])
>>> series.ads.feature_type = ['name', 'string', Tag('Name tag')]
>>> series.ads.feature_type_description
        Feature Type                               Description
    ----------------------------------------------------------
    0           name            Type representing name values.
    1         string          Type representing string values.
    2        Name tag                                     Tag.

sync(src: Union[DataFrame, Series]) → None

Syncs feature types of current series with that from src.

The src could be a dataframe or a series. In either case, only columns with matched names are synced.

Parameters: src ((pd.DataFrame | pd.Series)) – The source to sync from.
Returns: Nothing.
Return type: None

Examples

>>> series = pd.Series(['name1', 'name2', 'name3', None])
>>> series.ads.feature_type = ['name']
>>> series.ads.feature_type
['name', string]
>>> series.dropna().ads.feature_type
['string']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['name', 'string']

class ads.feature_engineering.accessor.series_accessor.ADSSeriesValidator(feature_type_list: List[FeatureType], series: Series)

Bases: object

Class helper to invoke registerred validator on a series level.

Initializes ADS series validator.

Parameters

feature_type_list (List[FeatureType]) – The list of feature types.
series (pd.Series) – The pandas series.

ads.feature_engineering.accessor.mixin.correlation module

ads.feature_engineering.accessor.mixin.correlation.cat_vs_cat(df: DataFrame, normal_form: bool = True) → DataFrame: Calculates the correlation of all pairs of categorical features and categorical features.

ads.feature_engineering.accessor.mixin.correlation.cat_vs_cont(df: DataFrame, categorical_columns, continuous_columns, normal_form: bool = True) → DataFrame: Calculates the correlation of all pairs of categorical features and continuous features.

ads.feature_engineering.accessor.mixin.correlation.cont_vs_cont(df: DataFrame, normal_form: bool = True) → DataFrame: Calculates the Pearson correlation between two columns of the DataFrame.

ads.feature_engineering.accessor.mixin.eda_mixin module

This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Dataframe. The series of purpose-driven methods enable the data scientist to complete analysis on the dataframe.

From the accessor we have access to the pandas object the user is interacting with as well as corresponding lists of feature types per column.

class ads.feature_engineering.accessor.mixin.eda_mixin.EDAMixin

Bases: object

correlation_ratio() → DataFrame

Generate a Correlation Ratio data frame for all categorical-continuous variable pairs.

Returns

pandas.DataFrame
Correlation Ratio correlation data frame with the following 3 columns –
1. Column 1 (name of the first categorical/continuous column)
2. Column 2 (name of the second categorical/continuous column)
3. Value (correlation value)

Note

Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.

correlation_ratio_plot() → Axes

Generate a heatmap of the Correlation Ratio correlation for all categorical-continuous variable pairs.

Returns: Correlation Ratio correlation plot object that can be updated by the customer
Return type: Plot object

cramersv() → DataFrame

Generate a Cramer’s V correlation data frame for all categorical variable pairs.

Gives a warning for dropped non-categorical columns.

Returns

Cramer’s V correlation data frame with the following 3 columns:

Column 1 (name of the first categorical column)
Column 2 (name of the second categorical column)
Value (correlation value)

Return type

pandas.DataFrame

Note

Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.

cramersv_plot() → Axes

Generate a heatmap of the Cramer’s V correlation for all categorical variable pairs.

Gives a warning for dropped non-categorical columns.

Returns: Cramer’s V correlation plot object that can be updated by the customer
Return type: Plot object

feature_count() → DataFrame

Counts the number of columns for each feature type and each primary feature. The column of primary is the number of primary feature types that is assigned to the column.

Returns: The number of columns for each feature type The number of columns for each primary feature
Return type: Dataframe with

Examples

>>> df.ads.feature_type
{'PassengerId': ['ordinal', 'category'],
'Survived': ['ordinal'],
'Pclass': ['ordinal'],
'Name': ['category'],
'Sex': ['category']}
>>> df.ads.feature_count()
    Feature Type        Count       Primary
0       category            3             2
1        ordinal            3             3

feature_plot() → DataFrame

For every column in the dataframe plot generate a list of summary plots based on the most relevant feature type.

Returns: Dataframe with 2 columns: 1. Column - feature name 2. Plot - plot object
Return type: pandas.DataFrame

feature_stat() → DataFrame

Summary statistics Dataframe provided.

This returns feature stats on each column using FeatureType summary method.

Examples

>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv')
>>> df.ads.feature_stat().head()
         Column    Metric                       Value
0       PassengerId         count                       891.000
1       PassengerId         mean                        446.000
2       PassengerId         standard deviation      257.354
3       PassengerId         sample minimum          1.000
4       PassengerId         lower quartile              223.500

Returns: Dataframe with 3 columns: name, metric, value
Return type: pandas.DataFrame

pearson() → DataFrame

Generate a Pearson correlation data frame for all continuous variable pairs.

Gives a warning for dropped non-numerical columns.

Returns

pandas.DataFrame
Pearson correlation data frame with the following 3 columns –
1. Column 1 (name of the first continuous column)
2. Column 2 (name of the second continuous column)
3. Value (correlation value)

Note

Pairs will be replicated. For example for variables x and y, we’d have (x,y), (y,x) both with same correlation value. We’ll also have (x,x) and (y,y) with value 1.0.

pearson_plot() → Axes

Generate a heatmap of the Pearson correlation for all continuous variable pairs.

Returns: Pearson correlation plot object that can be updated by the customer
Return type: Plot object

warning() → DataFrame

Generates a data frame that lists feature specific warnings.

Returns: The list of feature specific warnings.
Return type: pandas.DataFrame

Examples

>>> df.ads.warning()
    Column    Feature Type         Warning               Message       Metric    Value
--------------------------------------------------------------------------------------
0      Age      continuous           Zeros      Age has 38 zeros        Count       38
1      Age      continuous           Zeros   Age has 12.2% zeros   Percentage    12.2%

ads.feature_engineering.accessor.mixin.eda_mixin_series module

This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Series. The series of purpose-driven methods enable the data scientist to complete univariate analysis.

From the accessor we have access to the pandas object the user is interacting with as well as corresponding list of feature types.

class ads.feature_engineering.accessor.mixin.eda_mixin_series.EDAMixinSeries

Bases: object

feature_plot() → Axes

For the series generate a summary plot based on the most relevant feature type.

Returns: Plot object for the series based on the most relevant feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

feature_stat() → DataFrame

Summary statistics Dataframe provided.

This returns feature stats on series using FeatureType summary method.

Examples

>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv')
>>> df['Cabin'].ads.feature_stat()
    Metric      Value
0       count       891
1       unqiue      147
2       missing     687

Returns: Dataframe with 2 columns and rows for different metric values
Return type: pandas.DataFrame

warning() → DataFrame

Generates a data frame that lists feature specific warnings.

Returns: The list of feature specific warnings.
Return type: pandas.DataFrame

Examples

>>> df["Age"].ads.warning()
  Feature Type       Warning               Message         Metric      Value
 ---------------------------------------------------------------------------
0   continuous         Zeros      Age has 38 zeros          Count         38
1   continuous         Zeros   Age has 12.2% zeros     Percentage      12.2%

ads.feature_engineering.accessor.mixin.feature_types_mixin module

The module that represents the ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.

Classes

ADSFeatureTypesMixin
ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.

class ads.feature_engineering.accessor.mixin.feature_types_mixin.ADSFeatureTypesMixin

Bases: object

ADS Feature Types Mixin class that extends Pandas Series and DataFrame accessors.

warning_registered(cls) → pd.DataFrame: Lists registered warnings for registered feature types.

validator_registered(cls) → pd.DataFrame: Lists registered validators for registered feature types.

help(self, prop: str = None) → None: Help method that prints either a table of available properties or, given a property, returns its docstring.

help(prop: Optional[str] = None) → None

Help method that prints either a table of available properties or, given an individual property, returns its docstring.

Parameters: prop (str) – The Name of property.
Returns: Nothing.
Return type: None

validator_registered() → DataFrame

Lists registered validators for registered feature types.

Returns: The list of registered validators for registered feature types
Return type: pandas.DataFrame

Examples

>>> df.ads.validator_registered()
         Column     Feature Type        Validator                 Condition                    Handler
------------------------------------------------------------------------------------------------------
0   PhoneNumber    phone_number   is_phone_number                        ()            default_handler
1   PhoneNumber    phone_number   is_phone_number    {'country_code': '+7'}   specific_country_handler
2    CreditCard    credit_card     is_credit_card                        ()            default_handler

>>> df['PhoneNumber'].ads.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler

warning_registered() → DataFrame

Lists registered warnings for all registered feature types.

Returns: The list of registered warnings for registered feature types.
Return type: pandas.DataFrame

Examples

>>> df.ads.warning_registered()
       Column    Feature Type             Warning                    Handler
   -------------------------------------------------------------------------
   0      Age      continuous               zeros              zeros_handler
   1      Age      continuous    high_cardinality   high_cardinality_handler

>>> df["Age"].ads.warning_registered()
       Feature Type             Warning                    Handler
   ---------------------------------------------------------------
   0     continuous               zeros              zeros_handler
   1     continuous    high_cardinality   high_cardinality_handler

ads.feature_engineering.adsstring.common_regex_mixin module

class ads.feature_engineering.adsstring.common_regex_mixin.CommonRegexMixin

Bases: object

property address

property credit_card

property date

property email

property ip

property link

property phone_number_US

property price

redact(fields: Union[List[str], Dict[str, str]]) → str

Remove personal information in a string. For example, “Jane’s phone number is 123-456-7890” is turned into “Jane’s phone number is [phone_number_US].”

Parameters: fields ((list(str) | dict)) – either a list of fields to redact, e.g. [‘email’, ‘phone_number_US’], in which case the redacted text is replaced with capitalized word like [EMAIL] or [PHONE_NUMBER_US_WITH_EXT], or a dictionary where key is a field to redact and value is the replacement text, e.g., {‘email’: ‘HIDDEN_EMAIL’}.
Returns: redacted string
Return type: str

redact_map = {'address': '[ADDRESS]', 'address_with_zip': '[ADDRESS_WITH_ZIP]', 'credit_card': '[CREDIT_CARD]', 'date': '[DATE]', 'email': '[EMAIL]', 'ip': '[IP]', 'ipv6': '[IPV6]', 'link': '[LINK]', 'phone_number_US': '[PHONE_NUMBER_US]', 'phone_number_US_with_ext': '[PHONE_NUMBER_US_WITH_EXT]', 'po_box': '[PO_BOX]', 'price': '[PRICE]', 'ssn': '[SSN]', 'time': '[TIME]', 'zip_code': '[ZIP_CODE]'}

property ssn

property time

property zip_code

ads.feature_engineering.adsstring.oci_language module

class ads.feature_engineering.adsstring.oci_language.OCILanguage(auth=None)

Bases: object

property absa: DataFrame

property key_phrase: DataFrame

property language_dominant: DataFrame

property ner: DataFrame

property text_classification: DataFrame

ads.feature_engineering.adsstring.string module

class ads.feature_engineering.adsstring.string.ADSString(text: str, language='english')

Bases: str, CommonRegexMixin

Defines an enhanced string class for the purporse of performing NLP tasks. Its functionalities can be extended by registering plugins.

plugins

list of plugins that add functionalities to the class.

Type: List

string

plain string

Type: str

Example

>>> ADSString.nlp_backend('nltk')
>>> s = ADSString("Walking my dog on a breezy day is the best.")
>>> s.lower() # regular string methods still work
>>> s.replace("a", "e")
>>> s.nouns
>>> s.parts_of_speech
>>> s = ADSString("get in touch with my associate at john.smith@gmail.com to schedule")
>>> s.emails
>>> ADSString.plugin_register(OCILanguage)
>>> s = ADSString("This movie is awesome.")
>>> s.absa

Initialze the class and register plugins.

Parameters

text (str) – input text
language (str, optional) – language of the text, by default “english”.

Raises

TypeError – input text is not a string.

capitalize()

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower case.

casefold(): Return a version of the string suitable for caseless comparisons.

center(width, fillchar=' ', /)

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[, start[, end]]) → int: Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

encode(encoding='utf-8', errors='strict')

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[, start[, end]]) → bool: Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[, start[, end]]) → int

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args, **kwargs) → str: Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping) → str: Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).

help() → None

List available properties.

Parameters: plugin (Any) – registered plugin
Return type: None

index(sub[, start[, end]]) → int

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.

isalpha()

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.

isascii()

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.

isdecimal()

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.

isdigit()

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and there is at least one character in the string.

isidentifier()

Return True if the string is a valid Python identifier, False otherwise.

Use keyword.iskeyword() to test for reserved identifiers such as “def” and “class”.

islower()

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.

isnumeric()

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is at least one character in the string.

isprintable()

Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable in repr() or if it is empty.

isspace()

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.

istitle()

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.

isupper()

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.

join(iterable, /)

Concatenate any number of strings.

The string whose method is called is inserted in between each given string. The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

language_model_cache = {}

ljust(width, fillchar=' ', /)

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower(): Return a copy of the string converted to lowercase.

lstrip(chars=None, /)

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

maketrans(y=None, z=None, /)

Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.

nlp_backend() → None

Set backend for extracting NLP related properties.

Parameters

backend (str, optional) – name of backend, by default ‘nltk’.

Raises

ModuleNotFoundError – module corresponding to backend is not found.
ValueError – input backend is invalid.

Return type

None

partition(sep, /)

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original string and two empty strings.

plugin_clear() → None: Clears plugins.

plugin_list() → None: List registered plugins.

plugin_register() → None

Register a plugin

Parameters: plugin (Any) – plugin to register
Return type: None

plugins = []

redact(fields: Union[List[str], Dict[str, str]]) → str

Remove personal information in a string. For example, “Jane’s phone number is 123-456-7890” is turned into “Jane’s phone number is [phone_number_US].”

Parameters: fields ((list(str) | dict)) – either a list of fields to redact, e.g. [‘email’, ‘phone_number_US’], in which case the redacted text is replaced with capitalized word like [EMAIL] or [PHONE_NUMBER_US_WITH_EXT], or a dictionary where key is a field to redact and value is the replacement text, e.g., {‘email’: ‘HIDDEN_EMAIL’}.
Returns: redacted string
Return type: str

replace(old, new, count=-1, /)

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences are replaced.

rfind(sub[, start[, end]]) → int

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[, start[, end]]) → int

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width, fillchar=' ', /)

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep, /)

Partition the string into three parts using the given separator.

This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing two empty strings and the original string.

rsplit(sep=None, maxsplit=- 1)

Return a list of the words in the string, using sep as the delimiter string.

sep
The delimiter according which to split the string. None (the default value) means split according to any whitespace, and discard empty strings from the result.

maxsplit
Maximum number of splits to do. -1 (the default value) means no limit.

Splits are done starting at the end of the string and working to the front.

rstrip(chars=None, /)

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None, maxsplit=- 1)

Return a list of the words in the string, using sep as the delimiter string.

sep: The delimiter according which to split the string. None (the default value) means split according to any whitespace, and discard empty strings from the result.
maxsplit: Maximum number of splits to do. -1 (the default value) means no limit.

splitlines(keepends=False)

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and true.

startswith(prefix[, start[, end]]) → bool: Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

property string

strip(chars=None, /)

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase(): Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining cased characters have lower case.

translate(table, /)

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.

The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.

upper(): Return a copy of the string converted to uppercase.

zfill(width, /)

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

ads.feature_engineering.adsstring.string.to_adsstring(func: Callable) → Callable

Decorator that converts output of a function to ADSString if it returns a string.

Parameters: func (Callable) – function to decorate
Returns: decorated function
Return type: Callable

ads.feature_engineering.adsstring.string.wrap_output_string(decorator: Callable) → Callable

Class decorator that applies a decorator to all methods of a class.

Parameters: decorator (Callable) – decorator to apply
Returns: class decorator
Return type: Callable

ads.feature_engineering.feature_type.address module

The module that represents an Address feature type.

Classes:

Address: The Address feature type.

class ads.feature_engineering.feature_type.address.Address

Bases: String

Type representing address.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the location of given address on map base on zip code.

Example

>>> from ads.feature_engineering.feature_type.address import Address
>>> import pandas as pd
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
                        '1 Berkeley Street, Boston, MA 67891',
                        '54305 Oxford Street, Seattle, WA 95132',
                        ''])
>>> Address.validator.is_address(address)
0     True
1     True
2     True
3    False
dtype: bool

description = 'Type representing address.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 1
    unique: 3
values: Address

Returns: Domain based on the Address feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the location of given address on map base on zip code.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_plot()

Returns: Plot object for the series based on the Address feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  3
2       missing 1

Returns: Summary statistics of the Series provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.address.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pd.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.base module

class ads.feature_engineering.feature_type.base.FeatureBaseType(classname, bases, dictionary)

Bases: type

The helper metaclass to extend fucntionality of FeatureType class.

class ads.feature_engineering.feature_type.base.FeatureBaseTypeMeta(classname, bases, dictionary)

Bases: FeatureBaseType, ABCMeta

The class to provide compatibility between ABC and FeatureBaseType metaclass.

class ads.feature_engineering.feature_type.base.FeatureType

Bases: ABC

Abstract case for feature types. Default class attribute include name and description. Name is auto generated using camel to snake conversion unless specified.

description = 'Base feature type.'

name = 'feature_type'

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

class ads.feature_engineering.feature_type.base.Name: Bases: object

class ads.feature_engineering.feature_type.base.Tag(name: str)

Bases: object

Class for free form tags. Name must be specified.

Initialize a tag instance.

Parameters: name (str) – The name of the tag.

ads.feature_engineering.feature_type.boolean module

The module that represents a Boolean feature type.

Classes:

Boolean: The feature type that represents binary values True/False.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.boolean.Boolean

Bases: FeatureType

Type representing binary values True/False.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Show the counts of observations in True/False using bars.

Examples

>>> from ads.feature_engineering.feature_type.boolean import Boolean
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> Boolean.validator.is_boolean(s)
0     True
1     True
2     True
3     True
4    False
5    False
dtype: bool

description = 'Type representing binary values True/False.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_domain()
constraints:
- expression: $x in [True, False]
    language: python
stats:
    count: 6
    missing: 2
    unique: 2
values: Boolean

Returns: Domain based on the Boolean feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in True/False using bars.

Parameters: x (pandas.Series) – The feature being evaluated.
Returns: Plot object for the series based on the Boolean feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_plot()

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters: x (pandas.Series) – The feature being evaluated.
Returns: Summary statistics of the Series or Dataframe provided.
Return type: pandas.DataFrame

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.boolean.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pandas.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.category module

The module that represents a Category feature type.

Classes:

Category: The Category feature type.

class ads.feature_engineering.feature_type.category.Category

Bases: FeatureType

Type representing discrete unordered values.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing discrete unordered values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='category')
>>> cat.ads.feature_type = ['category']
>>> cat.ads.feature_domain()
constraints:
- expression: $x in ['S', 'C', 'Q', '']
    language: python
stats:
    count: 22
    missing: 3
    unique: 3
values: Category

Returns: Domain based on the Category feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in each categorical bin using bar chart.

Parameters: x (pandas.Series) – The feature being evaluated.
Returns: Plot object for the series based on the Category feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_plot()

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there are any.

Parameters: x (pandas.Series) – The feature being evaluated.
Returns: Summary statistics of the Series or Dataframe provided.
Return type: pandas.DataFrame

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.constant module

The module that represents a Constant feature type.

Classes:

Constant: The Constant feature type.

class ads.feature_engineering.feature_type.constant.Constant

Bases: FeatureType

Type representing constant values.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the counts of observations in bars.

description = 'Type representing constant values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type. .. rubric:: Example

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 5
    unique: 1
values: Constant

Returns: Domain based on the Constant feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in bars.

Parameters: x (pandas.Series) – The feature being shown.

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_plot()

Returns: Plot object for the series based on the Constant feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters: x (pandas.Series) – The feature being evaluated.
Returns: Summary statistics of the Series provided.
Return type: pandas.DataFrame

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_stat()
    Metric  Value
0       count   5
1       unique  1

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.continuous module

The module that represents a Continuous feature type.

Classes:

Continuous: The Continuous feature type.

class ads.feature_engineering.feature_type.continuous.Continuous

Bases: FeatureType

Type representing continuous values.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using box plot.

description = 'Type representing continuous values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_domain()
constraints: []
stats:
    count: 10.0
    lower quartile: 3.058
    mean: 4.959
    median: 3.81
    missing: 2.0
    sample maximum: 13.32
    sample minimum: 2.25
    skew: 2.175
    standard deviation: 3.62
    upper quartile: 4.908
values: Continuous

Returns: Domain based on the Continuous feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datasets using box plot.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feture_plot()

Returns: Plot object for the series based on the Continuous feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, skew and missing(count).

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_stat()
    Metric                  Value
0       count                   10.000
1       mean                    4.959
2       standard deviation          3.620
3       sample minimum          2.250
4       lower quartile          3.058
5       median                  3.810
6       upper quartile          4.908
7       sample maximum          13.320
8       skew                    2.175
9       missing                 2.000

Returns: Summary statistics of the Series or Dataframe provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.creditcard module

The module that represents a CreditCard feature type.

Classes:

CreditCard: The CreditCard feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.
_luhn_checksum(card_number: str) -> float: Implements Luhn algorithm to validate a credit card number.

class ads.feature_engineering.feature_type.creditcard.CreditCard

Bases: String

Type representing credit card numbers.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the counts of observations in each credit card type using bar chart.

Examples

>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> import pandas as pd
>>> s = pd.Series(["4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190"], name='credit_card')
>>> s.ads.feature_type = ['credit_card']
>>> CreditCard.validator.is_credit_card(s)
0     True
1    False
2     True
3     True
4     True
5     True
Name: credit_card, dtype: bool

description = 'Type representing credit card numbers.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_domain()
constraints: []
stats:
    count: 16
    count_Amex: 5
    count_Diners Club: 2
    count_MasterCard: 3
    count_Visa: 5
    count_missing: 1
    missing: 1
    unique: 15
values: CreditCard

Returns: Domain based on the CreditCard feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in each credit card type using bar chart.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_plot()

Returns: Plot object for the series based on the CreditCard feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series)

Generates feature statistics.

Feature statistics include (total)count, unique(count), missing(count) and: count of each credit card type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_stat()
    Metric              Value
0       count               16
1       unique              15
2       missing             1
3       count_Amex              5
4       count_Visa              5
5       count_MasterCard        3
6       count_Diners Club       2
7       count_missing       1

Returns: Summary statistics of the Series or Dataframe provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.creditcard.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pandas.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.datetime module

The module that represents a DateTime feature type.

Classes:

DateTime: The DateTime feature type.

class ads.feature_engineering.feature_type.datetime.DateTime

Bases: FeatureType

Type representing date and/or time.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datetime datasets using histograms.

Example

>>> from ads.feature_engineering.feature_type.datetime import DateTime
>>> import pandas as pd
>>> s = pd.Series(["12/12/12", "12/12/13", None, "12/12/14"], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> DateTime.validator.is_datetime(s)
0     True
1     True
2    False
3     True
Name: datetime, dtype: bool

description = 'Type representing date and/or time.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 8
    missing: 3
    sample maximum: April/15/11
    sample minimum: 3/11/2000
values: DateTime

Returns: Domain based on the DateTime feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datetime datasets using histograms.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_plot()

Returns: Plot object for the series based on the DateTime feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, sample maximum, sample minimum, and missing(count) if there is any.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_stat()
    Metric              Value
0       count               8
1       sample maximum      April/15/11
2       sample minimum      3/11/2000
3       missing             3

Returns: Summary statistics of the Series or Dataframe provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.datetime.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pandas.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.discrete module

The module that represents a Discrete feature type.

Classes:

Discrete: The Discrete feature type.

class ads.feature_engineering.feature_type.discrete.Discrete

Bases: FeatureType

Type representing discrete values.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using box plot.

description = 'Type representing discrete values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_domain()
constraints: []
stats:
    count: 4
    unique: 4
values: Discrete

Returns: Domain based on the Discrete feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datasets using box plot.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  4

Returns: Plot object for the series based on the Discrete feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
            discrete
count   4
unique  4

Returns: Summary statistics of the Series provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.document module

The module that represents a Document feature type.

Classes:

Document: The Document feature type.

class ads.feature_engineering.feature_type.document.Document

Bases: FeatureType

Type representing document values.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

description = 'Type representing document values.'

classmethod feature_domain()

Returns: Nothing.
Return type: None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.gis module

The module that represents a GIS feature type.

Classes:

GIS: The GIS feature type.

class ads.feature_engineering.feature_type.gis.GIS

Bases: FeatureType

Type representing geographic information.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.gis import GIS
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='gis')
>>> s.ads.feature_type = ['gis']
>>> GIS.validator.is_gis(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: gis, dtype: bool

description = 'Type representing geographic information.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: GIS

Returns: Domain based on the GIS feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the location of given address on map base on longitude and latitute.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_plot()

Returns: Plot object for the series based on the GIS feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_stat()
        gis
count   13
unique  10
missing 3

Returns: Summary statistics of the Series provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.gis.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pandas.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.integer module

The module that represents an Integer feature type.

Classes:

Integer: The Integer feature type.

class ads.feature_engineering.feature_type.integer.Integer

Bases: FeatureType

Type representing integer values.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using box plot.

description = 'Type representing integer values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='integer')
>>> s.ads.feature_type = ['integer']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    freq: 2
    missing: 2
    top: true
    unique: 2
values: Integer

Returns: Domain based on the Integer feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datasets using box plot.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_plot()

Returns: Plot object for the series based on the Integer feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, max and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_stat()
    Metric                  Value
0       count                   7
1       mean                    1
2       standard deviation          1
3       sample minimum          0
4       lower quartile          1
5       median                  1
6       upper quartile          2
7       sample maximum          4
8       missing                 1

Returns: Summary statistics of the Series or Dataframe provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address module

The module that represents an IpAddress feature type.

Classes:

IpAddress: The IpAddress feature type.

class ads.feature_engineering.feature_type.ip_address.IpAddress

Bases: FeatureType

Type representing IP Address.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address import IpAddress
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> IpAddress.validator.is_ip_address(s)
0     True
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool

description = 'Type representing IP Address.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 3
values: IpAddress

Returns: Domain based on the IpAddress feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2

Returns: Summary statistics of the Series provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pandas.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.ip_address_v4 module

The module that represents an IpAddressV4 feature type.

Classes:

IpAddressV4: The IpAddressV4 feature type.

class ads.feature_engineering.feature_type.ip_address_v4.IpAddressV4

Bases: FeatureType

Type representing IP Address V4.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v4 import IpAddressV4
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> IpAddressV4.validator.is_ip_address_v4(s)
0     True
1    False
2    False
3    False
4    False
Name: ip_address, dtype: bool

description = 'Type representing IP Address V4.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address_v4')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 4
values: IpAddressV4

Returns: Domain based on the IpAddressV4 feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  4
2       missing 2

Returns: Summary statistics of the Series provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address_v4.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pandas.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.ip_address_v6 module

The module that represents an IpAddressV6 feature type.

Classes:

IpAddressV6: The IpAddressV6 feature type.

class ads.feature_engineering.feature_type.ip_address_v6.IpAddressV6

Bases: FeatureType

Type representing IP Address V6.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v6 import IpAddressV6
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> IpAddressV6.validator.is_ip_address_v6(s)
0    False
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool

description = 'Type representing IP Address V6.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address_v6')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 2
values: IpAddressV6

Returns: Domain based on the IpAddressV6 feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2

Returns: Summary statistics of the Series provided.
Return type: Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address_v6.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pandas.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.lat_long module

The module that represents a LatLong feature type.

Classes:

LatLong: The LatLong feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.lat_long.LatLong

Bases: String

Type representing longitude and latitute.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.lat_long import LatLong
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='latlong')
>>> s.ads.feature_type = ['lat_long']
>>> LatLong.validator.is_lat_long(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: latlong, dtype: bool

description = 'Type representing longitude and latitute.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> latlong_series = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: LatLong

Returns: Domain based on the LatLong feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the location of given address on map base on longitude and latitute.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_plot()

Returns: Plot object for the series based on the LatLong feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generate feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_stat()
    Metric  Value
0       count   13
1       unique  10
2       missing 3

Returns: Summary statistics of the Series or Dataframe provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.lat_long.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pandas.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.object module

The module that represents an Object feature type.

Classes:

Object: The Object feature type.

class ads.feature_engineering.feature_type.object.Object

Bases: FeatureType

Type representing object.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

description = 'Type representing object.'

classmethod feature_domain()

Returns: Nothing.
Return type: None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ordinal module

The module that represents an Ordinal feature type.

Classes:

Ordinal: The Ordinal feature type.

class ads.feature_engineering.feature_type.ordinal.Ordinal

Bases: FeatureType

Type representing ordered values.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing ordered values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_domain()
constraints:
- expression: $x in [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
    language: python
stats:
    count: 10
    missing: 1
    unique: 9
values: Ordinal

Returns: Domain based on the Ordinal feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in each categorical bin using bar chart.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_plot()

Returns: The bart chart plot object for the series based on the Continuous feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count), and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_stat()
    Metric  Value
0       count   10
1       unique  9
2       missing 1

Returns: Summary statistics of the Series or Dataframe provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.phone_number module

The module that represents a Phone Number feature type.

Classes:

PhoneNumber: The Phone Number feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.phone_number.PhoneNumber

Bases: String

Type representing phone numbers.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

Examples

>>> from ads.feature_engineering.feature_type.phone_number import PhoneNumber
>>> import pandas as pd
>>> s = pd.Series([None, "1-640-124-5367", "1-573-916-4412"])
>>> PhoneNumber.validator.is_phone_number(s)
0    False
1     True
2     True
dtype: bool

description = 'Type representing phone numbers.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 7
    missing: 4
    unique: 2
values: PhoneNumber

Returns: Domain based on the PhoneNumber feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_stat()
    Metric  Value
1       count   7
2       unique  2
3       missing 4

Returns: Summary statistics of the Series or Dataframe provided.
Return type: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.phone_number.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pandas.Series) – The data to process.
Returns: The logical list indicating if the data matches requirements.
Return type: pandas.Series

ads.feature_engineering.feature_type.string module

The module that represents a String feature type.

Classes:

String: The feature type that represents string values.

class ads.feature_engineering.feature_type.string.String

Bases: FeatureType

Type representing string values.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using wordcloud.

Example

>>> from ads.feature_engineering.feature_type.string import String
>>> import pandas as pd
>>> s = pd.Series(["Hello", "world", None], name='string')
>>> String.validator.is_string(s)
0     True
1     True
2    False
Name: string, dtype: bool

description = 'Type representing string values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_domain()
constraints: []
stats:
    count: 22
    missing: 3
    unique: 3
values: String

Returns: Domain based on the String feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datasets using wordcloud.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_plot()

Returns: Plot object for the series based on the String feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3

Returns: Summary statistics of the Series or Dataframe provided.
Return type: Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.string.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pd.Series) – The data to process.
Returns: pd.Series
Return type: The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.text module

The module that represents a Text feature type.

Classes:

Text: The Text feature type.

class ads.feature_engineering.feature_type.text.Text

Bases: String

Type representing text values.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using wordcloud.

description = 'Type representing text values.'

classmethod feature_domain()

Returns: Nothing.
Return type: None

static feature_plot(x: Series) → Axes

Shows distributions of datasets using wordcloud.

Examples

>>> text = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='text')
>>> text.ads.feature_type = ['text']
>>> text.ads.feature_plot()

Returns: Plot object for the series based on the Text feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.unknown module

The module that represents an Unknown feature type.

Classes:

Text: The Unknown feature type.

class ads.feature_engineering.feature_type.unknown.Unknown

Bases: FeatureType

Type representing third-party dtypes.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

description = 'Type representing unknown type.'

classmethod feature_domain()

Returns: Nothing.
Return type: None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.zip_code module

The module that represents a ZipCode feature type.

Classes:

ZipCode: The ZipCode feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.zip_code.ZipCode

Bases: String

Type representing postal code.

description

The feature type description.

Type: str

name

The feature type name.

Type: str

warning

Provides functionality to register warnings and invoke them.

Type: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the geometry distribution base on location of zipcode.

Example

>>> from ads.feature_engineering.feature_type.zip_code import ZipCode
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(["94065", "90210", np.NaN, None], name='zipcode')
>>> ZipCode.validator.is_zip_code(s)
0     True
1     True
2    False
3    False
Name: zipcode, dtype: bool

description = 'Type representing postal code.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 2
    unique: 2
values: ZipCode

Returns: Domain based on the ZipCode feature type.
Return type: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the geometry distribution base on location of zipcode.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_plot()

Returns: Plot object for the series based on the ZipCode feature type.
Return type: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  2
2       missing 2

Returns: Summary statistics of the Series provided.
Return type: Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.zip_code.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters: data (pd.Series) – The data to process.
Returns: pd.Series
Return type: The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.handler.feature_validator module

The module that helps to register custom validators for the feature types and extending registered validators with dispatching based on the specific arguments.

Classes

FeatureValidator
The Feature Validator class to manage custom validators.

FeatureValidatorMethod
The Feature Validator Method class. Extends methods which requires dispatching based on the specific arguments.

class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator

Bases: object

The Feature Validator class to manage custom validators.

register(self, name: str, handler: Callable, condition: Union[Tuple, Dict[str, Any]] = None, replace: bool = False) → None: Registers new validator.

unregister(self, name: str, condition: Union[Tuple, Dict[str, Any]] = None) → None: Unregisters validator.

registered(self) → pd.DataFrame: Gets the list of registered validators.

Examples

>>> series = pd.Series(['+1-202-555-0141', '+1-202-555-0142'], name='Phone Number')

>>> def phone_number_validator(data: pd.Series) -> pd.Series:
...    print("phone_number_validator")
...    return data

>>> def universal_phone_number_validator(data: pd.Series, country_code) -> pd.Series:
...    print("universal_phone_number_validator")
...    return data

>>> def us_phone_number_validator(data: pd.Series, country_code) -> pd.Series:
...    print("us_phone_number_validator")
...    return data

>>> PhoneNumber.validator.register(name="is_phone_number", handler=phone_number_validator, replace=True)
>>> PhoneNumber.validator.register(name="is_phone_number", handler=universal_phone_number_validator, condition = ('country_code',))
>>> PhoneNumber.validator.register(name="is_phone_number", handler=us_phone_number_validator, condition = {'country_code':'+1'})

>>> PhoneNumber.validator.is_phone_number(series)
    phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

>>> PhoneNumber.validator.is_phone_number(series, country_code = '+7')
    universal_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

>>> PhoneNumber.validator.is_phone_number(series, country_code = '+1')
    us_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

>>> PhoneNumber.validator.registered()
               Validator                 Condition                            Handler
    ---------------------------------------------------------------------------------
    0    is_phone_number                        ()             phone_number_validator
    1    is_phone_number          ('country_code')   universal_phone_number_validator
    2    is_phone_number    {'country_code': '+1'}          us_phone_number_validator

>>> series.ads.validator.is_phone_number()
    phone_number_validator
        0     +1-202-555-0141
        1     +1-202-555-0142

>>> series.ads.validator.is_phone_number(country_code = '+7')
    universal_phone_number_validator
        0     +1-202-555-0141
        1     +1-202-555-0142

>>> series.ads.validator.is_phone_number(country_code = '+1')
    us_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

Initializes the FeatureValidator.

register(name: str, handler: Callable, condition: Optional[Union[Tuple, Dict[str, Any]]] = None, replace: bool = False) → None

Registers new validator.

Parameters

name (str) – The validator name.
handler (callable) – The handler.
condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator.
replace (bool) – The flag indicating if the registered validator should be replaced with the new one.

Returns

Nothing.

Return type

None

Raises

ValueError – The name is empty or handler is not provided.
TypeError – The handler is not callable. The name of the validator is not a string.
ValidatorAlreadyExists – The validator is already registered.

registered() → DataFrame

Gets the list of registered validators.

Returns: The list of registerd validators.
Return type: pd.DataFrame

unregister(name: str, condition: Optional[Union[Tuple, Dict[str, Any]]] = None) → None

Unregisters validator.

Parameters

name (str) – The name of the validator to be unregistered.
condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator to be unregistered.

Returns

Nothing.

Return type

None

Raises

TypeError – The name of the validator is not a string.
ValidatorNotFound – The validator not found.
ValidatorWIthConditionNotFound – The validator with provided condition not found.

class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidatorMethod(handler: Callable)

Bases: object

The Feature Validator Method class.

Extends methods which requires dispatching based on the specific arguments.

register(self, condition: Union[Tuple, Dict[str, Any]], handler: Callable) → None: Registers new handler.

unregister(self, condition: Union[Tuple, Dict[str, Any]]) → None: Unregisters existing handler.

registered(self) → pd.DataFrame: Gets the list of registered handlers.

Initializes the Feature Validator Method.

Parameters: handler (Callable) – The handler that will be called by default if suitable one not found.

register(condition: Union[Tuple, Dict[str, Any]], handler: Callable) → None

Registers new handler.

Parameters

condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to register a new handler.
handler (Callable) – The handler to be registered.

Returns

Nothing.

Return type

None

Raises

ValueError – If condition not provided or provided in the wrong format. If handler not provided or has wrong format.

registered() → DataFrame

Gets the list of registered handlers.

Returns: The list of registerd handlers.
Return type: pd.DataFrame

unregister(condition: Union[Tuple, Dict[str, Any]]) → None

Unregisters existing handler.

Parameters: condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to unregister a handler.
Returns: Nothing.
Return type: None
Raises: ValueError – If condition not provided or provided in the wrong format. If condition not registered.

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorAlreadyExists(name: str): Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorNotFound(name: str): Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionAlreadyExists(name: str): Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionNotFound(name: str): Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.WrongHandlerMethodSignature(handler_name: str, condition: str, handler_signature: str): Bases: ValueError

ads.feature_engineering.feature_type.handler.feature_warning module

The module that helps to register custom warnings for the feature types.

Classes

FeatureWarning
The Feature Warning class. Provides functionality to register warning handlers and invoke them.

Examples

>>> warning = FeatureWarning()
>>> def warning_handler_zeros_count(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 38 zeros', 'Count', 38]],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> def warning_handler_zeros_percentage(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> warning.register(name="zeros_count", handler=warning_handler_zeros_count)
>>> warning.register(name="zeros_percentage", handler=warning_handler_percentage)
>>> warning.registered()
                    Name                               Handler
    ----------------------------------------------------------
    0         zeros_count          warning_handler_zeros_count
    1    zeros_percentage     warning_handler_zeros_percentage

>>> warning.zeros_percentage(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38

>>> warning.zeros_count(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%

>>> warning(data_series)
        Warning                    Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%

>>> warning.unregister('zeros_count')
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros   Age has 12.2% zeros     Percentage      12.2%

class ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning

Bases: object

The Feature Warning class.

Provides functionality to register warning handlers and invoke them.

register(self, name: str, handler: Callable) → None: Registers a new warning for the feature type.

unregister(self, name: str) → None: Unregisters warning.

registered(self) → pd.DataFrame: Gets the list of registered warnings.

Examples

>>> warning = FeatureWarning()
>>> def warning_handler_zeros_count(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 38 zeros', 'Count', 38]],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> def warning_handler_zeros_percentage(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> warning.register(name="zeros_count", handler=warning_handler_zeros_count)
>>> warning.register(name="zeros_percentage", handler=warning_handler_percentage)
>>> warning.registered()
                  Warning                              Handler
    ----------------------------------------------------------
    0         zeros_count          warning_handler_zeros_count
    1    zeros_percentage     warning_handler_zeros_percentage

>>> warning.zeros_percentage(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38

>>> warning.zeros_count(data_series)
              Warning              Message         Metric      Value
    ----------------------------------------------------------------
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%

>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%

>>> warning.unregister('zeros_count')
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros   Age has 12.2% zeros     Percentage      12.2%

Initializes the FeatureWarning.

register(name: str, handler: Callable, replace: bool = False) → None

Registers a new warning.

Parameters

name (str) – The warning name.
handler (callable) – The handler associated with the warning.
replace (bool) – The flag indicating if the registered warning should be replaced with the new one.

Returns

Nothing

Return type

None

Raises

ValueError – If warning name is empty or handler not defined.
TypeError – If handler is not callable.
WarningAlreadyExists – If warning is already registered.

registered() → DataFrame

Gets the list of registered warnings.

Return type: pd.DataFrame

Examples

>>>    The list of registerd warnings in DataFrame format.
                     Name                               Handler
    -----------------------------------------------------------
    0         zeros_count           warning_handler_zeros_count
    1    zeros_percentage      warning_handler_zeros_percentage

unregister(name: str) → None

Unregisters warning.

Parameters

name (str) – The name of warning to be unregistered.

Returns

Nothing.

Return type

None

Raises

ValueError – If warning name is not provided or empty.
WarningNotFound – If warning not found.

ads.feature_engineering.feature_type.handler.warnings module

The module with all default warnings provided to user. These are registered to relevant feature types directly in the feature type files themselves.

ads.feature_engineering.feature_type.handler.warnings.high_cardinality_handler(s: Series) → DataFrame

Warning if number of unique values (including Nan) in series is greater than or equal to 15.

Parameters: s (pd.Series) – Pandas series - column of some feature type.
Returns: Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists count of unique values.
Return type: pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.missing_values_handler(s: Series) → DataFrame

Warning for > 5 percent missing values (Nans) in series.

Parameters: s (pd.Series) – Pandas series - column of some feature type.
Returns: Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of missing values and second is percentage of missing values.
Return type: pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.skew_handler(s: Series) → DataFrame

Warning if absolute value of skew is greater than 1.

Parameters: s (pd.Series) – Pandas series - column of some feature type, expects continuous values.
Returns: Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists skew value of that column.
Return type: pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.zeros_handler(s: Series) → DataFrame

Warning for greater than 10 percent zeros in series.

Parameters: s (pd.Series) – Pandas series - column of some feature type.
Returns: Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of zero values and second is percentage of zero values.
Return type: pd.Dataframe

ads.feature_engineering package

Submodules

ads.feature_engineering.exceptions module

ads.feature_engineering.feature_type_manager module

Classes

ads.feature_engineering.accessor.dataframe_accessor module

ads.feature_engineering.accessor.series_accessor module

ads.feature_engineering.accessor.mixin.correlation module

ads.feature_engineering.accessor.mixin.eda_mixin module

ads.feature_engineering.accessor.mixin.eda_mixin_series module

ads.feature_engineering.accessor.mixin.feature_types_mixin module

Classes

ads.feature_engineering.adsstring.common_regex_mixin module

ads.feature_engineering.adsstring.oci_language module

ads.feature_engineering.adsstring.string module

ads.feature_engineering.feature_type.address module

ads.feature_engineering.feature_type.base module

ads.feature_engineering.feature_type.boolean module

ads.feature_engineering.feature_type.category module

ads.feature_engineering.feature_type.constant module

ads.feature_engineering.feature_type.continuous module

ads.feature_engineering.feature_type.creditcard module

ads.feature_engineering.feature_type.datetime module

ads.feature_engineering.feature_type.discrete module

ads.feature_engineering.feature_type.document module

ads.feature_engineering.feature_type.gis module

ads.feature_engineering.feature_type.integer module

ads.feature_engineering.feature_type.ip_address module

ads.feature_engineering.feature_type.ip_address_v4 module

ads.feature_engineering.feature_type.ip_address_v6 module

ads.feature_engineering.feature_type.lat_long module

ads.feature_engineering.feature_type.object module

ads.feature_engineering.feature_type.ordinal module

ads.feature_engineering.feature_type.phone_number module

ads.feature_engineering.feature_type.string module

ads.feature_engineering.feature_type.text module

ads.feature_engineering.feature_type.unknown module

ads.feature_engineering.feature_type.zip_code module

ads.feature_engineering.feature_type.handler.feature_validator module

Classes

ads.feature_engineering.feature_type.handler.feature_warning module

Classes

ads.feature_engineering.feature_type.handler.warnings module

Module contents