ads.feature_engineering package

Submodules

ads.feature_engineering.exceptions module

exception ads.feature_engineering.exceptions.InvalidFeatureType(tname: str): Bases: TypeError

exception ads.feature_engineering.exceptions.NameAlreadyRegistered(name: str): Bases: NameError

exception ads.feature_engineering.exceptions.TypeAlreadyAdded(tname: str): Bases: TypeError

exception ads.feature_engineering.exceptions.TypeAlreadyRegistered(tname: str): Bases: TypeError

exception ads.feature_engineering.exceptions.TypeNotFound(tname: str): Bases: TypeError

exception ads.feature_engineering.exceptions.WarningAlreadyExists(name: str): Bases: ValueError

exception ads.feature_engineering.exceptions.WarningNotFound(name: str): Bases: ValueError

ads.feature_engineering.feature_type_manager module

The module that helps to manage feature types. Provides functionalities to register, unregister, list feature types.

Classes

FeatureTypeManager
Feature Types Manager class that manages feature types.

Examples

>>> from ads.feature_engineering.feature_type.base import FeatureType
>>> class NewType(FeatureType):
...    description="My personal type."
...    pass
>>> FeatureTypeManager.feature_type_register(NewType)
>>> FeatureTypeManager.feature_type_registered()
            Name        Feature Type                                  Description
---------------------------------------------------------------------------------
0     Continuous          continuous          Type representing continuous values.
1       DateTime           date_time           Type representing date and/or time.
2       Category            category  Type representing discrete unordered values.
3        Ordinal             ordinal             Type representing ordered values.
4        NewType            new_type                             My personal type.

>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler

>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler

>>> FeatureTypeManager.feature_type_unregister(NewType)
>>> FeatureTypeManager.feature_type_reset()
>>> FeatureTypeManager.feature_type_object('continuous')
Continuous

class ads.feature_engineering.feature_type_manager.FeatureTypeManager

Bases: object

Feature Types Manager class that manages feature types.

Provides functionalities to register, unregister, list feature types.

feature_type_object(cls, feature_type: Union[FeatureType, str]) → FeatureType: Gets a feature type by class object or name.

feature_type_register(cls, feature_type_cls: FeatureType) → None: Registers a feature type.

feature_type_unregister(cls, feature_type_cls: Union[FeatureType, str]) → None: Unregisters a feature type.

feature_type_reset(cls) → None: Resets feature types to be default.

feature_type_registered(cls) → pd.DataFrame: Lists all registered feature types as a DataFrame.

warning_registered(cls) → pd.DataFrame: Lists registered warnings for all registered feature types.

validator_registered(cls) → pd.DataFrame: Lists registered validators for all registered feature types.

Examples

>>> from ads.feature_engineering.feature_type.base import FeatureType
>>> class NewType(FeatureType):
...    pass
>>> FeatureTypeManager.register_feature_type(NewType)
>>> FeatureTypeManager.feature_type_registered()
            Name      Feature Type                                  Description
-------------------------------------------------------------------------------
0     Continuous        continuous          Type representing continuous values.
1       DateTime         date_time           Type representing date and/or time.
2       Category          category  Type representing discrete unordered values.
3        Ordinal           ordinal             Type representing ordered values.

>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler

>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler

>>> FeatureTypeManager.feature_type_unregister(NewType)
>>> FeatureTypeManager.feature_type_reset()
>>> FeatureTypeManager.feature_type_object('continuous')
Continuous

classmethod feature_type_object(feature_type: Union[FeatureType, str]) → FeatureType

Gets a feature type by class object or name.

Parameters:

feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.

Returns:

Found feature type.

Return type:

FeatureType

Raises:

TypeNotFound – If provided feature type not registered.
TypeError – If provided feature type not a subclass of FeatureType.

classmethod feature_type_register(feature_type_cls: FeatureType) → None

Registers new feature type.

Parameters:

feature_type (FeatureType) – Subclass of FeatureType to be registered.

Returns:

Nothing.

Return type:

None

Raises:

TypeError – Type is not a subclass of FeatureType.
TypeError – Type has already been registered.
NameError – Name has already been used.

classmethod feature_type_registered() → DataFrame

Lists all registered feature types as a DataFrame.

Returns:: The list of feature types in a DataFrame format.
Return type:: pd.DataFrame

classmethod feature_type_reset() → None

Resets feature types to be default.

Returns:: Nothing.
Return type:: None

classmethod feature_type_unregister(feature_type: Union[FeatureType, str]) → None

Unregisters a feature type.

Parameters:: feature_type ((FeatureType | str)) – The FeatureType subclass or a str indicating feature type.
Returns:: Nothing.
Return type:: None
Raises:: TypeError – In attempt to unregister a default feature type.

classmethod is_type_registered(feature_type: Union[FeatureType, str]) → bool

Checks if provided feature type registered in the system.

Parameters:: feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.
Returns:: True if provided feature type registered, False otherwise.
Return type:: bool

classmethod validator_registered() → DataFrame

Lists registered validators for registered feature types.

Returns:: The list of registered validators for registered feature types in a DataFrame format.
Return type:: pd.DataFrame

Examples

>>> FeatureTypeManager.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler
2    credit_card       is_credit_card                        ()             default_handler

classmethod warning_registered() → DataFrame

Lists registered warnings for all registered feature types.

Returns:: The list of registered warnings for registered feature types in a DataFrame format.
Return type:: pd.DataFrame

Examples

>>> FeatureTypeManager.warning_registered()
    Feature Type             Warning                    Handler
----------------------------------------------------------------------
0     continuous               zeros              zeros_handler
1     continuous    high_cardinality   high_cardinality_handler

ads.feature_engineering.accessor.dataframe_accessor module

The ADS accessor for the Pandas DataFrame. The accessor will be initialized with the pandas object the user is interacting with.

Examples

>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
    >>> from ads.feature_engineering.feature_type.continuous import Continuous
    >>> from ads.feature_engineering.feature_type.creditcard import CreditCard
    >>> from ads.feature_engineering.feature_type.string import String
    >>> from ads.feature_engineering.feature_type.base import Tag
>>> df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
          Column   Feature Type                        Description
------------------------------------------------------------------
0           Name         string    Type representing string values.
1    Credit Card         string    Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
                    Credit Card
-------------------------------
0                 4532640527811543

class ads.feature_engineering.accessor.dataframe_accessor.ADSDataFrameAccessor(pandas_obj)

Bases: ADSFeatureTypesMixin, EDAMixin, DBAccessMixin, DataLabelingAccessMixin

ADS accessor for the Pandas DataFrame.

columns

The column labels of the DataFrame.

Type:: List[str]

tags(self) → Dict[str, str]: Gets the dictionary of user defined tags for the dataframe.

default_type(self) → Dict[str, str]: Gets the map of columns and associated default feature type names.

feature_type(self) → Dict[str, List[str]]: Gets the list of registered feature types.

feature_type_description(self) → pd.DataFrame: Gets the list of registered feature types in a DataFrame format.

sync(self, src: Union[pd.DataFrame, pd.Series]) → pd.DataFrame: Syncs feature types of current DataFrame with that from src.

feature_select(self, include: List[Union[FeatureType, str]] = None, exclude: List[Union[FeatureType, str]] = None) → pd.DataFrame: Gets the list of registered feature types in a DataFrame format.

help(self, prop: str = None) → None: Provids docstring for affordable methods and properties.

Examples

>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
>>> from ads.feature_engineering.feature_type.continuous import Continuous
>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.base import Tag
df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
          Column   Feature Type                        Description
-------------------------------------------------------------------
0           Name         string    Type representing string values.
1    Credit Card         string    Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
                   Credit Card
------------------------------
0             4532640527811543

Initializes ADS Pandas DataFrame Accessor.

Parameters:: pandas_obj (pandas.DataFrame) – Pandas dataframe
Raises:: ValueError – If provided DataFrame has duplicate columns.

property default_type: Dict[str, str]

Gets the map of columns and associated default feature type names.

Returns:: The dictionary where key is column name and value is the name of default feature type.
Return type:: Dict[str, str]

feature_select(include: Optional[List[Union[FeatureType, str]]] = None, exclude: Optional[List[Union[FeatureType, str]]] = None) → DataFrame

Returns a subset of the DataFrame’s columns based on the column feature_types.

Parameters:

include (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be included.
exclude (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be excluded.

Raises:

ValueError – If both of include and exclude are empty
ValueError – If include and exclude are used simultaneously

Returns:

The subset of the frame including the feature types in include and excluding the feature types in exclude.

Return type:

pandas.DataFrame

property feature_type: Dict[str, List[str]]

Gets the list of registered feature types.

Returns:: The dictionary where key is column name and value is list of associated feature type names.
Return type:: Dict[str, List[str]]

property feature_type_description: DataFrame

Gets the list of registered feature types in a DataFrame format.

Return type:: pandas.DataFrame

Examples

>>> df.ads.feature_type_description()
          Column   Feature Type                         Description
-------------------------------------------------------------------
0           City         string    Type representing string values.
1   Phone Number         string    Type representing string values.

info() → Any

Gets information about the dataframe.

Returns:: The information about the dataframe.
Return type:: Any

model_schema(max_col_num: int = 2000)

Generates schema from the dataframe.

Parameters:: max_col_num (int, optional. Defaults to 1000) – The maximum column size of the data that allows to auto generate schema.

Examples

>>> df = pd.read_csv('./orcl_attrition.csv', usecols=['Age', 'Attrition'])
>>> schema = df.ads.model_schema()
>>> schema
Schema:
    - description: Attrition
    domain:
        constraints: []
        stats:
        count: 1470
        unique: 2
        values: String
    dtype: object
    feature_type: String
    name: Attrition
    required: true
    - description: Age
    domain:
        constraints: []
        stats:
        25%: 31.0
        50%: 37.0
        75%: 44.0
        count: 1470.0
        max: 61.0
        mean: 37.923809523809524
        min: 19.0
        std: 9.135373489136732
        values: Integer
    dtype: int64
    feature_type: Integer
    name: Age
    required: true
>>> schema.to_dict()
{'Schema': [{'dtype': 'object',
    'feature_type': 'String',
    'name': 'Attrition',
    'domain': {'values': 'String',
        'stats': {'count': 1470, 'unique': 2},
        'constraints': []},
    'required': True,
    'description': 'Attrition'},
    {'dtype': 'int64',
    'feature_type': 'Integer',
    'name': 'Age',
    'domain': {'values': 'Integer',
        'stats': {'count': 1470.0,
        'mean': 37.923809523809524,
        'std': 9.135373489136732,
        'min': 19.0,
        '25%': 31.0,
        '50%': 37.0,
        '75%': 44.0,
        'max': 61.0},
        'constraints': []},
    'required': True,
    'description': 'Age'}]}

Returns:: data schema.
Return type:: ads.feature_engineering.schema.Schema
Raises:: ads.feature_engineering.schema.DataSizeTooWide – If the number of columns of input data exceeds max_col_num.

sync(src: Union[DataFrame, Series]) → DataFrame

Syncs feature types of current DataFrame with that from src.

Syncs feature types of current dataframe with that from src, where src can be a dataframe or a series. In either case, only columns with matched names are synced.

Parameters:: src (pd.DataFrame | pd.Series) – The source to sync from.
Returns:: Synced dataframe.
Return type:: pandas.DataFrame

property tags: Dict[str, List[str]]

Gets the dictionary of user defined tags for the dataframe. Key is column name and value is list of tag names.

Returns:: The map of columns and associated default tags.
Return type:: Dict[str, List[str]]

ads.feature_engineering.accessor.series_accessor module

The ADS accessor for the Pandas Series. The accessor will be initialized with the pandas object the user is interacting with.

Examples

>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
    Feature Type                         Description
----------------------------------------------------
0         string    Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']

class ads.feature_engineering.accessor.series_accessor.ADSSeriesAccessor(pandas_obj: Series)

Bases: ADSFeatureTypesMixin, EDAMixinSeries

ADS accessor for Pandas Series.

name

The name of Series.

Type:: str

tags

The list of tags for the Series.

Type:: List[str]

help(self, prop: str = None) → None: Provids docstring for affordable methods and properties.

sync(self, src: Union[pd.DataFrame, pd.Series]) → None: Syncs feature types of current series with that from src.

default_type(self) → str: Gets the name of default feature type for the series.

feature_type(self) → List[str]: Gets the list of registered feature types for the series.

feature_type_description(self) → pd.DataFrame: Gets the list of registered feature types in a DataFrame format.

Examples

>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
    Feature Type                         Description
----------------------------------------------------
0         string    Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']

Initializes ADS Pandas Series Accessor.

Parameters:: pandas_obj (pd.Series) – The pandas series

property default_type: str

Gets the name of default feature type for the series.

Returns:: The name of default feature type.
Return type:: str

property feature_type: List[str]

Gets the list of registered feature types for the series.

Returns:: Names of feature types.
Return type:: List[str]

Examples

>>> series = pd.Series(['name1'])
>>> series.ads.feature_type = ['name', 'string', Tag('tag for name')]
>>> series.ads.feature_type
['name', 'string', 'tag for name']

property feature_type_description: DataFrame

Gets the list of registered feature types in a DataFrame format.

Returns:: The DataFrame with feature types for this series.
Return type:: pd.DataFrame

Examples

>>> series = pd.Series(['name1'])
>>> series.ads.feature_type = ['name', 'string', Tag('Name tag')]
>>> series.ads.feature_type_description
        Feature Type                               Description
    ----------------------------------------------------------
    0           name            Type representing name values.
    1         string          Type representing string values.
    2        Name tag                                     Tag.

sync(src: Union[DataFrame, Series]) → None

Syncs feature types of current series with that from src.

The src could be a dataframe or a series. In either case, only columns with matched names are synced.

Parameters:: src ((pd.DataFrame | pd.Series)) – The source to sync from.
Returns:: Nothing.
Return type:: None

Examples

>>> series = pd.Series(['name1', 'name2', 'name3', None])
>>> series.ads.feature_type = ['name']
>>> series.ads.feature_type
['name', string]
>>> series.dropna().ads.feature_type
['string']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['name', 'string']

class ads.feature_engineering.accessor.series_accessor.ADSSeriesValidator(feature_type_list: List[FeatureType], series: Series)

Bases: object

Class helper to invoke registerred validator on a series level.

Initializes ADS series validator.

Parameters:

feature_type_list (List[FeatureType]) – The list of feature types.
series (pd.Series) – The pandas series.

ads.feature_engineering.accessor.mixin.correlation module

ads.feature_engineering.accessor.mixin.correlation.cat_vs_cat(df: DataFrame, normal_form: bool = True) → DataFrame: Calculates the correlation of all pairs of categorical features and categorical features.

ads.feature_engineering.accessor.mixin.correlation.cat_vs_cont(df: DataFrame, categorical_columns, continuous_columns, normal_form: bool = True) → DataFrame: Calculates the correlation of all pairs of categorical features and continuous features.

ads.feature_engineering.accessor.mixin.correlation.cont_vs_cont(df: DataFrame, normal_form: bool = True) → DataFrame: Calculates the Pearson correlation between two columns of the DataFrame.

ads.feature_engineering.accessor.mixin.eda_mixin module

This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Dataframe. The series of purpose-driven methods enable the data scientist to complete analysis on the dataframe.

From the accessor we have access to the pandas object the user is interacting with as well as corresponding lists of feature types per column.

class ads.feature_engineering.accessor.mixin.eda_mixin.EDAMixin

Bases: object

correlation_ratio() → DataFrame

Generate a Correlation Ratio data frame for all categorical-continuous variable pairs.

Returns:

pandas.DataFrame
Correlation Ratio correlation data frame with the following 3 columns –
1. Column 1 (name of the first categorical/continuous column)
2. Column 2 (name of the second categorical/continuous column)
3. Value (correlation value)

Note

Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.

correlation_ratio_plot() → Axes

Generate a heatmap of the Correlation Ratio correlation for all categorical-continuous variable pairs.

Returns:: Correlation Ratio correlation plot object that can be updated by the customer
Return type:: Plot object

cramersv() → DataFrame

Generate a Cramer’s V correlation data frame for all categorical variable pairs.

Gives a warning for dropped non-categorical columns.

Returns:

Cramer’s V correlation data frame with the following 3 columns:

Column 1 (name of the first categorical column)
Column 2 (name of the second categorical column)
Value (correlation value)

Return type:

pandas.DataFrame

Note

Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.

cramersv_plot() → Axes

Generate a heatmap of the Cramer’s V correlation for all categorical variable pairs.

Gives a warning for dropped non-categorical columns.

Returns:: Cramer’s V correlation plot object that can be updated by the customer
Return type:: Plot object

feature_count() → DataFrame

Counts the number of columns for each feature type and each primary feature. The column of primary is the number of primary feature types that is assigned to the column.

Returns:: The number of columns for each feature type The number of columns for each primary feature
Return type:: Dataframe with

Examples

>>> df.ads.feature_type
{'PassengerId': ['ordinal', 'category'],
'Survived': ['ordinal'],
'Pclass': ['ordinal'],
'Name': ['category'],
'Sex': ['category']}
>>> df.ads.feature_count()
    Feature Type        Count       Primary
0       category            3             2
1        ordinal            3             3

feature_plot() → DataFrame

For every column in the dataframe plot generate a list of summary plots based on the most relevant feature type.

Returns:: Dataframe with 2 columns: 1. Column - feature name 2. Plot - plot object
Return type:: pandas.DataFrame

feature_stat() → DataFrame

Summary statistics Dataframe provided.

This returns feature stats on each column using FeatureType summary method.

Examples

>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv')
>>> df.ads.feature_stat().head()
         Column    Metric                       Value
0       PassengerId         count                       891.000
1       PassengerId         mean                        446.000
2       PassengerId         standard deviation      257.354
3       PassengerId         sample minimum          1.000
4       PassengerId         lower quartile              223.500

Returns:: Dataframe with 3 columns: name, metric, value
Return type:: pandas.DataFrame

pearson() → DataFrame

Generate a Pearson correlation data frame for all continuous variable pairs.

Gives a warning for dropped non-numerical columns.

Returns:

pandas.DataFrame
Pearson correlation data frame with the following 3 columns –
1. Column 1 (name of the first continuous column)
2. Column 2 (name of the second continuous column)
3. Value (correlation value)

Note

Pairs will be replicated. For example for variables x and y, we’d have (x,y), (y,x) both with same correlation value. We’ll also have (x,x) and (y,y) with value 1.0.

pearson_plot() → Axes

Generate a heatmap of the Pearson correlation for all continuous variable pairs.

Returns:: Pearson correlation plot object that can be updated by the customer
Return type:: Plot object

warning() → DataFrame

Generates a data frame that lists feature specific warnings.

Returns:: The list of feature specific warnings.
Return type:: pandas.DataFrame

Examples

>>> df.ads.warning()
    Column    Feature Type         Warning               Message       Metric    Value
--------------------------------------------------------------------------------------
0      Age      continuous           Zeros      Age has 38 zeros        Count       38
1      Age      continuous           Zeros   Age has 12.2% zeros   Percentage    12.2%

ads.feature_engineering.accessor.mixin.eda_mixin_series module

This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Series. The series of purpose-driven methods enable the data scientist to complete univariate analysis.

From the accessor we have access to the pandas object the user is interacting with as well as corresponding list of feature types.

class ads.feature_engineering.accessor.mixin.eda_mixin_series.EDAMixinSeries

Bases: object

feature_plot() → Axes

For the series generate a summary plot based on the most relevant feature type.

Returns:: Plot object for the series based on the most relevant feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

feature_stat() → DataFrame

Summary statistics Dataframe provided.

This returns feature stats on series using FeatureType summary method.

Examples

>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv')
>>> df['Cabin'].ads.feature_stat()
    Metric      Value
0       count       891
1       unqiue      147
2       missing     687

Returns:: Dataframe with 2 columns and rows for different metric values
Return type:: pandas.DataFrame

warning() → DataFrame

Generates a data frame that lists feature specific warnings.

Returns:: The list of feature specific warnings.
Return type:: pandas.DataFrame

Examples

>>> df["Age"].ads.warning()
  Feature Type       Warning               Message         Metric      Value
 ---------------------------------------------------------------------------
0   continuous         Zeros      Age has 38 zeros          Count         38
1   continuous         Zeros   Age has 12.2% zeros     Percentage      12.2%

ads.feature_engineering.accessor.mixin.feature_types_mixin module

The module that represents the ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.

Classes

ADSFeatureTypesMixin
ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.

class ads.feature_engineering.accessor.mixin.feature_types_mixin.ADSFeatureTypesMixin

Bases: object

ADS Feature Types Mixin class that extends Pandas Series and DataFrame accessors.

warning_registered(cls) → pd.DataFrame: Lists registered warnings for registered feature types.

validator_registered(cls) → pd.DataFrame: Lists registered validators for registered feature types.

help(self, prop: str = None) → None: Help method that prints either a table of available properties or, given a property, returns its docstring.

help(prop: Optional[str] = None) → None

Help method that prints either a table of available properties or, given an individual property, returns its docstring.

Parameters:: prop (str) – The Name of property.
Returns:: Nothing.
Return type:: None

validator_registered() → DataFrame

Lists registered validators for registered feature types.

Returns:: The list of registered validators for registered feature types
Return type:: pandas.DataFrame

Examples

>>> df.ads.validator_registered()
         Column     Feature Type        Validator                 Condition                    Handler
------------------------------------------------------------------------------------------------------
0   PhoneNumber    phone_number   is_phone_number                        ()            default_handler
1   PhoneNumber    phone_number   is_phone_number    {'country_code': '+7'}   specific_country_handler
2    CreditCard    credit_card     is_credit_card                        ()            default_handler

>>> df['PhoneNumber'].ads.validator_registered()
    Feature Type            Validator                 Condition                     Handler
-------------------------------------------------------------------------------------------
0   phone_number      is_phone_number                        ()             default_handler
1   phone_number      is_phone_number    {'country_code': '+7'}    specific_country_handler

warning_registered() → DataFrame

Lists registered warnings for all registered feature types.

Returns:: The list of registered warnings for registered feature types.
Return type:: pandas.DataFrame

Examples

>>> df.ads.warning_registered()
       Column    Feature Type             Warning                    Handler
   -------------------------------------------------------------------------
   0      Age      continuous               zeros              zeros_handler
   1      Age      continuous    high_cardinality   high_cardinality_handler

>>> df["Age"].ads.warning_registered()
       Feature Type             Warning                    Handler
   ---------------------------------------------------------------
   0     continuous               zeros              zeros_handler
   1     continuous    high_cardinality   high_cardinality_handler

ads.feature_engineering.adsstring.common_regex_mixin module

class ads.feature_engineering.adsstring.common_regex_mixin.CommonRegexMixin

Bases: object

property address

property credit_card

property date

property email

property ip

property link

property phone_number_US

property price

redact(fields: Union[List[str], Dict[str, str]]) → str

Remove personal information in a string. For example, “Jane’s phone number is 123-456-7890” is turned into “Jane’s phone number is [phone_number_US].”

Parameters:: fields ((list(str) | dict)) – either a list of fields to redact, e.g. [‘email’, ‘phone_number_US’], in which case the redacted text is replaced with capitalized word like [EMAIL] or [PHONE_NUMBER_US_WITH_EXT], or a dictionary where key is a field to redact and value is the replacement text, e.g., {‘email’: ‘HIDDEN_EMAIL’}.
Returns:: redacted string
Return type:: str

redact_map = {'address': '[ADDRESS]', 'address_with_zip': '[ADDRESS_WITH_ZIP]', 'credit_card': '[CREDIT_CARD]', 'date': '[DATE]', 'email': '[EMAIL]', 'ip': '[IP]', 'ipv6': '[IPV6]', 'link': '[LINK]', 'phone_number_US': '[PHONE_NUMBER_US]', 'phone_number_US_with_ext': '[PHONE_NUMBER_US_WITH_EXT]', 'po_box': '[PO_BOX]', 'price': '[PRICE]', 'ssn': '[SSN]', 'time': '[TIME]', 'zip_code': '[ZIP_CODE]'}

property ssn

property time

property zip_code

ads.feature_engineering.adsstring.oci_language module

ads.feature_engineering.adsstring.string module

ads.feature_engineering.feature_type.address module

The module that represents an Address feature type.

Classes:

Address: The Address feature type.

class ads.feature_engineering.feature_type.address.Address

Bases: String

Type representing address.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the location of given address on map base on zip code.

Example

>>> from ads.feature_engineering.feature_type.address import Address
>>> import pandas as pd
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
                        '1 Berkeley Street, Boston, MA 67891',
                        '54305 Oxford Street, Seattle, WA 95132',
                        ''])
>>> Address.validator.is_address(address)
0     True
1     True
2     True
3    False
dtype: bool

description = 'Type representing address.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 1
    unique: 3
values: Address

Returns:: Domain based on the Address feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the location of given address on map base on zip code.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_plot()

Returns:: Plot object for the series based on the Address feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  3
2       missing 1

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.address.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pd.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.base module

class ads.feature_engineering.feature_type.base.FeatureBaseType(classname, bases, dictionary)

Bases: type

The helper metaclass to extend fucntionality of FeatureType class.

class ads.feature_engineering.feature_type.base.FeatureBaseTypeMeta(classname, bases, dictionary)

Bases: FeatureBaseType, ABCMeta

The class to provide compatibility between ABC and FeatureBaseType metaclass.

class ads.feature_engineering.feature_type.base.FeatureType

Bases: ABC

Abstract case for feature types. Default class attribute include name and description. Name is auto generated using camel to snake conversion unless specified.

description = 'Base feature type.'

name = 'feature_type'

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

class ads.feature_engineering.feature_type.base.Name: Bases: object

class ads.feature_engineering.feature_type.base.Tag(name: str)

Bases: object

Class for free form tags. Name must be specified.

Initialize a tag instance.

Parameters:: name (str) – The name of the tag.

ads.feature_engineering.feature_type.boolean module

The module that represents a Boolean feature type.

Classes:

Boolean: The feature type that represents binary values True/False.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.boolean.Boolean

Bases: FeatureType

Type representing binary values True/False.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Show the counts of observations in True/False using bars.

Examples

>>> from ads.feature_engineering.feature_type.boolean import Boolean
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> Boolean.validator.is_boolean(s)
0     True
1     True
2     True
3     True
4    False
5    False
dtype: bool

description = 'Type representing binary values True/False.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_domain()
constraints:
- expression: $x in [True, False]
    language: python
stats:
    count: 6
    missing: 2
    unique: 2
values: Boolean

Returns:: Domain based on the Boolean feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in True/False using bars.

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Plot object for the series based on the Boolean feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_plot()

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.boolean.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.category module

The module that represents a Category feature type.

Classes:

Category: The Category feature type.

class ads.feature_engineering.feature_type.category.Category

Bases: FeatureType

Type representing discrete unordered values.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing discrete unordered values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='category')
>>> cat.ads.feature_type = ['category']
>>> cat.ads.feature_domain()
constraints:
- expression: $x in ['S', 'C', 'Q', '']
    language: python
stats:
    count: 22
    missing: 3
    unique: 3
values: Category

Returns:: Domain based on the Category feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in each categorical bin using bar chart.

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Plot object for the series based on the Category feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_plot()

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there are any.

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.constant module

The module that represents a Constant feature type.

Classes:

Constant: The Constant feature type.

class ads.feature_engineering.feature_type.constant.Constant

Bases: FeatureType

Type representing constant values.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the counts of observations in bars.

description = 'Type representing constant values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type. .. rubric:: Example

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 5
    unique: 1
values: Constant

Returns:: Domain based on the Constant feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in bars.

Parameters:: x (pandas.Series) – The feature being shown.

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_plot()

Returns:: Plot object for the series based on the Constant feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_stat()
    Metric  Value
0       count   5
1       unique  1

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.continuous module

The module that represents a Continuous feature type.

Classes:

Continuous: The Continuous feature type.

class ads.feature_engineering.feature_type.continuous.Continuous

Bases: FeatureType

Type representing continuous values.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using box plot.

description = 'Type representing continuous values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_domain()
constraints: []
stats:
    count: 10.0
    lower quartile: 3.058
    mean: 4.959
    median: 3.81
    missing: 2.0
    sample maximum: 13.32
    sample minimum: 2.25
    skew: 2.175
    standard deviation: 3.62
    upper quartile: 4.908
values: Continuous

Returns:: Domain based on the Continuous feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datasets using box plot.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feture_plot()

Returns:: Plot object for the series based on the Continuous feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, skew and missing(count).

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_stat()
    Metric                  Value
0       count                   10.000
1       mean                    4.959
2       standard deviation          3.620
3       sample minimum          2.250
4       lower quartile          3.058
5       median                  3.810
6       upper quartile          4.908
7       sample maximum          13.320
8       skew                    2.175
9       missing                 2.000

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.creditcard module

The module that represents a CreditCard feature type.

Classes:

CreditCard: The CreditCard feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.
_luhn_checksum(card_number: str) -> float: Implements Luhn algorithm to validate a credit card number.

class ads.feature_engineering.feature_type.creditcard.CreditCard

Bases: String

Type representing credit card numbers.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the counts of observations in each credit card type using bar chart.

Examples

>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> import pandas as pd
>>> s = pd.Series(["4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190"], name='credit_card')
>>> s.ads.feature_type = ['credit_card']
>>> CreditCard.validator.is_credit_card(s)
0     True
1    False
2     True
3     True
4     True
5     True
Name: credit_card, dtype: bool

description = 'Type representing credit card numbers.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_domain()
constraints: []
stats:
    count: 16
    count_Amex: 5
    count_Diners Club: 2
    count_MasterCard: 3
    count_Visa: 5
    count_missing: 1
    missing: 1
    unique: 15
values: CreditCard

Returns:: Domain based on the CreditCard feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in each credit card type using bar chart.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_plot()

Returns:: Plot object for the series based on the CreditCard feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series)

Generates feature statistics.

Feature statistics include (total)count, unique(count), missing(count) and: count of each credit card type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_stat()
    Metric              Value
0       count               16
1       unique              15
2       missing             1
3       count_Amex              5
4       count_Visa              5
5       count_MasterCard        3
6       count_Diners Club       2
7       count_missing       1

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.creditcard.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.datetime module

The module that represents a DateTime feature type.

Classes:

DateTime: The DateTime feature type.

class ads.feature_engineering.feature_type.datetime.DateTime

Bases: FeatureType

Type representing date and/or time.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datetime datasets using histograms.

Example

>>> from ads.feature_engineering.feature_type.datetime import DateTime
>>> import pandas as pd
>>> s = pd.Series(["12/12/12", "12/12/13", None, "12/12/14"], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> DateTime.validator.is_datetime(s)
0     True
1     True
2    False
3     True
Name: datetime, dtype: bool

description = 'Type representing date and/or time.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 8
    missing: 3
    sample maximum: April/15/11
    sample minimum: 3/11/2000
values: DateTime

Returns:: Domain based on the DateTime feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datetime datasets using histograms.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_plot()

Returns:: Plot object for the series based on the DateTime feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, sample maximum, sample minimum, and missing(count) if there is any.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_stat()
    Metric              Value
0       count               8
1       sample maximum      April/15/11
2       sample minimum      3/11/2000
3       missing             3

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.datetime.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.discrete module

The module that represents a Discrete feature type.

Classes:

Discrete: The Discrete feature type.

class ads.feature_engineering.feature_type.discrete.Discrete

Bases: FeatureType

Type representing discrete values.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using box plot.

description = 'Type representing discrete values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_domain()
constraints: []
stats:
    count: 4
    unique: 4
values: Discrete

Returns:: Domain based on the Discrete feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datasets using box plot.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  4

Returns:: Plot object for the series based on the Discrete feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
            discrete
count   4
unique  4

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.document module

The module that represents a Document feature type.

Classes:

Document: The Document feature type.

class ads.feature_engineering.feature_type.document.Document

Bases: FeatureType

Type representing document values.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

description = 'Type representing document values.'

classmethod feature_domain()

Returns:: Nothing.
Return type:: None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.gis module

The module that represents a GIS feature type.

Classes:

GIS: The GIS feature type.

class ads.feature_engineering.feature_type.gis.GIS

Bases: FeatureType

Type representing geographic information.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.gis import GIS
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='gis')
>>> s.ads.feature_type = ['gis']
>>> GIS.validator.is_gis(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: gis, dtype: bool

description = 'Type representing geographic information.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: GIS

Returns:: Domain based on the GIS feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the location of given address on map base on longitude and latitute.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_plot()

Returns:: Plot object for the series based on the GIS feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_stat()
        gis
count   13
unique  10
missing 3

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.gis.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.integer module

The module that represents an Integer feature type.

Classes:

Integer: The Integer feature type.

class ads.feature_engineering.feature_type.integer.Integer

Bases: FeatureType

Type representing integer values.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using box plot.

description = 'Type representing integer values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='integer')
>>> s.ads.feature_type = ['integer']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    freq: 2
    missing: 2
    top: true
    unique: 2
values: Integer

Returns:: Domain based on the Integer feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datasets using box plot.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_plot()

Returns:: Plot object for the series based on the Integer feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, max and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_stat()
    Metric                  Value
0       count                   7
1       mean                    1
2       standard deviation          1
3       sample minimum          0
4       lower quartile          1
5       median                  1
6       upper quartile          2
7       sample maximum          4
8       missing                 1

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address module

The module that represents an IpAddress feature type.

Classes:

IpAddress: The IpAddress feature type.

class ads.feature_engineering.feature_type.ip_address.IpAddress

Bases: FeatureType

Type representing IP Address.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address import IpAddress
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> IpAddress.validator.is_ip_address(s)
0     True
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool

description = 'Type representing IP Address.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 3
values: IpAddress

Returns:: Domain based on the IpAddress feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.ip_address_v4 module

The module that represents an IpAddressV4 feature type.

Classes:

IpAddressV4: The IpAddressV4 feature type.

class ads.feature_engineering.feature_type.ip_address_v4.IpAddressV4

Bases: FeatureType

Type representing IP Address V4.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v4 import IpAddressV4
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> IpAddressV4.validator.is_ip_address_v4(s)
0     True
1    False
2    False
3    False
4    False
Name: ip_address, dtype: bool

description = 'Type representing IP Address V4.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address_v4')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 4
values: IpAddressV4

Returns:: Domain based on the IpAddressV4 feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  4
2       missing 2

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address_v4.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.ip_address_v6 module

The module that represents an IpAddressV6 feature type.

Classes:

IpAddressV6: The IpAddressV6 feature type.

class ads.feature_engineering.feature_type.ip_address_v6.IpAddressV6

Bases: FeatureType

Type representing IP Address V6.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v6 import IpAddressV6
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> IpAddressV6.validator.is_ip_address_v6(s)
0    False
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool

description = 'Type representing IP Address V6.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address_v6')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 2
values: IpAddressV6

Returns:: Domain based on the IpAddressV6 feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2

Returns:: Summary statistics of the Series provided.
Return type:: Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ip_address_v6.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.lat_long module

The module that represents a LatLong feature type.

Classes:

LatLong: The LatLong feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.lat_long.LatLong

Bases: String

Type representing longitude and latitute.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.lat_long import LatLong
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='latlong')
>>> s.ads.feature_type = ['lat_long']
>>> LatLong.validator.is_lat_long(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: latlong, dtype: bool

description = 'Type representing longitude and latitute.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> latlong_series = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: LatLong

Returns:: Domain based on the LatLong feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the location of given address on map base on longitude and latitute.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_plot()

Returns:: Plot object for the series based on the LatLong feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generate feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_stat()
    Metric  Value
0       count   13
1       unique  10
2       missing 3

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.lat_long.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.object module

The module that represents an Object feature type.

Classes:

Object: The Object feature type.

class ads.feature_engineering.feature_type.object.Object

Bases: FeatureType

Type representing object.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

description = 'Type representing object.'

classmethod feature_domain()

Returns:: Nothing.
Return type:: None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.ordinal module

The module that represents an Ordinal feature type.

Classes:

Ordinal: The Ordinal feature type.

class ads.feature_engineering.feature_type.ordinal.Ordinal

Bases: FeatureType

Type representing ordered values.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing ordered values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_domain()
constraints:
- expression: $x in [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
    language: python
stats:
    count: 10
    missing: 1
    unique: 9
values: Ordinal

Returns:: Domain based on the Ordinal feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the counts of observations in each categorical bin using bar chart.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_plot()

Returns:: The bart chart plot object for the series based on the Continuous feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count), and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_stat()
    Metric  Value
0       count   10
1       unique  9
2       missing 1

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.phone_number module

The module that represents a Phone Number feature type.

Classes:

PhoneNumber: The Phone Number feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.phone_number.PhoneNumber

Bases: String

Type representing phone numbers.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

Examples

>>> from ads.feature_engineering.feature_type.phone_number import PhoneNumber
>>> import pandas as pd
>>> s = pd.Series([None, "1-640-124-5367", "1-573-916-4412"])
>>> PhoneNumber.validator.is_phone_number(s)
0    False
1     True
2     True
dtype: bool

description = 'Type representing phone numbers.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 7
    missing: 4
    unique: 2
values: PhoneNumber

Returns:: Domain based on the PhoneNumber feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_stat()
    Metric  Value
1       count   7
2       unique  2
3       missing 4

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.phone_number.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.string module

The module that represents a String feature type.

Classes:

String: The feature type that represents string values.

class ads.feature_engineering.feature_type.string.String

Bases: FeatureType

Type representing string values.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using wordcloud.

Example

>>> from ads.feature_engineering.feature_type.string import String
>>> import pandas as pd
>>> s = pd.Series(["Hello", "world", None], name='string')
>>> String.validator.is_string(s)
0     True
1     True
2    False
Name: string, dtype: bool

description = 'Type representing string values.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_domain()
constraints: []
stats:
    count: 22
    missing: 3
    unique: 3
values: String

Returns:: Domain based on the String feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows distributions of datasets using wordcloud.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_plot()

Returns:: Plot object for the series based on the String feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.string.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pd.Series) – The data to process.
Returns:: pd.Series
Return type:: The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.text module

The module that represents a Text feature type.

Classes:

Text: The Text feature type.

class ads.feature_engineering.feature_type.text.Text

Bases: String

Type representing text values.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_plot(x: pd.Series) → plt.Axes: Shows distributions of datasets using wordcloud.

description = 'Type representing text values.'

classmethod feature_domain()

Returns:: Nothing.
Return type:: None

static feature_plot(x: Series) → Axes

Shows distributions of datasets using wordcloud.

Examples

>>> text = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='text')
>>> text.ads.feature_type = ['text']
>>> text.ads.feature_plot()

Returns:: Plot object for the series based on the Text feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.unknown module

The module that represents an Unknown feature type.

Classes:

Text: The Unknown feature type.

class ads.feature_engineering.feature_type.unknown.Unknown

Bases: FeatureType

Type representing third-party dtypes.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

description = 'Type representing unknown type.'

classmethod feature_domain()

Returns:: Nothing.
Return type:: None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.zip_code module

The module that represents a ZipCode feature type.

Classes:

ZipCode: The ZipCode feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.zip_code.ZipCode

Bases: String

Type representing postal code.

description

The feature type description.

Type:: str

name

The feature type name.

Type:: str

warning

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes: Shows the geometry distribution base on location of zipcode.

Example

>>> from ads.feature_engineering.feature_type.zip_code import ZipCode
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(["94065", "90210", np.NaN, None], name='zipcode')
>>> ZipCode.validator.is_zip_code(s)
0     True
1     True
2    False
3    False
Name: zipcode, dtype: bool

description = 'Type representing postal code.'

classmethod feature_domain(x: Series) → Domain

Generate the domain of the data of this feature type.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 2
    unique: 2
values: ZipCode

Returns:: Domain based on the ZipCode feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes

Shows the geometry distribution base on location of zipcode.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_plot()

Returns:: Plot object for the series based on the ZipCode feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  2
2       missing 2

Returns:: Summary statistics of the Series provided.
Return type:: Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>

ads.feature_engineering.feature_type.zip_code.default_handler(data: Series, *args, **kwargs) → Series

Processes given data and indicates if the data matches requirements.

Parameters:: data (pd.Series) – The data to process.
Returns:: pd.Series
Return type:: The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.handler.feature_validator module

The module that helps to register custom validators for the feature types and extending registered validators with dispatching based on the specific arguments.

Classes

FeatureValidator
The Feature Validator class to manage custom validators.

FeatureValidatorMethod
The Feature Validator Method class. Extends methods which requires dispatching based on the specific arguments.

class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator

Bases: object

The Feature Validator class to manage custom validators.

register(self, name: str, handler: Callable, condition: Union[Tuple, Dict[str, Any]] = None, replace: bool = False) → None: Registers new validator.

unregister(self, name: str, condition: Union[Tuple, Dict[str, Any]] = None) → None: Unregisters validator.

registered(self) → pd.DataFrame: Gets the list of registered validators.

Examples

>>> series = pd.Series(['+1-202-555-0141', '+1-202-555-0142'], name='Phone Number')

>>> def phone_number_validator(data: pd.Series) -> pd.Series:
...    print("phone_number_validator")
...    return data

>>> def universal_phone_number_validator(data: pd.Series, country_code) -> pd.Series:
...    print("universal_phone_number_validator")
...    return data

>>> def us_phone_number_validator(data: pd.Series, country_code) -> pd.Series:
...    print("us_phone_number_validator")
...    return data

>>> PhoneNumber.validator.register(name="is_phone_number", handler=phone_number_validator, replace=True)
>>> PhoneNumber.validator.register(name="is_phone_number", handler=universal_phone_number_validator, condition = ('country_code',))
>>> PhoneNumber.validator.register(name="is_phone_number", handler=us_phone_number_validator, condition = {'country_code':'+1'})

>>> PhoneNumber.validator.is_phone_number(series)
    phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

>>> PhoneNumber.validator.is_phone_number(series, country_code = '+7')
    universal_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

>>> PhoneNumber.validator.is_phone_number(series, country_code = '+1')
    us_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

>>> PhoneNumber.validator.registered()
               Validator                 Condition                            Handler
    ---------------------------------------------------------------------------------
    0    is_phone_number                        ()             phone_number_validator
    1    is_phone_number          ('country_code')   universal_phone_number_validator
    2    is_phone_number    {'country_code': '+1'}          us_phone_number_validator

>>> series.ads.validator.is_phone_number()
    phone_number_validator
        0     +1-202-555-0141
        1     +1-202-555-0142

>>> series.ads.validator.is_phone_number(country_code = '+7')
    universal_phone_number_validator
        0     +1-202-555-0141
        1     +1-202-555-0142

>>> series.ads.validator.is_phone_number(country_code = '+1')
    us_phone_number_validator
    0     +1-202-555-0141
    1     +1-202-555-0142

Initializes the FeatureValidator.

register(name: str, handler: Callable, condition: Optional[Union[Tuple, Dict[str, Any]]] = None, replace: bool = False) → None

Registers new validator.

Parameters:

name (str) – The validator name.
handler (callable) – The handler.
condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator.
replace (bool) – The flag indicating if the registered validator should be replaced with the new one.

Returns:

Nothing.

Return type:

None

Raises:

ValueError – The name is empty or handler is not provided.
TypeError – The handler is not callable. The name of the validator is not a string.
ValidatorAlreadyExists – The validator is already registered.

registered() → DataFrame

Gets the list of registered validators.

Returns:: The list of registerd validators.
Return type:: pd.DataFrame

unregister(name: str, condition: Optional[Union[Tuple, Dict[str, Any]]] = None) → None

Unregisters validator.

Parameters:

name (str) – The name of the validator to be unregistered.
condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator to be unregistered.

Returns:

Nothing.

Return type:

None

Raises:

TypeError – The name of the validator is not a string.
ValidatorNotFound – The validator not found.
ValidatorWIthConditionNotFound – The validator with provided condition not found.

class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidatorMethod(handler: Callable)

Bases: object

The Feature Validator Method class.

Extends methods which requires dispatching based on the specific arguments.

register(self, condition: Union[Tuple, Dict[str, Any]], handler: Callable) → None: Registers new handler.

unregister(self, condition: Union[Tuple, Dict[str, Any]]) → None: Unregisters existing handler.

registered(self) → pd.DataFrame: Gets the list of registered handlers.

Initializes the Feature Validator Method.

Parameters:: handler (Callable) – The handler that will be called by default if suitable one not found.

register(condition: Union[Tuple, Dict[str, Any]], handler: Callable) → None

Registers new handler.

Parameters:

condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to register a new handler.
handler (Callable) – The handler to be registered.

Returns:

Nothing.

Return type:

None

Raises:

ValueError – If condition not provided or provided in the wrong format. If handler not provided or has wrong format.

registered() → DataFrame

Gets the list of registered handlers.

Returns:: The list of registerd handlers.
Return type:: pd.DataFrame

unregister(condition: Union[Tuple, Dict[str, Any]]) → None

Unregisters existing handler.

Parameters:: condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to unregister a handler.
Returns:: Nothing.
Return type:: None
Raises:: ValueError – If condition not provided or provided in the wrong format. If condition not registered.

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorAlreadyExists(name: str): Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorNotFound(name: str): Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionAlreadyExists(name: str): Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionNotFound(name: str): Bases: ValueError

exception ads.feature_engineering.feature_type.handler.feature_validator.WrongHandlerMethodSignature(handler_name: str, condition: str, handler_signature: str): Bases: ValueError

ads.feature_engineering.feature_type.handler.feature_warning module

The module that helps to register custom warnings for the feature types.

Classes

FeatureWarning
The Feature Warning class. Provides functionality to register warning handlers and invoke them.

Examples

>>> warning = FeatureWarning()
>>> def warning_handler_zeros_count(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 38 zeros', 'Count', 38]],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> def warning_handler_zeros_percentage(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> warning.register(name="zeros_count", handler=warning_handler_zeros_count)
>>> warning.register(name="zeros_percentage", handler=warning_handler_percentage)
>>> warning.registered()
                    Name                               Handler
    ----------------------------------------------------------
    0         zeros_count          warning_handler_zeros_count
    1    zeros_percentage     warning_handler_zeros_percentage

>>> warning.zeros_percentage(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38

>>> warning.zeros_count(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%

>>> warning(data_series)
        Warning                    Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%

>>> warning.unregister('zeros_count')
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros   Age has 12.2% zeros     Percentage      12.2%

class ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning

Bases: object

The Feature Warning class.

Provides functionality to register warning handlers and invoke them.

register(self, name: str, handler: Callable) → None: Registers a new warning for the feature type.

unregister(self, name: str) → None: Unregisters warning.

registered(self) → pd.DataFrame: Gets the list of registered warnings.

Examples

>>> warning = FeatureWarning()
>>> def warning_handler_zeros_count(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 38 zeros', 'Count', 38]],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> def warning_handler_zeros_percentage(data):
...    return pd.DataFrame(
...        [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']],
...        columns=['Warning', 'Message', 'Metric', 'Value'])
>>> warning.register(name="zeros_count", handler=warning_handler_zeros_count)
>>> warning.register(name="zeros_percentage", handler=warning_handler_percentage)
>>> warning.registered()
                  Warning                              Handler
    ----------------------------------------------------------
    0         zeros_count          warning_handler_zeros_count
    1    zeros_percentage     warning_handler_zeros_percentage

>>> warning.zeros_percentage(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38

>>> warning.zeros_count(data_series)
              Warning              Message         Metric      Value
    ----------------------------------------------------------------
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%

>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros      Age has 38 zeros          Count         38
    1          Zeros   Age has 12.2% zeros     Percentage      12.2%

>>> warning.unregister('zeros_count')
>>> warning(data_series)
             Warning               Message         Metric      Value
    ----------------------------------------------------------------
    0          Zeros   Age has 12.2% zeros     Percentage      12.2%

Initializes the FeatureWarning.

register(name: str, handler: Callable, replace: bool = False) → None

Registers a new warning.

Parameters:

name (str) – The warning name.
handler (callable) – The handler associated with the warning.
replace (bool) – The flag indicating if the registered warning should be replaced with the new one.

Returns:

Nothing

Return type:

None

Raises:

ValueError – If warning name is empty or handler not defined.
TypeError – If handler is not callable.
WarningAlreadyExists – If warning is already registered.

registered() → DataFrame

Gets the list of registered warnings.

Return type:: pd.DataFrame

Examples

>>>    The list of registerd warnings in DataFrame format.
                     Name                               Handler
    -----------------------------------------------------------
    0         zeros_count           warning_handler_zeros_count
    1    zeros_percentage      warning_handler_zeros_percentage

unregister(name: str) → None

Unregisters warning.

Parameters:

name (str) – The name of warning to be unregistered.

Returns:

Nothing.

Return type:

None

Raises:

ValueError – If warning name is not provided or empty.
WarningNotFound – If warning not found.

ads.feature_engineering.feature_type.handler.warnings module

The module with all default warnings provided to user. These are registered to relevant feature types directly in the feature type files themselves.

ads.feature_engineering.feature_type.handler.warnings.high_cardinality_handler(s: Series) → DataFrame

Warning if number of unique values (including Nan) in series is greater than or equal to 15.

Parameters:: s (pd.Series) – Pandas series - column of some feature type.
Returns:: Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists count of unique values.
Return type:: pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.missing_values_handler(s: Series) → DataFrame

Warning for > 5 percent missing values (Nans) in series.

Parameters:: s (pd.Series) – Pandas series - column of some feature type.
Returns:: Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of missing values and second is percentage of missing values.
Return type:: pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.skew_handler(s: Series) → DataFrame

Warning if absolute value of skew is greater than 1.

Parameters:: s (pd.Series) – Pandas series - column of some feature type, expects continuous values.
Returns:: Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists skew value of that column.
Return type:: pd.Dataframe

ads.feature_engineering.feature_type.handler.warnings.zeros_handler(s: Series) → DataFrame

Warning for greater than 10 percent zeros in series.

Parameters:: s (pd.Series) – Pandas series - column of some feature type.
Returns:: Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of zero values and second is percentage of zero values.
Return type:: pd.Dataframe

ads.feature_engineering package

Submodules

ads.feature_engineering.exceptions module

ads.feature_engineering.feature_type_manager module

Classes

ads.feature_engineering.accessor.dataframe_accessor module

ads.feature_engineering.accessor.series_accessor module

ads.feature_engineering.accessor.mixin.correlation module

ads.feature_engineering.accessor.mixin.eda_mixin module

ads.feature_engineering.accessor.mixin.eda_mixin_series module

ads.feature_engineering.accessor.mixin.feature_types_mixin module

Classes

ads.feature_engineering.adsstring.common_regex_mixin module

ads.feature_engineering.adsstring.oci_language module

ads.feature_engineering.adsstring.string module

ads.feature_engineering.feature_type.address module

ads.feature_engineering.feature_type.base module

ads.feature_engineering.feature_type.boolean module

ads.feature_engineering.feature_type.category module

ads.feature_engineering.feature_type.constant module

ads.feature_engineering.feature_type.continuous module

ads.feature_engineering.feature_type.creditcard module

ads.feature_engineering.feature_type.datetime module

ads.feature_engineering.feature_type.discrete module

ads.feature_engineering.feature_type.document module

ads.feature_engineering.feature_type.gis module

ads.feature_engineering.feature_type.integer module

ads.feature_engineering.feature_type.ip_address module

ads.feature_engineering.feature_type.ip_address_v4 module

ads.feature_engineering.feature_type.ip_address_v6 module

ads.feature_engineering.feature_type.lat_long module

ads.feature_engineering.feature_type.object module

ads.feature_engineering.feature_type.ordinal module

ads.feature_engineering.feature_type.phone_number module

ads.feature_engineering.feature_type.string module

ads.feature_engineering.feature_type.text module

ads.feature_engineering.feature_type.unknown module

ads.feature_engineering.feature_type.zip_code module

ads.feature_engineering.feature_type.handler.feature_validator module

Classes

ads.feature_engineering.feature_type.handler.feature_warning module

Classes

ads.feature_engineering.feature_type.handler.warnings module

Module contents