ads.feature_engineering package
Submodules
ads.feature_engineering.exceptions module
- exception ads.feature_engineering.exceptions.InvalidFeatureType(tname: str)
Bases:
TypeError
- exception ads.feature_engineering.exceptions.NameAlreadyRegistered(name: str)
Bases:
NameError
- exception ads.feature_engineering.exceptions.TypeAlreadyAdded(tname: str)
Bases:
TypeError
- exception ads.feature_engineering.exceptions.TypeAlreadyRegistered(tname: str)
Bases:
TypeError
- exception ads.feature_engineering.exceptions.TypeNotFound(tname: str)
Bases:
TypeError
- exception ads.feature_engineering.exceptions.WarningAlreadyExists(name: str)
Bases:
ValueError
- exception ads.feature_engineering.exceptions.WarningNotFound(name: str)
Bases:
ValueError
ads.feature_engineering.feature_type_manager module
The module that helps to manage feature types. Provides functionalities to register, unregister, list feature types.
Classes
- FeatureTypeManager
Feature Types Manager class that manages feature types.
Examples
>>> from ads.feature_engineering.feature_type.base import FeatureType
>>> class NewType(FeatureType):
... description="My personal type."
... pass
>>> FeatureTypeManager.feature_type_register(NewType)
>>> FeatureTypeManager.feature_type_registered()
Name Feature Type Description
---------------------------------------------------------------------------------
0 Continuous continuous Type representing continuous values.
1 DateTime date_time Type representing date and/or time.
2 Category category Type representing discrete unordered values.
3 Ordinal ordinal Type representing ordered values.
4 NewType new_type My personal type.
>>> FeatureTypeManager.warning_registered()
Feature Type Warning Handler
----------------------------------------------------------------------
0 continuous zeros zeros_handler
1 continuous high_cardinality high_cardinality_handler
>>> FeatureTypeManager.validator_registered()
Feature Type Validator Condition Handler
-------------------------------------------------------------------------------------------
0 phone_number is_phone_number () default_handler
1 phone_number is_phone_number {'country_code': '+7'} specific_country_handler
2 credit_card is_credit_card () default_handler
>>> FeatureTypeManager.feature_type_unregister(NewType)
>>> FeatureTypeManager.feature_type_reset()
>>> FeatureTypeManager.feature_type_object('continuous')
Continuous
- class ads.feature_engineering.feature_type_manager.FeatureTypeManager
Bases:
object
Feature Types Manager class that manages feature types.
Provides functionalities to register, unregister, list feature types.
- feature_type_object(cls, feature_type: Union[FeatureType, str]) FeatureType
Gets a feature type by class object or name.
- feature_type_register(cls, feature_type_cls: FeatureType) None
Registers a feature type.
- feature_type_unregister(cls, feature_type_cls: Union[FeatureType, str]) None
Unregisters a feature type.
- feature_type_reset(cls) None
Resets feature types to be default.
- feature_type_registered(cls) pd.DataFrame
Lists all registered feature types as a DataFrame.
- warning_registered(cls) pd.DataFrame
Lists registered warnings for all registered feature types.
- validator_registered(cls) pd.DataFrame
Lists registered validators for all registered feature types.
Examples
>>> from ads.feature_engineering.feature_type.base import FeatureType >>> class NewType(FeatureType): ... pass >>> FeatureTypeManager.register_feature_type(NewType) >>> FeatureTypeManager.feature_type_registered() Name Feature Type Description ------------------------------------------------------------------------------- 0 Continuous continuous Type representing continuous values. 1 DateTime date_time Type representing date and/or time. 2 Category category Type representing discrete unordered values. 3 Ordinal ordinal Type representing ordered values.
>>> FeatureTypeManager.warning_registered() Feature Type Warning Handler ---------------------------------------------------------------------- 0 continuous zeros zeros_handler 1 continuous high_cardinality high_cardinality_handler
>>> FeatureTypeManager.validator_registered() Feature Type Validator Condition Handler ------------------------------------------------------------------------------------------- 0 phone_number is_phone_number () default_handler 1 phone_number is_phone_number {'country_code': '+7'} specific_country_handler 2 credit_card is_credit_card () default_handler
>>> FeatureTypeManager.feature_type_unregister(NewType) >>> FeatureTypeManager.feature_type_reset() >>> FeatureTypeManager.feature_type_object('continuous') Continuous
- classmethod feature_type_object(feature_type: Union[FeatureType, str]) FeatureType
Gets a feature type by class object or name.
- Parameters
feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.
- Returns
Found feature type.
- Return type
- Raises
TypeNotFound – If provided feature type not registered.
TypeError – If provided feature type not a subclass of FeatureType.
- classmethod feature_type_register(feature_type_cls: FeatureType) None
Registers new feature type.
- Parameters
feature_type (FeatureType) – Subclass of FeatureType to be registered.
- Returns
Nothing.
- Return type
None
- Raises
TypeError – Type is not a subclass of FeatureType.
TypeError – Type has already been registered.
NameError – Name has already been used.
- classmethod feature_type_registered() DataFrame
Lists all registered feature types as a DataFrame.
- Returns
The list of feature types in a DataFrame format.
- Return type
pd.DataFrame
- classmethod feature_type_reset() None
Resets feature types to be default.
- Returns
Nothing.
- Return type
None
- classmethod feature_type_unregister(feature_type: Union[FeatureType, str]) None
Unregisters a feature type.
- Parameters
feature_type ((FeatureType | str)) – The FeatureType subclass or a str indicating feature type.
- Returns
Nothing.
- Return type
None
- Raises
TypeError – In attempt to unregister a default feature type.
- classmethod is_type_registered(feature_type: Union[FeatureType, str]) bool
Checks if provided feature type registered in the system.
- Parameters
feature_type (Union[FeatureType, str]) – The FeatureType subclass or a str indicating feature type.
- Returns
True if provided feature type registered, False otherwise.
- Return type
bool
- classmethod validator_registered() DataFrame
Lists registered validators for registered feature types.
- Returns
The list of registered validators for registered feature types in a DataFrame format.
- Return type
pd.DataFrame
Examples
>>> FeatureTypeManager.validator_registered() Feature Type Validator Condition Handler ------------------------------------------------------------------------------------------- 0 phone_number is_phone_number () default_handler 1 phone_number is_phone_number {'country_code': '+7'} specific_country_handler 2 credit_card is_credit_card () default_handler
- classmethod warning_registered() DataFrame
Lists registered warnings for all registered feature types.
- Returns
The list of registered warnings for registered feature types in a DataFrame format.
- Return type
pd.DataFrame
Examples
>>> FeatureTypeManager.warning_registered() Feature Type Warning Handler ---------------------------------------------------------------------- 0 continuous zeros zeros_handler 1 continuous high_cardinality high_cardinality_handler
ads.feature_engineering.accessor.dataframe_accessor module
The ADS accessor for the Pandas DataFrame. The accessor will be initialized with the pandas object the user is interacting with.
Examples
>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
>>> from ads.feature_engineering.feature_type.continuous import Continuous
>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.base import Tag
>>> df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
Column Feature Type Description
------------------------------------------------------------------
0 Name string Type representing string values.
1 Credit Card string Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
Credit Card
-------------------------------
0 4532640527811543
- class ads.feature_engineering.accessor.dataframe_accessor.ADSDataFrameAccessor(pandas_obj)
Bases:
ADSFeatureTypesMixin
,EDAMixin
,DBAccessMixin
,DataLabelingAccessMixin
ADS accessor for the Pandas DataFrame.
- columns
The column labels of the DataFrame.
- Type
List[str]
- tags(self) Dict[str, str]
Gets the dictionary of user defined tags for the dataframe.
- default_type(self) Dict[str, str]
Gets the map of columns and associated default feature type names.
- feature_type(self) Dict[str, List[str]]
Gets the list of registered feature types.
- feature_type_description(self) pd.DataFrame
Gets the list of registered feature types in a DataFrame format.
- sync(self, src: Union[pd.DataFrame, pd.Series]) pd.DataFrame
Syncs feature types of current DataFrame with that from src.
- feature_select(self, include: List[Union[FeatureType, str]] = None, exclude: List[Union[FeatureType, str]] = None) pd.DataFrame
Gets the list of registered feature types in a DataFrame format.
- help(self, prop: str = None) None
Provids docstring for affordable methods and properties.
Examples
>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor >>> from ads.feature_engineering.feature_type.continuous import Continuous >>> from ads.feature_engineering.feature_type.creditcard import CreditCard >>> from ads.feature_engineering.feature_type.string import String >>> from ads.feature_engineering.feature_type.base import Tag df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]}) >>> df.ads.feature_type {'Name': ['string'], 'Credit Card': ['string']} >>> df.ads.feature_type_description Column Feature Type Description ------------------------------------------------------------------- 0 Name string Type representing string values. 1 Credit Card string Type representing string values. >>> df.ads.default_type {'Name': 'string', 'Credit Card': 'string'} >>> df.ads.feature_type = {'Name':['string', Tag('abc')]} >>> df.ads.tags {'Name': ['abc']} >>> df.ads.feature_type = {'Credit Card':['credit_card']} >>> df.ads.feature_select(include=['credit_card']) Credit Card ------------------------------ 0 4532640527811543
Initializes ADS Pandas DataFrame Accessor.
- Parameters
pandas_obj (pandas.DataFrame) – Pandas dataframe
- Raises
ValueError – If provided DataFrame has duplicate columns.
- property default_type: Dict[str, str]
Gets the map of columns and associated default feature type names.
- Returns
The dictionary where key is column name and value is the name of default feature type.
- Return type
Dict[str, str]
- feature_select(include: Optional[List[Union[FeatureType, str]]] = None, exclude: Optional[List[Union[FeatureType, str]]] = None) DataFrame
Returns a subset of the DataFrame’s columns based on the column feature_types.
- Parameters
include (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be included.
exclude (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be excluded.
- Raises
ValueError – If both of include and exclude are empty
ValueError – If include and exclude are used simultaneously
- Returns
The subset of the frame including the feature types in include and excluding the feature types in exclude.
- Return type
pandas.DataFrame
- property feature_type: Dict[str, List[str]]
Gets the list of registered feature types.
- Returns
The dictionary where key is column name and value is list of associated feature type names.
- Return type
Dict[str, List[str]]
- property feature_type_description: DataFrame
Gets the list of registered feature types in a DataFrame format.
- Return type
pandas.DataFrame
Examples
>>> df.ads.feature_type_description() Column Feature Type Description ------------------------------------------------------------------- 0 City string Type representing string values. 1 Phone Number string Type representing string values.
- info() Any
Gets information about the dataframe.
- Returns
The information about the dataframe.
- Return type
Any
- model_schema(max_col_num: int = 2000)
Generates schema from the dataframe.
- Parameters
max_col_num (int, optional. Defaults to 1000) – The maximum column size of the data that allows to auto generate schema.
Examples
>>> df = pd.read_csv('./orcl_attrition.csv', usecols=['Age', 'Attrition']) >>> schema = df.ads.model_schema() >>> schema Schema: - description: Attrition domain: constraints: [] stats: count: 1470 unique: 2 values: String dtype: object feature_type: String name: Attrition required: true - description: Age domain: constraints: [] stats: 25%: 31.0 50%: 37.0 75%: 44.0 count: 1470.0 max: 61.0 mean: 37.923809523809524 min: 19.0 std: 9.135373489136732 values: Integer dtype: int64 feature_type: Integer name: Age required: true >>> schema.to_dict() {'Schema': [{'dtype': 'object', 'feature_type': 'String', 'name': 'Attrition', 'domain': {'values': 'String', 'stats': {'count': 1470, 'unique': 2}, 'constraints': []}, 'required': True, 'description': 'Attrition'}, {'dtype': 'int64', 'feature_type': 'Integer', 'name': 'Age', 'domain': {'values': 'Integer', 'stats': {'count': 1470.0, 'mean': 37.923809523809524, 'std': 9.135373489136732, 'min': 19.0, '25%': 31.0, '50%': 37.0, '75%': 44.0, 'max': 61.0}, 'constraints': []}, 'required': True, 'description': 'Age'}]}
- Returns
data schema.
- Return type
ads.feature_engineering.schema.Schema
- Raises
ads.feature_engineering.schema.DataSizeTooWide – If the number of columns of input data exceeds max_col_num.
- sync(src: Union[DataFrame, Series]) DataFrame
Syncs feature types of current DataFrame with that from src.
Syncs feature types of current dataframe with that from src, where src can be a dataframe or a series. In either case, only columns with matched names are synced.
- Parameters
src (pd.DataFrame | pd.Series) – The source to sync from.
- Returns
Synced dataframe.
- Return type
pandas.DataFrame
- property tags: Dict[str, List[str]]
Gets the dictionary of user defined tags for the dataframe. Key is column name and value is list of tag names.
- Returns
The map of columns and associated default tags.
- Return type
Dict[str, List[str]]
ads.feature_engineering.accessor.series_accessor module
The ADS accessor for the Pandas Series. The accessor will be initialized with the pandas object the user is interacting with.
Examples
>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
Feature Type Description
----------------------------------------------------
0 string Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']
- class ads.feature_engineering.accessor.series_accessor.ADSSeriesAccessor(pandas_obj: Series)
Bases:
ADSFeatureTypesMixin
,EDAMixinSeries
ADS accessor for Pandas Series.
- name
The name of Series.
- Type
str
- tags
The list of tags for the Series.
- Type
List[str]
- help(self, prop: str = None) None
Provids docstring for affordable methods and properties.
- sync(self, src: Union[pd.DataFrame, pd.Series]) None
Syncs feature types of current series with that from src.
- default_type(self) str
Gets the name of default feature type for the series.
- feature_type(self) List[str]
Gets the list of registered feature types for the series.
- feature_type_description(self) pd.DataFrame
Gets the list of registered feature types in a DataFrame format.
Examples
>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor >>> from ads.feature_engineering.feature_type.string import String >>> from ads.feature_engineering.feature_type.ordinal import Ordinal >>> from ads.feature_engineering.feature_type.base import Tag >>> series = pd.Series(['name1', 'name2', 'name3']) >>> series.ads.default_type 'string' >>> series.ads.feature_type ['string'] >>> series.ads.feature_type_description Feature Type Description ---------------------------------------------------- 0 string Type representing string values. >>> series.ads.feature_type = ['string', Ordinal, Tag('abc')] >>> series.ads.feature_type ['string', 'ordinal', 'abc'] >>> series1 = series.dropna() >>> series1.ads.sync(series) >>> series1.ads.feature_type ['string', 'ordinal', 'abc']
Initializes ADS Pandas Series Accessor.
- Parameters
pandas_obj (pd.Series) – The pandas series
- property default_type: str
Gets the name of default feature type for the series.
- Returns
The name of default feature type.
- Return type
str
- property feature_type: List[str]
Gets the list of registered feature types for the series.
- Returns
Names of feature types.
- Return type
List[str]
Examples
>>> series = pd.Series(['name1']) >>> series.ads.feature_type = ['name', 'string', Tag('tag for name')] >>> series.ads.feature_type ['name', 'string', 'tag for name']
- property feature_type_description: DataFrame
Gets the list of registered feature types in a DataFrame format.
- Returns
The DataFrame with feature types for this series.
- Return type
pd.DataFrame
Examples
>>> series = pd.Series(['name1']) >>> series.ads.feature_type = ['name', 'string', Tag('Name tag')] >>> series.ads.feature_type_description Feature Type Description ---------------------------------------------------------- 0 name Type representing name values. 1 string Type representing string values. 2 Name tag Tag.
- sync(src: Union[DataFrame, Series]) None
Syncs feature types of current series with that from src.
The src could be a dataframe or a series. In either case, only columns with matched names are synced.
- Parameters
src ((pd.DataFrame | pd.Series)) – The source to sync from.
- Returns
Nothing.
- Return type
None
Examples
>>> series = pd.Series(['name1', 'name2', 'name3', None]) >>> series.ads.feature_type = ['name'] >>> series.ads.feature_type ['name', string] >>> series.dropna().ads.feature_type ['string'] >>> series1 = series.dropna() >>> series1.ads.sync(series) >>> series1.ads.feature_type ['name', 'string']
- class ads.feature_engineering.accessor.series_accessor.ADSSeriesValidator(feature_type_list: List[FeatureType], series: Series)
Bases:
object
Class helper to invoke registerred validator on a series level.
Initializes ADS series validator.
- Parameters
feature_type_list (List[FeatureType]) – The list of feature types.
series (pd.Series) – The pandas series.
ads.feature_engineering.accessor.mixin.correlation module
- ads.feature_engineering.accessor.mixin.correlation.cat_vs_cat(df: DataFrame, normal_form: bool = True) DataFrame
Calculates the correlation of all pairs of categorical features and categorical features.
- ads.feature_engineering.accessor.mixin.correlation.cat_vs_cont(df: DataFrame, categorical_columns, continuous_columns, normal_form: bool = True) DataFrame
Calculates the correlation of all pairs of categorical features and continuous features.
- ads.feature_engineering.accessor.mixin.correlation.cont_vs_cont(df: DataFrame, normal_form: bool = True) DataFrame
Calculates the Pearson correlation between two columns of the DataFrame.
ads.feature_engineering.accessor.mixin.eda_mixin module
This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Dataframe. The series of purpose-driven methods enable the data scientist to complete analysis on the dataframe.
From the accessor we have access to the pandas object the user is interacting with as well as corresponding lists of feature types per column.
- class ads.feature_engineering.accessor.mixin.eda_mixin.EDAMixin
Bases:
object
- correlation_ratio() DataFrame
Generate a Correlation Ratio data frame for all categorical-continuous variable pairs.
- Returns
pandas.DataFrame
Correlation Ratio correlation data frame with the following 3 columns –
Column 1 (name of the first categorical/continuous column)
Column 2 (name of the second categorical/continuous column)
Value (correlation value)
Note
Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.
- correlation_ratio_plot() Axes
Generate a heatmap of the Correlation Ratio correlation for all categorical-continuous variable pairs.
- Returns
Correlation Ratio correlation plot object that can be updated by the customer
- Return type
Plot object
- cramersv() DataFrame
Generate a Cramer’s V correlation data frame for all categorical variable pairs.
Gives a warning for dropped non-categorical columns.
- Returns
- Cramer’s V correlation data frame with the following 3 columns:
Column 1 (name of the first categorical column)
Column 2 (name of the second categorical column)
Value (correlation value)
- Return type
pandas.DataFrame
Note
Pairs will be replicated. For example for variables x and y, we would have (x,y), (y,x) both with same correlation value. We will also have (x,x) and (y,y) with value 1.0.
- cramersv_plot() Axes
Generate a heatmap of the Cramer’s V correlation for all categorical variable pairs.
Gives a warning for dropped non-categorical columns.
- Returns
Cramer’s V correlation plot object that can be updated by the customer
- Return type
Plot object
- feature_count() DataFrame
Counts the number of columns for each feature type and each primary feature. The column of primary is the number of primary feature types that is assigned to the column.
- Returns
The number of columns for each feature type The number of columns for each primary feature
- Return type
Dataframe with
Examples
>>> df.ads.feature_type {'PassengerId': ['ordinal', 'category'], 'Survived': ['ordinal'], 'Pclass': ['ordinal'], 'Name': ['category'], 'Sex': ['category']} >>> df.ads.feature_count() Feature Type Count Primary 0 category 3 2 1 ordinal 3 3
- feature_plot() DataFrame
For every column in the dataframe plot generate a list of summary plots based on the most relevant feature type.
- Returns
Dataframe with 2 columns: 1. Column - feature name 2. Plot - plot object
- Return type
pandas.DataFrame
- feature_stat() DataFrame
Summary statistics Dataframe provided.
This returns feature stats on each column using FeatureType summary method.
Examples
>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv') >>> df.ads.feature_stat().head() Column Metric Value 0 PassengerId count 891.000 1 PassengerId mean 446.000 2 PassengerId standard deviation 257.354 3 PassengerId sample minimum 1.000 4 PassengerId lower quartile 223.500
- Returns
Dataframe with 3 columns: name, metric, value
- Return type
pandas.DataFrame
- pearson() DataFrame
Generate a Pearson correlation data frame for all continuous variable pairs.
Gives a warning for dropped non-numerical columns.
- Returns
pandas.DataFrame
Pearson correlation data frame with the following 3 columns –
Column 1 (name of the first continuous column)
Column 2 (name of the second continuous column)
Value (correlation value)
Note
Pairs will be replicated. For example for variables x and y, we’d have (x,y), (y,x) both with same correlation value. We’ll also have (x,x) and (y,y) with value 1.0.
- pearson_plot() Axes
Generate a heatmap of the Pearson correlation for all continuous variable pairs.
- Returns
Pearson correlation plot object that can be updated by the customer
- Return type
Plot object
- warning() DataFrame
Generates a data frame that lists feature specific warnings.
- Returns
The list of feature specific warnings.
- Return type
pandas.DataFrame
Examples
>>> df.ads.warning() Column Feature Type Warning Message Metric Value -------------------------------------------------------------------------------------- 0 Age continuous Zeros Age has 38 zeros Count 38 1 Age continuous Zeros Age has 12.2% zeros Percentage 12.2%
ads.feature_engineering.accessor.mixin.eda_mixin_series module
This exploratory data analysis (EDA) Mixin is used in the ADS accessor for the Pandas Series. The series of purpose-driven methods enable the data scientist to complete univariate analysis.
From the accessor we have access to the pandas object the user is interacting with as well as corresponding list of feature types.
- class ads.feature_engineering.accessor.mixin.eda_mixin_series.EDAMixinSeries
Bases:
object
- feature_plot() Axes
For the series generate a summary plot based on the most relevant feature type.
- Returns
Plot object for the series based on the most relevant feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- feature_stat() DataFrame
Summary statistics Dataframe provided.
This returns feature stats on series using FeatureType summary method.
Examples
>>> df = pd.read_csv('~/advanced-ds/tests/vor_datasets/vor_titanic.csv') >>> df['Cabin'].ads.feature_stat() Metric Value 0 count 891 1 unqiue 147 2 missing 687
- Returns
Dataframe with 2 columns and rows for different metric values
- Return type
pandas.DataFrame
- warning() DataFrame
Generates a data frame that lists feature specific warnings.
- Returns
The list of feature specific warnings.
- Return type
pandas.DataFrame
Examples
>>> df["Age"].ads.warning() Feature Type Warning Message Metric Value --------------------------------------------------------------------------- 0 continuous Zeros Age has 38 zeros Count 38 1 continuous Zeros Age has 12.2% zeros Percentage 12.2%
ads.feature_engineering.accessor.mixin.feature_types_mixin module
The module that represents the ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.
Classes
- ADSFeatureTypesMixin
ADS Feature Types Mixin class that extends Pandas Series and Dataframe accessors.
- class ads.feature_engineering.accessor.mixin.feature_types_mixin.ADSFeatureTypesMixin
Bases:
object
ADS Feature Types Mixin class that extends Pandas Series and DataFrame accessors.
- warning_registered(cls) pd.DataFrame
Lists registered warnings for registered feature types.
- validator_registered(cls) pd.DataFrame
Lists registered validators for registered feature types.
- help(self, prop: str = None) None
Help method that prints either a table of available properties or, given a property, returns its docstring.
- help(prop: Optional[str] = None) None
Help method that prints either a table of available properties or, given an individual property, returns its docstring.
- Parameters
prop (str) – The Name of property.
- Returns
Nothing.
- Return type
None
- validator_registered() DataFrame
Lists registered validators for registered feature types.
- Returns
The list of registered validators for registered feature types
- Return type
pandas.DataFrame
Examples
>>> df.ads.validator_registered() Column Feature Type Validator Condition Handler ------------------------------------------------------------------------------------------------------ 0 PhoneNumber phone_number is_phone_number () default_handler 1 PhoneNumber phone_number is_phone_number {'country_code': '+7'} specific_country_handler 2 CreditCard credit_card is_credit_card () default_handler
>>> df['PhoneNumber'].ads.validator_registered() Feature Type Validator Condition Handler ------------------------------------------------------------------------------------------- 0 phone_number is_phone_number () default_handler 1 phone_number is_phone_number {'country_code': '+7'} specific_country_handler
- warning_registered() DataFrame
Lists registered warnings for all registered feature types.
- Returns
The list of registered warnings for registered feature types.
- Return type
pandas.DataFrame
Examples
>>> df.ads.warning_registered() Column Feature Type Warning Handler ------------------------------------------------------------------------- 0 Age continuous zeros zeros_handler 1 Age continuous high_cardinality high_cardinality_handler
>>> df["Age"].ads.warning_registered() Feature Type Warning Handler --------------------------------------------------------------- 0 continuous zeros zeros_handler 1 continuous high_cardinality high_cardinality_handler
ads.feature_engineering.adsstring.common_regex_mixin module
- class ads.feature_engineering.adsstring.common_regex_mixin.CommonRegexMixin
Bases:
object
- property address
- property credit_card
- property date
- property email
- property ip
- property link
- property phone_number_US
- property price
- redact(fields: Union[List[str], Dict[str, str]]) str
Remove personal information in a string. For example, “Jane’s phone number is 123-456-7890” is turned into “Jane’s phone number is [phone_number_US].”
- Parameters
fields ((list(str) | dict)) – either a list of fields to redact, e.g. [‘email’, ‘phone_number_US’], in which case the redacted text is replaced with capitalized word like [EMAIL] or [PHONE_NUMBER_US_WITH_EXT], or a dictionary where key is a field to redact and value is the replacement text, e.g., {‘email’: ‘HIDDEN_EMAIL’}.
- Returns
redacted string
- Return type
str
- redact_map = {'address': '[ADDRESS]', 'address_with_zip': '[ADDRESS_WITH_ZIP]', 'credit_card': '[CREDIT_CARD]', 'date': '[DATE]', 'email': '[EMAIL]', 'ip': '[IP]', 'ipv6': '[IPV6]', 'link': '[LINK]', 'phone_number_US': '[PHONE_NUMBER_US]', 'phone_number_US_with_ext': '[PHONE_NUMBER_US_WITH_EXT]', 'po_box': '[PO_BOX]', 'price': '[PRICE]', 'ssn': '[SSN]', 'time': '[TIME]', 'zip_code': '[ZIP_CODE]'}
- property ssn
- property time
- property zip_code
ads.feature_engineering.adsstring.oci_language module
ads.feature_engineering.adsstring.string module
- class ads.feature_engineering.adsstring.string.ADSString(text: str, language='english')
Bases:
str
,CommonRegexMixin
Defines an enhanced string class for the purporse of performing NLP tasks. Its functionalities can be extended by registering plugins.
- plugins
list of plugins that add functionalities to the class.
- Type
List
- string
plain string
- Type
str
Example
>>> ADSString.nlp_backend('nltk') >>> s = ADSString("Walking my dog on a breezy day is the best.") >>> s.lower() # regular string methods still work >>> s.replace("a", "e") >>> s.nouns >>> s.parts_of_speech >>> s = ADSString("get in touch with my associate at john.smith@gmail.com to schedule") >>> s.emails >>> ADSString.plugin_register(OCILanguage) >>> s = ADSString("This movie is awesome.") >>> s.absa
Initialze the class and register plugins.
- Parameters
text (str) – input text
language (str, optional) – language of the text, by default “english”.
- Raises
TypeError – input text is not a string.
- capitalize()
Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
- casefold()
Return a version of the string suitable for caseless comparisons.
- center(width, fillchar=' ', /)
Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
- count(sub[, start[, end]]) int
Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
- encode(encoding='utf-8', errors='strict')
Encode the string using the codec registered for encoding.
- encoding
The encoding in which to encode the string.
- errors
The error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
- endswith(suffix[, start[, end]]) bool
Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
- expandtabs(tabsize=8)
Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
- find(sub[, start[, end]]) int
Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
- format(*args, **kwargs) str
Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
- format_map(mapping) str
Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
- help() None
List available properties.
- Parameters
plugin (Any) – registered plugin
- Return type
None
- index(sub[, start[, end]]) int
Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
- isalnum()
Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
- isalpha()
Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
- isascii()
Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
- isdecimal()
Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
- isdigit()
Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
- isidentifier()
Return True if the string is a valid Python identifier, False otherwise.
Use keyword.iskeyword() to test for reserved identifiers such as “def” and “class”.
- islower()
Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
- isnumeric()
Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
- isprintable()
Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
- isspace()
Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
- istitle()
Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
- isupper()
Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
- join(iterable, /)
Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
- language_model_cache = {}
- ljust(width, fillchar=' ', /)
Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
- lower()
Return a copy of the string converted to lowercase.
- lstrip(chars=None, /)
Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
- maketrans(y=None, z=None, /)
Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
- nlp_backend() None
Set backend for extracting NLP related properties.
- Parameters
backend (str, optional) – name of backend, by default ‘nltk’.
- Raises
ModuleNotFoundError – module corresponding to backend is not found.
ValueError – input backend is invalid.
- Return type
None
- partition(sep, /)
Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
- plugin_clear() None
Clears plugins.
- plugin_list() None
List registered plugins.
- plugin_register() None
Register a plugin
- Parameters
plugin (Any) – plugin to register
- Return type
None
- plugins = []
- redact(fields: Union[List[str], Dict[str, str]]) str
Remove personal information in a string. For example, “Jane’s phone number is 123-456-7890” is turned into “Jane’s phone number is [phone_number_US].”
- Parameters
fields ((list(str) | dict)) – either a list of fields to redact, e.g. [‘email’, ‘phone_number_US’], in which case the redacted text is replaced with capitalized word like [EMAIL] or [PHONE_NUMBER_US_WITH_EXT], or a dictionary where key is a field to redact and value is the replacement text, e.g., {‘email’: ‘HIDDEN_EMAIL’}.
- Returns
redacted string
- Return type
str
- replace(old, new, count=-1, /)
Return a copy with all occurrences of substring old replaced by new.
- count
Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
- rfind(sub[, start[, end]]) int
Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
- rindex(sub[, start[, end]]) int
Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
- rjust(width, fillchar=' ', /)
Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
- rpartition(sep, /)
Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
- rsplit(sep=None, maxsplit=- 1)
Return a list of the words in the string, using sep as the delimiter string.
- sep
The delimiter according which to split the string. None (the default value) means split according to any whitespace, and discard empty strings from the result.
- maxsplit
Maximum number of splits to do. -1 (the default value) means no limit.
Splits are done starting at the end of the string and working to the front.
- rstrip(chars=None, /)
Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
- split(sep=None, maxsplit=- 1)
Return a list of the words in the string, using sep as the delimiter string.
- sep
The delimiter according which to split the string. None (the default value) means split according to any whitespace, and discard empty strings from the result.
- maxsplit
Maximum number of splits to do. -1 (the default value) means no limit.
- splitlines(keepends=False)
Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
- startswith(prefix[, start[, end]]) bool
Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
- property string
- strip(chars=None, /)
Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
- swapcase()
Convert uppercase characters to lowercase and lowercase characters to uppercase.
- title()
Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
- translate(table, /)
Replace each character in the string using the given translation table.
- table
Translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
- upper()
Return a copy of the string converted to uppercase.
- zfill(width, /)
Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
- ads.feature_engineering.adsstring.string.to_adsstring(func: Callable) Callable
Decorator that converts output of a function to ADSString if it returns a string.
- Parameters
func (Callable) – function to decorate
- Returns
decorated function
- Return type
Callable
- ads.feature_engineering.adsstring.string.wrap_output_string(decorator: Callable) Callable
Class decorator that applies a decorator to all methods of a class.
- Parameters
decorator (Callable) – decorator to apply
- Returns
class decorator
- Return type
Callable
ads.feature_engineering.feature_type.address module
The module that represents an Address feature type.
- Classes:
- Address
The Address feature type.
- class ads.feature_engineering.feature_type.address.Address
Bases:
String
Type representing address.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows the location of given address on map base on zip code.
Example
>>> from ads.feature_engineering.feature_type.address import Address >>> import pandas as pd >>> address = pd.Series(['1 Miller Drive, New York, NY 12345', '1 Berkeley Street, Boston, MA 67891', '54305 Oxford Street, Seattle, WA 95132', '']) >>> Address.validator.is_address(address) 0 True 1 True 2 True 3 False dtype: bool
- description = 'Type representing address.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345', '1 Berkeley Street, Boston, MA 67891', '54305 Oxford Street, Seattle, WA 95132', ''], name='address') >>> address.ads.feature_type = ['address'] >>> address.ads.feature_domain() constraints: [] stats: count: 4 missing: 1 unique: 3 values: Address
- Returns
Domain based on the Address feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows the location of given address on map base on zip code.
Examples
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345', '1 Berkeley Street, Boston, MA 67891', '54305 Oxford Street, Seattle, WA 95132', ''], name='address') >>> address.ads.feature_type = ['address'] >>> address.ads.feature_plot()
- Returns
Plot object for the series based on the Address feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345', '1 Berkeley Street, Boston, MA 67891', '54305 Oxford Street, Seattle, WA 95132', ''], name='address') >>> address.ads.feature_type = ['address'] >>> address.ads.feature_stat() Metric Value 0 count 4 1 unique 3 2 missing 1
- Returns
Summary statistics of the Series provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.address.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (pd.Series) – The data to process.
- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.base module
- class ads.feature_engineering.feature_type.base.FeatureBaseType(classname, bases, dictionary)
Bases:
type
The helper metaclass to extend fucntionality of FeatureType class.
- class ads.feature_engineering.feature_type.base.FeatureBaseTypeMeta(classname, bases, dictionary)
Bases:
FeatureBaseType
,ABCMeta
The class to provide compatibility between ABC and FeatureBaseType metaclass.
- class ads.feature_engineering.feature_type.base.FeatureType
Bases:
ABC
Abstract case for feature types. Default class attribute include name and description. Name is auto generated using camel to snake conversion unless specified.
- description = 'Base feature type.'
- name = 'feature_type'
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- class ads.feature_engineering.feature_type.base.Name
Bases:
object
- class ads.feature_engineering.feature_type.base.Tag(name: str)
Bases:
object
Class for free form tags. Name must be specified.
Initialize a tag instance.
- Parameters
name (str) – The name of the tag.
ads.feature_engineering.feature_type.boolean module
The module that represents a Boolean feature type.
- Classes:
- Boolean
The feature type that represents binary values True/False.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- class ads.feature_engineering.feature_type.boolean.Boolean
Bases:
FeatureType
Type representing binary values True/False.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Show the counts of observations in True/False using bars.
Examples
>>> from ads.feature_engineering.feature_type.boolean import Boolean >>> import pandas as pd >>> import numpy as np >>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool') >>> s.ads.feature_type = ['boolean'] >>> Boolean.validator.is_boolean(s) 0 True 1 True 2 True 3 True 4 False 5 False dtype: bool
- description = 'Type representing binary values True/False.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool') >>> s.ads.feature_type = ['boolean'] >>> s.ads.feature_domain() constraints: - expression: $x in [True, False] language: python stats: count: 6 missing: 2 unique: 2 values: Boolean
- Returns
Domain based on the Boolean feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows the counts of observations in True/False using bars.
- Parameters
x (
pandas.Series
) – The feature being evaluated.- Returns
Plot object for the series based on the Boolean feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
Examples
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool') >>> s.ads.feature_type = ['boolean'] >>> s.ads.feature_plot()
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
- Parameters
x (
pandas.Series
) – The feature being evaluated.- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
pandas.DataFrame
Examples
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool') >>> s.ads.feature_type = ['boolean'] >>> s.ads.feature_stat() Metric Value 0 count 6 1 unique 2 2 missing 2
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.boolean.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (
pandas.Series
) – The data to process.- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.category module
The module that represents a Category feature type.
- Classes:
- Category
The Category feature type.
- class ads.feature_engineering.feature_type.category.Category
Bases:
FeatureType
Type representing discrete unordered values.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows the counts of observations in each categorical bin using bar chart.
- description = 'Type representing discrete unordered values.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='category') >>> cat.ads.feature_type = ['category'] >>> cat.ads.feature_domain() constraints: - expression: $x in ['S', 'C', 'Q', ''] language: python stats: count: 22 missing: 3 unique: 3 values: Category
- Returns
Domain based on the Category feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows the counts of observations in each categorical bin using bar chart.
- Parameters
x (
pandas.Series
) – The feature being evaluated.- Returns
Plot object for the series based on the Category feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
Examples
>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory') >>> cat.ads.feature_type = ['сategory'] >>> cat.ads.feature_plot()
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count) if there are any.
- Parameters
x (
pandas.Series
) – The feature being evaluated.- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
pandas.DataFrame
Examples
>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory') >>> cat.ads.feature_type = ['сategory'] >>> cat.ads.feature_stat() Metric Value 0 count 22 1 unique 3 2 missing 3
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.constant module
The module that represents a Constant feature type.
- Classes:
- Constant
The Constant feature type.
- class ads.feature_engineering.feature_type.constant.Constant
Bases:
FeatureType
Type representing constant values.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows the counts of observations in bars.
- description = 'Type representing constant values.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type. .. rubric:: Example
>>> s = pd.Series([1, 1, 1, 1, 1], name='constant') >>> s.ads.feature_type = ['constant'] >>> s.ads.feature_domain() constraints: [] stats: count: 5 unique: 1 values: Constant
- Returns
Domain based on the Constant feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows the counts of observations in bars.
- Parameters
x (
pandas.Series
) – The feature being shown.
Examples
>>> s = pd.Series([1, 1, 1, 1, 1], name='constant') >>> s.ads.feature_type = ['constant'] >>> s.ads.feature_plot()
- Returns
Plot object for the series based on the Constant feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
- Parameters
x (
pandas.Series
) – The feature being evaluated.- Returns
Summary statistics of the Series provided.
- Return type
pandas.DataFrame
Examples
>>> s = pd.Series([1, 1, 1, 1, 1], name='constant') >>> s.ads.feature_type = ['constant'] >>> s.ads.feature_stat() Metric Value 0 count 5 1 unique 1
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.continuous module
The module that represents a Continuous feature type.
- Classes:
- Continuous
The Continuous feature type.
- class ads.feature_engineering.feature_type.continuous.Continuous
Bases:
FeatureType
Type representing continuous values.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows distributions of datasets using box plot.
- description = 'Type representing continuous values.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25, 4.43, 3.26, np.NaN, None], name='continuous') >>> cts.ads.feature_type = ['continuous'] >>> cts.ads.feature_domain() constraints: [] stats: count: 10.0 lower quartile: 3.058 mean: 4.959 median: 3.81 missing: 2.0 sample maximum: 13.32 sample minimum: 2.25 skew: 2.175 standard deviation: 3.62 upper quartile: 4.908 values: Continuous
- Returns
Domain based on the Continuous feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows distributions of datasets using box plot.
Examples
>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25, 4.43, 3.26, np.NaN, None], name='continuous') >>> cts.ads.feature_type = ['continuous'] >>> cts.ads.feture_plot()
- Returns
Plot object for the series based on the Continuous feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, skew and missing(count).
Examples
>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25, 4.43, 3.26, np.NaN, None], name='continuous') >>> cts.ads.feature_type = ['continuous'] >>> cts.ads.feature_stat() Metric Value 0 count 10.000 1 mean 4.959 2 standard deviation 3.620 3 sample minimum 2.250 4 lower quartile 3.058 5 median 3.810 6 upper quartile 4.908 7 sample maximum 13.320 8 skew 2.175 9 missing 2.000
- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.creditcard module
The module that represents a CreditCard feature type.
- Classes:
- CreditCard
The CreditCard feature type.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- _luhn_checksum(card_number: str) -> float
Implements Luhn algorithm to validate a credit card number.
- class ads.feature_engineering.feature_type.creditcard.CreditCard
Bases:
String
Type representing credit card numbers.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows the counts of observations in each credit card type using bar chart.
Examples
>>> from ads.feature_engineering.feature_type.creditcard import CreditCard >>> import pandas as pd >>> s = pd.Series(["4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190"], name='credit_card') >>> s.ads.feature_type = ['credit_card'] >>> CreditCard.validator.is_credit_card(s) 0 True 1 False 2 True 3 True 4 True 5 True Name: credit_card, dtype: bool
- description = 'Type representing credit card numbers.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> visa = [ "4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190", ] >>> mastercard = [ "5334180299390324", "5111466404826446", "5273114895302717", "5430972152222336", "5536426859893306", ] >>> amex = [ "371025944923273", "374745112042294", "340984902710890", "375767928645325", "370720852891659", ] >>> creditcard_list = visa + mastercard + amex >>> creditcard_series = pd.Series(creditcard_list,name='card') >>> creditcard_series.ads.feature_type = ['credit_card'] >>> creditcard_series.ads.feature_domain() constraints: [] stats: count: 16 count_Amex: 5 count_Diners Club: 2 count_MasterCard: 3 count_Visa: 5 count_missing: 1 missing: 1 unique: 15 values: CreditCard
- Returns
Domain based on the CreditCard feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows the counts of observations in each credit card type using bar chart.
Examples
>>> visa = [ "4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190", ] >>> mastercard = [ "5334180299390324", "5111466404826446", "5273114895302717", "5430972152222336", "5536426859893306", ] >>> amex = [ "371025944923273", "374745112042294", "340984902710890", "375767928645325", "370720852891659", ] >>> creditcard_list = visa + mastercard + amex >>> creditcard_series = pd.Series(creditcard_list,name='card') >>> creditcard_series.ads.feature_type = ['credit_card'] >>> creditcard_series.ads.feature_plot()
- Returns
Plot object for the series based on the CreditCard feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series)
Generates feature statistics.
- Feature statistics include (total)count, unique(count), missing(count) and
count of each credit card type.
Examples
>>> visa = [ "4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190", ] >>> mastercard = [ "5334180299390324", "5111466404826446", "5273114895302717", "5430972152222336", "5536426859893306", ] >>> amex = [ "371025944923273", "374745112042294", "340984902710890", "375767928645325", "370720852891659", ] >>> creditcard_list = visa + mastercard + amex >>> creditcard_series = pd.Series(creditcard_list,name='card') >>> creditcard_series.ads.feature_type = ['credit_card'] >>> creditcard_series.ads.feature_stat() Metric Value 0 count 16 1 unique 15 2 missing 1 3 count_Amex 5 4 count_Visa 5 5 count_MasterCard 3 6 count_Diners Club 2 7 count_missing 1
- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.creditcard.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (
pandas.Series
) – The data to process.- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.datetime module
The module that represents a DateTime feature type.
- Classes:
- DateTime
The DateTime feature type.
- class ads.feature_engineering.feature_type.datetime.DateTime
Bases:
FeatureType
Type representing date and/or time.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows distributions of datetime datasets using histograms.
Example
>>> from ads.feature_engineering.feature_type.datetime import DateTime >>> import pandas as pd >>> s = pd.Series(["12/12/12", "12/12/13", None, "12/12/14"], name='datetime') >>> s.ads.feature_type = ['date_time'] >>> DateTime.validator.is_datetime(s) 0 True 1 True 2 False 3 True Name: datetime, dtype: bool
- description = 'Type representing date and/or time.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime') >>> s.ads.feature_type = ['date_time'] >>> s.ads.feature_domain() constraints: [] stats: count: 8 missing: 3 sample maximum: April/15/11 sample minimum: 3/11/2000 values: DateTime
- Returns
Domain based on the DateTime feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows distributions of datetime datasets using histograms.
Examples
>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime') >>> x.ads.feature_type = ['date_time'] >>> x.ads.feature_plot()
- Returns
Plot object for the series based on the DateTime feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, sample maximum, sample minimum, and missing(count) if there is any.
Examples
>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime') >>> x.ads.feature_type = ['date_time'] >>> x.ads.feature_stat() Metric Value 0 count 8 1 sample maximum April/15/11 2 sample minimum 3/11/2000 3 missing 3
- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.datetime.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (
pandas.Series
) – The data to process.- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.discrete module
The module that represents a Discrete feature type.
- Classes:
- Discrete
The Discrete feature type.
- class ads.feature_engineering.feature_type.discrete.Discrete
Bases:
FeatureType
Type representing discrete values.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows distributions of datasets using box plot.
- description = 'Type representing discrete values.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> discrete_numbers = pd.Series([35, 25, 13, 42], name='discrete') >>> discrete_numbers.ads.feature_type = ['discrete'] >>> discrete_numbers.ads.feature_domain() constraints: [] stats: count: 4 unique: 4 values: Discrete
- Returns
Domain based on the Discrete feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows distributions of datasets using box plot.
Examples
>>> discrete_numbers = pd.Series([35, 25, 13, 42], name='discrete') >>> discrete_numbers.ads.feature_type = ['discrete'] >>> discrete_numbers.ads.feature_stat() Metric Value 0 count 4 1 unique 4
- Returns
Plot object for the series based on the Discrete feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> discrete_numbers = pd.Series([35, 25, 13, 42], name='discrete') >>> discrete_numbers.ads.feature_type = ['discrete'] >>> discrete_numbers.ads.feature_stat() discrete count 4 unique 4
- Returns
Summary statistics of the Series provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.document module
The module that represents a Document feature type.
- Classes:
- Document
The Document feature type.
- class ads.feature_engineering.feature_type.document.Document
Bases:
FeatureType
Type representing document values.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- description = 'Type representing document values.'
- classmethod feature_domain()
- Returns
Nothing.
- Return type
None
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.gis module
The module that represents a GIS feature type.
- Classes:
- GIS
The GIS feature type.
- class ads.feature_engineering.feature_type.gis.GIS
Bases:
FeatureType
Type representing geographic information.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows the location of given address on map base on longitude and latitute.
Example
>>> from ads.feature_engineering.feature_type.gis import GIS >>> import pandas as pd >>> s = pd.Series(["-18.2193965, -93.587285", "-21.0255305, -122.478584", "85.103913, 19.405744", "82.913736, 178.225672", "62.9795085,-66.989705", "54.5604395,95.235090", "24.2811855,-162.380403", "-1.818319,-80.681214", None, "(51.816119, 175.979008)", "(54.3392995,-11.801615)"], name='gis') >>> s.ads.feature_type = ['gis'] >>> GIS.validator.is_gis(s) 0 True 1 True 2 True 3 True 4 True 5 True 6 True 7 True 8 False 9 True 10 True Name: gis, dtype: bool
- description = 'Type representing geographic information.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> gis = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='gis' ) >>> gis.ads.feature_type = ['gis'] >>> gis.ads.feature_domain() constraints: [] stats: count: 13 missing: 3 unique: 10 values: GIS
- Returns
Domain based on the GIS feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows the location of given address on map base on longitude and latitute.
Examples
>>> gis = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='gis' ) >>> gis.ads.feature_type = ['gis'] >>> gis.ads.feature_plot()
- Returns
Plot object for the series based on the GIS feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> gis = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='gis' ) >>> gis.ads.feature_type = ['gis'] >>> gis.ads.feature_stat() gis count 13 unique 10 missing 3
- Returns
Summary statistics of the Series provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.gis.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (
pandas.Series
) – The data to process.- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.integer module
The module that represents an Integer feature type.
- Classes:
- Integer
The Integer feature type.
- class ads.feature_engineering.feature_type.integer.Integer
Bases:
FeatureType
Type representing integer values.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows distributions of datasets using box plot.
- description = 'Type representing integer values.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='integer') >>> s.ads.feature_type = ['integer'] >>> s.ads.feature_domain() constraints: [] stats: count: 6 freq: 2 missing: 2 top: true unique: 2 values: Integer
- Returns
Domain based on the Integer feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows distributions of datasets using box plot.
Examples
>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer') >>> x.ads.feature_type = ['integer'] >>> x.ads.feature_plot()
- Returns
Plot object for the series based on the Integer feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, max and missing(count) if there is any.
Examples
>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer') >>> x.ads.feature_type = ['integer'] >>> x.ads.feature_stat() Metric Value 0 count 7 1 mean 1 2 standard deviation 1 3 sample minimum 0 4 lower quartile 1 5 median 1 6 upper quartile 2 7 sample maximum 4 8 missing 1
- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.ip_address module
The module that represents an IpAddress feature type.
- Classes:
- IpAddress
The IpAddress feature type.
- class ads.feature_engineering.feature_type.ip_address.IpAddress
Bases:
FeatureType
Type representing IP Address.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
Example
>>> from ads.feature_engineering.feature_type.ip_address import IpAddress >>> import pandas as pd >>> import numpy as np >>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address'] >>> IpAddress.validator.is_ip_address(s) 0 True 1 True 2 False 3 False 4 False Name: ip_address, dtype: bool
- description = 'Type representing IP Address.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address'] >>> s.ads.feature_domain() constraints: [] stats: count: 6 missing: 2 unique: 3 values: IpAddress
- Returns
Domain based on the IpAddress feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address'] >>> s.ads.feature_stat() Metric Value 0 count 6 1 unique 2 2 missing 2
- Returns
Summary statistics of the Series provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.ip_address.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (
pandas.Series
) – The data to process.- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.ip_address_v4 module
The module that represents an IpAddressV4 feature type.
- Classes:
- IpAddressV4
The IpAddressV4 feature type.
- class ads.feature_engineering.feature_type.ip_address_v4.IpAddressV4
Bases:
FeatureType
Type representing IP Address V4.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
Example
>>> from ads.feature_engineering.feature_type.ip_address_v4 import IpAddressV4 >>> import pandas as pd >>> import numpy as np >>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address_v4'] >>> IpAddressV4.validator.is_ip_address_v4(s) 0 True 1 False 2 False 3 False 4 False Name: ip_address, dtype: bool
- description = 'Type representing IP Address V4.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address_v4') >>> s.ads.feature_type = ['ip_address_v4'] >>> s.ads.feature_domain() constraints: [] stats: count: 6 missing: 2 unique: 4 values: IpAddressV4
- Returns
Domain based on the IpAddressV4 feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address_v4'] >>> s.ads.feature_stat() Metric Value 0 count 6 1 unique 4 2 missing 2
- Returns
Summary statistics of the Series provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.ip_address_v4.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (
pandas.Series
) – The data to process.- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.ip_address_v6 module
The module that represents an IpAddressV6 feature type.
- Classes:
- IpAddressV6
The IpAddressV6 feature type.
- class ads.feature_engineering.feature_type.ip_address_v6.IpAddressV6
Bases:
FeatureType
Type representing IP Address V6.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
Example
>>> from ads.feature_engineering.feature_type.ip_address_v6 import IpAddressV6 >>> import pandas as pd >>> import numpy as np >>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address_v6'] >>> IpAddressV6.validator.is_ip_address_v6(s) 0 False 1 True 2 False 3 False 4 False Name: ip_address, dtype: bool
- description = 'Type representing IP Address V6.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address_v6') >>> s.ads.feature_type = ['ip_address_v6'] >>> s.ads.feature_domain() constraints: [] stats: count: 6 missing: 2 unique: 2 values: IpAddressV6
- Returns
Domain based on the IpAddressV6 feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address_v6'] >>> s.ads.feature_stat() Metric Value 0 count 6 1 unique 2 2 missing 2
- Returns
Summary statistics of the Series provided.
- Return type
Pandas Dataframe
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.ip_address_v6.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (
pandas.Series
) – The data to process.- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.lat_long module
The module that represents a LatLong feature type.
- Classes:
- LatLong
The LatLong feature type.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- class ads.feature_engineering.feature_type.lat_long.LatLong
Bases:
String
Type representing longitude and latitute.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows the location of given address on map base on longitude and latitute.
Example
>>> from ads.feature_engineering.feature_type.lat_long import LatLong >>> import pandas as pd >>> s = pd.Series(["-18.2193965, -93.587285", "-21.0255305, -122.478584", "85.103913, 19.405744", "82.913736, 178.225672", "62.9795085,-66.989705", "54.5604395,95.235090", "24.2811855,-162.380403", "-1.818319,-80.681214", None, "(51.816119, 175.979008)", "(54.3392995,-11.801615)"], name='latlong') >>> s.ads.feature_type = ['lat_long'] >>> LatLong.validator.is_lat_long(s) 0 True 1 True 2 True 3 True 4 True 5 True 6 True 7 True 8 False 9 True 10 True Name: latlong, dtype: bool
- description = 'Type representing longitude and latitute.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> latlong_series = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='latlong' ) >>> latlong_series.ads.feature_type = ['lat_long'] >>> latlong_series.ads.feature_domain() constraints: [] stats: count: 13 missing: 3 unique: 10 values: LatLong
- Returns
Domain based on the LatLong feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows the location of given address on map base on longitude and latitute.
Examples
>>> latlong_series = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='latlong' ) >>> latlong_series.ads.feature_type = ['lat_long'] >>> latlong_series.ads.feature_plot()
- Returns
Plot object for the series based on the LatLong feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generate feature statistics.
Feature statistics include (total)count, unique(count) and missing(count) if there is any.
Examples
>>> latlong_series = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='latlong' ) >>> latlong_series.ads.feature_type = ['lat_long'] >>> latlong_series.ads.feature_stat() Metric Value 0 count 13 1 unique 10 2 missing 3
- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.lat_long.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (
pandas.Series
) – The data to process.- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.object module
The module that represents an Object feature type.
- Classes:
- Object
The Object feature type.
- class ads.feature_engineering.feature_type.object.Object
Bases:
FeatureType
Type representing object.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- description = 'Type representing object.'
- classmethod feature_domain()
- Returns
Nothing.
- Return type
None
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.ordinal module
The module that represents an Ordinal feature type.
- Classes:
- Ordinal
The Ordinal feature type.
- class ads.feature_engineering.feature_type.ordinal.Ordinal
Bases:
FeatureType
Type representing ordered values.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows the counts of observations in each categorical bin using bar chart.
- description = 'Type representing ordered values.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal') >>> x.ads.feature_type = ['ordinal'] >>> x.ads.feature_domain() constraints: - expression: $x in [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] language: python stats: count: 10 missing: 1 unique: 9 values: Ordinal
- Returns
Domain based on the Ordinal feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows the counts of observations in each categorical bin using bar chart.
Examples
>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal') >>> x.ads.feature_type = ['ordinal'] >>> x.ads.feature_plot()
- Returns
The bart chart plot object for the series based on the Continuous feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count), and missing(count) if there is any.
Examples
>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal') >>> x.ads.feature_type = ['ordinal'] >>> x.ads.feature_stat() Metric Value 0 count 10 1 unique 9 2 missing 1
- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.phone_number module
The module that represents a Phone Number feature type.
- Classes:
- PhoneNumber
The Phone Number feature type.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- class ads.feature_engineering.feature_type.phone_number.PhoneNumber
Bases:
String
Type representing phone numbers.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
Examples
>>> from ads.feature_engineering.feature_type.phone_number import PhoneNumber >>> import pandas as pd >>> s = pd.Series([None, "1-640-124-5367", "1-573-916-4412"]) >>> PhoneNumber.validator.is_phone_number(s) 0 False 1 True 2 True dtype: bool
- description = 'Type representing phone numbers.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone') >>> s.ads.feature_type = ['phone_number'] >>> s.ads.feature_domain() constraints: [] stats: count: 7 missing: 4 unique: 2 values: PhoneNumber
- Returns
Domain based on the PhoneNumber feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count) if there is any.
Examples
>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone') >>> s.ads.feature_type = ['phone_number'] >>> s.ads.feature_stat() Metric Value 1 count 7 2 unique 2 3 missing 4
- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.phone_number.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (
pandas.Series
) – The data to process.- Returns
The logical list indicating if the data matches requirements.
- Return type
pandas.Series
ads.feature_engineering.feature_type.string module
The module that represents a String feature type.
- Classes:
- String
The feature type that represents string values.
- class ads.feature_engineering.feature_type.string.String
Bases:
FeatureType
Type representing string values.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows distributions of datasets using wordcloud.
Example
>>> from ads.feature_engineering.feature_type.string import String >>> import pandas as pd >>> s = pd.Series(["Hello", "world", None], name='string') >>> String.validator.is_string(s) 0 True 1 True 2 False Name: string, dtype: bool
- description = 'Type representing string values.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string') >>> string.ads.feature_type = ['string'] >>> string.ads.feature_domain() constraints: [] stats: count: 22 missing: 3 unique: 3 values: String
- Returns
Domain based on the String feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows distributions of datasets using wordcloud.
Examples
>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string') >>> string.ads.feature_type = ['string'] >>> string.ads.feature_plot()
- Returns
Plot object for the series based on the String feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count) if there is any.
Examples
>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string') >>> string.ads.feature_type = ['string'] >>> string.ads.feature_stat() Metric Value 0 count 22 1 unique 3 2 missing 3
- Returns
Summary statistics of the Series or Dataframe provided.
- Return type
Pandas Dataframe
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.string.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (pd.Series) – The data to process.
- Returns
pd.Series
- Return type
The logical list indicating if the data matches requirements.
ads.feature_engineering.feature_type.text module
The module that represents a Text feature type.
- Classes:
- Text
The Text feature type.
- class ads.feature_engineering.feature_type.text.Text
Bases:
String
Type representing text values.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes
Shows distributions of datasets using wordcloud.
- description = 'Type representing text values.'
- classmethod feature_domain()
- Returns
Nothing.
- Return type
None
- static feature_plot(x: Series) Axes
Shows distributions of datasets using wordcloud.
Examples
>>> text = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='text') >>> text.ads.feature_type = ['text'] >>> text.ads.feature_plot()
- Returns
Plot object for the series based on the Text feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.unknown module
The module that represents an Unknown feature type.
- Classes:
- Text
The Unknown feature type.
- class ads.feature_engineering.feature_type.unknown.Unknown
Bases:
FeatureType
Type representing third-party dtypes.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- description = 'Type representing unknown type.'
- classmethod feature_domain()
- Returns
Nothing.
- Return type
None
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
ads.feature_engineering.feature_type.zip_code module
The module that represents a ZipCode feature type.
- Classes:
- ZipCode
The ZipCode feature type.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- class ads.feature_engineering.feature_type.zip_code.ZipCode
Bases:
String
Type representing postal code.
- description
The feature type description.
- Type
str
- name
The feature type name.
- Type
str
- warning
Provides functionality to register warnings and invoke them.
- Type
- validator
Provides functionality to register validators and invoke them.
- feature_stat(x: pd.Series) pd.DataFrame
Generates feature statistics.
- feature_plot(x: pd.Series) plt.Axes
Shows the geometry distribution base on location of zipcode.
Example
>>> from ads.feature_engineering.feature_type.zip_code import ZipCode >>> import pandas as pd >>> import numpy as np >>> s = pd.Series(["94065", "90210", np.NaN, None], name='zipcode') >>> ZipCode.validator.is_zip_code(s) 0 True 1 True 2 False 3 False Name: zipcode, dtype: bool
- description = 'Type representing postal code.'
- classmethod feature_domain(x: Series) Domain
Generate the domain of the data of this feature type.
Examples
>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode') >>> zipcode.ads.feature_type = ['zip_code'] >>> zipcode.ads.feature_domain() constraints: [] stats: count: 4 missing: 2 unique: 2 values: ZipCode
- Returns
Domain based on the ZipCode feature type.
- Return type
ads.feature_engineering.schema.Domain
- static feature_plot(x: Series) Axes
Shows the geometry distribution base on location of zipcode.
Examples
>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode') >>> zipcode.ads.feature_type = ['zip_code'] >>> zipcode.ads.feature_plot()
- Returns
Plot object for the series based on the ZipCode feature type.
- Return type
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode') >>> zipcode.ads.feature_type = ['zip_code'] >>> zipcode.ads.feature_stat() Metric Value 0 count 4 1 unique 2 2 missing 2
- Returns
Summary statistics of the Series provided.
- Return type
Pandas Dataframe
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>
- ads.feature_engineering.feature_type.zip_code.default_handler(data: Series, *args, **kwargs) Series
Processes given data and indicates if the data matches requirements.
- Parameters
data (pd.Series) – The data to process.
- Returns
pd.Series
- Return type
The logical list indicating if the data matches requirements.
ads.feature_engineering.feature_type.handler.feature_validator module
The module that helps to register custom validators for the feature types and extending registered validators with dispatching based on the specific arguments.
Classes
- FeatureValidator
The Feature Validator class to manage custom validators.
- FeatureValidatorMethod
The Feature Validator Method class. Extends methods which requires dispatching based on the specific arguments.
- class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator
Bases:
object
The Feature Validator class to manage custom validators.
- register(self, name: str, handler: Callable, condition: Union[Tuple, Dict[str, Any]] = None, replace: bool = False) None
Registers new validator.
- unregister(self, name: str, condition: Union[Tuple, Dict[str, Any]] = None) None
Unregisters validator.
- registered(self) pd.DataFrame
Gets the list of registered validators.
Examples
>>> series = pd.Series(['+1-202-555-0141', '+1-202-555-0142'], name='Phone Number')
>>> def phone_number_validator(data: pd.Series) -> pd.Series: ... print("phone_number_validator") ... return data
>>> def universal_phone_number_validator(data: pd.Series, country_code) -> pd.Series: ... print("universal_phone_number_validator") ... return data
>>> def us_phone_number_validator(data: pd.Series, country_code) -> pd.Series: ... print("us_phone_number_validator") ... return data
>>> PhoneNumber.validator.register(name="is_phone_number", handler=phone_number_validator, replace=True) >>> PhoneNumber.validator.register(name="is_phone_number", handler=universal_phone_number_validator, condition = ('country_code',)) >>> PhoneNumber.validator.register(name="is_phone_number", handler=us_phone_number_validator, condition = {'country_code':'+1'})
>>> PhoneNumber.validator.is_phone_number(series) phone_number_validator 0 +1-202-555-0141 1 +1-202-555-0142
>>> PhoneNumber.validator.is_phone_number(series, country_code = '+7') universal_phone_number_validator 0 +1-202-555-0141 1 +1-202-555-0142
>>> PhoneNumber.validator.is_phone_number(series, country_code = '+1') us_phone_number_validator 0 +1-202-555-0141 1 +1-202-555-0142
>>> PhoneNumber.validator.registered() Validator Condition Handler --------------------------------------------------------------------------------- 0 is_phone_number () phone_number_validator 1 is_phone_number ('country_code') universal_phone_number_validator 2 is_phone_number {'country_code': '+1'} us_phone_number_validator
>>> series.ads.validator.is_phone_number() phone_number_validator 0 +1-202-555-0141 1 +1-202-555-0142
>>> series.ads.validator.is_phone_number(country_code = '+7') universal_phone_number_validator 0 +1-202-555-0141 1 +1-202-555-0142
>>> series.ads.validator.is_phone_number(country_code = '+1') us_phone_number_validator 0 +1-202-555-0141 1 +1-202-555-0142
Initializes the FeatureValidator.
- register(name: str, handler: Callable, condition: Optional[Union[Tuple, Dict[str, Any]]] = None, replace: bool = False) None
Registers new validator.
- Parameters
name (str) – The validator name.
handler (callable) – The handler.
condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator.
replace (bool) – The flag indicating if the registered validator should be replaced with the new one.
- Returns
Nothing.
- Return type
None
- Raises
ValueError – The name is empty or handler is not provided.
TypeError – The handler is not callable. The name of the validator is not a string.
ValidatorAlreadyExists – The validator is already registered.
- registered() DataFrame
Gets the list of registered validators.
- Returns
The list of registerd validators.
- Return type
pd.DataFrame
- unregister(name: str, condition: Optional[Union[Tuple, Dict[str, Any]]] = None) None
Unregisters validator.
- Parameters
name (str) – The name of the validator to be unregistered.
condition (Union[Tuple, Dict[str, Any]]) – The condition for the validator to be unregistered.
- Returns
Nothing.
- Return type
None
- Raises
TypeError – The name of the validator is not a string.
ValidatorNotFound – The validator not found.
ValidatorWIthConditionNotFound – The validator with provided condition not found.
- class ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidatorMethod(handler: Callable)
Bases:
object
The Feature Validator Method class.
Extends methods which requires dispatching based on the specific arguments.
- register(self, condition: Union[Tuple, Dict[str, Any]], handler: Callable) None
Registers new handler.
- unregister(self, condition: Union[Tuple, Dict[str, Any]]) None
Unregisters existing handler.
- registered(self) pd.DataFrame
Gets the list of registered handlers.
Initializes the Feature Validator Method.
- Parameters
handler (Callable) – The handler that will be called by default if suitable one not found.
- register(condition: Union[Tuple, Dict[str, Any]], handler: Callable) None
Registers new handler.
- Parameters
condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to register a new handler.
handler (Callable) – The handler to be registered.
- Returns
Nothing.
- Return type
None
- Raises
ValueError – If condition not provided or provided in the wrong format. If handler not provided or has wrong format.
- registered() DataFrame
Gets the list of registered handlers.
- Returns
The list of registerd handlers.
- Return type
pd.DataFrame
- unregister(condition: Union[Tuple, Dict[str, Any]]) None
Unregisters existing handler.
- Parameters
condition (Union[Tuple, Dict[str, Any]]) – The condition which will be used to unregister a handler.
- Returns
Nothing.
- Return type
None
- Raises
ValueError – If condition not provided or provided in the wrong format. If condition not registered.
- exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorAlreadyExists(name: str)
Bases:
ValueError
- exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorNotFound(name: str)
Bases:
ValueError
- exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionAlreadyExists(name: str)
Bases:
ValueError
- exception ads.feature_engineering.feature_type.handler.feature_validator.ValidatorWithConditionNotFound(name: str)
Bases:
ValueError
- exception ads.feature_engineering.feature_type.handler.feature_validator.WrongHandlerMethodSignature(handler_name: str, condition: str, handler_signature: str)
Bases:
ValueError
ads.feature_engineering.feature_type.handler.feature_warning module
The module that helps to register custom warnings for the feature types.
Classes
- FeatureWarning
The Feature Warning class. Provides functionality to register warning handlers and invoke them.
Examples
>>> warning = FeatureWarning()
>>> def warning_handler_zeros_count(data):
... return pd.DataFrame(
... [['Zeros', 'Age has 38 zeros', 'Count', 38]],
... columns=['Warning', 'Message', 'Metric', 'Value'])
>>> def warning_handler_zeros_percentage(data):
... return pd.DataFrame(
... [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']],
... columns=['Warning', 'Message', 'Metric', 'Value'])
>>> warning.register(name="zeros_count", handler=warning_handler_zeros_count)
>>> warning.register(name="zeros_percentage", handler=warning_handler_percentage)
>>> warning.registered()
Name Handler
----------------------------------------------------------
0 zeros_count warning_handler_zeros_count
1 zeros_percentage warning_handler_zeros_percentage
>>> warning.zeros_percentage(data_series)
Warning Message Metric Value
----------------------------------------------------------------
0 Zeros Age has 38 zeros Count 38
>>> warning.zeros_count(data_series)
Warning Message Metric Value
----------------------------------------------------------------
1 Zeros Age has 12.2% zeros Percentage 12.2%
>>> warning(data_series)
Warning Message Metric Value
----------------------------------------------------------------
0 Zeros Age has 38 zeros Count 38
1 Zeros Age has 12.2% zeros Percentage 12.2%
>>> warning.unregister('zeros_count')
>>> warning(data_series)
Warning Message Metric Value
----------------------------------------------------------------
0 Zeros Age has 12.2% zeros Percentage 12.2%
- class ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning
Bases:
object
The Feature Warning class.
Provides functionality to register warning handlers and invoke them.
- register(self, name: str, handler: Callable) None
Registers a new warning for the feature type.
- unregister(self, name: str) None
Unregisters warning.
- registered(self) pd.DataFrame
Gets the list of registered warnings.
Examples
>>> warning = FeatureWarning() >>> def warning_handler_zeros_count(data): ... return pd.DataFrame( ... [['Zeros', 'Age has 38 zeros', 'Count', 38]], ... columns=['Warning', 'Message', 'Metric', 'Value']) >>> def warning_handler_zeros_percentage(data): ... return pd.DataFrame( ... [['Zeros', 'Age has 12.2% zeros', 'Percentage', '12.2%']], ... columns=['Warning', 'Message', 'Metric', 'Value']) >>> warning.register(name="zeros_count", handler=warning_handler_zeros_count) >>> warning.register(name="zeros_percentage", handler=warning_handler_percentage) >>> warning.registered() Warning Handler ---------------------------------------------------------- 0 zeros_count warning_handler_zeros_count 1 zeros_percentage warning_handler_zeros_percentage
>>> warning.zeros_percentage(data_series) Warning Message Metric Value ---------------------------------------------------------------- 0 Zeros Age has 38 zeros Count 38
>>> warning.zeros_count(data_series) Warning Message Metric Value ---------------------------------------------------------------- 1 Zeros Age has 12.2% zeros Percentage 12.2%
>>> warning(data_series) Warning Message Metric Value ---------------------------------------------------------------- 0 Zeros Age has 38 zeros Count 38 1 Zeros Age has 12.2% zeros Percentage 12.2%
>>> warning.unregister('zeros_count') >>> warning(data_series) Warning Message Metric Value ---------------------------------------------------------------- 0 Zeros Age has 12.2% zeros Percentage 12.2%
Initializes the FeatureWarning.
- register(name: str, handler: Callable, replace: bool = False) None
Registers a new warning.
- Parameters
name (str) – The warning name.
handler (callable) – The handler associated with the warning.
replace (bool) – The flag indicating if the registered warning should be replaced with the new one.
- Returns
Nothing
- Return type
None
- Raises
ValueError – If warning name is empty or handler not defined.
TypeError – If handler is not callable.
WarningAlreadyExists – If warning is already registered.
- registered() DataFrame
Gets the list of registered warnings.
- Return type
pd.DataFrame
Examples
>>> The list of registerd warnings in DataFrame format. Name Handler ----------------------------------------------------------- 0 zeros_count warning_handler_zeros_count 1 zeros_percentage warning_handler_zeros_percentage
- unregister(name: str) None
Unregisters warning.
- Parameters
name (str) – The name of warning to be unregistered.
- Returns
Nothing.
- Return type
None
- Raises
ValueError – If warning name is not provided or empty.
WarningNotFound – If warning not found.
ads.feature_engineering.feature_type.handler.warnings module
The module with all default warnings provided to user. These are registered to relevant feature types directly in the feature type files themselves.
- ads.feature_engineering.feature_type.handler.warnings.high_cardinality_handler(s: Series) DataFrame
Warning if number of unique values (including Nan) in series is greater than or equal to 15.
- Parameters
s (pd.Series) – Pandas series - column of some feature type.
- Returns
Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists count of unique values.
- Return type
pd.Dataframe
- ads.feature_engineering.feature_type.handler.warnings.missing_values_handler(s: Series) DataFrame
Warning for > 5 percent missing values (Nans) in series.
- Parameters
s (pd.Series) – Pandas series - column of some feature type.
- Returns
Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of missing values and second is percentage of missing values.
- Return type
pd.Dataframe
- ads.feature_engineering.feature_type.handler.warnings.skew_handler(s: Series) DataFrame
Warning if absolute value of skew is greater than 1.
- Parameters
s (pd.Series) – Pandas series - column of some feature type, expects continuous values.
- Returns
Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 1 rows, which lists skew value of that column.
- Return type
pd.Dataframe
- ads.feature_engineering.feature_type.handler.warnings.zeros_handler(s: Series) DataFrame
Warning for greater than 10 percent zeros in series.
- Parameters
s (pd.Series) – Pandas series - column of some feature type.
- Returns
Dataframe with 4 columns ‘Warning’, ‘Message’, ‘Metric’, ‘Value’ and 2 rows, where first row is count of zero values and second is percentage of zero values.
- Return type
pd.Dataframe