ads.feature_engineering.feature_type package¶
Subpackages¶
- ads.feature_engineering.feature_type.adsstring package
- Subpackages
- Submodules
- ads.feature_engineering.feature_type.adsstring.common_regex_mixin module
CommonRegex
CommonRegexMixin
CommonRegexMixin.address
CommonRegexMixin.credit_card
CommonRegexMixin.date
CommonRegexMixin.email
CommonRegexMixin.ip
CommonRegexMixin.link
CommonRegexMixin.phone_number_US
CommonRegexMixin.price
CommonRegexMixin.redact()
CommonRegexMixin.redact_map
CommonRegexMixin.ssn
CommonRegexMixin.time
CommonRegexMixin.zip_code
- ads.feature_engineering.feature_type.adsstring.oci_language module
- ads.feature_engineering.feature_type.adsstring.string module
ADSString
ADSString.plugins
ADSString.string
ADSString.capitalize()
ADSString.casefold()
ADSString.center()
ADSString.count()
ADSString.description
ADSString.encode()
ADSString.endswith()
ADSString.expandtabs()
ADSString.find()
ADSString.format()
ADSString.format_map()
ADSString.help()
ADSString.index()
ADSString.isalnum()
ADSString.isalpha()
ADSString.isascii()
ADSString.isdecimal()
ADSString.isdigit()
ADSString.isidentifier()
ADSString.islower()
ADSString.isnumeric()
ADSString.isprintable()
ADSString.isspace()
ADSString.istitle()
ADSString.isupper()
ADSString.join()
ADSString.language_model_cache
ADSString.ljust()
ADSString.lower()
ADSString.lstrip()
ADSString.maketrans()
ADSString.nlp_backend()
ADSString.partition()
ADSString.plugin_clear()
ADSString.plugin_list()
ADSString.plugin_register()
ADSString.plugins
ADSString.redact()
ADSString.removeprefix()
ADSString.removesuffix()
ADSString.replace()
ADSString.rfind()
ADSString.rindex()
ADSString.rjust()
ADSString.rpartition()
ADSString.rsplit()
ADSString.rstrip()
ADSString.split()
ADSString.splitlines()
ADSString.startswith()
ADSString.string
ADSString.strip()
ADSString.swapcase()
ADSString.title()
ADSString.translate()
ADSString.upper()
ADSString.validator
ADSString.warning
ADSString.zfill()
to_adsstring()
wrap_output_string()
- Module contents
- ads.feature_engineering.feature_type.handler package
- Submodules
- ads.feature_engineering.feature_type.handler.feature_validator module
- ads.feature_engineering.feature_type.handler.feature_warning module
- ads.feature_engineering.feature_type.handler.warnings module
- Module contents
Submodules¶
ads.feature_engineering.feature_type.address module¶
The module that represents an Address feature type.
- Classes:
- Address
The Address feature type.
- class ads.feature_engineering.feature_type.address.Address[source]¶
Bases:
String
Type representing address.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes [source]¶
Shows the location of given address on map base on zip code.
Example
>>> from ads.feature_engineering.feature_type.address import Address >>> import pandas as pd >>> address = pd.Series(['1 Miller Drive, New York, NY 12345', '1 Berkeley Street, Boston, MA 67891', '54305 Oxford Street, Seattle, WA 95132', '']) >>> Address.validator.is_address(address) 0 True 1 True 2 True 3 False dtype: bool
- description = 'Type representing address.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345', '1 Berkeley Street, Boston, MA 67891', '54305 Oxford Street, Seattle, WA 95132', ''], name='address') >>> address.ads.feature_type = ['address'] >>> address.ads.feature_domain() constraints: [] stats: count: 4 missing: 1 unique: 3 values: Address
- Returns:
Domain based on the Address feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows the location of given address on map base on zip code.
Examples
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345', '1 Berkeley Street, Boston, MA 67891', '54305 Oxford Street, Seattle, WA 95132', ''], name='address') >>> address.ads.feature_type = ['address'] >>> address.ads.feature_plot()
- Returns:
Plot object for the series based on the Address feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345', '1 Berkeley Street, Boston, MA 67891', '54305 Oxford Street, Seattle, WA 95132', ''], name='address') >>> address.ads.feature_type = ['address'] >>> address.ads.feature_stat() Metric Value 0 count 4 1 unique 3 2 missing 1
- Returns:
Summary statistics of the Series provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.address.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (pd.Series) – The data to process.
- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.base module¶
- class ads.feature_engineering.feature_type.base.FeatureBaseType(classname, bases, dictionary)[source]¶
Bases:
type
The helper metaclass to extend fucntionality of FeatureType class.
- class ads.feature_engineering.feature_type.base.FeatureBaseTypeMeta(classname, bases, dictionary)[source]¶
Bases:
FeatureBaseType
,ABCMeta
The class to provide compatibility between ABC and FeatureBaseType metaclass.
- class ads.feature_engineering.feature_type.base.FeatureType[source]¶
Bases:
ABC
Abstract case for feature types. Default class attribute include name and description. Name is auto generated using camel to snake conversion unless specified.
- description = 'Base feature type.'¶
- name = 'feature_type'¶
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.boolean module¶
The module that represents a Boolean feature type.
- Classes:
- Boolean
The feature type that represents binary values True/False.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- class ads.feature_engineering.feature_type.boolean.Boolean[source]¶
Bases:
FeatureType
Type representing binary values True/False.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes [source]¶
Show the counts of observations in True/False using bars.
Examples
>>> from ads.feature_engineering.feature_type.boolean import Boolean >>> import pandas as pd >>> import numpy as np >>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool') >>> s.ads.feature_type = ['boolean'] >>> Boolean.validator.is_boolean(s) 0 True 1 True 2 True 3 True 4 False 5 False dtype: bool
- description = 'Type representing binary values True/False.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool') >>> s.ads.feature_type = ['boolean'] >>> s.ads.feature_domain() constraints: - expression: $x in [True, False] language: python stats: count: 6 missing: 2 unique: 2 values: Boolean
- Returns:
Domain based on the Boolean feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows the counts of observations in True/False using bars.
- Parameters:
x (
pandas.Series
) – The feature being evaluated.- Returns:
Plot object for the series based on the Boolean feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
Examples
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool') >>> s.ads.feature_type = ['boolean'] >>> s.ads.feature_plot()
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
- Parameters:
x (
pandas.Series
) – The feature being evaluated.- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
pandas.DataFrame
Examples
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool') >>> s.ads.feature_type = ['boolean'] >>> s.ads.feature_stat() Metric Value 0 count 6 1 unique 2 2 missing 2
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.boolean.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (
pandas.Series
) – The data to process.- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.category module¶
The module that represents a Category feature type.
- Classes:
- Category
The Category feature type.
- class ads.feature_engineering.feature_type.category.Category[source]¶
Bases:
FeatureType
Type representing discrete unordered values.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes [source]¶
Shows the counts of observations in each categorical bin using bar chart.
- description = 'Type representing discrete unordered values.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='category') >>> cat.ads.feature_type = ['category'] >>> cat.ads.feature_domain() constraints: - expression: $x in ['S', 'C', 'Q', ''] language: python stats: count: 22 missing: 3 unique: 3 values: Category
- Returns:
Domain based on the Category feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows the counts of observations in each categorical bin using bar chart.
- Parameters:
x (
pandas.Series
) – The feature being evaluated.- Returns:
Plot object for the series based on the Category feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
Examples
>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory') >>> cat.ads.feature_type = ['сategory'] >>> cat.ads.feature_plot()
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count) if there are any.
- Parameters:
x (
pandas.Series
) – The feature being evaluated.- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
pandas.DataFrame
Examples
>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory') >>> cat.ads.feature_type = ['сategory'] >>> cat.ads.feature_stat() Metric Value 0 count 22 1 unique 3 2 missing 3
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.constant module¶
The module that represents a Constant feature type.
- Classes:
- Constant
The Constant feature type.
- class ads.feature_engineering.feature_type.constant.Constant[source]¶
Bases:
FeatureType
Type representing constant values.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- description = 'Type representing constant values.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type. .. rubric:: Example
>>> s = pd.Series([1, 1, 1, 1, 1], name='constant') >>> s.ads.feature_type = ['constant'] >>> s.ads.feature_domain() constraints: [] stats: count: 5 unique: 1 values: Constant
- Returns:
Domain based on the Constant feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows the counts of observations in bars.
- Parameters:
x (
pandas.Series
) – The feature being shown.
Examples
>>> s = pd.Series([1, 1, 1, 1, 1], name='constant') >>> s.ads.feature_type = ['constant'] >>> s.ads.feature_plot()
- Returns:
Plot object for the series based on the Constant feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
- Parameters:
x (
pandas.Series
) – The feature being evaluated.- Returns:
Summary statistics of the Series provided.
- Return type:
pandas.DataFrame
Examples
>>> s = pd.Series([1, 1, 1, 1, 1], name='constant') >>> s.ads.feature_type = ['constant'] >>> s.ads.feature_stat() Metric Value 0 count 5 1 unique 1
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.continuous module¶
The module that represents a Continuous feature type.
- Classes:
- Continuous
The Continuous feature type.
- class ads.feature_engineering.feature_type.continuous.Continuous[source]¶
Bases:
FeatureType
Type representing continuous values.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- description = 'Type representing continuous values.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25, 4.43, 3.26, np.NaN, None], name='continuous') >>> cts.ads.feature_type = ['continuous'] >>> cts.ads.feature_domain() constraints: [] stats: count: 10.0 lower quartile: 3.058 mean: 4.959 median: 3.81 missing: 2.0 sample maximum: 13.32 sample minimum: 2.25 skew: 2.175 standard deviation: 3.62 upper quartile: 4.908 values: Continuous
- Returns:
Domain based on the Continuous feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows distributions of datasets using box plot.
Examples
>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25, 4.43, 3.26, np.NaN, None], name='continuous') >>> cts.ads.feature_type = ['continuous'] >>> cts.ads.feture_plot()
- Returns:
Plot object for the series based on the Continuous feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, skew and missing(count).
Examples
>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25, 4.43, 3.26, np.NaN, None], name='continuous') >>> cts.ads.feature_type = ['continuous'] >>> cts.ads.feature_stat() Metric Value 0 count 10.000 1 mean 4.959 2 standard deviation 3.620 3 sample minimum 2.250 4 lower quartile 3.058 5 median 3.810 6 upper quartile 4.908 7 sample maximum 13.320 8 skew 2.175 9 missing 2.000
- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.creditcard module¶
The module that represents a CreditCard feature type.
- Classes:
- CreditCard
The CreditCard feature type.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- _luhn_checksum(card_number: str) -> float
Implements Luhn algorithm to validate a credit card number.
- class ads.feature_engineering.feature_type.creditcard.CreditCard[source]¶
Bases:
String
Type representing credit card numbers.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes [source]¶
Shows the counts of observations in each credit card type using bar chart.
Examples
>>> from ads.feature_engineering.feature_type.creditcard import CreditCard >>> import pandas as pd >>> s = pd.Series(["4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190"], name='credit_card') >>> s.ads.feature_type = ['credit_card'] >>> CreditCard.validator.is_credit_card(s) 0 True 1 False 2 True 3 True 4 True 5 True Name: credit_card, dtype: bool
- description = 'Type representing credit card numbers.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> visa = [ "4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190", ] >>> mastercard = [ "5334180299390324", "5111466404826446", "5273114895302717", "5430972152222336", "5536426859893306", ] >>> amex = [ "371025944923273", "374745112042294", "340984902710890", "375767928645325", "370720852891659", ] >>> creditcard_list = visa + mastercard + amex >>> creditcard_series = pd.Series(creditcard_list,name='card') >>> creditcard_series.ads.feature_type = ['credit_card'] >>> creditcard_series.ads.feature_domain() constraints: [] stats: count: 16 count_Amex: 5 count_Diners Club: 2 count_MasterCard: 3 count_Visa: 5 count_missing: 1 missing: 1 unique: 15 values: CreditCard
- Returns:
Domain based on the CreditCard feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows the counts of observations in each credit card type using bar chart.
Examples
>>> visa = [ "4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190", ] >>> mastercard = [ "5334180299390324", "5111466404826446", "5273114895302717", "5430972152222336", "5536426859893306", ] >>> amex = [ "371025944923273", "374745112042294", "340984902710890", "375767928645325", "370720852891659", ] >>> creditcard_list = visa + mastercard + amex >>> creditcard_series = pd.Series(creditcard_list,name='card') >>> creditcard_series.ads.feature_type = ['credit_card'] >>> creditcard_series.ads.feature_plot()
- Returns:
Plot object for the series based on the CreditCard feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series)[source]¶
Generates feature statistics.
- Feature statistics include (total)count, unique(count), missing(count) and
count of each credit card type.
Examples
>>> visa = [ "4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190", ] >>> mastercard = [ "5334180299390324", "5111466404826446", "5273114895302717", "5430972152222336", "5536426859893306", ] >>> amex = [ "371025944923273", "374745112042294", "340984902710890", "375767928645325", "370720852891659", ] >>> creditcard_list = visa + mastercard + amex >>> creditcard_series = pd.Series(creditcard_list,name='card') >>> creditcard_series.ads.feature_type = ['credit_card'] >>> creditcard_series.ads.feature_stat() Metric Value 0 count 16 1 unique 15 2 missing 1 3 count_Amex 5 4 count_Visa 5 5 count_MasterCard 3 6 count_Diners Club 2 7 count_missing 1
- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.creditcard.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (
pandas.Series
) – The data to process.- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.datetime module¶
The module that represents a DateTime feature type.
- Classes:
- DateTime
The DateTime feature type.
- class ads.feature_engineering.feature_type.datetime.DateTime[source]¶
Bases:
FeatureType
Type representing date and/or time.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes [source]¶
Shows distributions of datetime datasets using histograms.
Example
>>> from ads.feature_engineering.feature_type.datetime import DateTime >>> import pandas as pd >>> s = pd.Series(["12/12/12", "12/12/13", None, "12/12/14"], name='datetime') >>> s.ads.feature_type = ['date_time'] >>> DateTime.validator.is_datetime(s) 0 True 1 True 2 False 3 True Name: datetime, dtype: bool
- description = 'Type representing date and/or time.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime') >>> s.ads.feature_type = ['date_time'] >>> s.ads.feature_domain() constraints: [] stats: count: 8 missing: 3 sample maximum: April/15/11 sample minimum: 3/11/2000 values: DateTime
- Returns:
Domain based on the DateTime feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows distributions of datetime datasets using histograms.
Examples
>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime') >>> x.ads.feature_type = ['date_time'] >>> x.ads.feature_plot()
- Returns:
Plot object for the series based on the DateTime feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, sample maximum, sample minimum, and missing(count) if there is any.
Examples
>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime') >>> x.ads.feature_type = ['date_time'] >>> x.ads.feature_stat() Metric Value 0 count 8 1 sample maximum April/15/11 2 sample minimum 3/11/2000 3 missing 3
- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.datetime.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (
pandas.Series
) – The data to process.- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.discrete module¶
The module that represents a Discrete feature type.
- Classes:
- Discrete
The Discrete feature type.
- class ads.feature_engineering.feature_type.discrete.Discrete[source]¶
Bases:
FeatureType
Type representing discrete values.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- description = 'Type representing discrete values.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> discrete_numbers = pd.Series([35, 25, 13, 42], name='discrete') >>> discrete_numbers.ads.feature_type = ['discrete'] >>> discrete_numbers.ads.feature_domain() constraints: [] stats: count: 4 unique: 4 values: Discrete
- Returns:
Domain based on the Discrete feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows distributions of datasets using box plot.
Examples
>>> discrete_numbers = pd.Series([35, 25, 13, 42], name='discrete') >>> discrete_numbers.ads.feature_type = ['discrete'] >>> discrete_numbers.ads.feature_stat() Metric Value 0 count 4 1 unique 4
- Returns:
Plot object for the series based on the Discrete feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> discrete_numbers = pd.Series([35, 25, 13, 42], name='discrete') >>> discrete_numbers.ads.feature_type = ['discrete'] >>> discrete_numbers.ads.feature_stat() discrete count 4 unique 4
- Returns:
Summary statistics of the Series provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.document module¶
The module that represents a Document feature type.
- Classes:
- Document
The Document feature type.
- class ads.feature_engineering.feature_type.document.Document[source]¶
Bases:
FeatureType
Type representing document values.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- description = 'Type representing document values.'¶
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.gis module¶
The module that represents a GIS feature type.
- Classes:
- GIS
The GIS feature type.
- class ads.feature_engineering.feature_type.gis.GIS[source]¶
Bases:
FeatureType
Type representing geographic information.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes [source]¶
Shows the location of given address on map base on longitude and latitute.
Example
>>> from ads.feature_engineering.feature_type.gis import GIS >>> import pandas as pd >>> s = pd.Series(["-18.2193965, -93.587285", "-21.0255305, -122.478584", "85.103913, 19.405744", "82.913736, 178.225672", "62.9795085,-66.989705", "54.5604395,95.235090", "24.2811855,-162.380403", "-1.818319,-80.681214", None, "(51.816119, 175.979008)", "(54.3392995,-11.801615)"], name='gis') >>> s.ads.feature_type = ['gis'] >>> GIS.validator.is_gis(s) 0 True 1 True 2 True 3 True 4 True 5 True 6 True 7 True 8 False 9 True 10 True Name: gis, dtype: bool
- description = 'Type representing geographic information.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> gis = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='gis' ) >>> gis.ads.feature_type = ['gis'] >>> gis.ads.feature_domain() constraints: [] stats: count: 13 missing: 3 unique: 10 values: GIS
- Returns:
Domain based on the GIS feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows the location of given address on map base on longitude and latitute.
Examples
>>> gis = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='gis' ) >>> gis.ads.feature_type = ['gis'] >>> gis.ads.feature_plot()
- Returns:
Plot object for the series based on the GIS feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> gis = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='gis' ) >>> gis.ads.feature_type = ['gis'] >>> gis.ads.feature_stat() gis count 13 unique 10 missing 3
- Returns:
Summary statistics of the Series provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.gis.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (
pandas.Series
) – The data to process.- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.integer module¶
The module that represents an Integer feature type.
- Classes:
- Integer
The Integer feature type.
- class ads.feature_engineering.feature_type.integer.Integer[source]¶
Bases:
FeatureType
Type representing integer values.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- description = 'Type representing integer values.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='integer') >>> s.ads.feature_type = ['integer'] >>> s.ads.feature_domain() constraints: [] stats: count: 6 freq: 2 missing: 2 top: true unique: 2 values: Integer
- Returns:
Domain based on the Integer feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows distributions of datasets using box plot.
Examples
>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer') >>> x.ads.feature_type = ['integer'] >>> x.ads.feature_plot()
- Returns:
Plot object for the series based on the Integer feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, max and missing(count) if there is any.
Examples
>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer') >>> x.ads.feature_type = ['integer'] >>> x.ads.feature_stat() Metric Value 0 count 7 1 mean 1 2 standard deviation 1 3 sample minimum 0 4 lower quartile 1 5 median 1 6 upper quartile 2 7 sample maximum 4 8 missing 1
- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.ip_address module¶
The module that represents an IpAddress feature type.
- Classes:
- IpAddress
The IpAddress feature type.
- class ads.feature_engineering.feature_type.ip_address.IpAddress[source]¶
Bases:
FeatureType
Type representing IP Address.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
Example
>>> from ads.feature_engineering.feature_type.ip_address import IpAddress >>> import pandas as pd >>> import numpy as np >>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address'] >>> IpAddress.validator.is_ip_address(s) 0 True 1 True 2 False 3 False 4 False Name: ip_address, dtype: bool
- description = 'Type representing IP Address.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address'] >>> s.ads.feature_domain() constraints: [] stats: count: 6 missing: 2 unique: 3 values: IpAddress
- Returns:
Domain based on the IpAddress feature type.
- Return type:
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address'] >>> s.ads.feature_stat() Metric Value 0 count 6 1 unique 2 2 missing 2
- Returns:
Summary statistics of the Series provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.ip_address.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (
pandas.Series
) – The data to process.- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.ip_address_v4 module¶
The module that represents an IpAddressV4 feature type.
- Classes:
- IpAddressV4
The IpAddressV4 feature type.
- class ads.feature_engineering.feature_type.ip_address_v4.IpAddressV4[source]¶
Bases:
FeatureType
Type representing IP Address V4.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
Example
>>> from ads.feature_engineering.feature_type.ip_address_v4 import IpAddressV4 >>> import pandas as pd >>> import numpy as np >>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address_v4'] >>> IpAddressV4.validator.is_ip_address_v4(s) 0 True 1 False 2 False 3 False 4 False Name: ip_address, dtype: bool
- description = 'Type representing IP Address V4.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address_v4') >>> s.ads.feature_type = ['ip_address_v4'] >>> s.ads.feature_domain() constraints: [] stats: count: 6 missing: 2 unique: 4 values: IpAddressV4
- Returns:
Domain based on the IpAddressV4 feature type.
- Return type:
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address_v4'] >>> s.ads.feature_stat() Metric Value 0 count 6 1 unique 4 2 missing 2
- Returns:
Summary statistics of the Series provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.ip_address_v4.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (
pandas.Series
) – The data to process.- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.ip_address_v6 module¶
The module that represents an IpAddressV6 feature type.
- Classes:
- IpAddressV6
The IpAddressV6 feature type.
- class ads.feature_engineering.feature_type.ip_address_v6.IpAddressV6[source]¶
Bases:
FeatureType
Type representing IP Address V6.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
Example
>>> from ads.feature_engineering.feature_type.ip_address_v6 import IpAddressV6 >>> import pandas as pd >>> import numpy as np >>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address_v6'] >>> IpAddressV6.validator.is_ip_address_v6(s) 0 False 1 True 2 False 3 False 4 False Name: ip_address, dtype: bool
- description = 'Type representing IP Address V6.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address_v6') >>> s.ads.feature_type = ['ip_address_v6'] >>> s.ads.feature_domain() constraints: [] stats: count: 6 missing: 2 unique: 2 values: IpAddressV6
- Returns:
Domain based on the IpAddressV6 feature type.
- Return type:
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address') >>> s.ads.feature_type = ['ip_address_v6'] >>> s.ads.feature_stat() Metric Value 0 count 6 1 unique 2 2 missing 2
- Returns:
Summary statistics of the Series provided.
- Return type:
Pandas Dataframe
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.ip_address_v6.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (
pandas.Series
) – The data to process.- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.lat_long module¶
The module that represents a LatLong feature type.
- Classes:
- LatLong
The LatLong feature type.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- class ads.feature_engineering.feature_type.lat_long.LatLong[source]¶
Bases:
String
Type representing longitude and latitute.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes [source]¶
Shows the location of given address on map base on longitude and latitute.
Example
>>> from ads.feature_engineering.feature_type.lat_long import LatLong >>> import pandas as pd >>> s = pd.Series(["-18.2193965, -93.587285", "-21.0255305, -122.478584", "85.103913, 19.405744", "82.913736, 178.225672", "62.9795085,-66.989705", "54.5604395,95.235090", "24.2811855,-162.380403", "-1.818319,-80.681214", None, "(51.816119, 175.979008)", "(54.3392995,-11.801615)"], name='latlong') >>> s.ads.feature_type = ['lat_long'] >>> LatLong.validator.is_lat_long(s) 0 True 1 True 2 True 3 True 4 True 5 True 6 True 7 True 8 False 9 True 10 True Name: latlong, dtype: bool
- description = 'Type representing longitude and latitute.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> latlong_series = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='latlong' ) >>> latlong_series.ads.feature_type = ['lat_long'] >>> latlong_series.ads.feature_domain() constraints: [] stats: count: 13 missing: 3 unique: 10 values: LatLong
- Returns:
Domain based on the LatLong feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows the location of given address on map base on longitude and latitute.
Examples
>>> latlong_series = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='latlong' ) >>> latlong_series.ads.feature_type = ['lat_long'] >>> latlong_series.ads.feature_plot()
- Returns:
Plot object for the series based on the LatLong feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generate feature statistics.
Feature statistics include (total)count, unique(count) and missing(count) if there is any.
Examples
>>> latlong_series = pd.Series([ "69.196241,-125.017615", "5.2272595,-143.465712", "-33.9855425,-153.445155", "43.340610,86.460554", "24.2811855,-162.380403", "2.7849025,-7.328156", "45.033805,157.490179", "-1.818319,-80.681214", "-44.510428,-169.269477", "-56.3344375,-166.407038", "", np.NaN, None ], name='latlong' ) >>> latlong_series.ads.feature_type = ['lat_long'] >>> latlong_series.ads.feature_stat() Metric Value 0 count 13 1 unique 10 2 missing 3
- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.lat_long.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (
pandas.Series
) – The data to process.- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.object module¶
The module that represents an Object feature type.
- Classes:
- Object
The Object feature type.
- class ads.feature_engineering.feature_type.object.Object[source]¶
Bases:
FeatureType
Type representing object.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- description = 'Type representing object.'¶
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.ordinal module¶
The module that represents an Ordinal feature type.
- Classes:
- Ordinal
The Ordinal feature type.
- class ads.feature_engineering.feature_type.ordinal.Ordinal[source]¶
Bases:
FeatureType
Type representing ordered values.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes [source]¶
Shows the counts of observations in each categorical bin using bar chart.
- description = 'Type representing ordered values.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal') >>> x.ads.feature_type = ['ordinal'] >>> x.ads.feature_domain() constraints: - expression: $x in [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] language: python stats: count: 10 missing: 1 unique: 9 values: Ordinal
- Returns:
Domain based on the Ordinal feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows the counts of observations in each categorical bin using bar chart.
Examples
>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal') >>> x.ads.feature_type = ['ordinal'] >>> x.ads.feature_plot()
- Returns:
The bart chart plot object for the series based on the Continuous feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count), and missing(count) if there is any.
Examples
>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal') >>> x.ads.feature_type = ['ordinal'] >>> x.ads.feature_stat() Metric Value 0 count 10 1 unique 9 2 missing 1
- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.phone_number module¶
The module that represents a Phone Number feature type.
- Classes:
- PhoneNumber
The Phone Number feature type.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- class ads.feature_engineering.feature_type.phone_number.PhoneNumber[source]¶
Bases:
String
Type representing phone numbers.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
Examples
>>> from ads.feature_engineering.feature_type.phone_number import PhoneNumber >>> import pandas as pd >>> s = pd.Series([None, "1-640-124-5367", "1-573-916-4412"]) >>> PhoneNumber.validator.is_phone_number(s) 0 False 1 True 2 True dtype: bool
- description = 'Type representing phone numbers.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone') >>> s.ads.feature_type = ['phone_number'] >>> s.ads.feature_domain() constraints: [] stats: count: 7 missing: 4 unique: 2 values: PhoneNumber
- Returns:
Domain based on the PhoneNumber feature type.
- Return type:
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count) if there is any.
Examples
>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone') >>> s.ads.feature_type = ['phone_number'] >>> s.ads.feature_stat() Metric Value 1 count 7 2 unique 2 3 missing 4
- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
pandas.DataFrame
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.phone_number.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (
pandas.Series
) – The data to process.- Returns:
The logical list indicating if the data matches requirements.
- Return type:
pandas.Series
ads.feature_engineering.feature_type.string module¶
The module that represents a String feature type.
- Classes:
- String
The feature type that represents string values.
- class ads.feature_engineering.feature_type.string.String[source]¶
Bases:
FeatureType
Type representing string values.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
Example
>>> from ads.feature_engineering.feature_type.string import String >>> import pandas as pd >>> s = pd.Series(["Hello", "world", None], name='string') >>> String.validator.is_string(s) 0 True 1 True 2 False Name: string, dtype: bool
- description = 'Type representing string values.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string') >>> string.ads.feature_type = ['string'] >>> string.ads.feature_domain() constraints: [] stats: count: 22 missing: 3 unique: 3 values: String
- Returns:
Domain based on the String feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows distributions of datasets using wordcloud.
Examples
>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string') >>> string.ads.feature_type = ['string'] >>> string.ads.feature_plot()
- Returns:
Plot object for the series based on the String feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count) if there is any.
Examples
>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string') >>> string.ads.feature_type = ['string'] >>> string.ads.feature_stat() Metric Value 0 count 22 1 unique 3 2 missing 3
- Returns:
Summary statistics of the Series or Dataframe provided.
- Return type:
Pandas Dataframe
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.string.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (pd.Series) – The data to process.
- Returns:
pd.Series
- Return type:
The logical list indicating if the data matches requirements.
ads.feature_engineering.feature_type.text module¶
The module that represents a Text feature type.
- Classes:
- Text
The Text feature type.
- class ads.feature_engineering.feature_type.text.Text[source]¶
Bases:
String
Type representing text values.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- description = 'Type representing text values.'¶
- static feature_plot(x: Series) Axes [source]¶
Shows distributions of datasets using wordcloud.
Examples
>>> text = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S', 'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='text') >>> text.ads.feature_type = ['text'] >>> text.ads.feature_plot()
- Returns:
Plot object for the series based on the Text feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.unknown module¶
The module that represents an Unknown feature type.
- Classes:
- Text
The Unknown feature type.
- class ads.feature_engineering.feature_type.unknown.Unknown[source]¶
Bases:
FeatureType
Type representing third-party dtypes.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- description = 'Type representing unknown type.'¶
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
ads.feature_engineering.feature_type.zip_code module¶
The module that represents a ZipCode feature type.
- Classes:
- ZipCode
The ZipCode feature type.
- Functions:
- default_handler(data: pd.Series) -> pd.Series
Processes given data and indicates if the data matches requirements.
- class ads.feature_engineering.feature_type.zip_code.ZipCode[source]¶
Bases:
String
Type representing postal code.
- warning¶
Provides functionality to register warnings and invoke them.
- Type:
- validator¶
Provides functionality to register validators and invoke them.
- feature_plot(x: pd.Series) plt.Axes [source]¶
Shows the geometry distribution base on location of zipcode.
Example
>>> from ads.feature_engineering.feature_type.zip_code import ZipCode >>> import pandas as pd >>> import numpy as np >>> s = pd.Series(["94065", "90210", np.NaN, None], name='zipcode') >>> ZipCode.validator.is_zip_code(s) 0 True 1 True 2 False 3 False Name: zipcode, dtype: bool
- description = 'Type representing postal code.'¶
- classmethod feature_domain(x: Series) Domain [source]¶
Generate the domain of the data of this feature type.
Examples
>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode') >>> zipcode.ads.feature_type = ['zip_code'] >>> zipcode.ads.feature_domain() constraints: [] stats: count: 4 missing: 2 unique: 2 values: ZipCode
- Returns:
Domain based on the ZipCode feature type.
- Return type:
- static feature_plot(x: Series) Axes [source]¶
Shows the geometry distribution base on location of zipcode.
Examples
>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode') >>> zipcode.ads.feature_type = ['zip_code'] >>> zipcode.ads.feature_plot()
- Returns:
Plot object for the series based on the ZipCode feature type.
- Return type:
matplotlib.axes._subplots.AxesSubplot
- static feature_stat(x: Series) DataFrame [source]¶
Generates feature statistics.
Feature statistics include (total)count, unique(count) and missing(count).
Examples
>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode') >>> zipcode.ads.feature_type = ['zip_code'] >>> zipcode.ads.feature_stat() Metric Value 0 count 4 1 unique 2 2 missing 2
- Returns:
Summary statistics of the Series provided.
- Return type:
Pandas Dataframe
- validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶
- warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶
- ads.feature_engineering.feature_type.zip_code.default_handler(data: Series, *args, **kwargs) Series [source]¶
Processes given data and indicates if the data matches requirements.
- Parameters:
data (pd.Series) – The data to process.
- Returns:
pd.Series
- Return type:
The logical list indicating if the data matches requirements.
Module contents¶
- Address
Type representing address.
- Boolean
Type representing binary values True/False.
- Category
Type representing discrete unordered values.
- Constant
Type representing constant values.
- Continuous
Type representing continuous values.
- CreditCard
Type representing credit card numbers.
- DateTime
Type representing date and/or time.
- Document
Type representing document values.
- Discrete
Type representing discrete values.
- FeatureType
Base class for all feature types.
- GIS
Type representing geographic information.
- Integer
Type representing integer values.
- IpAddress
Type representing IP Address.
- IpAddressV4
Type representing IP Address V4.
- IpAddressV6
Type representing IP Address V6.
- LatLong
Type representing longitude and latitute.
- Object
Type representing object.
- Ordinal
Type representing ordered values.
- PhoneNumber
Type representing phone numbers.
- String
Type representing string values.
- Tag
Free form tag.
- Text
Type representing text values.
- ZipCode
Type representing postal code.
- Unknown
Type representing third-party dtypes.