ads.feature_engineering.feature_type package#

Subpackages#

Submodules#

ads.feature_engineering.feature_type.address module#

The module that represents an Address feature type.

Classes:
Address

The Address feature type.

class ads.feature_engineering.feature_type.address.Address[source]#

Bases: String

Type representing address.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows the location of given address on map base on zip code.

Example

>>> from ads.feature_engineering.feature_type.address import Address
>>> import pandas as pd
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
                        '1 Berkeley Street, Boston, MA 67891',
                        '54305 Oxford Street, Seattle, WA 95132',
                        ''])
>>> Address.validator.is_address(address)
0     True
1     True
2     True
3    False
dtype: bool
description = 'Type representing address.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 1
    unique: 3
values: Address
Returns:

Domain based on the Address feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows the location of given address on map base on zip code.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_plot()
Returns:

Plot object for the series based on the Address feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  3
2       missing 1
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.address.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pd.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.base module#

class ads.feature_engineering.feature_type.base.FeatureBaseType(classname, bases, dictionary)[source]#

Bases: type

The helper metaclass to extend fucntionality of FeatureType class.

class ads.feature_engineering.feature_type.base.FeatureBaseTypeMeta(classname, bases, dictionary)[source]#

Bases: FeatureBaseType, ABCMeta

The class to provide compatibility between ABC and FeatureBaseType metaclass.

class ads.feature_engineering.feature_type.base.FeatureType[source]#

Bases: ABC

Abstract case for feature types. Default class attribute include name and description. Name is auto generated using camel to snake conversion unless specified.

description = 'Base feature type.'#
name = 'feature_type'#
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
class ads.feature_engineering.feature_type.base.Name[source]#

Bases: object

class ads.feature_engineering.feature_type.base.Tag(name: str)[source]#

Bases: object

Class for free form tags. Name must be specified.

Initialize a tag instance.

Parameters:

name (str) – The name of the tag.

ads.feature_engineering.feature_type.boolean module#

The module that represents a Boolean feature type.

Classes:
Boolean

The feature type that represents binary values True/False.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.boolean.Boolean[source]#

Bases: FeatureType

Type representing binary values True/False.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Show the counts of observations in True/False using bars.

Examples

>>> from ads.feature_engineering.feature_type.boolean import Boolean
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> Boolean.validator.is_boolean(s)
0     True
1     True
2     True
3     True
4    False
5    False
dtype: bool
description = 'Type representing binary values True/False.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_domain()
constraints:
- expression: $x in [True, False]
    language: python
stats:
    count: 6
    missing: 2
    unique: 2
values: Boolean
Returns:

Domain based on the Boolean feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows the counts of observations in True/False using bars.

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Plot object for the series based on the Boolean feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_plot()
static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.boolean.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.category module#

The module that represents a Category feature type.

Classes:
Category

The Category feature type.

class ads.feature_engineering.feature_type.category.Category[source]#

Bases: FeatureType

Type representing discrete unordered values.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing discrete unordered values.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='category')
>>> cat.ads.feature_type = ['category']
>>> cat.ads.feature_domain()
constraints:
- expression: $x in ['S', 'C', 'Q', '']
    language: python
stats:
    count: 22
    missing: 3
    unique: 3
values: Category
Returns:

Domain based on the Category feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows the counts of observations in each categorical bin using bar chart.

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Plot object for the series based on the Category feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_plot()
static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there are any.

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.constant module#

The module that represents a Constant feature type.

Classes:
Constant

The Constant feature type.

class ads.feature_engineering.feature_type.constant.Constant[source]#

Bases: FeatureType

Type representing constant values.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows the counts of observations in bars.

description = 'Type representing constant values.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type. .. rubric:: Example

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 5
    unique: 1
values: Constant
Returns:

Domain based on the Constant feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows the counts of observations in bars.

Parameters:

x (pandas.Series) – The feature being shown.

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_plot()
Returns:

Plot object for the series based on the Constant feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters:

x (pandas.Series) – The feature being evaluated.

Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_stat()
    Metric  Value
0       count   5
1       unique  1
validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.continuous module#

The module that represents a Continuous feature type.

Classes:
Continuous

The Continuous feature type.

class ads.feature_engineering.feature_type.continuous.Continuous[source]#

Bases: FeatureType

Type representing continuous values.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows distributions of datasets using box plot.

description = 'Type representing continuous values.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_domain()
constraints: []
stats:
    count: 10.0
    lower quartile: 3.058
    mean: 4.959
    median: 3.81
    missing: 2.0
    sample maximum: 13.32
    sample minimum: 2.25
    skew: 2.175
    standard deviation: 3.62
    upper quartile: 4.908
values: Continuous
Returns:

Domain based on the Continuous feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows distributions of datasets using box plot.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feture_plot()
Returns:

Plot object for the series based on the Continuous feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, skew and missing(count).

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.NaN, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_stat()
    Metric                  Value
0       count                   10.000
1       mean                    4.959
2       standard deviation          3.620
3       sample minimum          2.250
4       lower quartile          3.058
5       median                  3.810
6       upper quartile          4.908
7       sample maximum          13.320
8       skew                    2.175
9       missing                 2.000
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.creditcard module#

The module that represents a CreditCard feature type.

Classes:
CreditCard

The CreditCard feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

_luhn_checksum(card_number: str) -> float

Implements Luhn algorithm to validate a credit card number.

class ads.feature_engineering.feature_type.creditcard.CreditCard[source]#

Bases: String

Type representing credit card numbers.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows the counts of observations in each credit card type using bar chart.

Examples

>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> import pandas as pd
>>> s = pd.Series(["4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190"], name='credit_card')
>>> s.ads.feature_type = ['credit_card']
>>> CreditCard.validator.is_credit_card(s)
0     True
1    False
2     True
3     True
4     True
5     True
Name: credit_card, dtype: bool
description = 'Type representing credit card numbers.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_domain()
constraints: []
stats:
    count: 16
    count_Amex: 5
    count_Diners Club: 2
    count_MasterCard: 3
    count_Visa: 5
    count_missing: 1
    missing: 1
    unique: 15
values: CreditCard
Returns:

Domain based on the CreditCard feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows the counts of observations in each credit card type using bar chart.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_plot()
Returns:

Plot object for the series based on the CreditCard feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series)[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count), missing(count) and

count of each credit card type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_stat()
    Metric              Value
0       count               16
1       unique              15
2       missing             1
3       count_Amex              5
4       count_Visa              5
5       count_MasterCard        3
6       count_Diners Club       2
7       count_missing       1
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.creditcard.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.datetime module#

The module that represents a DateTime feature type.

Classes:
DateTime

The DateTime feature type.

class ads.feature_engineering.feature_type.datetime.DateTime[source]#

Bases: FeatureType

Type representing date and/or time.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows distributions of datetime datasets using histograms.

Example

>>> from ads.feature_engineering.feature_type.datetime import DateTime
>>> import pandas as pd
>>> s = pd.Series(["12/12/12", "12/12/13", None, "12/12/14"], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> DateTime.validator.is_datetime(s)
0     True
1     True
2    False
3     True
Name: datetime, dtype: bool
description = 'Type representing date and/or time.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 8
    missing: 3
    sample maximum: April/15/11
    sample minimum: 3/11/2000
values: DateTime
Returns:

Domain based on the DateTime feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows distributions of datetime datasets using histograms.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_plot()
Returns:

Plot object for the series based on the DateTime feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, sample maximum, sample minimum, and missing(count) if there is any.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_stat()
    Metric              Value
0       count               8
1       sample maximum      April/15/11
2       sample minimum      3/11/2000
3       missing             3
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.datetime.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.discrete module#

The module that represents a Discrete feature type.

Classes:
Discrete

The Discrete feature type.

class ads.feature_engineering.feature_type.discrete.Discrete[source]#

Bases: FeatureType

Type representing discrete values.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows distributions of datasets using box plot.

description = 'Type representing discrete values.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_domain()
constraints: []
stats:
    count: 4
    unique: 4
values: Discrete
Returns:

Domain based on the Discrete feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows distributions of datasets using box plot.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  4
Returns:

Plot object for the series based on the Discrete feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
            discrete
count   4
unique  4
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.document module#

The module that represents a Document feature type.

Classes:
Document

The Document feature type.

class ads.feature_engineering.feature_type.document.Document[source]#

Bases: FeatureType

Type representing document values.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

description = 'Type representing document values.'#
classmethod feature_domain()[source]#
Returns:

Nothing.

Return type:

None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.gis module#

The module that represents a GIS feature type.

Classes:
GIS

The GIS feature type.

class ads.feature_engineering.feature_type.gis.GIS[source]#

Bases: FeatureType

Type representing geographic information.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.gis import GIS
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='gis')
>>> s.ads.feature_type = ['gis']
>>> GIS.validator.is_gis(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: gis, dtype: bool
description = 'Type representing geographic information.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: GIS
Returns:

Domain based on the GIS feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows the location of given address on map base on longitude and latitute.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_plot()
Returns:

Plot object for the series based on the GIS feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_stat()
        gis
count   13
unique  10
missing 3
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.gis.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.integer module#

The module that represents an Integer feature type.

Classes:
Integer

The Integer feature type.

class ads.feature_engineering.feature_type.integer.Integer[source]#

Bases: FeatureType

Type representing integer values.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows distributions of datasets using box plot.

description = 'Type representing integer values.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.NaN, None], name='integer')
>>> s.ads.feature_type = ['integer']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    freq: 2
    missing: 2
    top: true
    unique: 2
values: Integer
Returns:

Domain based on the Integer feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows distributions of datasets using box plot.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_plot()
Returns:

Plot object for the series based on the Integer feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, max and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_stat()
    Metric                  Value
0       count                   7
1       mean                    1
2       standard deviation          1
3       sample minimum          0
4       lower quartile          1
5       median                  1
6       upper quartile          2
7       sample maximum          4
8       missing                 1
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.ip_address module#

The module that represents an IpAddress feature type.

Classes:
IpAddress

The IpAddress feature type.

class ads.feature_engineering.feature_type.ip_address.IpAddress[source]#

Bases: FeatureType

Type representing IP Address.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address import IpAddress
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> IpAddress.validator.is_ip_address(s)
0     True
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool
description = 'Type representing IP Address.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 3
values: IpAddress
Returns:

Domain based on the IpAddress feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.ip_address.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.ip_address_v4 module#

The module that represents an IpAddressV4 feature type.

Classes:
IpAddressV4

The IpAddressV4 feature type.

class ads.feature_engineering.feature_type.ip_address_v4.IpAddressV4[source]#

Bases: FeatureType

Type representing IP Address V4.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v4 import IpAddressV4
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> IpAddressV4.validator.is_ip_address_v4(s)
0     True
1    False
2    False
3    False
4    False
Name: ip_address, dtype: bool
description = 'Type representing IP Address V4.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address_v4')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 4
values: IpAddressV4
Returns:

Domain based on the IpAddressV4 feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  4
2       missing 2
Returns:

Summary statistics of the Series provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.ip_address_v4.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.ip_address_v6 module#

The module that represents an IpAddressV6 feature type.

Classes:
IpAddressV6

The IpAddressV6 feature type.

class ads.feature_engineering.feature_type.ip_address_v6.IpAddressV6[source]#

Bases: FeatureType

Type representing IP Address V6.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v6 import IpAddressV6
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> IpAddressV6.validator.is_ip_address_v6(s)
0    False
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool
description = 'Type representing IP Address V6.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address_v6')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 2
values: IpAddressV6
Returns:

Domain based on the IpAddressV6 feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.NaN, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2
Returns:

Summary statistics of the Series provided.

Return type:

Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.ip_address_v6.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.lat_long module#

The module that represents a LatLong feature type.

Classes:
LatLong

The LatLong feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.lat_long.LatLong[source]#

Bases: String

Type representing longitude and latitute.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.lat_long import LatLong
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='latlong')
>>> s.ads.feature_type = ['lat_long']
>>> LatLong.validator.is_lat_long(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: latlong, dtype: bool
description = 'Type representing longitude and latitute.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> latlong_series = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.NaN,
    None
    ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: LatLong
Returns:

Domain based on the LatLong feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows the location of given address on map base on longitude and latitute.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_plot()
Returns:

Plot object for the series based on the LatLong feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generate feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.NaN,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_stat()
    Metric  Value
0       count   13
1       unique  10
2       missing 3
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.lat_long.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.object module#

The module that represents an Object feature type.

Classes:
Object

The Object feature type.

class ads.feature_engineering.feature_type.object.Object[source]#

Bases: FeatureType

Type representing object.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

description = 'Type representing object.'#
classmethod feature_domain()[source]#
Returns:

Nothing.

Return type:

None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.ordinal module#

The module that represents an Ordinal feature type.

Classes:
Ordinal

The Ordinal feature type.

class ads.feature_engineering.feature_type.ordinal.Ordinal[source]#

Bases: FeatureType

Type representing ordered values.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing ordered values.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_domain()
constraints:
- expression: $x in [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
    language: python
stats:
    count: 10
    missing: 1
    unique: 9
values: Ordinal
Returns:

Domain based on the Ordinal feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows the counts of observations in each categorical bin using bar chart.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_plot()
Returns:

The bart chart plot object for the series based on the Continuous feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count), and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_stat()
    Metric  Value
0       count   10
1       unique  9
2       missing 1
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.phone_number module#

The module that represents a Phone Number feature type.

Classes:
PhoneNumber

The Phone Number feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.phone_number.PhoneNumber[source]#

Bases: String

Type representing phone numbers.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

Examples

>>> from ads.feature_engineering.feature_type.phone_number import PhoneNumber
>>> import pandas as pd
>>> s = pd.Series([None, "1-640-124-5367", "1-573-916-4412"])
>>> PhoneNumber.validator.is_phone_number(s)
0    False
1     True
2     True
dtype: bool
description = 'Type representing phone numbers.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 7
    missing: 4
    unique: 2
values: PhoneNumber
Returns:

Domain based on the PhoneNumber feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.NaN, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_stat()
    Metric  Value
1       count   7
2       unique  2
3       missing 4
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.phone_number.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pandas.Series) – The data to process.

Returns:

The logical list indicating if the data matches requirements.

Return type:

pandas.Series

ads.feature_engineering.feature_type.string module#

The module that represents a String feature type.

Classes:
String

The feature type that represents string values.

class ads.feature_engineering.feature_type.string.String[source]#

Bases: FeatureType

Type representing string values.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows distributions of datasets using wordcloud.

Example

>>> from ads.feature_engineering.feature_type.string import String
>>> import pandas as pd
>>> s = pd.Series(["Hello", "world", None], name='string')
>>> String.validator.is_string(s)
0     True
1     True
2    False
Name: string, dtype: bool
description = 'Type representing string values.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_domain()
constraints: []
stats:
    count: 22
    missing: 3
    unique: 3
values: String
Returns:

Domain based on the String feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows distributions of datasets using wordcloud.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_plot()
Returns:

Plot object for the series based on the String feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3
Returns:

Summary statistics of the Series or Dataframe provided.

Return type:

Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.string.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pd.Series) – The data to process.

Returns:

pd.Series

Return type:

The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.text module#

The module that represents a Text feature type.

Classes:
Text

The Text feature type.

class ads.feature_engineering.feature_type.text.Text[source]#

Bases: String

Type representing text values.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows distributions of datasets using wordcloud.

description = 'Type representing text values.'#
classmethod feature_domain()[source]#
Returns:

Nothing.

Return type:

None

static feature_plot(x: Series) Axes[source]#

Shows distributions of datasets using wordcloud.

Examples

>>> text = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.NaN, None], name='text')
>>> text.ads.feature_type = ['text']
>>> text.ads.feature_plot()
Returns:

Plot object for the series based on the Text feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.unknown module#

The module that represents an Unknown feature type.

Classes:
Text

The Unknown feature type.

class ads.feature_engineering.feature_type.unknown.Unknown[source]#

Bases: FeatureType

Type representing third-party dtypes.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

description = 'Type representing unknown type.'#
classmethod feature_domain()[source]#
Returns:

Nothing.

Return type:

None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#

ads.feature_engineering.feature_type.zip_code module#

The module that represents a ZipCode feature type.

Classes:
ZipCode

The ZipCode feature type.

Functions:
default_handler(data: pd.Series) -> pd.Series

Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.zip_code.ZipCode[source]#

Bases: String

Type representing postal code.

description#

The feature type description.

Type:

str

name#

The feature type name.

Type:

str

warning#

Provides functionality to register warnings and invoke them.

Type:

FeatureWarning

validator#

Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) pd.DataFrame[source]#

Generates feature statistics.

feature_plot(x: pd.Series) plt.Axes[source]#

Shows the geometry distribution base on location of zipcode.

Example

>>> from ads.feature_engineering.feature_type.zip_code import ZipCode
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(["94065", "90210", np.NaN, None], name='zipcode')
>>> ZipCode.validator.is_zip_code(s)
0     True
1     True
2    False
3    False
Name: zipcode, dtype: bool
description = 'Type representing postal code.'#
classmethod feature_domain(x: Series) Domain[source]#

Generate the domain of the data of this feature type.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 2
    unique: 2
values: ZipCode
Returns:

Domain based on the ZipCode feature type.

Return type:

ads.feature_engineering.schema.Domain

static feature_plot(x: Series) Axes[source]#

Shows the geometry distribution base on location of zipcode.

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_plot()
Returns:

Plot object for the series based on the ZipCode feature type.

Return type:

matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) DataFrame[source]#

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> zipcode = pd.Series([94065, 90210, np.NaN, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  2
2       missing 2
Returns:

Summary statistics of the Series provided.

Return type:

Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>#
warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>#
ads.feature_engineering.feature_type.zip_code.default_handler(data: Series, *args, **kwargs) Series[source]#

Processes given data and indicates if the data matches requirements.

Parameters:

data (pd.Series) – The data to process.

Returns:

pd.Series

Return type:

The logical list indicating if the data matches requirements.

Module contents#

Address

Type representing address.

Boolean

Type representing binary values True/False.

Category

Type representing discrete unordered values.

Constant

Type representing constant values.

Continuous

Type representing continuous values.

CreditCard

Type representing credit card numbers.

DateTime

Type representing date and/or time.

Document

Type representing document values.

Discrete

Type representing discrete values.

FeatureType

Base class for all feature types.

GIS

Type representing geographic information.

Integer

Type representing integer values.

IpAddress

Type representing IP Address.

IpAddressV4

Type representing IP Address V4.

IpAddressV6

Type representing IP Address V6.

LatLong

Type representing longitude and latitute.

Object

Type representing object.

Ordinal

Type representing ordered values.

PhoneNumber

Type representing phone numbers.

String

Type representing string values.

Tag

Free form tag.

Text

Type representing text values.

ZipCode

Type representing postal code.

Unknown

Type representing third-party dtypes.