ads.feature_engineering.feature_type package¶

Subpackages¶

Submodules¶

ads.feature_engineering.feature_type.address module¶

The module that represents an Address feature type.

Classes:

Address: The Address feature type.

class ads.feature_engineering.feature_type.address.Address[source]¶

Bases: String

Type representing address.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows the location of given address on map base on zip code.

Example

>>> from ads.feature_engineering.feature_type.address import Address
>>> import pandas as pd
>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
                        '1 Berkeley Street, Boston, MA 67891',
                        '54305 Oxford Street, Seattle, WA 95132',
                        ''])
>>> Address.validator.is_address(address)
0     True
1     True
2     True
3    False
dtype: bool

description = 'Type representing address.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 1
    unique: 3
values: Address

Returns:: Domain based on the Address feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows the location of given address on map base on zip code.

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_plot()

Returns:: Plot object for the series based on the Address feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> address = pd.Series(['1 Miller Drive, New York, NY 12345',
              '1 Berkeley Street, Boston, MA 67891',
              '54305 Oxford Street, Seattle, WA 95132',
              ''],
           name='address')
>>> address.ads.feature_type = ['address']
>>> address.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  3
2       missing 1

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.address.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pd.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.base module¶

class ads.feature_engineering.feature_type.base.FeatureBaseType(classname, bases, dictionary)[source]¶

Bases: type

The helper metaclass to extend fucntionality of FeatureType class.

class ads.feature_engineering.feature_type.base.FeatureBaseTypeMeta(classname, bases, dictionary)[source]¶

Bases: FeatureBaseType, ABCMeta

The class to provide compatibility between ABC and FeatureBaseType metaclass.

class ads.feature_engineering.feature_type.base.FeatureType[source]¶

Bases: ABC

Abstract case for feature types. Default class attribute include name and description. Name is auto generated using camel to snake conversion unless specified.

description = 'Base feature type.'¶

name = 'feature_type'¶

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

class ads.feature_engineering.feature_type.base.Name[source]¶: Bases: object

class ads.feature_engineering.feature_type.base.Tag(name: str)[source]¶

Bases: object

Class for free form tags. Name must be specified.

Initialize a tag instance.

Parameters:: name (str) – The name of the tag.

ads.feature_engineering.feature_type.boolean module¶

The module that represents a Boolean feature type.

Classes:

Boolean: The feature type that represents binary values True/False.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.boolean.Boolean[source]¶

Bases: FeatureType

Type representing binary values True/False.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Show the counts of observations in True/False using bars.

Examples

>>> from ads.feature_engineering.feature_type.boolean import Boolean
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([True, False, True, False, np.nan, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> Boolean.validator.is_boolean(s)
0     True
1     True
2     True
3     True
4    False
5    False
dtype: bool

description = 'Type representing binary values True/False.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.nan, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_domain()
constraints:
- expression: $x in [True, False]
    language: python
stats:
    count: 6
    missing: 2
    unique: 2
values: Boolean

Returns:: Domain based on the Boolean feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows the counts of observations in True/False using bars.

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Plot object for the series based on the Boolean feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

Examples

>>> s = pd.Series([True, False, True, False, np.nan, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_plot()

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

Examples

>>> s = pd.Series([True, False, True, False, np.nan, None], name='bool')
>>> s.ads.feature_type = ['boolean']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.boolean.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.category module¶

The module that represents a Category feature type.

Classes:

Category: The Category feature type.

class ads.feature_engineering.feature_type.category.Category[source]¶

Bases: FeatureType

Type representing discrete unordered values.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing discrete unordered values.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.nan, None], name='category')
>>> cat.ads.feature_type = ['category']
>>> cat.ads.feature_domain()
constraints:
- expression: $x in ['S', 'C', 'Q', '']
    language: python
stats:
    count: 22
    missing: 3
    unique: 3
values: Category

Returns:: Domain based on the Category feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows the counts of observations in each categorical bin using bar chart.

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Plot object for the series based on the Category feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.nan, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_plot()

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there are any.

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

Examples

>>> cat = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
            'S', 'S', 'S', 'Q', 'S', 'S', '', np.nan, None], name='сategory')
>>> cat.ads.feature_type = ['сategory']
>>> cat.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.constant module¶

The module that represents a Constant feature type.

Classes:

Constant: The Constant feature type.

class ads.feature_engineering.feature_type.constant.Constant[source]¶

Bases: FeatureType

Type representing constant values.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows the counts of observations in bars.

description = 'Type representing constant values.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type. .. rubric:: Example

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 5
    unique: 1
values: Constant

Returns:: Domain based on the Constant feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows the counts of observations in bars.

Parameters:: x (pandas.Series) – The feature being shown.

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_plot()

Returns:: Plot object for the series based on the Constant feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Parameters:: x (pandas.Series) – The feature being evaluated.
Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

Examples

>>> s = pd.Series([1, 1, 1, 1, 1], name='constant')
>>> s.ads.feature_type = ['constant']
>>> s.ads.feature_stat()
    Metric  Value
0       count   5
1       unique  1

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.continuous module¶

The module that represents a Continuous feature type.

Classes:

Continuous: The Continuous feature type.

class ads.feature_engineering.feature_type.continuous.Continuous[source]¶

Bases: FeatureType

Type representing continuous values.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows distributions of datasets using box plot.

description = 'Type representing continuous values.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.nan, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_domain()
constraints: []
stats:
    count: 10.0
    lower quartile: 3.058
    mean: 4.959
    median: 3.81
    missing: 2.0
    sample maximum: 13.32
    sample minimum: 2.25
    skew: 2.175
    standard deviation: 3.62
    upper quartile: 4.908
values: Continuous

Returns:: Domain based on the Continuous feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows distributions of datasets using box plot.

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.nan, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feture_plot()

Returns:: Plot object for the series based on the Continuous feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, skew and missing(count).

Examples

>>> cts = pd.Series([13.32, 3.32, 4.3, 2.45, 6.34, 2.25,
                    4.43, 3.26, np.nan, None], name='continuous')
>>> cts.ads.feature_type = ['continuous']
>>> cts.ads.feature_stat()
    Metric                  Value
0       count                   10.000
1       mean                    4.959
2       standard deviation          3.620
3       sample minimum          2.250
4       lower quartile          3.058
5       median                  3.810
6       upper quartile          4.908
7       sample maximum          13.320
8       skew                    2.175
9       missing                 2.000

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.creditcard module¶

The module that represents a CreditCard feature type.

Classes:

CreditCard: The CreditCard feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.
_luhn_checksum(card_number: str) -> float: Implements Luhn algorithm to validate a credit card number.

class ads.feature_engineering.feature_type.creditcard.CreditCard[source]¶

Bases: String

Type representing credit card numbers.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows the counts of observations in each credit card type using bar chart.

Examples

>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> import pandas as pd
>>> s = pd.Series(["4532640527811543", None, "4556929308150929", "4539944650919740", "4485348152450846", "4556593717607190"], name='credit_card')
>>> s.ads.feature_type = ['credit_card']
>>> CreditCard.validator.is_credit_card(s)
0     True
1    False
2     True
3     True
4     True
5     True
Name: credit_card, dtype: bool

description = 'Type representing credit card numbers.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_domain()
constraints: []
stats:
    count: 16
    count_Amex: 5
    count_Diners Club: 2
    count_MasterCard: 3
    count_Visa: 5
    count_missing: 1
    missing: 1
    unique: 15
values: CreditCard

Returns:: Domain based on the CreditCard feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows the counts of observations in each credit card type using bar chart.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_plot()

Returns:: Plot object for the series based on the CreditCard feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series)[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count), missing(count) and: count of each credit card type.

Examples

>>> visa = [
    "4532640527811543",
    None,
    "4556929308150929",
    "4539944650919740",
    "4485348152450846",
    "4556593717607190",
    ]
>>> mastercard = [
    "5334180299390324",
    "5111466404826446",
    "5273114895302717",
    "5430972152222336",
    "5536426859893306",
    ]
>>> amex = [
    "371025944923273",
    "374745112042294",
    "340984902710890",
    "375767928645325",
    "370720852891659",
    ]
>>> creditcard_list = visa + mastercard + amex
>>> creditcard_series = pd.Series(creditcard_list,name='card')
>>> creditcard_series.ads.feature_type = ['credit_card']
>>> creditcard_series.ads.feature_stat()
    Metric              Value
0       count               16
1       unique              15
2       missing             1
3       count_Amex              5
4       count_Visa              5
5       count_MasterCard        3
6       count_Diners Club       2
7       count_missing       1

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.creditcard.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.datetime module¶

The module that represents a DateTime feature type.

Classes:

DateTime: The DateTime feature type.

class ads.feature_engineering.feature_type.datetime.DateTime[source]¶

Bases: FeatureType

Type representing date and/or time.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows distributions of datetime datasets using histograms.

Example

>>> from ads.feature_engineering.feature_type.datetime import DateTime
>>> import pandas as pd
>>> s = pd.Series(["12/12/12", "12/12/13", None, "12/12/14"], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> DateTime.validator.is_datetime(s)
0     True
1     True
2    False
3     True
Name: datetime, dtype: bool

description = 'Type representing date and/or time.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> s.ads.feature_type = ['date_time']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 8
    missing: 3
    sample maximum: April/15/11
    sample minimum: 3/11/2000
values: DateTime

Returns:: Domain based on the DateTime feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows distributions of datetime datasets using histograms.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_plot()

Returns:: Plot object for the series based on the DateTime feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, sample maximum, sample minimum, and missing(count) if there is any.

Examples

>>> x = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '', None, np.nan, 'April/13/2011', 'April/15/11'], name='datetime')
>>> x.ads.feature_type = ['date_time']
>>> x.ads.feature_stat()
    Metric              Value
0       count               8
1       sample maximum      April/15/11
2       sample minimum      3/11/2000
3       missing             3

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.datetime.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.discrete module¶

The module that represents a Discrete feature type.

Classes:

Discrete: The Discrete feature type.

class ads.feature_engineering.feature_type.discrete.Discrete[source]¶

Bases: FeatureType

Type representing discrete values.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows distributions of datasets using box plot.

description = 'Type representing discrete values.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_domain()
constraints: []
stats:
    count: 4
    unique: 4
values: Discrete

Returns:: Domain based on the Discrete feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows distributions of datasets using box plot.

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  4

Returns:: Plot object for the series based on the Discrete feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> discrete_numbers = pd.Series([35, 25, 13, 42],
           name='discrete')
>>> discrete_numbers.ads.feature_type = ['discrete']
>>> discrete_numbers.ads.feature_stat()
            discrete
count   4
unique  4

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.document module¶

The module that represents a Document feature type.

Classes:

Document: The Document feature type.

class ads.feature_engineering.feature_type.document.Document[source]¶

Bases: FeatureType

Type representing document values.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

description = 'Type representing document values.'¶

classmethod feature_domain()[source]¶

Returns:: Nothing.
Return type:: None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.gis module¶

The module that represents a GIS feature type.

Classes:

GIS: The GIS feature type.

class ads.feature_engineering.feature_type.gis.GIS[source]¶

Bases: FeatureType

Type representing geographic information.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.gis import GIS
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='gis')
>>> s.ads.feature_type = ['gis']
>>> GIS.validator.is_gis(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: gis, dtype: bool

description = 'Type representing geographic information.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.nan,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: GIS

Returns:: Domain based on the GIS feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows the location of given address on map base on longitude and latitute.

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.nan,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_plot()

Returns:: Plot object for the series based on the GIS feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> gis = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.nan,
    None
    ],
    name='gis'
)
>>> gis.ads.feature_type = ['gis']
>>> gis.ads.feature_stat()
        gis
count   13
unique  10
missing 3

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.gis.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.integer module¶

The module that represents an Integer feature type.

Classes:

Integer: The Integer feature type.

class ads.feature_engineering.feature_type.integer.Integer[source]¶

Bases: FeatureType

Type representing integer values.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows distributions of datasets using box plot.

description = 'Type representing integer values.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series([True, False, True, False, np.nan, None], name='integer')
>>> s.ads.feature_type = ['integer']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    freq: 2
    missing: 2
    top: true
    unique: 2
values: Integer

Returns:: Domain based on the Integer feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows distributions of datasets using box plot.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_plot()

Returns:: Plot object for the series based on the Integer feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, mean, standard deviation, sample minimum, lower quartile, median, 75%, upper quartile, max and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 0, 1, 2, 3, 4, np.nan], name='integer')
>>> x.ads.feature_type = ['integer']
>>> x.ads.feature_stat()
    Metric                  Value
0       count                   7
1       mean                    1
2       standard deviation          1
3       sample minimum          0
4       lower quartile          1
5       median                  1
6       upper quartile          2
7       sample maximum          4
8       missing                 1

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.ip_address module¶

The module that represents an IpAddress feature type.

Classes:

IpAddress: The IpAddress feature type.

class ads.feature_engineering.feature_type.ip_address.IpAddress[source]¶

Bases: FeatureType

Type representing IP Address.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address import IpAddress
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.nan, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> IpAddress.validator.is_ip_address(s)
0     True
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool

description = 'Type representing IP Address.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.nan, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 3
values: IpAddress

Returns:: Domain based on the IpAddress feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '192.168.0.1', '2001:db8::', '2002:db8::', np.nan, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.ip_address.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.ip_address_v4 module¶

The module that represents an IpAddressV4 feature type.

Classes:

IpAddressV4: The IpAddressV4 feature type.

class ads.feature_engineering.feature_type.ip_address_v4.IpAddressV4[source]¶

Bases: FeatureType

Type representing IP Address V4.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v4 import IpAddressV4
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.nan, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> IpAddressV4.validator.is_ip_address_v4(s)
0     True
1    False
2    False
3    False
4    False
Name: ip_address, dtype: bool

description = 'Type representing IP Address V4.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.nan, None], name='ip_address_v4')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 4
values: IpAddressV4

Returns:: Domain based on the IpAddressV4 feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['192.168.0.1', '192.168.0.2', '192.168.0.3', '192.168.0.4', np.nan, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v4']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  4
2       missing 2

Returns:: Summary statistics of the Series provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.ip_address_v4.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.ip_address_v6 module¶

The module that represents an IpAddressV6 feature type.

Classes:

IpAddressV6: The IpAddressV6 feature type.

class ads.feature_engineering.feature_type.ip_address_v6.IpAddressV6[source]¶

Bases: FeatureType

Type representing IP Address V6.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

Example

>>> from ads.feature_engineering.feature_type.ip_address_v6 import IpAddressV6
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['192.168.0.1', '2001:db8::', '', np.nan, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> IpAddressV6.validator.is_ip_address_v6(s)
0    False
1     True
2    False
3    False
4    False
Name: ip_address, dtype: bool

description = 'Type representing IP Address V6.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.nan, None], name='ip_address_v6')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 6
    missing: 2
    unique: 2
values: IpAddressV6

Returns:: Domain based on the IpAddressV6 feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> s = pd.Series(['2002:db8::', '2001:db8::', '2001:db8::', '2002:db8::', np.nan, None], name='ip_address')
>>> s.ads.feature_type = ['ip_address_v6']
>>> s.ads.feature_stat()
    Metric  Value
0       count   6
1       unique  2
2       missing 2

Returns:: Summary statistics of the Series provided.
Return type:: Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.ip_address_v6.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.lat_long module¶

The module that represents a LatLong feature type.

Classes:

LatLong: The LatLong feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.lat_long.LatLong[source]¶

Bases: String

Type representing longitude and latitute.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows the location of given address on map base on longitude and latitute.

Example

>>> from ads.feature_engineering.feature_type.lat_long import LatLong
>>> import pandas as pd
>>> s = pd.Series(["-18.2193965, -93.587285",
                    "-21.0255305, -122.478584",
                    "85.103913, 19.405744",
                    "82.913736, 178.225672",
                    "62.9795085,-66.989705",
                    "54.5604395,95.235090",
                    "24.2811855,-162.380403",
                    "-1.818319,-80.681214",
                    None,
                    "(51.816119, 175.979008)",
                    "(54.3392995,-11.801615)"],
                    name='latlong')
>>> s.ads.feature_type = ['lat_long']
>>> LatLong.validator.is_lat_long(s)
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8     False
9      True
10     True
Name: latlong, dtype: bool

description = 'Type representing longitude and latitute.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> latlong_series = pd.Series([
    "69.196241,-125.017615",
    "5.2272595,-143.465712",
    "-33.9855425,-153.445155",
    "43.340610,86.460554",
    "24.2811855,-162.380403",
    "2.7849025,-7.328156",
    "45.033805,157.490179",
    "-1.818319,-80.681214",
    "-44.510428,-169.269477",
    "-56.3344375,-166.407038",
    "",
    np.nan,
    None
    ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_domain()
constraints: []
stats:
    count: 13
    missing: 3
    unique: 10
values: LatLong

Returns:: Domain based on the LatLong feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows the location of given address on map base on longitude and latitute.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.nan,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_plot()

Returns:: Plot object for the series based on the LatLong feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generate feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> latlong_series = pd.Series([
            "69.196241,-125.017615",
            "5.2272595,-143.465712",
            "-33.9855425,-153.445155",
            "43.340610,86.460554",
            "24.2811855,-162.380403",
            "2.7849025,-7.328156",
            "45.033805,157.490179",
            "-1.818319,-80.681214",
            "-44.510428,-169.269477",
            "-56.3344375,-166.407038",
            "",
            np.nan,
            None
        ],
    name='latlong'
)
>>> latlong_series.ads.feature_type = ['lat_long']
>>> latlong_series.ads.feature_stat()
    Metric  Value
0       count   13
1       unique  10
2       missing 3

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.lat_long.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.object module¶

The module that represents an Object feature type.

Classes:

Object: The Object feature type.

class ads.feature_engineering.feature_type.object.Object[source]¶

Bases: FeatureType

Type representing object.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

description = 'Type representing object.'¶

classmethod feature_domain()[source]¶

Returns:: Nothing.
Return type:: None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.ordinal module¶

The module that represents an Ordinal feature type.

Classes:

Ordinal: The Ordinal feature type.

class ads.feature_engineering.feature_type.ordinal.Ordinal[source]¶

Bases: FeatureType

Type representing ordered values.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows the counts of observations in each categorical bin using bar chart.

description = 'Type representing ordered values.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_domain()
constraints:
- expression: $x in [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
    language: python
stats:
    count: 10
    missing: 1
    unique: 9
values: Ordinal

Returns:: Domain based on the Ordinal feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows the counts of observations in each categorical bin using bar chart.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_plot()

Returns:: The bart chart plot object for the series based on the Continuous feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count), and missing(count) if there is any.

Examples

>>> x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan], name='ordinal')
>>> x.ads.feature_type = ['ordinal']
>>> x.ads.feature_stat()
    Metric  Value
0       count   10
1       unique  9
2       missing 1

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.phone_number module¶

The module that represents a Phone Number feature type.

Classes:

PhoneNumber: The Phone Number feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.phone_number.PhoneNumber[source]¶

Bases: String

Type representing phone numbers.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

Examples

>>> from ads.feature_engineering.feature_type.phone_number import PhoneNumber
>>> import pandas as pd
>>> s = pd.Series([None, "1-640-124-5367", "1-573-916-4412"])
>>> PhoneNumber.validator.is_phone_number(s)
0    False
1     True
2     True
dtype: bool

description = 'Type representing phone numbers.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.nan, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_domain()
constraints: []
stats:
    count: 7
    missing: 4
    unique: 2
values: PhoneNumber

Returns:: Domain based on the PhoneNumber feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> s = pd.Series(['2068866666', '6508866666', '2068866666', '', np.nan, np.nan, None], name='phone')
>>> s.ads.feature_type = ['phone_number']
>>> s.ads.feature_stat()
    Metric  Value
1       count   7
2       unique  2
3       missing 4

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: pandas.DataFrame

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.phone_number.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pandas.Series) – The data to process.
Returns:: The logical list indicating if the data matches requirements.
Return type:: pandas.Series

ads.feature_engineering.feature_type.string module¶

The module that represents a String feature type.

Classes:

String: The feature type that represents string values.

class ads.feature_engineering.feature_type.string.String[source]¶

Bases: FeatureType

Type representing string values.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows distributions of datasets using wordcloud.

Example

>>> from ads.feature_engineering.feature_type.string import String
>>> import pandas as pd
>>> s = pd.Series(["Hello", "world", None], name='string')
>>> String.validator.is_string(s)
0     True
1     True
2    False
Name: string, dtype: bool

description = 'Type representing string values.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.nan, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_domain()
constraints: []
stats:
    count: 22
    missing: 3
    unique: 3
values: String

Returns:: Domain based on the String feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows distributions of datasets using wordcloud.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.nan, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_plot()

Returns:: Plot object for the series based on the String feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count) if there is any.

Examples

>>> string = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.nan, None], name='string')
>>> string.ads.feature_type = ['string']
>>> string.ads.feature_stat()
    Metric  Value
0       count   22
1       unique  3
2       missing 3

Returns:: Summary statistics of the Series or Dataframe provided.
Return type:: Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.string.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pd.Series) – The data to process.
Returns:: pd.Series
Return type:: The logical list indicating if the data matches requirements.

ads.feature_engineering.feature_type.text module¶

The module that represents a Text feature type.

Classes:

Text: The Text feature type.

class ads.feature_engineering.feature_type.text.Text[source]¶

Bases: String

Type representing text values.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows distributions of datasets using wordcloud.

description = 'Type representing text values.'¶

classmethod feature_domain()[source]¶

Returns:: Nothing.
Return type:: None

static feature_plot(x: Series) → Axes[source]¶

Shows distributions of datasets using wordcloud.

Examples

>>> text = pd.Series(['S', 'C', 'S', 'S', 'S', 'Q', 'S', 'S', 'S', 'C', 'S', 'S', 'S',
        'S', 'S', 'S', 'Q', 'S', 'S', '', np.nan, None], name='text')
>>> text.ads.feature_type = ['text']
>>> text.ads.feature_plot()

Returns:: Plot object for the series based on the Text feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.unknown module¶

The module that represents an Unknown feature type.

Classes:

Text: The Unknown feature type.

class ads.feature_engineering.feature_type.unknown.Unknown[source]¶

Bases: FeatureType

Type representing third-party dtypes.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

description = 'Type representing unknown type.'¶

classmethod feature_domain()[source]¶

Returns:: Nothing.
Return type:: None

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.zip_code module¶

The module that represents a ZipCode feature type.

Classes:

ZipCode: The ZipCode feature type.

Functions:

default_handler(data: pd.Series) -> pd.Series: Processes given data and indicates if the data matches requirements.

class ads.feature_engineering.feature_type.zip_code.ZipCode[source]¶

Bases: String

Type representing postal code.

description¶

The feature type description.

Type:: str

name¶

The feature type name.

Type:: str

warning¶

Provides functionality to register warnings and invoke them.

Type:: FeatureWarning

validator¶: Provides functionality to register validators and invoke them.

feature_stat(x: pd.Series) → pd.DataFrame[source]¶: Generates feature statistics.

feature_plot(x: pd.Series) → plt.Axes[source]¶: Shows the geometry distribution base on location of zipcode.

Example

>>> from ads.feature_engineering.feature_type.zip_code import ZipCode
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(["94065", "90210", np.nan, None], name='zipcode')
>>> ZipCode.validator.is_zip_code(s)
0     True
1     True
2    False
3    False
Name: zipcode, dtype: bool

description = 'Type representing postal code.'¶

classmethod feature_domain(x: Series) → Domain[source]¶

Generate the domain of the data of this feature type.

Examples

>>> zipcode = pd.Series([94065, 90210, np.nan, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_domain()
constraints: []
stats:
    count: 4
    missing: 2
    unique: 2
values: ZipCode

Returns:: Domain based on the ZipCode feature type.
Return type:: ads.feature_engineering.schema.Domain

static feature_plot(x: Series) → Axes[source]¶

Shows the geometry distribution base on location of zipcode.

Examples

>>> zipcode = pd.Series([94065, 90210, np.nan, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_plot()

Returns:: Plot object for the series based on the ZipCode feature type.
Return type:: matplotlib.axes._subplots.AxesSubplot

static feature_stat(x: Series) → DataFrame[source]¶

Generates feature statistics.

Feature statistics include (total)count, unique(count) and missing(count).

Examples

>>> zipcode = pd.Series([94065, 90210, np.nan, None], name='zipcode')
>>> zipcode.ads.feature_type = ['zip_code']
>>> zipcode.ads.feature_stat()
    Metric  Value
0       count   4
1       unique  2
2       missing 2

Returns:: Summary statistics of the Series provided.
Return type:: Pandas Dataframe

validator = <ads.feature_engineering.feature_type.handler.feature_validator.FeatureValidator object>¶

warning = <ads.feature_engineering.feature_type.handler.feature_warning.FeatureWarning object>¶

ads.feature_engineering.feature_type.zip_code.default_handler(data: Series, *args, **kwargs) → Series[source]¶

Processes given data and indicates if the data matches requirements.

Parameters:: data (pd.Series) – The data to process.
Returns:: pd.Series
Return type:: The logical list indicating if the data matches requirements.

Module contents¶

Address: Type representing address.
Boolean: Type representing binary values True/False.
Category: Type representing discrete unordered values.
Constant: Type representing constant values.
Continuous: Type representing continuous values.
CreditCard: Type representing credit card numbers.
DateTime: Type representing date and/or time.
Document: Type representing document values.
Discrete: Type representing discrete values.
FeatureType: Base class for all feature types.
GIS: Type representing geographic information.
Integer: Type representing integer values.
IpAddress: Type representing IP Address.
IpAddressV4: Type representing IP Address V4.
IpAddressV6: Type representing IP Address V6.
LatLong: Type representing longitude and latitute.
Object: Type representing object.
Ordinal: Type representing ordered values.
PhoneNumber: Type representing phone numbers.
String: Type representing string values.
Tag: Free form tag.
Text: Type representing text values.
ZipCode: Type representing postal code.
Unknown: Type representing third-party dtypes.