ads.feature_engineering.accessor package¶
Subpackages¶
- ads.feature_engineering.accessor.mixin package
- Submodules
- ads.feature_engineering.accessor.mixin.correlation module
- ads.feature_engineering.accessor.mixin.eda_mixin module
- ads.feature_engineering.accessor.mixin.eda_mixin_series module
- ads.feature_engineering.accessor.mixin.feature_types_mixin module
- ads.feature_engineering.accessor.mixin.utils module
- Module contents
Submodules¶
ads.feature_engineering.accessor.dataframe_accessor module¶
The ADS accessor for the Pandas DataFrame. The accessor will be initialized with the pandas object the user is interacting with.
Examples
>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor
>>> from ads.feature_engineering.feature_type.continuous import Continuous
>>> from ads.feature_engineering.feature_type.creditcard import CreditCard
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.base import Tag
>>> df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]})
>>> df.ads.feature_type
{'Name': ['string'], 'Credit Card': ['string']}
>>> df.ads.feature_type_description
Column Feature Type Description
------------------------------------------------------------------
0 Name string Type representing string values.
1 Credit Card string Type representing string values.
>>> df.ads.default_type
{'Name': 'string', 'Credit Card': 'string'}
>>> df.ads.feature_type = {'Name':['string', Tag('abc')]}
>>> df.ads.tags
{'Name': ['abc']}
>>> df.ads.feature_type = {'Credit Card':['credit_card']}
>>> df.ads.feature_select(include=['credit_card'])
Credit Card
-------------------------------
0 4532640527811543
- class ads.feature_engineering.accessor.dataframe_accessor.ADSDataFrameAccessor(pandas_obj)[source]¶
Bases:
ADSFeatureTypesMixin
,EDAMixin
,DBAccessMixin
,DataLabelingAccessMixin
,ADSDatasetAccessMixin
ADS accessor for the Pandas DataFrame.
- default_type(self) Dict[str, str] ¶
Gets the map of columns and associated default feature type names.
- feature_type_description(self) pd.DataFrame ¶
Gets the list of registered feature types in a DataFrame format.
- sync(self, src: pd.DataFrame | pd.Series) pd.DataFrame [source]¶
Syncs feature types of current DataFrame with that from src.
- feature_select(self, include: List[FeatureType | str] = None, exclude: List[FeatureType | str] = None) pd.DataFrame [source]¶
Gets the list of registered feature types in a DataFrame format.
Examples
>>> from ads.feature_engineering.accessor.dataframe_accessor import ADSDataFrameAccessor >>> from ads.feature_engineering.feature_type.continuous import Continuous >>> from ads.feature_engineering.feature_type.creditcard import CreditCard >>> from ads.feature_engineering.feature_type.string import String >>> from ads.feature_engineering.feature_type.base import Tag df = pd.DataFrame({'Name': ['Alex'], 'CreditCard': ["4532640527811543"]}) >>> df.ads.feature_type {'Name': ['string'], 'Credit Card': ['string']} >>> df.ads.feature_type_description Column Feature Type Description ------------------------------------------------------------------- 0 Name string Type representing string values. 1 Credit Card string Type representing string values. >>> df.ads.default_type {'Name': 'string', 'Credit Card': 'string'} >>> df.ads.feature_type = {'Name':['string', Tag('abc')]} >>> df.ads.tags {'Name': ['abc']} >>> df.ads.feature_type = {'Credit Card':['credit_card']} >>> df.ads.feature_select(include=['credit_card']) Credit Card ------------------------------ 0 4532640527811543
Initializes ADS Pandas DataFrame Accessor.
- Parameters:
pandas_obj (pandas.DataFrame) – Pandas dataframe
- Raises:
ValueError – If provided DataFrame has duplicate columns.
- property default_type: Dict[str, str]¶
Gets the map of columns and associated default feature type names.
- feature_select(include: List[FeatureType | str] | None = None, exclude: List[FeatureType | str] | None = None) DataFrame [source]¶
Returns a subset of the DataFrame’s columns based on the column feature_types.
- Parameters:
include (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be included.
exclude (List[Union[FeatureType, str]], optional) – Defaults to None. A list of FeatureType subclass or str to be excluded.
- Raises:
ValueError – If both of include and exclude are empty
ValueError – If include and exclude are used simultaneously
- Returns:
The subset of the frame including the feature types in include and excluding the feature types in exclude.
- Return type:
pandas.DataFrame
- property feature_type_description: DataFrame¶
Gets the list of registered feature types in a DataFrame format.
- Return type:
pandas.DataFrame
Examples
>>> df.ads.feature_type_description() Column Feature Type Description ------------------------------------------------------------------- 0 City string Type representing string values. 1 Phone Number string Type representing string values.
- info() Any [source]¶
Gets information about the dataframe.
- Returns:
The information about the dataframe.
- Return type:
Any
- model_schema(max_col_num: int = 2000)[source]¶
Generates schema from the dataframe.
- Parameters:
max_col_num (int, optional. Defaults to 1000) – The maximum column size of the data that allows to auto generate schema.
Examples
>>> df = pd.read_csv('./orcl_attrition.csv', usecols=['Age', 'Attrition']) >>> schema = df.ads.model_schema() >>> schema Schema: - description: Attrition domain: constraints: [] stats: count: 1470 unique: 2 values: String dtype: object feature_type: String name: Attrition required: true - description: Age domain: constraints: [] stats: 25%: 31.0 50%: 37.0 75%: 44.0 count: 1470.0 max: 61.0 mean: 37.923809523809524 min: 19.0 std: 9.135373489136732 values: Integer dtype: int64 feature_type: Integer name: Age required: true >>> schema.to_dict() {'Schema': [{'dtype': 'object', 'feature_type': 'String', 'name': 'Attrition', 'domain': {'values': 'String', 'stats': {'count': 1470, 'unique': 2}, 'constraints': []}, 'required': True, 'description': 'Attrition'}, {'dtype': 'int64', 'feature_type': 'Integer', 'name': 'Age', 'domain': {'values': 'Integer', 'stats': {'count': 1470.0, 'mean': 37.923809523809524, 'std': 9.135373489136732, 'min': 19.0, '25%': 31.0, '50%': 37.0, '75%': 44.0, 'max': 61.0}, 'constraints': []}, 'required': True, 'description': 'Age'}]}
- Returns:
data schema.
- Return type:
- Raises:
ads.feature_engineering.schema.DataSizeTooWide – If the number of columns of input data exceeds max_col_num.
- sync(src: DataFrame | Series) DataFrame [source]¶
Syncs feature types of current DataFrame with that from src.
Syncs feature types of current dataframe with that from src, where src can be a dataframe or a series. In either case, only columns with matched names are synced.
- Parameters:
src (pd.DataFrame | pd.Series) – The source to sync from.
- Returns:
Synced dataframe.
- Return type:
pandas.DataFrame
ads.feature_engineering.accessor.series_accessor module¶
The ADS accessor for the Pandas Series. The accessor will be initialized with the pandas object the user is interacting with.
Examples
>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor
>>> from ads.feature_engineering.feature_type.string import String
>>> from ads.feature_engineering.feature_type.ordinal import Ordinal
>>> from ads.feature_engineering.feature_type.base import Tag
>>> series = pd.Series(['name1', 'name2', 'name3'])
>>> series.ads.default_type
'string'
>>> series.ads.feature_type
['string']
>>> series.ads.feature_type_description
Feature Type Description
----------------------------------------------------
0 string Type representing string values.
>>> series.ads.feature_type = ['string', Ordinal, Tag('abc')]
>>> series.ads.feature_type
['string', 'ordinal', 'abc']
>>> series1 = series.dropna()
>>> series1.ads.sync(series)
>>> series1.ads.feature_type
['string', 'ordinal', 'abc']
- class ads.feature_engineering.accessor.series_accessor.ADSSeriesAccessor(pandas_obj: Series)[source]¶
Bases:
ADSFeatureTypesMixin
,EDAMixinSeries
ADS accessor for Pandas Series.
- sync(self, src: pd.DataFrame | pd.Series) None [source]¶
Syncs feature types of current series with that from src.
- feature_type_description(self) pd.DataFrame ¶
Gets the list of registered feature types in a DataFrame format.
Examples
>>> from ads.feature_engineering.accessor.series_accessor import ADSSeriesAccessor >>> from ads.feature_engineering.feature_type.string import String >>> from ads.feature_engineering.feature_type.ordinal import Ordinal >>> from ads.feature_engineering.feature_type.base import Tag >>> series = pd.Series(['name1', 'name2', 'name3']) >>> series.ads.default_type 'string' >>> series.ads.feature_type ['string'] >>> series.ads.feature_type_description Feature Type Description ---------------------------------------------------- 0 string Type representing string values. >>> series.ads.feature_type = ['string', Ordinal, Tag('abc')] >>> series.ads.feature_type ['string', 'ordinal', 'abc'] >>> series1 = series.dropna() >>> series1.ads.sync(series) >>> series1.ads.feature_type ['string', 'ordinal', 'abc']
Initializes ADS Pandas Series Accessor.
- Parameters:
pandas_obj (pd.Series) – The pandas series
- property default_type: str¶
Gets the name of default feature type for the series.
- Returns:
The name of default feature type.
- Return type:
- property feature_type: List[str]¶
Gets the list of registered feature types for the series.
- Returns:
Names of feature types.
- Return type:
List[str]
Examples
>>> series = pd.Series(['name1']) >>> series.ads.feature_type = ['name', 'string', Tag('tag for name')] >>> series.ads.feature_type ['name', 'string', 'tag for name']
- property feature_type_description: DataFrame¶
Gets the list of registered feature types in a DataFrame format.
- Returns:
The DataFrame with feature types for this series.
- Return type:
pd.DataFrame
Examples
>>> series = pd.Series(['name1']) >>> series.ads.feature_type = ['name', 'string', Tag('Name tag')] >>> series.ads.feature_type_description Feature Type Description ---------------------------------------------------------- 0 name Type representing name values. 1 string Type representing string values. 2 Name tag Tag.
- sync(src: DataFrame | Series) None [source]¶
Syncs feature types of current series with that from src.
The src could be a dataframe or a series. In either case, only columns with matched names are synced.
- Parameters:
src ((pd.DataFrame | pd.Series)) – The source to sync from.
- Returns:
Nothing.
- Return type:
None
Examples
>>> series = pd.Series(['name1', 'name2', 'name3', None]) >>> series.ads.feature_type = ['name'] >>> series.ads.feature_type ['name', string] >>> series.dropna().ads.feature_type ['string'] >>> series1 = series.dropna() >>> series1.ads.sync(series) >>> series1.ads.feature_type ['name', 'string']
- class ads.feature_engineering.accessor.series_accessor.ADSSeriesValidator(feature_type_list: List[FeatureType], series: Series)[source]¶
Bases:
object
Class helper to invoke registerred validator on a series level.
Initializes ADS series validator.
- Parameters:
feature_type_list (List[FeatureType]) – The list of feature types.
series (pd.Series) – The pandas series.