ads.data_labeling.mixin package¶

Submodules¶

ads.data_labeling.mixin.data_labeling module¶

class ads.data_labeling.mixin.data_labeling.DataLabelingAccessMixin[source]¶

Bases: object

Mixin class for labeled text data.

static read_labeled_data(path: str | None = None, dataset_id: str | None = None, compartment_id: str | None = None, auth: Dict | None = None, materialize: bool = False, encoding: str = 'utf-8', include_unlabeled: bool = False, format: str | None = None, chunksize: int | None = None)[source]¶

Loads the dataset generated by data labeling service from either the export file or the Data Labeling Service.

Parameters:

path ((str, optional). Defaults to None) – The export file path, can be either local or object storage path.
dataset_id ((str, optional). Defaults to None) – The dataset OCID.
compartment_id (str. Defaults to the compartment_id from the env variable.) – The compartment OCID of the dataset.
auth ((dict, optional). Defaults to None) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
materialize ((bool, optional). Defaults to False) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.
encoding ((str, optional). Defaults to 'utf-8') – Encoding of files. Only used for “TEXT” dataset.
include_unlabeled ((bool, optional). Default to False) – Whether to load the unlabeled records or not.
format ((str, optional). Defaults to None) –
Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo for Object Detection type.
- When None, it outputs List[NERItem] or List[BoundingBoxItem],
- When “spacy”, it outputs List[Tuple],
- When “yolo”, it outputs List[List[Tuple]].
chunksize ((int, optional). Defaults to None) – The amount of records that should be read in one iteration. The result will be returned in a generator format.

Returns:

pd.Dataframe if chunksize is not specified. Generator[pd.Dataframe] if chunksize is specified.

Return type:

Union[Generator[pd.DataFrame, Any, Any], pd.DataFrame]

Examples

>>> import pandas as pd
>>> import ads
>>> from ads.common import auth as authutil
>>> df = pd.DataFrame.ads.read_labeled_data(path="path_to_your_metadata.jsonl",
...                                         auth=authutil.api_keys(),
...                                         materialize=False)
                            Path       Content               Annotations
    --------------------------------------------------------------------
    0   path/to/the/content/file                                     yes
    1   path/to/the/content/file                                      no

>>> df = pd.DataFrame.ads.read_labeled_data_from_dls(dataset_id="your_dataset_ocid",
...                                                  compartment_id="your_compartment_id",
...                                                  auth=authutil.api_keys(),
...                                                  materialize=False)
                            Path       Content               Annotations
    --------------------------------------------------------------------
    0   path/to/the/content/file                                     yes
    1   path/to/the/content/file                                      no

render_bounding_box(options: Dict | None = None, content_column: str = 'Content', annotations_column: str = 'Annotations', categories: List[str] | None = None, limit: int = 50, path: str | None = None) → None[source]¶

Renders bounding box dataset. Displays only first 50 rows.

Parameters:

options (dict) – The colors options specified for rendering.
content_column (Optional[str]) – The column name with the content data.
annotations_column (Optional[str]) – The column name for the annotations list.
categories (Optional List[str]) – The list of object categories in proper order for model training. Only used when bounding box annotations are in YOLO format. Example: [‘cat’,’dog’,’horse’]
limit (Optional[int]. Defaults to 50) – The maximum amount of records to display.
path (Optional[str]) – Path to save the image with annotations to local directory.

Returns:

Nothing

Return type:

None

Examples

>>> import pandas as pd
>>> import ads
>>> from ads.common import auth as authutil
>>> df = pd.DataFrame.ads.read_labeled_data(path="path_to_your_metadata.jsonl",
...                                         auth=authutil.api_keys(),
...                                         materialize=True)
>>> df.ads.render_bounding_box(content_column="Content", annotations_column="Annotations")

render_ner(options: Dict = None, content_column: str = 'Content', annotations_column: str = 'Annotations', limit: int = 50, return_html: bool = False) → None[source]¶

Renders NER dataset. Displays only first 50 rows.

Parameters:

options (dict) – The colors options specified for rendering.
content_column (Optional[str]) – The column name with the content data.
annotations_column (Optional[str]) – The column name for the annotations list.
limit (Optional[int]. Defaults to 50) – The maximum amount of records to display.

Returns:

Nothing

Return type:

None

Examples

>>> import pandas as pd
>>> import ads
>>> from ads.common import auth as authutil
>>> df = pd.DataFrame.ads.read_labeled_data(path="path_to_your_metadata.jsonl",
...                                         auth=authutil.api_keys(),
...                                         materialize=True)
>>> df.ads.render_ner(content_column="Content", annotations_column="Annotations")

ads.data_labeling.mixin package¶

Submodules¶

ads.data_labeling.mixin.data_labeling module¶

Module contents¶