ads.data_labeling package

Submodules

ads.data_labeling.interface.loader module

class ads.data_labeling.interface.loader.Loader

Bases: ABC

Data Loader Interface.

abstract load(**kwargs) Any

ads.data_labeling.interface.parser module

class ads.data_labeling.interface.parser.Parser

Bases: ABC

Data Parser Interface.

abstract parse() Any

ads.data_labeling.interface.reader module

class ads.data_labeling.interface.reader.Reader

Bases: ABC

Data Reader Interface.

info() Serializable
abstract read() Any

ads.data_labeling.boundingbox module

class ads.data_labeling.boundingbox.BoundingBoxItem(top_left: ~typing.Tuple[float, float], bottom_left: ~typing.Tuple[float, float], bottom_right: ~typing.Tuple[float, float], top_right: ~typing.Tuple[float, float], labels: ~typing.List[str] = <factory>)

Bases: object

BoundingBoxItem class representing bounding box label.

labels

List of labels for this bounding box.

Type

List[str]

top_left

Top left corner of this bounding box.

Type

Tuple[float, float]

bottom_left

Bottom left corner of this bounding box.

Type

Tuple[float, float]

bottom_right

Bottom right corner of this bounding box.

Type

Tuple[float, float]

top_right

Top right corner of this bounding box.

Type

Tuple[float, float]

Examples

>>> item = BoundingBoxItem(
...     labels = ['cat','dog']
...     bottom_left=(0.2, 0.4),
...     top_left=(0.2, 0.2),
...     top_right=(0.8, 0.2),
...     bottom_right=(0.8, 0.4))
>>> item.to_yolo(categories = ['cat','dog', 'horse'])
bottom_left: Tuple[float, float]
bottom_right: Tuple[float, float]
classmethod from_yolo(bbox: List[Tuple], categories: Optional[List[str]] = None) BoundingBoxItem

Converts the YOLO formated annotations to BoundingBoxItem.

Parameters
  • bboxes (List[Tuple]) – The list of bounding box annotations in YOLO format. Example: [(0, 0.511560675, 0.50234826, 0.47013485, 0.57803468)]

  • categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

The BoundingBoxItem.

Return type

BoundingBoxItem

Raises

TypeError – When categories list has a wrong format.

labels: List[str]
to_yolo(categories: List[str]) List[Tuple[int, float, float, float, float]]

Converts BoundingBoxItem to the YOLO format.

Parameters

categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

The list of YOLO formatted bounding boxes.

Return type

List[Tuple[int, float, float, float, float]]

Raises
  • ValueError – When categories list not provided. When categories list not matched with the labels.

  • TypeError – When categories list has a wrong format.

top_left: Tuple[float, float]
top_right: Tuple[float, float]
class ads.data_labeling.boundingbox.BoundingBoxItems(items: ~typing.List[~ads.data_labeling.boundingbox.BoundingBoxItem] = <factory>)

Bases: object

BoundingBoxItems class which consists of a list of BoundingBoxItem.

items

List of BoundingBoxItem.

Type

List[BoundingBoxItem]

Examples

>>> item = BoundingBoxItem(
...     labels = ['cat','dog']
...     bottom_left=(0.2, 0.4),
...     top_left=(0.2, 0.2),
...     top_right=(0.8, 0.2),
...     bottom_right=(0.8, 0.4))
>>> items = BoundingBoxItems(items = [item])
>>> items.to_yolo(categories = ['cat','dog', 'horse'])
items: List[BoundingBoxItem]
to_yolo(categories: List[str]) List[Tuple[int, float, float, float, float]]

Converts BoundingBoxItems to the YOLO format.

Parameters

categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

The list of YOLO formatted bounding boxes.

Return type

List[Tuple[int, float, float, float, float]]

Raises
  • ValueError – When categories list not provided. When categories list not matched with the labels.

  • TypeError – When categories list has a wrong format.

ads.data_labeling.constants module

class ads.data_labeling.constants.AnnotationType

Bases: object

AnnotationType class which contains all the annotation types that data labeling service supports.

BOUNDING_BOX = 'BOUNDING_BOX'
ENTITY_EXTRACTION = 'ENTITY_EXTRACTION'
MULTI_LABEL = 'MULTI_LABEL'
SINGLE_LABEL = 'SINGLE_LABEL'
class ads.data_labeling.constants.DatasetType

Bases: object

DatasetType class which contains all the dataset types that data labeling service supports.

DOCUMENT = 'DOCUMENT'
IMAGE = 'IMAGE'
TEXT = 'TEXT'
class ads.data_labeling.constants.Formats

Bases: object

Common formats class which contains all the common formats that are supported to convert to.

SPACY = 'spacy'
YOLO = 'yolo'

ads.data_labeling.data_labeling_service module

class ads.data_labeling.data_labeling_service.DataLabeling(compartment_id: Optional[str] = None, dls_cp_client_auth: Optional[dict] = None, dls_dp_client_auth: Optional[dict] = None)

Bases: OCIWorkRequestMixin

Class for data labeling service. Integrate the data labeling service APIs.

Examples

>>> import ads
>>> import pandas
>>> from ads.data_labeling.data_labeling_service import DataLabeling
>>> ads.set_auth("api_key")
>>> dls = DataLabeling()
>>> dls.list_dataset()
>>> metadata_path = dls.export(dataset_id="your dataset id",
...     path="oci://<bucket_name>@<namespace>/folder")
>>> df = pd.DataFrame.ads.read_labeled_data(metadata_path)

Initialize a DataLabeling class.

Parameters
  • compartment_id (str, optional) – OCID of data labeling datasets’ compartment

  • dls_cp_client_auth (dict, optional) – Data Labeling control plane client auth. Default is None. The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • dls_dp_client_auth (dict, optional) – Data Labeling data plane client auth. Default is None. The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns

Nothing.

Return type

None

export(dataset_id: str, path: str, include_unlabeled=False) str

Export dataset based on the dataset_id and save the jsonl files under the path (metadata jsonl file and the records jsonl file) to the object storage path provided by the user and return the metadata jsonl path.

Parameters
  • dataset_id (str) – The dataset id of which the snapshot will be generated.

  • path (str) – The object storage path to store the generated snapshot. “oci://<bucket_name>@<namespace>/prefix”

  • include_unlabeled (bool, Optional. Defaults to False.) – Whether to include unlabeled records or not.

Returns

oci path of the metadata jsonl file.

Return type

str

list_dataset(**kwargs) DataFrame

List all the datasets created from the data labeling service under a given compartment.

Parameters

kwargs (dict, optional) – Additional keyword arguments will be passed to oci.data_labeling_serviceDataLabelingManagementClient.list_datasets method.

Returns

pandas dataframe which contains the dataset information.

Return type

pandas.DataFrame

Raises

Exception – If pagination.list_call_get_all_results() fails

ads.data_labeling.metadata module

class ads.data_labeling.metadata.Metadata(source_path: str = '', records_path: str = '', labels: ~typing.List[str] = <factory>, dataset_name: str = '', compartment_id: str = '', dataset_id: str = '', annotation_type: str = '', dataset_type: str = '')

Bases: DataClassSerializable

The class that representing the labeled dataset metadata.

source_path

Contains information on where all the source data(image/text/document) stores.

Type

str

records_path

Contains information on where records jsonl file stores.

Type

str

labels

List of classes/labels for the dataset.

Type

List

dataset_name

Dataset display name on the Data Labeling Service console.

Type

str

compartment_id

Compartment id of the labeled dataset.

Type

str

dataset_id

Dataset id.

Type

str

annotation_type

Type of the labeling/annotation task. Currently supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION, BOUNDING_BOX.

Type

str

dataset_type

Type of the dataset. Currently supports Text, Image, DOCUMENT.

Type

str

annotation_type: str = ''
compartment_id: str = ''
dataset_id: str = ''
dataset_name: str = ''
dataset_type: str = ''
classmethod from_dls_dataset(dataset: Dataset) Metadata

Contructs a Metadata instance from OCI DLS dataset.

Parameters

dataset (OCIDLSDataset) – OCIDLSDataset object.

Returns

The ads labeled dataset metadata instance.

Return type

Metadata

labels: List[str]
records_path: str = ''
source_path: str = ''
to_dataframe() DataFrame

Converts the metadata to dataframe format.

Returns

The metadata in Pandas dataframe format.

Return type

pandas.DataFrame

to_dict() Dict

Converts to dictionary representation.

Returns

The metadata in dictionary type.

Return type

Dict

ads.data_labeling.ner module

class ads.data_labeling.ner.NERItem(label: str = '', offset: int = 0, length: int = 0)

Bases: object

NERItem class which is a representation of a token span.

label

Entity name.

Type

str

offset

The token span’s entity start index position in the text.

Type

int

length

Length of the token span.

Type

int

classmethod from_spacy(token) NERItem
label: str = ''
length: int = 0
offset: int = 0
to_spacy() tuple

Converts one NERItem to the spacy format.

Returns

NERItem in the spacy format

Return type

Tuple

class ads.data_labeling.ner.NERItems(items: ~typing.List[~ads.data_labeling.ner.NERItem] = <factory>)

Bases: object

NERItems class consists of a list of NERItem.

items

List of NERItem.

Type

List[NERItem]

items: List[NERItem]
to_spacy() List[tuple]

Converts NERItems to the spacy format.

Returns

List of NERItems in the Spacy format.

Return type

List[tuple]

exception ads.data_labeling.ner.WrongEntityFormatLabelIsEmpty

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatLabelNotString

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatLengthIsNegative

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatLengthNotInteger

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatOffsetIsNegative

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatOffsetNotInteger

Bases: ValueError

ads.data_labeling.record module

class ads.data_labeling.record.Record(path: str = '', content: Optional[Any] = None, annotation: Optional[Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]] = None)

Bases: object

Class representing Record.

path

File path.

Type

str

content

Content of the record.

Type

Any

annotation

Annotation/label of the record.

Type

Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]

annotation: Union[Tuple, str, List[BoundingBoxItem], List[NERItem]] = None
content: Any = None
path: str = ''
to_dict() Dict

Convert the Record instance to a dictionary.

Returns

Dictionary representation of the Record instance.

Return type

Dict

to_tuple() Tuple[str, Any, Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]]

Convert the Record instance to a tuple.

Returns

Tuple representation of the Record instance.

Return type

Tuple

ads.data_labeling.mixin.data_labeling module

class ads.data_labeling.mixin.data_labeling.DataLabelingAccessMixin

Bases: object

Mixin class for labeled text data.

static read_labeled_data(path: Optional[str] = None, dataset_id: Optional[str] = None, compartment_id: Optional[str] = None, auth: Optional[Dict] = None, materialize: bool = False, encoding: str = 'utf-8', include_unlabeled: bool = False, format: Optional[str] = None, chunksize: Optional[int] = None)

Loads the dataset generated by data labeling service from either the export file or the Data Labeling Service.

Parameters
  • path ((str, optional). Defaults to None) – The export file path, can be either local or object storage path.

  • dataset_id ((str, optional). Defaults to None) – The dataset OCID.

  • compartment_id (str. Defaults to the compartment_id from the env variable.) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • materialize ((bool, optional). Defaults to False) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.

  • encoding ((str, optional). Defaults to 'utf-8') – Encoding of files. Only used for “TEXT” dataset.

  • include_unlabeled ((bool, optional). Default to False) – Whether to load the unlabeled records or not.

  • format ((str, optional). Defaults to None) –

    Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo for Object Detection type.

    • When None, it outputs List[NERItem] or List[BoundingBoxItem],

    • When “spacy”, it outputs List[Tuple],

    • When “yolo”, it outputs List[List[Tuple]].

  • chunksize ((int, optional). Defaults to None) – The amount of records that should be read in one iteration. The result will be returned in a generator format.

Returns

pd.Dataframe if chunksize is not specified. Generator[pd.Dataframe] if chunksize is specified.

Return type

Union[Generator[pd.DataFrame, Any, Any], pd.DataFrame]

Examples

>>> import pandas as pd
>>> import ads
>>> from ads.common import auth as authutil
>>> df = pd.DataFrame.ads.read_labeled_data(path="path_to_your_metadata.jsonl",
...                                         auth=authutil.api_keys(),
...                                         materialize=False)
                            Path       Content               Annotations
    --------------------------------------------------------------------
    0   path/to/the/content/file                                     yes
    1   path/to/the/content/file                                      no
>>> df = pd.DataFrame.ads.read_labeled_data_from_dls(dataset_id="your_dataset_ocid",
...                                                  compartment_id="your_compartment_id",
...                                                  auth=authutil.api_keys(),
...                                                  materialize=False)
                            Path       Content               Annotations
    --------------------------------------------------------------------
    0   path/to/the/content/file                                     yes
    1   path/to/the/content/file                                      no
render_bounding_box(options: Optional[Dict] = None, content_column: str = 'Content', annotations_column: str = 'Annotations', categories: Optional[List[str]] = None, limit: int = 50, path: Optional[str] = None) None

Renders bounding box dataset. Displays only first 50 rows.

Parameters
  • options (dict) – The colors options specified for rendering.

  • content_column (Optional[str]) – The column name with the content data.

  • annotations_column (Optional[str]) – The column name for the annotations list.

  • categories (Optional List[str]) – The list of object categories in proper order for model training. Only used when bounding box annotations are in YOLO format. Example: [‘cat’,’dog’,’horse’]

  • limit (Optional[int]. Defaults to 50) – The maximum amount of records to display.

  • path (Optional[str]) – Path to save the image with annotations to local directory.

Returns

Nothing

Return type

None

Examples

>>> import pandas as pd
>>> import ads
>>> from ads.common import auth as authutil
>>> df = pd.DataFrame.ads.read_labeled_data(path="path_to_your_metadata.jsonl",
...                                         auth=authutil.api_keys(),
...                                         materialize=True)
>>> df.ads.render_bounding_box(content_column="Content", annotations_column="Annotations")
render_ner(options: Dict = None, content_column: str = 'Content', annotations_column: str = 'Annotations', limit: int = 50) None

Renders NER dataset. Displays only first 50 rows.

Parameters
  • options (dict) – The colors options specified for rendering.

  • content_column (Optional[str]) – The column name with the content data.

  • annotations_column (Optional[str]) – The column name for the annotations list.

  • limit (Optional[int]. Defaults to 50) – The maximum amount of records to display.

Returns

Nothing

Return type

None

Examples

>>> import pandas as pd
>>> import ads
>>> from ads.common import auth as authutil
>>> df = pd.DataFrame.ads.read_labeled_data(path="path_to_your_metadata.jsonl",
...                                         auth=authutil.api_keys(),
...                                         materialize=True)
>>> df.ads.render_ner(content_column="Content", annotations_column="Annotations")

ads.data_labeling.parser.export_metadata_parser module

class ads.data_labeling.parser.export_metadata_parser.MetadataParser

Bases: Parser

MetadataParser class which parses the metadata from the record.

EXPECTED_KEYS = ['id', 'compartmentId', 'displayName', 'labelsSet', 'annotationFormat', 'datasetSourceDetails', 'datasetFormatDetails']
static parse(json_data: Dict[Any, Any]) Metadata

Parses the metadata jsonl file.

Parameters

json_data (dict) – dictionary format of the metadata jsonl file content.

Returns

Metadata object which contains the useful fields from the metadata jsonl file

Return type

Metadata

ads.data_labeling.parser.export_record_parser module

class ads.data_labeling.parser.export_record_parser.BoundingBoxRecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: RecordParser

BoundingBoxRecordParser class which parses the label of BoundingBox label data.

Initiates a RecordParser instance.

Parameters
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

class ads.data_labeling.parser.export_record_parser.EntityType

Bases: object

Entity type class for supporting multiple types of entities.

GENERIC = 'GENERIC'
IMAGEOBJECTSELECTION = 'IMAGEOBJECTSELECTION'
TEXTSELECTION = 'TEXTSELECTION'
class ads.data_labeling.parser.export_record_parser.MultiLabelRecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: RecordParser

MultiLabelRecordParser class which parses the label of Multiple label data.

Initiates a RecordParser instance.

Parameters
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

class ads.data_labeling.parser.export_record_parser.NERRecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: RecordParser

NERRecordParser class which parses the label of NER label data.

Initiates a RecordParser instance.

Parameters
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

class ads.data_labeling.parser.export_record_parser.RecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: Parser

RecordParser class which parses the labels from the record.

Examples

>>> from ads.data_labeling.parser.export_record_parser import SingleLabelRecordParser
>>> from ads.data_labeling.parser.export_record_parser import MultiLabelRecordParser
>>> from ads.data_labeling.parser.export_record_parser import NERRecordParser
>>> from ads.data_labeling.parser.export_record_parser import BoundingBoxRecordParser
>>> import fsspec
>>> import json
>>> from ads.common import auth as authutil
>>> labels = []
>>> with fsspec.open("/path/to/records_file.jsonl", **authutil.api_keys()) as f:
>>>     for line in f:
>>>         bounding_box_labels = BoundingBoxRecordParser("source_data_path").parse(json.loads(line))
>>>         labels.append(bounding_box_labels)

Initiates a RecordParser instance.

Parameters
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

parse(record: Dict) Record

Extracts the annotations from the record content. Constructs and returns a Record instance containing the file path and the labels.

Parameters

record (Dict) – Content of the record from the record file.

Returns

Record instance which contains the file path as well as the annotations.

Return type

Record

class ads.data_labeling.parser.export_record_parser.RecordParserFactory

Bases: object

RecordParserFactory class which contains a list of registered parsers and allows to register new RecordParsers.

Current parsers include:
  • SingleLabelRecordParser

  • MultiLabelRecordParser

  • NERRecordParser

  • BoundingBoxRecordParser

static parser(annotation_type: str, dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None) RecordParser

Gets the parser based on the annotation_type.

Parameters
  • annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.

  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser corresponding to the annotation type.

Return type

RecordParser

Raises

ValueError – If annotation_type is not supported.

classmethod register(annotation_type: str, parser) None

Registers a new parser.

Parameters
  • annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.

  • parser (RecordParser) – A new Parser class to be registered.

Returns

Nothing.

Return type

None

class ads.data_labeling.parser.export_record_parser.SingleLabelRecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: RecordParser

SingleLabelRecordParser class which parses the label of Single label data.

Initiates a RecordParser instance.

Parameters
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

ads.data_labeling.reader.dataset_reader module

The module containing classes to read labeled datasets. Allows to read labeled datasets from exports or from the cloud.

Classes

LabeledDatasetReader

The LabeledDatasetReader class to read labeled dataset.

ExportReader

The ExportReader class to read labeled dataset from the export.

DLSDatasetReader

The DLSDatasetReader class to read labeled dataset from the cloud.

Examples

>>> from ads.common import auth as authutil
>>> from ads.data_labeling import LabeledDatasetReader
>>> ds_reader = LabeledDatasetReader.from_export(
...    path="oci://bucket_name@namespace/dataset_metadata.jsonl",
...    auth=authutil.api_keys(),
...    materialize=True
... )
>>> ds_reader.info()
    ------------------------------------------------------------------------
    annotation_type                                             SINGLE_LABEL
    compartment_id                                          TEST_COMPARTMENT
    dataset_id                                                  TEST_DATASET
    dataset_name                                           test_dataset_name
    dataset_type                                                        TEXT
    labels                                                     ['yes', 'no']
    records_path                                             path/to/records
    source_path                                              path/to/dataset
>>> ds_reader.read()
                             Path            Content            Annotations
    -----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no
    2   path/to/the/content/file3       file content                     no
>>> next(ds_reader.read(iterator=True))
    ("path/to/the/content/file1", "file content", "yes")
>>> next(ds_reader.read(iterator=True, chunksize=2))
    [("path/to/the/content/file1", "file content", "yes"),
    ("path/to/the/content/file2", "file content", "no")]
>>> next(ds_reader.read(chunksize=2))
                            Path            Content            Annotations
    ----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no
>>> ds_reader = LabeledDatasetReader.from_DLS(
...    dataset_id="dataset_OCID",
...    compartment_id="compartment_OCID",
...    auth=authutil.api_keys(),
...    materialize=True
... )
class ads.data_labeling.reader.dataset_reader.DLSDatasetReader(dataset_id: str, compartment_id: str, auth: Dict, encoding='utf-8', materialize: bool = False, include_unlabeled: bool = False)

Bases: Reader

The DLSDatasetReader class to read labeled dataset from the cloud.

info(self) Metadata

Gets the labeled dataset metadata.

read(self) Generator[Tuple, Any, Any]

Reads the labeled dataset.

Initializes the DLS dataset reader instance.

Parameters
  • dataset_id (str) – The dataset OCID.

  • compartment_id (str) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files. The encoding is used to extract the metadata information of the labeled dataset and also to extract the content of the text dataset records.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of dataset files should be loaded/materialized or not. By default the content will not be materialized.

  • include_unlabeled ((bool, optional). Defaults to False.) – Whether to load the unlabeled records or not.

Raises
  • ValueError – When dataset_id is empty or not a string.:

  • TypeError – When dataset_id not a string.:

info() Metadata

Gets the labeled dataset metadata.

Returns

The labeled dataset metadata.

Return type

Metadata

read(format: Optional[str] = None) Generator[Tuple, Any, Any]

Reads the labeled dataset records.

Parameters

format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

Returns

The labeled dataset records.

Return type

Generator[Tuple, Any, Any]

class ads.data_labeling.reader.dataset_reader.ExportReader(path: str, auth: Optional[Dict] = None, encoding='utf-8', materialize: bool = False, include_unlabeled: bool = False)

Bases: Reader

The ExportReader class to read labeled dataset from the export.

info(self) Metadata

Gets the labeled dataset metadata.

read(self) Generator[Tuple, Any, Any]

Reads the labeled dataset.

Initializes the labeled dataset export reader instance.

Parameters
  • path (str) – The metadata file path, can be either local or object storage path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files. The encoding is used to extract the metadata information of the labeled dataset and also to extract the content of the text dataset records.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of dataset files should be loaded/materialized or not. By default the content will not be materialized.

  • include_unlabeled ((bool, optional). Defaults to False.) – Whether to load the unlabeled records or not.

Raises
  • ValueError – When path is empty or not a string.:

  • TypeError – When path not a string.:

info() Metadata

Gets the labeled dataset metadata.

Returns

The labeled dataset metadata.

Return type

Metadata

read(format: Optional[str] = None) Generator[Tuple, Any, Any]

Reads the labeled dataset records.

Parameters

format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

Returns

The labeled dataset records.

Return type

Generator[Tuple, Any, Any]

class ads.data_labeling.reader.dataset_reader.LabeledDatasetReader(reader: Reader)

Bases: object

The labeled dataset reader class.

info(self) Metadata

Gets labeled dataset metadata.

read(self, iterator: bool = False) Union[Generator[Any, Any, Any], pd.DataFrame]

Reads labeled dataset.

from_export(cls, path: str, auth: Dict = None, encoding='utf-8', materialize: bool = False) 'LabeledDatasetReader'

Constructs a Labeled Dataset Reader instance.

Examples

>>> from ads.common import auth as authutil
>>> from ads.data_labeling import LabeledDatasetReader
>>> ds_reader = LabeledDatasetReader.from_export(
...    path="oci://bucket_name@namespace/dataset_metadata.jsonl",
...    auth=authutil.api_keys(),
...    materialize=True
... )
>>> ds_reader = LabeledDatasetReader.from_DLS(
...    dataset_id="dataset_OCID",
...    compartment_id="compartment_OCID",
...    auth=authutil.api_keys(),
...    materialize=True
... )
>>> ds_reader.info()
    ------------------------------------------------------------------------
    annotation_type                                             SINGLE_LABEL
    compartment_id                                          TEST_COMPARTMENT
    dataset_id                                                  TEST_DATASET
    dataset_name                                           test_dataset_name
    dataset_type                                                        TEXT
    labels                                                     ['yes', 'no']
    records_path                                             path/to/records
    source_path                                              path/to/dataset
>>> ds_reader.read()
                             Path            Content            Annotations
    -----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no
    2   path/to/the/content/file3       file content                     no
>>> next(ds_reader.read(iterator=True))
    ("path/to/the/content/file1", "file content", "yes")
>>> next(ds_reader.read(iterator=True, chunksize=2))
    [("path/to/the/content/file1", "file content", "yes"),
    ("path/to/the/content/file2", "file content", "no")]
>>> next(ds_reader.read(chunksize=2))
                            Path            Content            Annotations
    ----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no

Initializes the labeled dataset reader instance.

Parameters

reader (Reader) – The Reader instance which reads and extracts the labeled dataset.

classmethod from_DLS(dataset_id: str, compartment_id: Optional[str] = None, auth: Optional[dict] = None, encoding: str = 'utf-8', materialize: bool = False, include_unlabeled: bool = False) LabeledDatasetReader

Constructs Labeled Dataset Reader instance.

Parameters
  • dataset_id (str) – The dataset OCID.

  • compartment_id (str. Defaults to the compartment_id from the env variable.) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.

Returns

The LabeledDatasetReader instance.

Return type

LabeledDatasetReader

classmethod from_export(path: str, auth: Optional[dict] = None, encoding: str = 'utf-8', materialize: bool = False, include_unlabeled: bool = False) LabeledDatasetReader

Constructs Labeled Dataset Reader instance.

Parameters
  • path (str) – The metadata file path, can be either local or object storage path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.

Returns

The LabeledDatasetReader instance.

Return type

LabeledDatasetReader

info() Serializable

Gets the labeled dataset metadata.

Returns

The labeled dataset metadata.

Return type

Metadata

read(iterator: bool = False, format: Optional[str] = None, chunksize: Optional[int] = None) Union[Generator[Any, Any, Any], DataFrame]

Reads the labeled dataset records.

Parameters
  • iterator ((bool, optional). Defaults to False.) – True if the result should be represented as a Generator. Fasle if the result should be represented as a Pandas DataFrame.

  • format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” or “yolo”.

  • chunksize ((int, optional). Defaults to None.) – The number of records that should be read in one iteration. The result will be returned in a generator format.

Returns

  • Union[ – Generator[Tuple[str, str, Any], Any, Any], Generator[List[Tuple[str, str, Any]], Any, Any], Generator[pd.DataFrame, Any, Any], pd.DataFrame

  • ]pd.Dataframe if iterator and chunksize are not specified. Generator[pd.Dataframe] ` if `iterator equal to False and chunksize is specified. Generator[List[Tuple[str, str, Any]]] if iterator equal to True and chunksize is specified. Generator[Tuple[str, str, Any]] if iterator equal to True and chunksize is not specified.

ads.data_labeling.reader.jsonl_reader module

class ads.data_labeling.reader.jsonl_reader.JsonlReader(path: str, auth: Optional[Dict] = None, encoding='utf-8')

Bases: Reader

JsonlReader class which reads the file.

Initiates a JsonlReader object.

Parameters
  • path (str) – object storage path or local path for a file.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding of files. Only used for “TEXT” dataset.

Examples

>>> from ads.data_labeling.reader.jsonl_reader import JsonlReader
>>> path = "your/path/to/jsonl/file.jsonl"
>>> from ads.common import auth as authutil
>>> reader = JsonlReader(path=path, auth=authutil.api_keys(), encoding="utf-8")
>>> next(reader.read())
read(skip: Optional[int] = None) Generator[Dict, Any, Any]

Reads and yields the content of the file.

Parameters

skip ((int, optional). Defaults to None.) – The number of records that should be skipped.

Returns

The content of the file.

Return type

Generator[Dict, Any, Any]

Raises
  • ValueError – If skip not empty and not a positive integer.

  • FileNotFoundError – When file not found.

ads.data_labeling.reader.metadata_reader module

class ads.data_labeling.reader.metadata_reader.DLSMetadataReader(dataset_id: str, compartment_id: str, auth: dict)

Bases: Reader

DLSMetadataReader class which reads the metadata jsonl file from the cloud.

Initializes the DLS metadata reader instance.

Parameters
  • dataset_id (str) – The dataset OCID.

  • compartment_id (str) – The compartment OCID of the dataset.

  • auth (dict) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Raises
  • ValueError – When dataset_id is empty or not a string.:

  • TypeError – When dataset_id not a string.:

read() Metadata

Reads the content from the metadata file.

Returns

The metadata of the labeled dataset.

Return type

Metadata

Raises
exception ads.data_labeling.reader.metadata_reader.DatasetNotFoundError(id: str)

Bases: Exception

exception ads.data_labeling.reader.metadata_reader.EmptyMetadata

Bases: Exception

Empty Metadata.

class ads.data_labeling.reader.metadata_reader.ExportMetadataReader(path: str, auth: Optional[Dict] = None, encoding='utf-8')

Bases: JsonlReader

ExportMetadataReader class which reads the metadata jsonl file from local/object storage path.

Initiates a JsonlReader object.

Parameters
  • path (str) – object storage path or local path for a file.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding of files. Only used for “TEXT” dataset.

Examples

>>> from ads.data_labeling.reader.jsonl_reader import JsonlReader
>>> path = "your/path/to/jsonl/file.jsonl"
>>> from ads.common import auth as authutil
>>> reader = JsonlReader(path=path, auth=authutil.api_keys(), encoding="utf-8")
>>> next(reader.read())
read() Metadata

Reads the content from the metadata file.

Returns

The metadata of the labeled dataset.

Return type

Metadata

class ads.data_labeling.reader.metadata_reader.MetadataReader(reader: Reader)

Bases: object

MetadataReader class which reads and extracts the labeled dataset metadata.

Examples

>>> from ads.data_labeling import MetadataReader
>>> import oci
>>> import os
>>> from ads.common import auth as authutil
>>> reader = MetadataReader.from_export_file("metadata_export_file_path",
...                                 auth=authutil.api_keys())
>>> reader.read()

Initiate a MetadataReader instance.

Parameters

reader (Reader) – Reader instance which reads and extracts the labeled dataset metadata.

classmethod from_DLS(dataset_id: str, compartment_id: Optional[str] = None, auth: Optional[dict] = None) MetadataReader

Contructs a MetadataReader instance.

Parameters
  • dataset_id (str) – The dataset OCID.

  • compartment_id ((str, optional). Default None) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns

The MetadataReader instance whose reader is a DLSMetadataReader instance.

Return type

MetadataReader

classmethod from_export_file(path: str, auth: Optional[Dict] = None) MetadataReader

Contructs a MetadataReader instance.

Parameters
  • path (str) – metadata file path, can be either local or object storage path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns

The MetadataReader instance whose reader is a ExportMetadataReader instance.

Return type

MetadataReader

read() Metadata

Reads the content from the metadata file.

Returns

The metadata of the labeled dataset.

Return type

Metadata

exception ads.data_labeling.reader.metadata_reader.ReadDatasetError(id: str)

Bases: Exception

ads.data_labeling.reader.record_reader module

class ads.data_labeling.reader.record_reader.RecordReader(reader: Reader, parser: Parser, loader: Optional[Loader] = None, include_unlabeled: bool = False, encoding: str = 'utf-8', materialize: bool = False)

Bases: object

Record Reader Class consists of parser, reader and loader. Reader reads the the content from the record file. Parser parses the label for each record. And Loader loads the content of the file path in that record.

Examples

>>> import os
>>> import oci
>>> from ads.data_labeling import RecordReader
>>> from ads.common import auth as authutil
>>> file_path = "/path/to/your_record.jsonl"
>>> dataset_type = "IMAGE"
>>> annotation_type = "BOUNDING_BOX"
>>> record_reader = RecordReader.from_export_file(file_path, dataset_type, annotation_type, "image_file_path", authutil.api_keys())
>>> next(record_reader.read())

Initiates a RecordReader instance.

Parameters
  • reader (Reader) – Reader instance to read content from the record file.

  • parser (Parser) – Parser instance to parse the labels from record file.

  • loader (Loader. Defaults to None.) – Loader instance to load the content from the file path in the record.

  • materialize (bool, optional. Defaults to False.) – Whether to materialize the content using loader.

  • include_unlabeled ((bool, optional). Default to False.) – Whether to load the unlabeled records or not.

  • encoding (str, optional) – Encoding for text files. Used only to extract the content of the text dataset contents.

Raises

ValueError – If the record reader and record parser must be specified. If the loader is not specified when materialize if True.

classmethod from_DLS(dataset_id: str, dataset_type: str, annotation_type: str, dataset_source_path: str, compartment_id: Optional[str] = None, auth: Optional[Dict] = None, include_unlabeled: bool = False, encoding: str = 'utf-8', materialize: bool = False, format: Optional[str] = None, categories: Optional[List[str]] = None) RecordReader

Constructs Record Reader instance.

Parameters
  • dataset_id (str) – The dataset OCID.

  • dataset_type (str) – Dataset type. Currently supports TEXT, IMAGE and DOCUMENT.

  • annotation_type (str) – Annotation Type. Currently TEXT supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION. IMAGE supports SINGLE_LABEL, MULTI_LABEL and BOUNDING_BOX. DOCUMENT supports SINGLE_LABEL and MULTI_LABEL.

  • dataset_source_path (str) – Dataset source path.

  • compartment_id ((str, optional). Defaults to None.) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.

  • format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

The RecordReader instance.

Return type

RecordReader

classmethod from_export_file(path: str, dataset_type: str, annotation_type: str, dataset_source_path: str, auth: Optional[Dict] = None, include_unlabeled: bool = False, encoding: str = 'utf-8', materialize: bool = False, format: Optional[str] = None, categories: Optional[List[str]] = None, includes_metadata=False) RecordReader

Initiates a RecordReader instance.

Parameters
  • path (str) – Record file path.

  • dataset_type (str) – Dataset type. Currently supports TEXT, IMAGE and DOCUMENT.

  • annotation_type (str) – Annotation Type. Currently TEXT supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION. IMAGE supports SINGLE_LABEL, MULTI_LABEL and BOUNDING_BOX. DOCUMENT supports SINGLE_LABEL and MULTI_LABEL.

  • dataset_source_path (str) – Dataset source path.

  • auth ((dict, optional). Default None) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • include_unlabeled ((bool, optional). Default to False.) – Whether to load the unlabeled records or not.

  • encoding ((str, optional). Defaults to "utf-8".) – Encoding for text files. Used only to extract the content of the text dataset contents.

  • materialize ((bool, optional). Defaults to False.) – Whether to materialize the content by loader.

  • format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

  • includes_metadata ((bool, optional). Defaults to False.) – Determines whether the export file includes metadata or not.

Returns

A RecordReader instance.

Return type

RecordReader

read() Generator[Tuple[str, Union[List, str]], Any, Any]

Reads the record.

Yields

Generator[Tuple[str, Union[List, str]], Any, Any] – File path, content and labels in a tuple.

ads.data_labeling.visualizer.image_visualizer module

The module that helps to visualize Image Dataset.

ads.data_labeling.visualizer.image_visualizer.render(items: List[LabeledImageItem], options: Dict = None)

Renders Labeled Image dataset.

Examples

>>> bbox1 = BoundingBoxItem(bottom_left=(0.3, 0.4),
>>>                        top_left=(0.3, 0.09),
>>>                        top_right=(0.86, 0.09),
>>>                        bottom_right=(0.86, 0.4),
>>>                        labels=['dolphin', 'fish'])
>>> record1 = LabeledImageItem(img_obj1, [bbox1])
>>> bbox2 = BoundingBoxItem(bottom_left=(0.2, 0.4),
>>>                        top_left=(0.2, 0.2),
>>>                        top_right=(0.8, 0.2),
>>>                        bottom_right=(0.8, 0.4),
>>>                        labels=['dolphin'])
>>> bbox3 = BoundingBoxItem(bottom_left=(0.5, 1.0),
>>>                        top_left=(0.5, 0.8),
>>>                        top_right=(0.8, 0.8),
>>>                        bottom_right=(0.8, 1.0),
>>>                        labels=['shark'])
>>> record2 = LabeledImageItem(img_obj2, [bbox2, bbox3])
>>> render(items = [record1, record2], options={"default_color":"blue", "colors": {"dolphin":"blue", "whale":"red"}})
class ads.data_labeling.visualizer.image_visualizer.ImageLabeledDataFormatter

Bases: object

The ImageRender class to render Image items in a notebook session.

static render_item(item: LabeledImageItem, options: Optional[Dict] = None, path: Optional[str] = None) None

Renders image dataset.

Parameters
  • item (LabeledImageItem) – Item to render.

  • options (Optional[dict]) – Render options.

  • path (str) – Path to save the image with annotations to local directory.

Returns

Nothing.

Return type

None

Raises
  • ValueError – If items not provided. If path is not valid.

  • TypeError – If items provided in a wrong format.

class ads.data_labeling.visualizer.image_visualizer.LabeledImageItem(img: ImageFile, boxes: List[BoundingBoxItem])

Bases: object

Data class representing Image Item.

img

the labeled image object.

Type

ImageFile

boxes

a list of BoundingBoxItem

Type

List[BoundingBoxItem]

boxes: List[BoundingBoxItem]
img: ImageFile
class ads.data_labeling.visualizer.image_visualizer.RenderOptions(default_color: str, colors: Optional[dict])

Bases: object

Data class representing render options.

default_color

The specified default color.

Type

str

colors

The multiple specified colors.

Type

Optional[dict]

colors: Optional[dict]
default_color: str
classmethod from_dict(options: dict) RenderOptions

Constructs an instance of RenderOptions from a dictionary.

Parameters

options (dict) – Render options in dictionary format.

Returns

The instance of RenderOptions.

Return type

RenderOptions

to_dict()

Converts RenderOptions instance to dictionary format.

Returns

The render options in dictionary format.

Return type

dict

exception ads.data_labeling.visualizer.image_visualizer.WrongEntityFormat

Bases: ValueError

ads.data_labeling.visualizer.image_visualizer.render(items: List[LabeledImageItem], options: Optional[Dict] = None, path: Optional[str] = None) None

Render image dataset.

Parameters
  • items (List[LabeledImageItem]) – The list of LabeledImageItem to render.

  • options (dict, optional) – The options for rendering.

  • path (str) – Path to save the images with annotations to local directory.

Returns

Nothing.

Return type

None

Raises
  • ValueError – If items not provided. If path is not valid.

  • TypeError – If items provided in a wrong format.

Examples

>>> bbox1 = BoundingBoxItem(bottom_left=(0.3, 0.4),
>>>                        top_left=(0.3, 0.09),
>>>                        top_right=(0.86, 0.09),
>>>                        bottom_right=(0.86, 0.4),
>>>                        labels=['dolphin', 'fish'])
>>> record1 = LabeledImageItem(img_obj1, [bbox1])
>>> render(items = [record1])

ads.data_labeling.visualizer.text_visualizer module

The module that helps to visualize NER Text Dataset.

ads.data_labeling.visualizer.text_visualizer.render(items: List[LabeledTextItem], options: Dict = None) str

Renders NER dataset to Html format.

Examples

>>> record1 = LabeledTextItem("London is the capital of the United Kingdom", [NERItem('city', 0, 6), NERItem("country", 29, 14)])
>>> record2 = LabeledTextItem("Houston area contractor seeking a Sheet Metal Superintendent.", [NERItem("city", 0, 6)])
>>> result = render(items = [record1, record2], options={"default_color":"#DDEECC", "colors": {"city":"#DDEECC", "country":"#FFAAAA"}})
>>> display(HTML(result))
class ads.data_labeling.visualizer.text_visualizer.LabeledTextItem(txt: str, ents: List[NERItem])

Bases: object

Data class representing NER Item.

txt

The labeled sentence.

Type

str

ents

The list of entities.

Type

List[NERItem]

ents: List[NERItem]
txt: str
class ads.data_labeling.visualizer.text_visualizer.RenderOptions(default_color: str, colors: Optional[dict])

Bases: object

Data class representing render options.

default_color

The specified default color.

Type

str

colors

The multiple specified colors.

Type

Optional[dict]

colors: Optional[dict]
default_color: str
classmethod from_dict(options: dict) RenderOptions

Constructs an instance of RenderOptions from a dictionary.

Parameters

options (dict) – Render options in dictionary format.

Returns

The instance of RenderOptions.

Return type

RenderOptions

to_dict()

Converts RenderOptions instance to dictionary format.

Returns

The render options in dictionary format.

Return type

dict

class ads.data_labeling.visualizer.text_visualizer.TextLabeledDataFormatter

Bases: object

The TextLabeledDataFormatter class to render NER items into Html format.

static render(items: List[LabeledTextItem], options: Optional[Dict] = None) str

Renders NER dataset to Html format.

Parameters
  • items (List[LabeledTextItem]) – Items to render.

  • options (Optional[dict]) – Render options.

Returns

Html representation of rendered NER dataset.

Return type

str

Raises
  • ValueError – If items not provided.

  • TypeError – If items provided in a wrong format.

ads.data_labeling.visualizer.text_visualizer.render(items: List[LabeledTextItem], options: Optional[Dict] = None) str

Renders NER dataset to Html format.

Parameters
  • items (List[LabeledTextItem]) – The list of NER items to render.

  • options (dict, optional) – The options for rendering.

Returns

Html string.

Return type

str

Examples

>>> record = LabeledTextItem("London is the capital of the United Kingdom", [NERItem('city', 0, 6), NERItem("country", 29, 14)])
>>> result = render(items = [record], options={"default_color":"#DDEECC", "colors": {"city":"#DDEECC", "country":"#FFAAAA"}})
>>> display(HTML(result))

Module contents