ads.data_labeling package

Submodules

ads.data_labeling.interface.loader module

class ads.data_labeling.interface.loader.Loader

Bases: ABC

Data Loader Interface.

abstract load(**kwargs) → Any

ads.data_labeling.interface.parser module

class ads.data_labeling.interface.parser.Parser

Bases: ABC

Data Parser Interface.

abstract parse() → Any

ads.data_labeling.interface.reader module

class ads.data_labeling.interface.reader.Reader

Bases: ABC

Data Reader Interface.

info() → Serializable

abstract read() → Any

ads.data_labeling.boundingbox module

class ads.data_labeling.boundingbox.BoundingBoxItem(top_left: ~typing.Tuple[float, float], bottom_left: ~typing.Tuple[float, float], bottom_right: ~typing.Tuple[float, float], top_right: ~typing.Tuple[float, float], labels: ~typing.List[str] = <factory>)

Bases: object

BoundingBoxItem class representing bounding box label.

labels

List of labels for this bounding box.

Type: List[str]

top_left

Top left corner of this bounding box.

Type: Tuple[float, float]

bottom_left

Bottom left corner of this bounding box.

Type: Tuple[float, float]

bottom_right

Bottom right corner of this bounding box.

Type: Tuple[float, float]

top_right

Top right corner of this bounding box.

Type: Tuple[float, float]

Examples

>>> item = BoundingBoxItem(
...     labels = ['cat','dog']
...     bottom_left=(0.2, 0.4),
...     top_left=(0.2, 0.2),
...     top_right=(0.8, 0.2),
...     bottom_right=(0.8, 0.4))
>>> item.to_yolo(categories = ['cat','dog', 'horse'])

bottom_left: Tuple[float, float]

bottom_right: Tuple[float, float]

classmethod from_yolo(bbox: List[Tuple], categories: Optional[List[str]] = None) → BoundingBoxItem

Converts the YOLO formated annotations to BoundingBoxItem.

Parameters

bboxes (List[Tuple]) – The list of bounding box annotations in YOLO format. Example: [(0, 0.511560675, 0.50234826, 0.47013485, 0.57803468)]
categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

The BoundingBoxItem.

Return type

BoundingBoxItem

Raises

TypeError – When categories list has a wrong format.

labels: List[str]

to_yolo(categories: List[str]) → List[Tuple[int, float, float, float, float]]

Converts BoundingBoxItem to the YOLO format.

Parameters

categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

The list of YOLO formatted bounding boxes.

Return type

List[Tuple[int, float, float, float, float]]

Raises

ValueError – When categories list not provided. When categories list not matched with the labels.
TypeError – When categories list has a wrong format.

top_left: Tuple[float, float]

top_right: Tuple[float, float]

class ads.data_labeling.boundingbox.BoundingBoxItems(items: ~typing.List[~ads.data_labeling.boundingbox.BoundingBoxItem] = <factory>)

Bases: object

BoundingBoxItems class which consists of a list of BoundingBoxItem.

items

List of BoundingBoxItem.

Type: List[BoundingBoxItem]

Examples

>>> item = BoundingBoxItem(
...     labels = ['cat','dog']
...     bottom_left=(0.2, 0.4),
...     top_left=(0.2, 0.2),
...     top_right=(0.8, 0.2),
...     bottom_right=(0.8, 0.4))
>>> items = BoundingBoxItems(items = [item])
>>> items.to_yolo(categories = ['cat','dog', 'horse'])

items: List[BoundingBoxItem]

to_yolo(categories: List[str]) → List[Tuple[int, float, float, float, float]]

Converts BoundingBoxItems to the YOLO format.

Parameters

categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

The list of YOLO formatted bounding boxes.

Return type

List[Tuple[int, float, float, float, float]]

Raises

ValueError – When categories list not provided. When categories list not matched with the labels.
TypeError – When categories list has a wrong format.

ads.data_labeling.constants module

class ads.data_labeling.constants.AnnotationType

Bases: object

AnnotationType class which contains all the annotation types that data labeling service supports.

BOUNDING_BOX = 'BOUNDING_BOX'

ENTITY_EXTRACTION = 'ENTITY_EXTRACTION'

MULTI_LABEL = 'MULTI_LABEL'

SINGLE_LABEL = 'SINGLE_LABEL'

class ads.data_labeling.constants.DatasetType

Bases: object

DatasetType class which contains all the dataset types that data labeling service supports.

DOCUMENT = 'DOCUMENT'

IMAGE = 'IMAGE'

TEXT = 'TEXT'

class ads.data_labeling.constants.Formats

Bases: object

Common formats class which contains all the common formats that are supported to convert to.

SPACY = 'spacy'

YOLO = 'yolo'

ads.data_labeling.data_labeling_service module

class ads.data_labeling.data_labeling_service.DataLabeling(compartment_id: Optional[str] = None, dls_cp_client_auth: Optional[dict] = None, dls_dp_client_auth: Optional[dict] = None)

Bases: OCIWorkRequestMixin

Class for data labeling service. Integrate the data labeling service APIs.

Examples

>>> import ads
>>> import pandas
>>> from ads.data_labeling.data_labeling_service import DataLabeling
>>> ads.set_auth("api_key")
>>> dls = DataLabeling()
>>> dls.list_dataset()
>>> metadata_path = dls.export(dataset_id="your dataset id",
...     path="oci://<bucket_name>@<namespace>/folder")
>>> df = pd.DataFrame.ads.read_labeled_data(metadata_path)

Initialize a DataLabeling class.

Parameters

compartment_id (str, optional) – OCID of data labeling datasets’ compartment
dls_cp_client_auth (dict, optional) – Data Labeling control plane client auth. Default is None. The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
dls_dp_client_auth (dict, optional) – Data Labeling data plane client auth. Default is None. The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns

Nothing.

Return type

None

export(dataset_id: str, path: str, include_unlabeled=False) → str

Export dataset based on the dataset_id and save the jsonl files under the path (metadata jsonl file and the records jsonl file) to the object storage path provided by the user and return the metadata jsonl path.

Parameters

dataset_id (str) – The dataset id of which the snapshot will be generated.
path (str) – The object storage path to store the generated snapshot. “oci://<bucket_name>@<namespace>/prefix”
include_unlabeled (bool, Optional. Defaults to False.) – Whether to include unlabeled records or not.

Returns

oci path of the metadata jsonl file.

Return type

str

list_dataset(**kwargs) → DataFrame

List all the datasets created from the data labeling service under a given compartment.

Parameters: kwargs (dict, optional) – Additional keyword arguments will be passed to oci.data_labeling_serviceDataLabelingManagementClient.list_datasets method.
Returns: pandas dataframe which contains the dataset information.
Return type: pandas.DataFrame
Raises: Exception – If pagination.list_call_get_all_results() fails

ads.data_labeling.metadata module

class ads.data_labeling.metadata.Metadata(source_path: str = '', records_path: str = '', labels: ~typing.List[str] = <factory>, dataset_name: str = '', compartment_id: str = '', dataset_id: str = '', annotation_type: str = '', dataset_type: str = '')

Bases: DataClassSerializable

The class that representing the labeled dataset metadata.

source_path

Contains information on where all the source data(image/text/document) stores.

Type: str

records_path

Contains information on where records jsonl file stores.

Type: str

labels

List of classes/labels for the dataset.

Type: List

dataset_name

Dataset display name on the Data Labeling Service console.

Type: str

compartment_id

Compartment id of the labeled dataset.

Type: str

dataset_id

Dataset id.

Type: str

annotation_type

Type of the labeling/annotation task. Currently supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION, BOUNDING_BOX.

Type: str

dataset_type

Type of the dataset. Currently supports Text, Image, DOCUMENT.

Type: str

annotation_type: str = ''

compartment_id: str = ''

dataset_id: str = ''

dataset_name: str = ''

dataset_type: str = ''

classmethod from_dls_dataset(dataset: Dataset) → Metadata

Contructs a Metadata instance from OCI DLS dataset.

Parameters: dataset (OCIDLSDataset) – OCIDLSDataset object.
Returns: The ads labeled dataset metadata instance.
Return type: Metadata

labels: List[str]

records_path: str = ''

source_path: str = ''

to_dataframe() → DataFrame

Converts the metadata to dataframe format.

Returns: The metadata in Pandas dataframe format.
Return type: pandas.DataFrame

to_dict() → Dict

Converts to dictionary representation.

Returns: The metadata in dictionary type.
Return type: Dict

ads.data_labeling.ner module

class ads.data_labeling.ner.NERItem(label: str = '', offset: int = 0, length: int = 0)

Bases: object

NERItem class which is a representation of a token span.

label

Entity name.

Type: str

offset

The token span’s entity start index position in the text.

Type: int

length

Length of the token span.

Type: int

classmethod from_spacy(token) → NERItem

label: str = ''

length: int = 0

offset: int = 0

to_spacy() → tuple

Converts one NERItem to the spacy format.

Returns: NERItem in the spacy format
Return type: Tuple

class ads.data_labeling.ner.NERItems(items: ~typing.List[~ads.data_labeling.ner.NERItem] = <factory>)

Bases: object

NERItems class consists of a list of NERItem.

items

List of NERItem.

Type: List[NERItem]

items: List[NERItem]

to_spacy() → List[tuple]

Converts NERItems to the spacy format.

Returns: List of NERItems in the Spacy format.
Return type: List[tuple]

exception ads.data_labeling.ner.WrongEntityFormatLabelIsEmpty: Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatLabelNotString: Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatLengthIsNegative: Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatLengthNotInteger: Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatOffsetIsNegative: Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatOffsetNotInteger: Bases: ValueError

ads.data_labeling.record module

class ads.data_labeling.record.Record(path: str = '', content: Optional[Any] = None, annotation: Optional[Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]] = None)

Bases: object

Class representing Record.

path

File path.

Type: str

content

Content of the record.

Type: Any

annotation

Annotation/label of the record.

Type: Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]

annotation: Union[Tuple, str, List[BoundingBoxItem], List[NERItem]] = None

content: Any = None

path: str = ''

to_dict() → Dict

Convert the Record instance to a dictionary.

Returns: Dictionary representation of the Record instance.
Return type: Dict

to_tuple() → Tuple[str, Any, Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]]

Convert the Record instance to a tuple.

Returns: Tuple representation of the Record instance.
Return type: Tuple

ads.data_labeling.mixin.data_labeling module

class ads.data_labeling.mixin.data_labeling.DataLabelingAccessMixin

Bases: object

Mixin class for labeled text data.

static read_labeled_data(path: Optional[str] = None, dataset_id: Optional[str] = None, compartment_id: Optional[str] = None, auth: Optional[Dict] = None, materialize: bool = False, encoding: str = 'utf-8', include_unlabeled: bool = False, format: Optional[str] = None, chunksize: Optional[int] = None)

Loads the dataset generated by data labeling service from either the export file or the Data Labeling Service.

Parameters

path ((str, optional). Defaults to None) – The export file path, can be either local or object storage path.
dataset_id ((str, optional). Defaults to None) – The dataset OCID.
compartment_id (str. Defaults to the compartment_id from the env variable.) – The compartment OCID of the dataset.
auth ((dict, optional). Defaults to None) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
materialize ((bool, optional). Defaults to False) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.
encoding ((str, optional). Defaults to 'utf-8') – Encoding of files. Only used for “TEXT” dataset.
include_unlabeled ((bool, optional). Default to False) – Whether to load the unlabeled records or not.
format ((str, optional). Defaults to None) –
Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo for Object Detection type.
- When None, it outputs List[NERItem] or List[BoundingBoxItem],
- When “spacy”, it outputs List[Tuple],
- When “yolo”, it outputs List[List[Tuple]].
chunksize ((int, optional). Defaults to None) – The amount of records that should be read in one iteration. The result will be returned in a generator format.

Returns

pd.Dataframe if chunksize is not specified. Generator[pd.Dataframe] if chunksize is specified.

Return type

Union[Generator[pd.DataFrame, Any, Any], pd.DataFrame]

Examples

>>> import pandas as pd
>>> import ads
>>> from ads.common import auth as authutil
>>> df = pd.DataFrame.ads.read_labeled_data(path="path_to_your_metadata.jsonl",
...                                         auth=authutil.api_keys(),
...                                         materialize=False)
                            Path       Content               Annotations
    --------------------------------------------------------------------
    0   path/to/the/content/file                                     yes
    1   path/to/the/content/file                                      no

>>> df = pd.DataFrame.ads.read_labeled_data_from_dls(dataset_id="your_dataset_ocid",
...                                                  compartment_id="your_compartment_id",
...                                                  auth=authutil.api_keys(),
...                                                  materialize=False)
                            Path       Content               Annotations
    --------------------------------------------------------------------
    0   path/to/the/content/file                                     yes
    1   path/to/the/content/file                                      no

render_bounding_box(options: Optional[Dict] = None, content_column: str = 'Content', annotations_column: str = 'Annotations', categories: Optional[List[str]] = None, limit: int = 50, path: Optional[str] = None) → None

Renders bounding box dataset. Displays only first 50 rows.

Parameters

options (dict) – The colors options specified for rendering.
content_column (Optional[str]) – The column name with the content data.
annotations_column (Optional[str]) – The column name for the annotations list.
categories (Optional List[str]) – The list of object categories in proper order for model training. Only used when bounding box annotations are in YOLO format. Example: [‘cat’,’dog’,’horse’]
limit (Optional[int]. Defaults to 50) – The maximum amount of records to display.
path (Optional[str]) – Path to save the image with annotations to local directory.

Returns

Nothing

Return type

None

Examples

>>> import pandas as pd
>>> import ads
>>> from ads.common import auth as authutil
>>> df = pd.DataFrame.ads.read_labeled_data(path="path_to_your_metadata.jsonl",
...                                         auth=authutil.api_keys(),
...                                         materialize=True)
>>> df.ads.render_bounding_box(content_column="Content", annotations_column="Annotations")

render_ner(options: Dict = None, content_column: str = 'Content', annotations_column: str = 'Annotations', limit: int = 50) → None

Renders NER dataset. Displays only first 50 rows.

Parameters

options (dict) – The colors options specified for rendering.
content_column (Optional[str]) – The column name with the content data.
annotations_column (Optional[str]) – The column name for the annotations list.
limit (Optional[int]. Defaults to 50) – The maximum amount of records to display.

Returns

Nothing

Return type

None

Examples

>>> import pandas as pd
>>> import ads
>>> from ads.common import auth as authutil
>>> df = pd.DataFrame.ads.read_labeled_data(path="path_to_your_metadata.jsonl",
...                                         auth=authutil.api_keys(),
...                                         materialize=True)
>>> df.ads.render_ner(content_column="Content", annotations_column="Annotations")

ads.data_labeling.parser.export_metadata_parser module

class ads.data_labeling.parser.export_metadata_parser.MetadataParser

Bases: Parser

MetadataParser class which parses the metadata from the record.

EXPECTED_KEYS = ['id', 'compartmentId', 'displayName', 'labelsSet', 'annotationFormat', 'datasetSourceDetails', 'datasetFormatDetails']

static parse(json_data: Dict[Any, Any]) → Metadata

Parses the metadata jsonl file.

Parameters: json_data (dict) – dictionary format of the metadata jsonl file content.
Returns: Metadata object which contains the useful fields from the metadata jsonl file
Return type: Metadata

ads.data_labeling.parser.export_record_parser module

class ads.data_labeling.parser.export_record_parser.BoundingBoxRecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: RecordParser

BoundingBoxRecordParser class which parses the label of BoundingBox label data.

Initiates a RecordParser instance.

Parameters

dataset_source_path (str) – Dataset source path.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

class ads.data_labeling.parser.export_record_parser.EntityType

Bases: object

Entity type class for supporting multiple types of entities.

GENERIC = 'GENERIC'

IMAGEOBJECTSELECTION = 'IMAGEOBJECTSELECTION'

TEXTSELECTION = 'TEXTSELECTION'

class ads.data_labeling.parser.export_record_parser.MultiLabelRecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: RecordParser

MultiLabelRecordParser class which parses the label of Multiple label data.

Initiates a RecordParser instance.

Parameters

dataset_source_path (str) – Dataset source path.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

class ads.data_labeling.parser.export_record_parser.NERRecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: RecordParser

NERRecordParser class which parses the label of NER label data.

Initiates a RecordParser instance.

Parameters

dataset_source_path (str) – Dataset source path.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

class ads.data_labeling.parser.export_record_parser.RecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: Parser

RecordParser class which parses the labels from the record.

Examples

>>> from ads.data_labeling.parser.export_record_parser import SingleLabelRecordParser
>>> from ads.data_labeling.parser.export_record_parser import MultiLabelRecordParser
>>> from ads.data_labeling.parser.export_record_parser import NERRecordParser
>>> from ads.data_labeling.parser.export_record_parser import BoundingBoxRecordParser
>>> import fsspec
>>> import json
>>> from ads.common import auth as authutil
>>> labels = []
>>> with fsspec.open("/path/to/records_file.jsonl", **authutil.api_keys()) as f:
>>>     for line in f:
>>>         bounding_box_labels = BoundingBoxRecordParser("source_data_path").parse(json.loads(line))
>>>         labels.append(bounding_box_labels)

Initiates a RecordParser instance.

Parameters

dataset_source_path (str) – Dataset source path.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

parse(record: Dict) → Record

Extracts the annotations from the record content. Constructs and returns a Record instance containing the file path and the labels.

Parameters: record (Dict) – Content of the record from the record file.
Returns: Record instance which contains the file path as well as the annotations.
Return type: Record

class ads.data_labeling.parser.export_record_parser.RecordParserFactory

Bases: object

RecordParserFactory class which contains a list of registered parsers and allows to register new RecordParsers.

Current parsers include:

SingleLabelRecordParser
MultiLabelRecordParser
NERRecordParser
BoundingBoxRecordParser

static parser(annotation_type: str, dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None) → RecordParser

Gets the parser based on the annotation_type.

Parameters

annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.
dataset_source_path (str) – Dataset source path.
format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser corresponding to the annotation type.

Return type

RecordParser

Raises

ValueError – If annotation_type is not supported.

classmethod register(annotation_type: str, parser) → None

Registers a new parser.

Parameters

annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.
parser (RecordParser) – A new Parser class to be registered.

Returns

Nothing.

Return type

None

class ads.data_labeling.parser.export_record_parser.SingleLabelRecordParser(dataset_source_path: str, format: Optional[str] = None, categories: Optional[List[str]] = None)

Bases: RecordParser

SingleLabelRecordParser class which parses the label of Single label data.

Initiates a RecordParser instance.

Parameters

dataset_source_path (str) – Dataset source path.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

RecordParser instance.

Return type

RecordParser

ads.data_labeling.reader.dataset_reader module

The module containing classes to read labeled datasets. Allows to read labeled datasets from exports or from the cloud.

Classes

LabeledDatasetReader
The LabeledDatasetReader class to read labeled dataset.

ExportReader
The ExportReader class to read labeled dataset from the export.

DLSDatasetReader
The DLSDatasetReader class to read labeled dataset from the cloud.

Examples

>>> from ads.common import auth as authutil
>>> from ads.data_labeling import LabeledDatasetReader
>>> ds_reader = LabeledDatasetReader.from_export(
...    path="oci://bucket_name@namespace/dataset_metadata.jsonl",
...    auth=authutil.api_keys(),
...    materialize=True
... )
>>> ds_reader.info()
    ------------------------------------------------------------------------
    annotation_type                                             SINGLE_LABEL
    compartment_id                                          TEST_COMPARTMENT
    dataset_id                                                  TEST_DATASET
    dataset_name                                           test_dataset_name
    dataset_type                                                        TEXT
    labels                                                     ['yes', 'no']
    records_path                                             path/to/records
    source_path                                              path/to/dataset

>>> ds_reader.read()
                             Path            Content            Annotations
    -----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no
    2   path/to/the/content/file3       file content                     no

>>> next(ds_reader.read(iterator=True))
    ("path/to/the/content/file1", "file content", "yes")

>>> next(ds_reader.read(iterator=True, chunksize=2))
    [("path/to/the/content/file1", "file content", "yes"),
    ("path/to/the/content/file2", "file content", "no")]

>>> next(ds_reader.read(chunksize=2))
                            Path            Content            Annotations
    ----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no

>>> ds_reader = LabeledDatasetReader.from_DLS(
...    dataset_id="dataset_OCID",
...    compartment_id="compartment_OCID",
...    auth=authutil.api_keys(),
...    materialize=True
... )

class ads.data_labeling.reader.dataset_reader.DLSDatasetReader(dataset_id: str, compartment_id: str, auth: Dict, encoding='utf-8', materialize: bool = False, include_unlabeled: bool = False)

Bases: Reader

The DLSDatasetReader class to read labeled dataset from the cloud.

info(self) → Metadata: Gets the labeled dataset metadata.

read(self) → Generator[Tuple, Any, Any]: Reads the labeled dataset.

Initializes the DLS dataset reader instance.

Parameters

dataset_id (str) – The dataset OCID.
compartment_id (str) – The compartment OCID of the dataset.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files. The encoding is used to extract the metadata information of the labeled dataset and also to extract the content of the text dataset records.
materialize ((bool, optional). Defaults to False.) – Whether the content of dataset files should be loaded/materialized or not. By default the content will not be materialized.
include_unlabeled ((bool, optional). Defaults to False.) – Whether to load the unlabeled records or not.

Raises

ValueError – When dataset_id is empty or not a string.:
TypeError – When dataset_id not a string.:

info() → Metadata

Gets the labeled dataset metadata.

Returns: The labeled dataset metadata.
Return type: Metadata

read(format: Optional[str] = None) → Generator[Tuple, Any, Any]

Reads the labeled dataset records.

Parameters: format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].
Returns: The labeled dataset records.
Return type: Generator[Tuple, Any, Any]

class ads.data_labeling.reader.dataset_reader.ExportReader(path: str, auth: Optional[Dict] = None, encoding='utf-8', materialize: bool = False, include_unlabeled: bool = False)

Bases: Reader

The ExportReader class to read labeled dataset from the export.

info(self) → Metadata: Gets the labeled dataset metadata.

read(self) → Generator[Tuple, Any, Any]: Reads the labeled dataset.

Initializes the labeled dataset export reader instance.

Parameters

path (str) – The metadata file path, can be either local or object storage path.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files. The encoding is used to extract the metadata information of the labeled dataset and also to extract the content of the text dataset records.
materialize ((bool, optional). Defaults to False.) – Whether the content of dataset files should be loaded/materialized or not. By default the content will not be materialized.
include_unlabeled ((bool, optional). Defaults to False.) – Whether to load the unlabeled records or not.

Raises

ValueError – When path is empty or not a string.:
TypeError – When path not a string.:

info() → Metadata

Gets the labeled dataset metadata.

Returns: The labeled dataset metadata.
Return type: Metadata

read(format: Optional[str] = None) → Generator[Tuple, Any, Any]

Reads the labeled dataset records.

Parameters: format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].
Returns: The labeled dataset records.
Return type: Generator[Tuple, Any, Any]

class ads.data_labeling.reader.dataset_reader.LabeledDatasetReader(reader: Reader)

Bases: object

The labeled dataset reader class.

info(self) → Metadata: Gets labeled dataset metadata.

read(self, iterator: bool = False) → Union[Generator[Any, Any, Any], pd.DataFrame]: Reads labeled dataset.

from_export(cls, path: str, auth: Dict = None, encoding='utf-8', materialize: bool = False) → 'LabeledDatasetReader': Constructs a Labeled Dataset Reader instance.

Examples

>>> from ads.common import auth as authutil
>>> from ads.data_labeling import LabeledDatasetReader

>>> ds_reader = LabeledDatasetReader.from_export(
...    path="oci://bucket_name@namespace/dataset_metadata.jsonl",
...    auth=authutil.api_keys(),
...    materialize=True
... )

>>> ds_reader = LabeledDatasetReader.from_DLS(
...    dataset_id="dataset_OCID",
...    compartment_id="compartment_OCID",
...    auth=authutil.api_keys(),
...    materialize=True
... )

>>> ds_reader.info()
    ------------------------------------------------------------------------
    annotation_type                                             SINGLE_LABEL
    compartment_id                                          TEST_COMPARTMENT
    dataset_id                                                  TEST_DATASET
    dataset_name                                           test_dataset_name
    dataset_type                                                        TEXT
    labels                                                     ['yes', 'no']
    records_path                                             path/to/records
    source_path                                              path/to/dataset

>>> ds_reader.read()
                             Path            Content            Annotations
    -----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no
    2   path/to/the/content/file3       file content                     no

>>> next(ds_reader.read(iterator=True))
    ("path/to/the/content/file1", "file content", "yes")

>>> next(ds_reader.read(iterator=True, chunksize=2))
    [("path/to/the/content/file1", "file content", "yes"),
    ("path/to/the/content/file2", "file content", "no")]

>>> next(ds_reader.read(chunksize=2))
                            Path            Content            Annotations
    ----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no

Initializes the labeled dataset reader instance.

Parameters: reader (Reader) – The Reader instance which reads and extracts the labeled dataset.

classmethod from_DLS(dataset_id: str, compartment_id: Optional[str] = None, auth: Optional[dict] = None, encoding: str = 'utf-8', materialize: bool = False, include_unlabeled: bool = False) → LabeledDatasetReader

Constructs Labeled Dataset Reader instance.

Parameters

dataset_id (str) – The dataset OCID.
compartment_id (str. Defaults to the compartment_id from the env variable.) – The compartment OCID of the dataset.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files.
materialize ((bool, optional). Defaults to False.) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.

Returns

The LabeledDatasetReader instance.

Return type

LabeledDatasetReader

classmethod from_export(path: str, auth: Optional[dict] = None, encoding: str = 'utf-8', materialize: bool = False, include_unlabeled: bool = False) → LabeledDatasetReader

Constructs Labeled Dataset Reader instance.

Parameters

path (str) – The metadata file path, can be either local or object storage path.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files.
materialize ((bool, optional). Defaults to False.) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.

Returns

The LabeledDatasetReader instance.

Return type

LabeledDatasetReader

info() → Serializable

Gets the labeled dataset metadata.

Returns: The labeled dataset metadata.
Return type: Metadata

read(iterator: bool = False, format: Optional[str] = None, chunksize: Optional[int] = None) → Union[Generator[Any, Any, Any], DataFrame]

Reads the labeled dataset records.

Parameters

iterator ((bool, optional). Defaults to False.) – True if the result should be represented as a Generator. Fasle if the result should be represented as a Pandas DataFrame.
format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” or “yolo”.
chunksize ((int, optional). Defaults to None.) – The number of records that should be read in one iteration. The result will be returned in a generator format.

Returns

Union[ – Generator[Tuple[str, str, Any], Any, Any], Generator[List[Tuple[str, str, Any]], Any, Any], Generator[pd.DataFrame, Any, Any], pd.DataFrame
] – pd.Dataframe if iterator and chunksize are not specified. Generator[pd.Dataframe] ` if `iterator equal to False and chunksize is specified. Generator[List[Tuple[str, str, Any]]] if iterator equal to True and chunksize is specified. Generator[Tuple[str, str, Any]] if iterator equal to True and chunksize is not specified.

ads.data_labeling.reader.jsonl_reader module

class ads.data_labeling.reader.jsonl_reader.JsonlReader(path: str, auth: Optional[Dict] = None, encoding='utf-8')

Bases: Reader

JsonlReader class which reads the file.

Initiates a JsonlReader object.

Parameters

path (str) – object storage path or local path for a file.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
encoding ((str, optional). Defaults to 'utf-8'.) – Encoding of files. Only used for “TEXT” dataset.

Examples

>>> from ads.data_labeling.reader.jsonl_reader import JsonlReader
>>> path = "your/path/to/jsonl/file.jsonl"
>>> from ads.common import auth as authutil
>>> reader = JsonlReader(path=path, auth=authutil.api_keys(), encoding="utf-8")
>>> next(reader.read())

read(skip: Optional[int] = None) → Generator[Dict, Any, Any]

Reads and yields the content of the file.

Parameters

skip ((int, optional). Defaults to None.) – The number of records that should be skipped.

Returns

The content of the file.

Return type

Generator[Dict, Any, Any]

Raises

ValueError – If skip not empty and not a positive integer.
FileNotFoundError – When file not found.

ads.data_labeling.reader.metadata_reader module

class ads.data_labeling.reader.metadata_reader.DLSMetadataReader(dataset_id: str, compartment_id: str, auth: dict)

Bases: Reader

DLSMetadataReader class which reads the metadata jsonl file from the cloud.

Initializes the DLS metadata reader instance.

Parameters

dataset_id (str) – The dataset OCID.
compartment_id (str) – The compartment OCID of the dataset.
auth (dict) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Raises

ValueError – When dataset_id is empty or not a string.:
TypeError – When dataset_id not a string.:

read() → Metadata

Reads the content from the metadata file.

Returns

The metadata of the labeled dataset.

Return type

Metadata

Raises

DatasetNotFoundError – If dataset not found.
ReadDatasetError – If any error occured in attempt to read dataset.

exception ads.data_labeling.reader.metadata_reader.DatasetNotFoundError(id: str): Bases: Exception

exception ads.data_labeling.reader.metadata_reader.EmptyMetadata

Bases: Exception

Empty Metadata.

class ads.data_labeling.reader.metadata_reader.ExportMetadataReader(path: str, auth: Optional[Dict] = None, encoding='utf-8')

Bases: JsonlReader

ExportMetadataReader class which reads the metadata jsonl file from local/object storage path.

Initiates a JsonlReader object.

Parameters

path (str) – object storage path or local path for a file.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
encoding ((str, optional). Defaults to 'utf-8'.) – Encoding of files. Only used for “TEXT” dataset.

Examples

>>> from ads.data_labeling.reader.jsonl_reader import JsonlReader
>>> path = "your/path/to/jsonl/file.jsonl"
>>> from ads.common import auth as authutil
>>> reader = JsonlReader(path=path, auth=authutil.api_keys(), encoding="utf-8")
>>> next(reader.read())

read() → Metadata

Reads the content from the metadata file.

Returns: The metadata of the labeled dataset.
Return type: Metadata

class ads.data_labeling.reader.metadata_reader.MetadataReader(reader: Reader)

Bases: object

MetadataReader class which reads and extracts the labeled dataset metadata.

Examples

>>> from ads.data_labeling import MetadataReader
>>> import oci
>>> import os
>>> from ads.common import auth as authutil
>>> reader = MetadataReader.from_export_file("metadata_export_file_path",
...                                 auth=authutil.api_keys())
>>> reader.read()

Initiate a MetadataReader instance.

Parameters: reader (Reader) – Reader instance which reads and extracts the labeled dataset metadata.

classmethod from_DLS(dataset_id: str, compartment_id: Optional[str] = None, auth: Optional[dict] = None) → MetadataReader

Contructs a MetadataReader instance.

Parameters

dataset_id (str) – The dataset OCID.
compartment_id ((str, optional). Default None) – The compartment OCID of the dataset.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns

The MetadataReader instance whose reader is a DLSMetadataReader instance.

Return type

MetadataReader

classmethod from_export_file(path: str, auth: Optional[Dict] = None) → MetadataReader

Contructs a MetadataReader instance.

Parameters

path (str) – metadata file path, can be either local or object storage path.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns

The MetadataReader instance whose reader is a ExportMetadataReader instance.

Return type

MetadataReader

read() → Metadata

Reads the content from the metadata file.

Returns: The metadata of the labeled dataset.
Return type: Metadata

exception ads.data_labeling.reader.metadata_reader.ReadDatasetError(id: str): Bases: Exception

ads.data_labeling.reader.record_reader module

class ads.data_labeling.reader.record_reader.RecordReader(reader: Reader, parser: Parser, loader: Optional[Loader] = None, include_unlabeled: bool = False, encoding: str = 'utf-8', materialize: bool = False)

Bases: object

Record Reader Class consists of parser, reader and loader. Reader reads the the content from the record file. Parser parses the label for each record. And Loader loads the content of the file path in that record.

Examples

>>> import os
>>> import oci
>>> from ads.data_labeling import RecordReader
>>> from ads.common import auth as authutil
>>> file_path = "/path/to/your_record.jsonl"
>>> dataset_type = "IMAGE"
>>> annotation_type = "BOUNDING_BOX"
>>> record_reader = RecordReader.from_export_file(file_path, dataset_type, annotation_type, "image_file_path", authutil.api_keys())
>>> next(record_reader.read())

Initiates a RecordReader instance.

Parameters

reader (Reader) – Reader instance to read content from the record file.
parser (Parser) – Parser instance to parse the labels from record file.
loader (Loader. Defaults to None.) – Loader instance to load the content from the file path in the record.
materialize (bool, optional. Defaults to False.) – Whether to materialize the content using loader.
include_unlabeled ((bool, optional). Default to False.) – Whether to load the unlabeled records or not.
encoding (str, optional) – Encoding for text files. Used only to extract the content of the text dataset contents.

Raises

ValueError – If the record reader and record parser must be specified. If the loader is not specified when materialize if True.

classmethod from_DLS(dataset_id: str, dataset_type: str, annotation_type: str, dataset_source_path: str, compartment_id: Optional[str] = None, auth: Optional[Dict] = None, include_unlabeled: bool = False, encoding: str = 'utf-8', materialize: bool = False, format: Optional[str] = None, categories: Optional[List[str]] = None) → RecordReader

Constructs Record Reader instance.

Parameters

dataset_id (str) – The dataset OCID.
dataset_type (str) – Dataset type. Currently supports TEXT, IMAGE and DOCUMENT.
annotation_type (str) – Annotation Type. Currently TEXT supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION. IMAGE supports SINGLE_LABEL, MULTI_LABEL and BOUNDING_BOX. DOCUMENT supports SINGLE_LABEL and MULTI_LABEL.
dataset_source_path (str) – Dataset source path.
compartment_id ((str, optional). Defaults to None.) – The compartment OCID of the dataset.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files.
materialize ((bool, optional). Defaults to False.) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.
format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns

The RecordReader instance.

Return type

RecordReader

classmethod from_export_file(path: str, dataset_type: str, annotation_type: str, dataset_source_path: str, auth: Optional[Dict] = None, include_unlabeled: bool = False, encoding: str = 'utf-8', materialize: bool = False, format: Optional[str] = None, categories: Optional[List[str]] = None, includes_metadata=False) → RecordReader

Initiates a RecordReader instance.

Parameters

path (str) – Record file path.
dataset_type (str) – Dataset type. Currently supports TEXT, IMAGE and DOCUMENT.
annotation_type (str) – Annotation Type. Currently TEXT supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION. IMAGE supports SINGLE_LABEL, MULTI_LABEL and BOUNDING_BOX. DOCUMENT supports SINGLE_LABEL and MULTI_LABEL.
dataset_source_path (str) – Dataset source path.
auth ((dict, optional). Default None) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
include_unlabeled ((bool, optional). Default to False.) – Whether to load the unlabeled records or not.
encoding ((str, optional). Defaults to "utf-8".) – Encoding for text files. Used only to extract the content of the text dataset contents.
materialize ((bool, optional). Defaults to False.) – Whether to materialize the content by loader.
format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
includes_metadata ((bool, optional). Defaults to False.) – Determines whether the export file includes metadata or not.

Returns

A RecordReader instance.

Return type

RecordReader

read() → Generator[Tuple[str, Union[List, str]], Any, Any]

Reads the record.

Yields: Generator[Tuple[str, Union[List, str]], Any, Any] – File path, content and labels in a tuple.

ads.data_labeling.visualizer.image_visualizer module

The module that helps to visualize Image Dataset.

ads.data_labeling.visualizer.image_visualizer.render(items: List[LabeledImageItem], options: Dict = None): Renders Labeled Image dataset.

Examples

>>> bbox1 = BoundingBoxItem(bottom_left=(0.3, 0.4),
>>>                        top_left=(0.3, 0.09),
>>>                        top_right=(0.86, 0.09),
>>>                        bottom_right=(0.86, 0.4),
>>>                        labels=['dolphin', 'fish'])

>>> record1 = LabeledImageItem(img_obj1, [bbox1])

>>> bbox2 = BoundingBoxItem(bottom_left=(0.2, 0.4),
>>>                        top_left=(0.2, 0.2),
>>>                        top_right=(0.8, 0.2),
>>>                        bottom_right=(0.8, 0.4),
>>>                        labels=['dolphin'])
>>> bbox3 = BoundingBoxItem(bottom_left=(0.5, 1.0),
>>>                        top_left=(0.5, 0.8),
>>>                        top_right=(0.8, 0.8),
>>>                        bottom_right=(0.8, 1.0),
>>>                        labels=['shark'])

>>> record2 = LabeledImageItem(img_obj2, [bbox2, bbox3])
>>> render(items = [record1, record2], options={"default_color":"blue", "colors": {"dolphin":"blue", "whale":"red"}})

class ads.data_labeling.visualizer.image_visualizer.ImageLabeledDataFormatter

Bases: object

The ImageRender class to render Image items in a notebook session.

static render_item(item: LabeledImageItem, options: Optional[Dict] = None, path: Optional[str] = None) → None

Renders image dataset.

Parameters

item (LabeledImageItem) – Item to render.
options (Optional[dict]) – Render options.
path (str) – Path to save the image with annotations to local directory.

Returns

Nothing.

Return type

None

Raises

ValueError – If items not provided. If path is not valid.
TypeError – If items provided in a wrong format.

class ads.data_labeling.visualizer.image_visualizer.LabeledImageItem(img: ImageFile, boxes: List[BoundingBoxItem])

Bases: object

Data class representing Image Item.

img

the labeled image object.

Type: ImageFile

boxes

a list of BoundingBoxItem

Type: List[BoundingBoxItem]

boxes: List[BoundingBoxItem]

img: ImageFile

class ads.data_labeling.visualizer.image_visualizer.RenderOptions(default_color: str, colors: Optional[dict])

Bases: object

Data class representing render options.

default_color

The specified default color.

Type: str

colors

The multiple specified colors.

Type: Optional[dict]

colors: Optional[dict]

default_color: str

classmethod from_dict(options: dict) → RenderOptions

Constructs an instance of RenderOptions from a dictionary.

Parameters: options (dict) – Render options in dictionary format.
Returns: The instance of RenderOptions.
Return type: RenderOptions

to_dict()

Converts RenderOptions instance to dictionary format.

Returns: The render options in dictionary format.
Return type: dict

exception ads.data_labeling.visualizer.image_visualizer.WrongEntityFormat: Bases: ValueError

ads.data_labeling.visualizer.image_visualizer.render(items: List[LabeledImageItem], options: Optional[Dict] = None, path: Optional[str] = None) → None

Render image dataset.

Parameters

items (List[LabeledImageItem]) – The list of LabeledImageItem to render.
options (dict, optional) – The options for rendering.
path (str) – Path to save the images with annotations to local directory.

Returns

Nothing.

Return type

None

Raises

ValueError – If items not provided. If path is not valid.
TypeError – If items provided in a wrong format.

Examples

>>> bbox1 = BoundingBoxItem(bottom_left=(0.3, 0.4),
>>>                        top_left=(0.3, 0.09),
>>>                        top_right=(0.86, 0.09),
>>>                        bottom_right=(0.86, 0.4),
>>>                        labels=['dolphin', 'fish'])

>>> record1 = LabeledImageItem(img_obj1, [bbox1])
>>> render(items = [record1])

ads.data_labeling.visualizer.text_visualizer module

The module that helps to visualize NER Text Dataset.

ads.data_labeling.visualizer.text_visualizer.render(items: List[LabeledTextItem], options: Dict = None) → str: Renders NER dataset to Html format.

Examples

>>> record1 = LabeledTextItem("London is the capital of the United Kingdom", [NERItem('city', 0, 6), NERItem("country", 29, 14)])
>>> record2 = LabeledTextItem("Houston area contractor seeking a Sheet Metal Superintendent.", [NERItem("city", 0, 6)])
>>> result = render(items = [record1, record2], options={"default_color":"#DDEECC", "colors": {"city":"#DDEECC", "country":"#FFAAAA"}})
>>> display(HTML(result))

class ads.data_labeling.visualizer.text_visualizer.LabeledTextItem(txt: str, ents: List[NERItem])

Bases: object

Data class representing NER Item.

txt

The labeled sentence.

Type: str

ents

The list of entities.

Type: List[NERItem]

ents: List[NERItem]

txt: str

class ads.data_labeling.visualizer.text_visualizer.RenderOptions(default_color: str, colors: Optional[dict])

Bases: object

Data class representing render options.

default_color

The specified default color.

Type: str

colors

The multiple specified colors.

Type: Optional[dict]

colors: Optional[dict]

default_color: str

classmethod from_dict(options: dict) → RenderOptions

Constructs an instance of RenderOptions from a dictionary.

Parameters: options (dict) – Render options in dictionary format.
Returns: The instance of RenderOptions.
Return type: RenderOptions

to_dict()

Converts RenderOptions instance to dictionary format.

Returns: The render options in dictionary format.
Return type: dict

class ads.data_labeling.visualizer.text_visualizer.TextLabeledDataFormatter

Bases: object

The TextLabeledDataFormatter class to render NER items into Html format.

static render(items: List[LabeledTextItem], options: Optional[Dict] = None) → str

Renders NER dataset to Html format.

Parameters

items (List[LabeledTextItem]) – Items to render.
options (Optional[dict]) – Render options.

Returns

Html representation of rendered NER dataset.

Return type

str

Raises

ValueError – If items not provided.
TypeError – If items provided in a wrong format.

ads.data_labeling.visualizer.text_visualizer.render(items: List[LabeledTextItem], options: Optional[Dict] = None) → str

Renders NER dataset to Html format.

Parameters

items (List[LabeledTextItem]) – The list of NER items to render.
options (dict, optional) – The options for rendering.

Returns

Html string.

Return type

str

Examples

>>> record = LabeledTextItem("London is the capital of the United Kingdom", [NERItem('city', 0, 6), NERItem("country", 29, 14)])
>>> result = render(items = [record], options={"default_color":"#DDEECC", "colors": {"city":"#DDEECC", "country":"#FFAAAA"}})
>>> display(HTML(result))

ads.data_labeling package

Submodules

ads.data_labeling.interface.loader module

ads.data_labeling.interface.parser module

ads.data_labeling.interface.reader module

ads.data_labeling.boundingbox module

ads.data_labeling.constants module

ads.data_labeling.data_labeling_service module

ads.data_labeling.metadata module

ads.data_labeling.ner module

ads.data_labeling.record module

ads.data_labeling.mixin.data_labeling module

ads.data_labeling.parser.export_metadata_parser module

ads.data_labeling.parser.export_record_parser module

ads.data_labeling.reader.dataset_reader module

Classes

ads.data_labeling.reader.jsonl_reader module

ads.data_labeling.reader.metadata_reader module

ads.data_labeling.reader.record_reader module

ads.data_labeling.visualizer.image_visualizer module

ads.data_labeling.visualizer.text_visualizer module

Module contents