ads.data_labeling package

Subpackages

Submodules

ads.data_labeling.boundingbox module

class ads.data_labeling.boundingbox.BoundingBoxItem(top_left: ~typing.Tuple[float, float], bottom_left: ~typing.Tuple[float, float], bottom_right: ~typing.Tuple[float, float], top_right: ~typing.Tuple[float, float], labels: ~typing.List[str] = <factory>)[source]

Bases: object

BoundingBoxItem class representing bounding box label.

labels

List of labels for this bounding box.

Type:

List[str]

top_left

Top left corner of this bounding box.

Type:

Tuple[float, float]

bottom_left

Bottom left corner of this bounding box.

Type:

Tuple[float, float]

bottom_right

Bottom right corner of this bounding box.

Type:

Tuple[float, float]

top_right

Top right corner of this bounding box.

Type:

Tuple[float, float]

Examples

>>> item = BoundingBoxItem(
...     labels = ['cat','dog']
...     bottom_left=(0.2, 0.4),
...     top_left=(0.2, 0.2),
...     top_right=(0.8, 0.2),
...     bottom_right=(0.8, 0.4))
>>> item.to_yolo(categories = ['cat','dog', 'horse'])
bottom_left: Tuple[float, float]
bottom_right: Tuple[float, float]
classmethod from_yolo(bbox: List[Tuple], categories: List[str] | None = None) BoundingBoxItem[source]

Converts the YOLO formated annotations to BoundingBoxItem.

Parameters:
  • bboxes (List[Tuple]) – The list of bounding box annotations in YOLO format. Example: [(0, 0.511560675, 0.50234826, 0.47013485, 0.57803468)]

  • categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

The BoundingBoxItem.

Return type:

BoundingBoxItem

Raises:

TypeError – When categories list has a wrong format.

labels: List[str]
to_yolo(categories: List[str]) List[Tuple[int, float, float, float, float]][source]

Converts BoundingBoxItem to the YOLO format.

Parameters:

categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

The list of YOLO formatted bounding boxes.

Return type:

List[Tuple[int, float, float, float, float]]

Raises:
  • ValueError – When categories list not provided. When categories list not matched with the labels.

  • TypeError – When categories list has a wrong format.

top_left: Tuple[float, float]
top_right: Tuple[float, float]
class ads.data_labeling.boundingbox.BoundingBoxItems(items: ~typing.List[~ads.data_labeling.boundingbox.BoundingBoxItem] = <factory>)[source]

Bases: object

BoundingBoxItems class which consists of a list of BoundingBoxItem.

items

List of BoundingBoxItem.

Type:

List[BoundingBoxItem]

Examples

>>> item = BoundingBoxItem(
...     labels = ['cat','dog']
...     bottom_left=(0.2, 0.4),
...     top_left=(0.2, 0.2),
...     top_right=(0.8, 0.2),
...     bottom_right=(0.8, 0.4))
>>> items = BoundingBoxItems(items = [item])
>>> items.to_yolo(categories = ['cat','dog', 'horse'])
items: List[BoundingBoxItem]
to_yolo(categories: List[str]) List[Tuple[int, float, float, float, float]][source]

Converts BoundingBoxItems to the YOLO format.

Parameters:

categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

The list of YOLO formatted bounding boxes.

Return type:

List[Tuple[int, float, float, float, float]]

Raises:
  • ValueError – When categories list not provided. When categories list not matched with the labels.

  • TypeError – When categories list has a wrong format.

ads.data_labeling.constants module

class ads.data_labeling.constants.AnnotationType[source]

Bases: object

AnnotationType class which contains all the annotation types that data labeling service supports.

BOUNDING_BOX = 'BOUNDING_BOX'
ENTITY_EXTRACTION = 'ENTITY_EXTRACTION'
MULTI_LABEL = 'MULTI_LABEL'
SINGLE_LABEL = 'SINGLE_LABEL'
class ads.data_labeling.constants.DatasetType[source]

Bases: object

DatasetType class which contains all the dataset types that data labeling service supports.

DOCUMENT = 'DOCUMENT'
IMAGE = 'IMAGE'
TEXT = 'TEXT'
class ads.data_labeling.constants.Formats[source]

Bases: object

Common formats class which contains all the common formats that are supported to convert to.

SPACY = 'spacy'
YOLO = 'yolo'

ads.data_labeling.data_labeling_service module

class ads.data_labeling.data_labeling_service.DataLabeling(compartment_id: str | None = None, dls_cp_client_auth: dict | None = None, dls_dp_client_auth: dict | None = None)[source]

Bases: OCIWorkRequestMixin

Class for data labeling service. Integrate the data labeling service APIs.

Examples

>>> import ads
>>> import pandas
>>> from ads.data_labeling.data_labeling_service import DataLabeling
>>> ads.set_auth("api_key")
>>> dls = DataLabeling()
>>> dls.list_dataset()
>>> metadata_path = dls.export(dataset_id="your dataset id",
...     path="oci://<bucket_name>@<namespace>/folder")
>>> df = pd.DataFrame.ads.read_labeled_data(metadata_path)

Initialize a DataLabeling class.

Parameters:
  • compartment_id (str, optional) – OCID of data labeling datasets’ compartment

  • dls_cp_client_auth (dict, optional) – Data Labeling control plane client auth. Default is None. The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • dls_dp_client_auth (dict, optional) – Data Labeling data plane client auth. Default is None. The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns:

Nothing.

Return type:

None

export(dataset_id: str, path: str, include_unlabeled=False) str[source]

Export dataset based on the dataset_id and save the jsonl files under the path (metadata jsonl file and the records jsonl file) to the object storage path provided by the user and return the metadata jsonl path.

Parameters:
  • dataset_id (str) – The dataset id of which the snapshot will be generated.

  • path (str) – The object storage path to store the generated snapshot. “oci://<bucket_name>@<namespace>/prefix”

  • include_unlabeled (bool, Optional. Defaults to False.) – Whether to include unlabeled records or not.

Returns:

oci path of the metadata jsonl file.

Return type:

str

list_dataset(**kwargs) DataFrame[source]

List all the datasets created from the data labeling service under a given compartment.

Parameters:

kwargs (dict, optional) – Additional keyword arguments will be passed to oci.data_labeling_serviceDataLabelingManagementClient.list_datasets method.

Returns:

pandas dataframe which contains the dataset information.

Return type:

pandas.DataFrame

Raises:

Exception – If pagination.list_call_get_all_results() fails

ads.data_labeling.metadata module

class ads.data_labeling.metadata.Metadata(source_path: str = '', records_path: str = '', labels: ~typing.List[str] = <factory>, dataset_name: str = '', compartment_id: str = '', dataset_id: str = '', annotation_type: str = '', dataset_type: str = '')[source]

Bases: DataClassSerializable

The class that representing the labeled dataset metadata.

source_path

Contains information on where all the source data(image/text/document) stores.

Type:

str

records_path

Contains information on where records jsonl file stores.

Type:

str

labels

List of classes/labels for the dataset.

Type:

List

dataset_name

Dataset display name on the Data Labeling Service console.

Type:

str

compartment_id

Compartment id of the labeled dataset.

Type:

str

dataset_id

Dataset id.

Type:

str

annotation_type

Type of the labeling/annotation task. Currently supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION, BOUNDING_BOX.

Type:

str

dataset_type

Type of the dataset. Currently supports Text, Image, DOCUMENT.

Type:

str

annotation_type: str = ''
compartment_id: str = ''
dataset_id: str = ''
dataset_name: str = ''
dataset_type: str = ''
classmethod from_dls_dataset(dataset: Dataset) Metadata[source]

Contructs a Metadata instance from OCI DLS dataset.

Parameters:

dataset (OCIDLSDataset) – OCIDLSDataset object.

Returns:

The ads labeled dataset metadata instance.

Return type:

Metadata

labels: List[str]
records_path: str = ''
source_path: str = ''
to_dataframe() DataFrame[source]

Converts the metadata to dataframe format.

Returns:

The metadata in Pandas dataframe format.

Return type:

pandas.DataFrame

to_dict() Dict[source]

Converts to dictionary representation.

Returns:

The metadata in dictionary type.

Return type:

Dict

ads.data_labeling.ner module

class ads.data_labeling.ner.NERItem(label: str = '', offset: int = 0, length: int = 0)[source]

Bases: object

NERItem class which is a representation of a token span.

label

Entity name.

Type:

str

offset

The token span’s entity start index position in the text.

Type:

int

length

Length of the token span.

Type:

int

classmethod from_spacy(token) NERItem[source]
label: str = ''
length: int = 0
offset: int = 0
to_spacy() tuple[source]

Converts one NERItem to the spacy format.

Returns:

NERItem in the spacy format

Return type:

Tuple

class ads.data_labeling.ner.NERItems(items: ~typing.List[~ads.data_labeling.ner.NERItem] = <factory>)[source]

Bases: object

NERItems class consists of a list of NERItem.

items

List of NERItem.

Type:

List[NERItem]

items: List[NERItem]
to_spacy() List[tuple][source]

Converts NERItems to the spacy format.

Returns:

List of NERItems in the Spacy format.

Return type:

List[tuple]

exception ads.data_labeling.ner.WrongEntityFormatLabelIsEmpty[source]

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatLabelNotString[source]

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatLengthIsNegative[source]

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatLengthNotInteger[source]

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatOffsetIsNegative[source]

Bases: ValueError

exception ads.data_labeling.ner.WrongEntityFormatOffsetNotInteger[source]

Bases: ValueError

ads.data_labeling.record module

class ads.data_labeling.record.Record(path: str = '', content: Any | None = None, annotation: Tuple | str | List[BoundingBoxItem] | List[NERItem] | None = None)[source]

Bases: object

Class representing Record.

path

File path.

Type:

str

content

Content of the record.

Type:

Any

annotation

Annotation/label of the record.

Type:

Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]

annotation: Tuple | str | List[BoundingBoxItem] | List[NERItem] = None
content: Any = None
path: str = ''
to_dict() Dict[source]

Convert the Record instance to a dictionary.

Returns:

Dictionary representation of the Record instance.

Return type:

Dict

to_tuple() Tuple[str, Any, Tuple | str | List[BoundingBoxItem] | List[NERItem]][source]

Convert the Record instance to a tuple.

Returns:

Tuple representation of the Record instance.

Return type:

Tuple

Module contents