ads.data_labeling package
Subpackages
- ads.data_labeling.interface package
- ads.data_labeling.loader package
- ads.data_labeling.mixin package
- ads.data_labeling.parser package
- ads.data_labeling.reader package
- Submodules
- ads.data_labeling.reader.dataset_reader module
- ads.data_labeling.reader.dls_record_reader module
- ads.data_labeling.reader.export_record_reader module
- ads.data_labeling.reader.jsonl_reader module
- ads.data_labeling.reader.metadata_reader module
- ads.data_labeling.reader.record_reader module
- Module contents
- ads.data_labeling.visualizer package
Submodules
ads.data_labeling.boundingbox module
- class ads.data_labeling.boundingbox.BoundingBoxItem(top_left: ~typing.Tuple[float, float], bottom_left: ~typing.Tuple[float, float], bottom_right: ~typing.Tuple[float, float], top_right: ~typing.Tuple[float, float], labels: ~typing.List[str] = <factory>)
Bases:
object
BoundingBoxItem class representing bounding box label.
- labels
List of labels for this bounding box.
- Type:
List[str]
- top_left
Top left corner of this bounding box.
- Type:
Tuple[float, float]
- bottom_left
Bottom left corner of this bounding box.
- Type:
Tuple[float, float]
- bottom_right
Bottom right corner of this bounding box.
- Type:
Tuple[float, float]
- top_right
Top right corner of this bounding box.
- Type:
Tuple[float, float]
Examples
>>> item = BoundingBoxItem( ... labels = ['cat','dog'] ... bottom_left=(0.2, 0.4), ... top_left=(0.2, 0.2), ... top_right=(0.8, 0.2), ... bottom_right=(0.8, 0.4)) >>> item.to_yolo(categories = ['cat','dog', 'horse'])
- bottom_left: Tuple[float, float]
- bottom_right: Tuple[float, float]
- classmethod from_yolo(bbox: List[Tuple], categories: Optional[List[str]] = None) BoundingBoxItem
Converts the YOLO formated annotations to BoundingBoxItem.
- Parameters:
bboxes (List[Tuple]) – The list of bounding box annotations in YOLO format. Example: [(0, 0.511560675, 0.50234826, 0.47013485, 0.57803468)]
categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
- Returns:
The BoundingBoxItem.
- Return type:
- Raises:
TypeError – When categories list has a wrong format.
- labels: List[str]
- to_yolo(categories: List[str]) List[Tuple[int, float, float, float, float]]
Converts BoundingBoxItem to the YOLO format.
- Parameters:
categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
- Returns:
The list of YOLO formatted bounding boxes.
- Return type:
List[Tuple[int, float, float, float, float]]
- Raises:
ValueError – When categories list not provided. When categories list not matched with the labels.
TypeError – When categories list has a wrong format.
- top_left: Tuple[float, float]
- top_right: Tuple[float, float]
- class ads.data_labeling.boundingbox.BoundingBoxItems(items: ~typing.List[~ads.data_labeling.boundingbox.BoundingBoxItem] = <factory>)
Bases:
object
BoundingBoxItems class which consists of a list of BoundingBoxItem.
- items
List of BoundingBoxItem.
- Type:
List[BoundingBoxItem]
Examples
>>> item = BoundingBoxItem( ... labels = ['cat','dog'] ... bottom_left=(0.2, 0.4), ... top_left=(0.2, 0.2), ... top_right=(0.8, 0.2), ... bottom_right=(0.8, 0.4)) >>> items = BoundingBoxItems(items = [item]) >>> items.to_yolo(categories = ['cat','dog', 'horse'])
- items: List[BoundingBoxItem]
- to_yolo(categories: List[str]) List[Tuple[int, float, float, float, float]]
Converts BoundingBoxItems to the YOLO format.
- Parameters:
categories (List[str]) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
- Returns:
The list of YOLO formatted bounding boxes.
- Return type:
List[Tuple[int, float, float, float, float]]
- Raises:
ValueError – When categories list not provided. When categories list not matched with the labels.
TypeError – When categories list has a wrong format.
ads.data_labeling.constants module
- class ads.data_labeling.constants.AnnotationType
Bases:
object
AnnotationType class which contains all the annotation types that data labeling service supports.
- BOUNDING_BOX = 'BOUNDING_BOX'
- ENTITY_EXTRACTION = 'ENTITY_EXTRACTION'
- MULTI_LABEL = 'MULTI_LABEL'
- SINGLE_LABEL = 'SINGLE_LABEL'
ads.data_labeling.data_labeling_service module
- class ads.data_labeling.data_labeling_service.DataLabeling(compartment_id: Optional[str] = None, dls_cp_client_auth: Optional[dict] = None, dls_dp_client_auth: Optional[dict] = None)
Bases:
OCIWorkRequestMixin
Class for data labeling service. Integrate the data labeling service APIs.
Examples
>>> import ads >>> import pandas >>> from ads.data_labeling.data_labeling_service import DataLabeling >>> ads.set_auth("api_key") >>> dls = DataLabeling() >>> dls.list_dataset() >>> metadata_path = dls.export(dataset_id="your dataset id", ... path="oci://<bucket_name>@<namespace>/folder") >>> df = pd.DataFrame.ads.read_labeled_data(metadata_path)
Initialize a DataLabeling class.
- Parameters:
compartment_id (str, optional) – OCID of data labeling datasets’ compartment
dls_cp_client_auth (dict, optional) – Data Labeling control plane client auth. Default is None. The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
dls_dp_client_auth (dict, optional) – Data Labeling data plane client auth. Default is None. The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
- Returns:
Nothing.
- Return type:
None
- export(dataset_id: str, path: str, include_unlabeled=False) str
Export dataset based on the dataset_id and save the jsonl files under the path (metadata jsonl file and the records jsonl file) to the object storage path provided by the user and return the metadata jsonl path.
- Parameters:
dataset_id (str) – The dataset id of which the snapshot will be generated.
path (str) – The object storage path to store the generated snapshot. “oci://<bucket_name>@<namespace>/prefix”
include_unlabeled (bool, Optional. Defaults to False.) – Whether to include unlabeled records or not.
- Returns:
oci path of the metadata jsonl file.
- Return type:
str
- list_dataset(**kwargs) DataFrame
List all the datasets created from the data labeling service under a given compartment.
- Parameters:
kwargs (dict, optional) – Additional keyword arguments will be passed to oci.data_labeling_serviceDataLabelingManagementClient.list_datasets method.
- Returns:
pandas dataframe which contains the dataset information.
- Return type:
pandas.DataFrame
- Raises:
Exception – If pagination.list_call_get_all_results() fails
ads.data_labeling.metadata module
- class ads.data_labeling.metadata.Metadata(source_path: str = '', records_path: str = '', labels: ~typing.List[str] = <factory>, dataset_name: str = '', compartment_id: str = '', dataset_id: str = '', annotation_type: str = '', dataset_type: str = '')
Bases:
DataClassSerializable
The class that representing the labeled dataset metadata.
- source_path
Contains information on where all the source data(image/text/document) stores.
- Type:
str
- records_path
Contains information on where records jsonl file stores.
- Type:
str
- labels
List of classes/labels for the dataset.
- Type:
List
- dataset_name
Dataset display name on the Data Labeling Service console.
- Type:
str
- compartment_id
Compartment id of the labeled dataset.
- Type:
str
- dataset_id
Dataset id.
- Type:
str
- annotation_type
Type of the labeling/annotation task. Currently supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION, BOUNDING_BOX.
- Type:
str
- dataset_type
Type of the dataset. Currently supports Text, Image, DOCUMENT.
- Type:
str
- annotation_type: str = ''
- compartment_id: str = ''
- dataset_id: str = ''
- dataset_name: str = ''
- dataset_type: str = ''
- classmethod from_dls_dataset(dataset: Dataset) Metadata
Contructs a Metadata instance from OCI DLS dataset.
- Parameters:
dataset (OCIDLSDataset) – OCIDLSDataset object.
- Returns:
The ads labeled dataset metadata instance.
- Return type:
- labels: List[str]
- records_path: str = ''
- source_path: str = ''
- to_dataframe() DataFrame
Converts the metadata to dataframe format.
- Returns:
The metadata in Pandas dataframe format.
- Return type:
pandas.DataFrame
- to_dict() Dict
Converts to dictionary representation.
- Returns:
The metadata in dictionary type.
- Return type:
Dict
ads.data_labeling.ner module
- class ads.data_labeling.ner.NERItem(label: str = '', offset: int = 0, length: int = 0)
Bases:
object
NERItem class which is a representation of a token span.
- label
Entity name.
- Type:
str
- offset
The token span’s entity start index position in the text.
- Type:
int
- length
Length of the token span.
- Type:
int
- label: str = ''
- length: int = 0
- offset: int = 0
- to_spacy() tuple
Converts one NERItem to the spacy format.
- Returns:
NERItem in the spacy format
- Return type:
Tuple
- class ads.data_labeling.ner.NERItems(items: ~typing.List[~ads.data_labeling.ner.NERItem] = <factory>)
Bases:
object
NERItems class consists of a list of NERItem.
- to_spacy() List[tuple]
Converts NERItems to the spacy format.
- Returns:
List of NERItems in the Spacy format.
- Return type:
List[tuple]
- exception ads.data_labeling.ner.WrongEntityFormatLabelIsEmpty
Bases:
ValueError
- exception ads.data_labeling.ner.WrongEntityFormatLabelNotString
Bases:
ValueError
- exception ads.data_labeling.ner.WrongEntityFormatLengthIsNegative
Bases:
ValueError
- exception ads.data_labeling.ner.WrongEntityFormatLengthNotInteger
Bases:
ValueError
- exception ads.data_labeling.ner.WrongEntityFormatOffsetIsNegative
Bases:
ValueError
- exception ads.data_labeling.ner.WrongEntityFormatOffsetNotInteger
Bases:
ValueError
ads.data_labeling.record module
- class ads.data_labeling.record.Record(path: str = '', content: Optional[Any] = None, annotation: Optional[Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]] = None)
Bases:
object
Class representing Record.
- path
File path.
- Type:
str
- content
Content of the record.
- Type:
Any
- annotation
Annotation/label of the record.
- Type:
Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]
- annotation: Union[Tuple, str, List[BoundingBoxItem], List[NERItem]] = None
- content: Any = None
- path: str = ''
- to_dict() Dict
Convert the Record instance to a dictionary.
- Returns:
Dictionary representation of the Record instance.
- Return type:
Dict
- to_tuple() Tuple[str, Any, Union[Tuple, str, List[BoundingBoxItem], List[NERItem]]]
Convert the Record instance to a tuple.
- Returns:
Tuple representation of the Record instance.
- Return type:
Tuple