ads.data_labeling.parser package

Submodules

ads.data_labeling.parser.dls_record_parser module

exception ads.data_labeling.parser.dls_record_parser.AnnotationNotFoundError(id: str)[source]

Bases: Exception

class ads.data_labeling.parser.dls_record_parser.DLSBoundingBoxRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]

Bases: DLSRecordParser

BoundingBoxRecordParser class which parses the label of BoundingBox label data.

Initiates a DLSRecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

DLSRecordParser instance.

Return type:

DLSRecordParser

class ads.data_labeling.parser.dls_record_parser.DLSMultiLabelRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]

Bases: DLSRecordParser

MultiLabelRecordParser class which parses the label of Multiple label data.

Initiates a DLSRecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

DLSRecordParser instance.

Return type:

DLSRecordParser

class ads.data_labeling.parser.dls_record_parser.DLSNERRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]

Bases: DLSRecordParser

NERRecordParser class which parses the label of NER label data.

Initiates a DLSRecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

DLSRecordParser instance.

Return type:

DLSRecordParser

class ads.data_labeling.parser.dls_record_parser.DLSRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]

Bases: Parser

DLSRecordParser class which parses the labels from the record.

Initiates a DLSRecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

DLSRecordParser instance.

Return type:

DLSRecordParser

parse(oci_record_summary: OCIRecordSummary) Record[source]

Extracts the annotations. Constructs and returns a Record instance which contains the file path and the labels.

Parameters:

oci_record_summary (OCIRecordSummary) – The summary information about the record.

Returns:

Record instance which contains the file path and the annotations.

Return type:

Record

class ads.data_labeling.parser.dls_record_parser.DLSRecordParserFactory[source]

Bases: object

DLSRecordParserFactory class which contains a list of registered parsers and allows to register new DLSRecordParsers.

Current parsers include:
  • DLSSingleLabelRecordParser

  • DLSMultiLabelRecordParser

  • DLSNERRecordParser

  • DLSBoundingBoxRecordParser

static parser(annotation_type: str, dataset_source_path: str, format: str | None = None, categories: List[str] | None = None, auth: dict | None = None) DLSRecordParser[source]

Gets the parser based on the annotation_type.

Parameters:
  • annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.

  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns:

DLSRecordParser corresponding to the annotation type.

Return type:

DLSRecordParser

Raises:

ValueError – If annotation_type is not supported.

classmethod register(annotation_type: str, parser: DLSRecordParser) None[source]

Registers a new parser.

Parameters:
  • annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.

  • parser (DLSRecordParser) – A new Parser class to be registered.

Returns:

Nothing.

Return type:

None

class ads.data_labeling.parser.dls_record_parser.DLSSingleLabelRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]

Bases: DLSRecordParser

SingleLabelRecordParser class which parses the label of Single label data.

Initiates a DLSRecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

DLSRecordParser instance.

Return type:

DLSRecordParser

exception ads.data_labeling.parser.dls_record_parser.ReadAnnotationError(id: str)[source]

Bases: Exception

ads.data_labeling.parser.export_metadata_parser module

class ads.data_labeling.parser.export_metadata_parser.MetadataParser[source]

Bases: Parser

MetadataParser class which parses the metadata from the record.

EXPECTED_KEYS = ['id', 'compartmentId', 'displayName', 'labelsSet', 'annotationFormat', 'datasetSourceDetails', 'datasetFormatDetails']
static parse(json_data: Dict[Any, Any]) Metadata[source]

Parses the metadata jsonl file.

Parameters:

json_data (dict) – dictionary format of the metadata jsonl file content.

Returns:

Metadata object which contains the useful fields from the metadata jsonl file

Return type:

Metadata

ads.data_labeling.parser.export_record_parser module

class ads.data_labeling.parser.export_record_parser.BoundingBoxRecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]

Bases: RecordParser

BoundingBoxRecordParser class which parses the label of BoundingBox label data.

Initiates a RecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

RecordParser instance.

Return type:

RecordParser

class ads.data_labeling.parser.export_record_parser.EntityType[source]

Bases: object

Entity type class for supporting multiple types of entities.

GENERIC = 'GENERIC'
IMAGEOBJECTSELECTION = 'IMAGEOBJECTSELECTION'
TEXTSELECTION = 'TEXTSELECTION'
class ads.data_labeling.parser.export_record_parser.MultiLabelRecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]

Bases: RecordParser

MultiLabelRecordParser class which parses the label of Multiple label data.

Initiates a RecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

RecordParser instance.

Return type:

RecordParser

class ads.data_labeling.parser.export_record_parser.NERRecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]

Bases: RecordParser

NERRecordParser class which parses the label of NER label data.

Initiates a RecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

RecordParser instance.

Return type:

RecordParser

class ads.data_labeling.parser.export_record_parser.RecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]

Bases: Parser

RecordParser class which parses the labels from the record.

Examples

>>> from ads.data_labeling.parser.export_record_parser import SingleLabelRecordParser
>>> from ads.data_labeling.parser.export_record_parser import MultiLabelRecordParser
>>> from ads.data_labeling.parser.export_record_parser import NERRecordParser
>>> from ads.data_labeling.parser.export_record_parser import BoundingBoxRecordParser
>>> import fsspec
>>> import json
>>> from ads.common import auth as authutil
>>> labels = []
>>> with fsspec.open("/path/to/records_file.jsonl", **authutil.api_keys()) as f:
>>>     for line in f:
>>>         bounding_box_labels = BoundingBoxRecordParser("source_data_path").parse(json.loads(line))
>>>         labels.append(bounding_box_labels)

Initiates a RecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

RecordParser instance.

Return type:

RecordParser

parse(record: Dict) Record[source]

Extracts the annotations from the record content. Constructs and returns a Record instance containing the file path and the labels.

Parameters:

record (Dict) – Content of the record from the record file.

Returns:

Record instance which contains the file path as well as the annotations.

Return type:

Record

class ads.data_labeling.parser.export_record_parser.RecordParserFactory[source]

Bases: object

RecordParserFactory class which contains a list of registered parsers and allows to register new RecordParsers.

Current parsers include:
  • SingleLabelRecordParser

  • MultiLabelRecordParser

  • NERRecordParser

  • BoundingBoxRecordParser

static parser(annotation_type: str, dataset_source_path: str, format: str | None = None, categories: List[str] | None = None) RecordParser[source]

Gets the parser based on the annotation_type.

Parameters:
  • annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.

  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

RecordParser corresponding to the annotation type.

Return type:

RecordParser

Raises:

ValueError – If annotation_type is not supported.

classmethod register(annotation_type: str, parser) None[source]

Registers a new parser.

Parameters:
  • annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.

  • parser (RecordParser) – A new Parser class to be registered.

Returns:

Nothing.

Return type:

None

class ads.data_labeling.parser.export_record_parser.SingleLabelRecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]

Bases: RecordParser

SingleLabelRecordParser class which parses the label of Single label data.

Initiates a RecordParser instance.

Parameters:
  • dataset_source_path (str) – Dataset source path.

  • format ((str, optional). Defaults to None.) – Output format of annotations.

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

RecordParser instance.

Return type:

RecordParser

Module contents