ads.data_labeling.parser package¶
Submodules¶
ads.data_labeling.parser.dls_record_parser module¶
- exception ads.data_labeling.parser.dls_record_parser.AnnotationNotFoundError(id: str)[source]¶
Bases:
Exception
- class ads.data_labeling.parser.dls_record_parser.DLSBoundingBoxRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
DLSRecordParser
BoundingBoxRecordParser class which parses the label of BoundingBox label data.
Initiates a DLSRecordParser instance.
- Parameters:
dataset_source_path (str) – Dataset source path.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
- Returns:
DLSRecordParser instance.
- Return type:
- class ads.data_labeling.parser.dls_record_parser.DLSMultiLabelRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
DLSRecordParser
MultiLabelRecordParser class which parses the label of Multiple label data.
Initiates a DLSRecordParser instance.
- Parameters:
dataset_source_path (str) – Dataset source path.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
- Returns:
DLSRecordParser instance.
- Return type:
- class ads.data_labeling.parser.dls_record_parser.DLSNERRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
DLSRecordParser
NERRecordParser class which parses the label of NER label data.
Initiates a DLSRecordParser instance.
- Parameters:
dataset_source_path (str) – Dataset source path.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
- Returns:
DLSRecordParser instance.
- Return type:
- class ads.data_labeling.parser.dls_record_parser.DLSRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
Parser
DLSRecordParser class which parses the labels from the record.
Initiates a DLSRecordParser instance.
- Parameters:
dataset_source_path (str) – Dataset source path.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
- Returns:
DLSRecordParser instance.
- Return type:
- parse(oci_record_summary: OCIRecordSummary) Record [source]¶
Extracts the annotations. Constructs and returns a Record instance which contains the file path and the labels.
- Parameters:
oci_record_summary (OCIRecordSummary) – The summary information about the record.
- Returns:
Record instance which contains the file path and the annotations.
- Return type:
- class ads.data_labeling.parser.dls_record_parser.DLSRecordParserFactory[source]¶
Bases:
object
DLSRecordParserFactory class which contains a list of registered parsers and allows to register new DLSRecordParsers.
- Current parsers include:
DLSSingleLabelRecordParser
DLSMultiLabelRecordParser
DLSNERRecordParser
DLSBoundingBoxRecordParser
- static parser(annotation_type: str, dataset_source_path: str, format: str | None = None, categories: List[str] | None = None, auth: dict | None = None) DLSRecordParser [source]¶
Gets the parser based on the annotation_type.
- Parameters:
annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.
dataset_source_path (str) – Dataset source path.
format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
- Returns:
DLSRecordParser corresponding to the annotation type.
- Return type:
- Raises:
ValueError – If annotation_type is not supported.
- classmethod register(annotation_type: str, parser: DLSRecordParser) None [source]¶
Registers a new parser.
- Parameters:
annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.
parser (DLSRecordParser) – A new Parser class to be registered.
- Returns:
Nothing.
- Return type:
None
- class ads.data_labeling.parser.dls_record_parser.DLSSingleLabelRecordParser(dataset_source_path: str, auth: dict | None = None, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
DLSRecordParser
SingleLabelRecordParser class which parses the label of Single label data.
Initiates a DLSRecordParser instance.
- Parameters:
dataset_source_path (str) – Dataset source path.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
format ((str, optional). Defaults to None.) – Output format of annotations.
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
- Returns:
DLSRecordParser instance.
- Return type:
ads.data_labeling.parser.export_metadata_parser module¶
ads.data_labeling.parser.export_record_parser module¶
- class ads.data_labeling.parser.export_record_parser.BoundingBoxRecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
RecordParser
BoundingBoxRecordParser class which parses the label of BoundingBox label data.
Initiates a RecordParser instance.
- Parameters:
- Returns:
RecordParser instance.
- Return type:
- class ads.data_labeling.parser.export_record_parser.EntityType[source]¶
Bases:
object
Entity type class for supporting multiple types of entities.
- GENERIC = 'GENERIC'¶
- IMAGEOBJECTSELECTION = 'IMAGEOBJECTSELECTION'¶
- TEXTSELECTION = 'TEXTSELECTION'¶
- class ads.data_labeling.parser.export_record_parser.MultiLabelRecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
RecordParser
MultiLabelRecordParser class which parses the label of Multiple label data.
Initiates a RecordParser instance.
- Parameters:
- Returns:
RecordParser instance.
- Return type:
- class ads.data_labeling.parser.export_record_parser.NERRecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
RecordParser
NERRecordParser class which parses the label of NER label data.
Initiates a RecordParser instance.
- Parameters:
- Returns:
RecordParser instance.
- Return type:
- class ads.data_labeling.parser.export_record_parser.RecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
Parser
RecordParser class which parses the labels from the record.
Examples
>>> from ads.data_labeling.parser.export_record_parser import SingleLabelRecordParser >>> from ads.data_labeling.parser.export_record_parser import MultiLabelRecordParser >>> from ads.data_labeling.parser.export_record_parser import NERRecordParser >>> from ads.data_labeling.parser.export_record_parser import BoundingBoxRecordParser >>> import fsspec >>> import json >>> from ads.common import auth as authutil >>> labels = [] >>> with fsspec.open("/path/to/records_file.jsonl", **authutil.api_keys()) as f: >>> for line in f: >>> bounding_box_labels = BoundingBoxRecordParser("source_data_path").parse(json.loads(line)) >>> labels.append(bounding_box_labels)
Initiates a RecordParser instance.
- Parameters:
- Returns:
RecordParser instance.
- Return type:
- parse(record: Dict) Record [source]¶
Extracts the annotations from the record content. Constructs and returns a Record instance containing the file path and the labels.
- Parameters:
record (Dict) – Content of the record from the record file.
- Returns:
Record instance which contains the file path as well as the annotations.
- Return type:
- class ads.data_labeling.parser.export_record_parser.RecordParserFactory[source]¶
Bases:
object
RecordParserFactory class which contains a list of registered parsers and allows to register new RecordParsers.
- Current parsers include:
SingleLabelRecordParser
MultiLabelRecordParser
NERRecordParser
BoundingBoxRecordParser
- static parser(annotation_type: str, dataset_source_path: str, format: str | None = None, categories: List[str] | None = None) RecordParser [source]¶
Gets the parser based on the annotation_type.
- Parameters:
annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.
dataset_source_path (str) – Dataset source path.
format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].
categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]
- Returns:
RecordParser corresponding to the annotation type.
- Return type:
- Raises:
ValueError – If annotation_type is not supported.
- classmethod register(annotation_type: str, parser) None [source]¶
Registers a new parser.
- Parameters:
annotation_type (str) – Annotation type which can be SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION and BOUNDING_BOX.
parser (RecordParser) – A new Parser class to be registered.
- Returns:
Nothing.
- Return type:
None
- class ads.data_labeling.parser.export_record_parser.SingleLabelRecordParser(dataset_source_path: str, format: str | None = None, categories: List[str] | None = None)[source]¶
Bases:
RecordParser
SingleLabelRecordParser class which parses the label of Single label data.
Initiates a RecordParser instance.
- Parameters:
- Returns:
RecordParser instance.
- Return type: