ads.data_labeling.reader package#

Submodules#

ads.data_labeling.reader.dataset_reader module#

The module containing classes to read labeled datasets. Allows to read labeled datasets from exports or from the cloud.

Classes#

LabeledDatasetReader

The LabeledDatasetReader class to read labeled dataset.

ExportReader

The ExportReader class to read labeled dataset from the export.

DLSDatasetReader

The DLSDatasetReader class to read labeled dataset from the cloud.

Examples

>>> from ads.common import auth as authutil
>>> from ads.data_labeling import LabeledDatasetReader
>>> ds_reader = LabeledDatasetReader.from_export(
...    path="oci://bucket_name@namespace/dataset_metadata.jsonl",
...    auth=authutil.api_keys(),
...    materialize=True
... )
>>> ds_reader.info()
    ------------------------------------------------------------------------
    annotation_type                                             SINGLE_LABEL
    compartment_id                                          TEST_COMPARTMENT
    dataset_id                                                  TEST_DATASET
    dataset_name                                           test_dataset_name
    dataset_type                                                        TEXT
    labels                                                     ['yes', 'no']
    records_path                                             path/to/records
    source_path                                              path/to/dataset
>>> ds_reader.read()
                             Path            Content            Annotations
    -----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no
    2   path/to/the/content/file3       file content                     no
>>> next(ds_reader.read(iterator=True))
    ("path/to/the/content/file1", "file content", "yes")
>>> next(ds_reader.read(iterator=True, chunksize=2))
    [("path/to/the/content/file1", "file content", "yes"),
    ("path/to/the/content/file2", "file content", "no")]
>>> next(ds_reader.read(chunksize=2))
                            Path            Content            Annotations
    ----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no
>>> ds_reader = LabeledDatasetReader.from_DLS(
...    dataset_id="dataset_OCID",
...    compartment_id="compartment_OCID",
...    auth=authutil.api_keys(),
...    materialize=True
... )
class ads.data_labeling.reader.dataset_reader.DLSDatasetReader(dataset_id: str, compartment_id: str, auth: Dict, encoding='utf-8', materialize: bool = False, include_unlabeled: bool = False)[source]#

Bases: Reader

The DLSDatasetReader class to read labeled dataset from the cloud.

info(self) Metadata[source]#

Gets the labeled dataset metadata.

read(self) Generator[Tuple, Any, Any][source]#

Reads the labeled dataset.

Initializes the DLS dataset reader instance.

Parameters:
  • dataset_id (str) – The dataset OCID.

  • compartment_id (str) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files. The encoding is used to extract the metadata information of the labeled dataset and also to extract the content of the text dataset records.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of dataset files should be loaded/materialized or not. By default the content will not be materialized.

  • include_unlabeled ((bool, optional). Defaults to False.) – Whether to load the unlabeled records or not.

Raises:
  • ValueError – When dataset_id is empty or not a string.:

  • TypeError – When dataset_id not a string.:

info() Metadata[source]#

Gets the labeled dataset metadata.

Returns:

The labeled dataset metadata.

Return type:

Metadata

read(format: str | None = None) Generator[Tuple, Any, Any][source]#

Reads the labeled dataset records.

Parameters:

format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

Returns:

The labeled dataset records.

Return type:

Generator[Tuple, Any, Any]

class ads.data_labeling.reader.dataset_reader.ExportReader(path: str, auth: Dict | None = None, encoding='utf-8', materialize: bool = False, include_unlabeled: bool = False)[source]#

Bases: Reader

The ExportReader class to read labeled dataset from the export.

info(self) Metadata[source]#

Gets the labeled dataset metadata.

read(self) Generator[Tuple, Any, Any][source]#

Reads the labeled dataset.

Initializes the labeled dataset export reader instance.

Parameters:
  • path (str) – The metadata file path, can be either local or object storage path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files. The encoding is used to extract the metadata information of the labeled dataset and also to extract the content of the text dataset records.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of dataset files should be loaded/materialized or not. By default the content will not be materialized.

  • include_unlabeled ((bool, optional). Defaults to False.) – Whether to load the unlabeled records or not.

Raises:
info() Metadata[source]#

Gets the labeled dataset metadata.

Returns:

The labeled dataset metadata.

Return type:

Metadata

read(format: str | None = None) Generator[Tuple, Any, Any][source]#

Reads the labeled dataset records.

Parameters:

format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

Returns:

The labeled dataset records.

Return type:

Generator[Tuple, Any, Any]

class ads.data_labeling.reader.dataset_reader.LabeledDatasetReader(reader: Reader)[source]#

Bases: object

The labeled dataset reader class.

info(self) Metadata[source]#

Gets labeled dataset metadata.

read(self, iterator: bool = False) Generator[Any, Any, Any] | pd.DataFrame[source]#

Reads labeled dataset.

from_export(cls, path: str, auth: Dict = None, encoding='utf-8', materialize: bool = False) 'LabeledDatasetReader'[source]#

Constructs a Labeled Dataset Reader instance.

Examples

>>> from ads.common import auth as authutil
>>> from ads.data_labeling import LabeledDatasetReader
>>> ds_reader = LabeledDatasetReader.from_export(
...    path="oci://bucket_name@namespace/dataset_metadata.jsonl",
...    auth=authutil.api_keys(),
...    materialize=True
... )
>>> ds_reader = LabeledDatasetReader.from_DLS(
...    dataset_id="dataset_OCID",
...    compartment_id="compartment_OCID",
...    auth=authutil.api_keys(),
...    materialize=True
... )
>>> ds_reader.info()
    ------------------------------------------------------------------------
    annotation_type                                             SINGLE_LABEL
    compartment_id                                          TEST_COMPARTMENT
    dataset_id                                                  TEST_DATASET
    dataset_name                                           test_dataset_name
    dataset_type                                                        TEXT
    labels                                                     ['yes', 'no']
    records_path                                             path/to/records
    source_path                                              path/to/dataset
>>> ds_reader.read()
                             Path            Content            Annotations
    -----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no
    2   path/to/the/content/file3       file content                     no
>>> next(ds_reader.read(iterator=True))
    ("path/to/the/content/file1", "file content", "yes")
>>> next(ds_reader.read(iterator=True, chunksize=2))
    [("path/to/the/content/file1", "file content", "yes"),
    ("path/to/the/content/file2", "file content", "no")]
>>> next(ds_reader.read(chunksize=2))
                            Path            Content            Annotations
    ----------------------------------------------------------------------
    0   path/to/the/content/file1       file content                    yes
    1   path/to/the/content/file2       file content                     no

Initializes the labeled dataset reader instance.

Parameters:

reader (Reader) – The Reader instance which reads and extracts the labeled dataset.

classmethod from_DLS(dataset_id: str, compartment_id: str | None = None, auth: dict | None = None, encoding: str = 'utf-8', materialize: bool = False, include_unlabeled: bool = False) LabeledDatasetReader[source]#

Constructs Labeled Dataset Reader instance.

Parameters:
  • dataset_id (str) – The dataset OCID.

  • compartment_id (str. Defaults to the compartment_id from the env variable.) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.

Returns:

The LabeledDatasetReader instance.

Return type:

LabeledDatasetReader

classmethod from_export(path: str, auth: dict | None = None, encoding: str = 'utf-8', materialize: bool = False, include_unlabeled: bool = False) LabeledDatasetReader[source]#

Constructs Labeled Dataset Reader instance.

Parameters:
  • path (str) – The metadata file path, can be either local or object storage path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.

Returns:

The LabeledDatasetReader instance.

Return type:

LabeledDatasetReader

info() Serializable[source]#

Gets the labeled dataset metadata.

Returns:

The labeled dataset metadata.

Return type:

Metadata

read(iterator: bool = False, format: str | None = None, chunksize: int | None = None) Generator[Any, Any, Any] | DataFrame[source]#

Reads the labeled dataset records.

Parameters:
  • iterator ((bool, optional). Defaults to False.) – True if the result should be represented as a Generator. Fasle if the result should be represented as a Pandas DataFrame.

  • format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” or “yolo”.

  • chunksize ((int, optional). Defaults to None.) – The number of records that should be read in one iteration. The result will be returned in a generator format.

Returns:

  • Union[ – Generator[Tuple[str, str, Any], Any, Any], Generator[List[Tuple[str, str, Any]], Any, Any], Generator[pd.DataFrame, Any, Any], pd.DataFrame

  • ]pd.Dataframe if iterator and chunksize are not specified. Generator[pd.Dataframe] ` if `iterator equal to False and chunksize is specified. Generator[List[Tuple[str, str, Any]]] if iterator equal to True and chunksize is specified. Generator[Tuple[str, str, Any]] if iterator equal to True and chunksize is not specified.

ads.data_labeling.reader.dls_record_reader module#

class ads.data_labeling.reader.dls_record_reader.DLSRecordReader(dataset_id: str, compartment_id: str, auth: dict | None = None)[source]#

Bases: Reader

DLS Record Reader Class that reads records from the cloud into ADS format.

Initiates a DLSRecordReader instance.

Parameters:
  • dataset_id (str) – The dataset OCID.

  • compartment_id (str) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

read() Generator[OCIRecordSummary, Any, Any][source]#

Reads OCI records.

Yields:

OCIRecordSummary – The OCIRecordSummary instance.

class ads.data_labeling.reader.dls_record_reader.OCIRecordSummary(record: RecordSummary | None = None, annotation: List[AnnotationSummary] | None = None)[source]#

Bases: object

The class that representing the labeled record in ADS format.

record#

OCI RecordSummary.

Type:

RecordSummary

annotations#

List of OCI AnnotationSummary.

Type:

List[AnnotationSummary]

annotation: List[AnnotationSummary] = None#
record: RecordSummary = None#
exception ads.data_labeling.reader.dls_record_reader.ReadAnnotationsError(dataset_id: str)[source]#

Bases: Exception

exception ads.data_labeling.reader.dls_record_reader.ReadRecordsError(dataset_id: str)[source]#

Bases: Exception

ads.data_labeling.reader.export_record_reader module#

class ads.data_labeling.reader.export_record_reader.ExportRecordReader(path: str, auth: Dict | None = None, encoding='utf-8', includes_metadata: bool = False)[source]#

Bases: JsonlReader

The ExportRecordReader class to read labeled dataset records from the export.

read(self) Generator[Dict, Any, Any][source]#

Reads labeled dataset records.

Initiates an ExportRecordReader instance.

Parameters:
  • path (str) – object storage path or local path for a file.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding of files. Only used for “TEXT” dataset.

  • includes_metadata ((bool, optional). Defaults to False.) – Determines whether the export file includes metadata or not.

Examples

>>> from ads.data_labeling.reader.export_record_reader import ExportRecordReader
>>> path = "your/path/to/jsonl/file.jsonl"
>>> from ads.common import auth as authutil
>>> reader = ExportRecordReader(path=path, auth=authutil.api_keys(), encoding="utf-8")
>>> next(reader.read())
read() Generator[Dict, Any, Any][source]#

Reads labeled dataset records.

Returns:

The labeled dataset records.

Return type:

Generator[Dict, Any, Any]

ads.data_labeling.reader.jsonl_reader module#

class ads.data_labeling.reader.jsonl_reader.JsonlReader(path: str, auth: Dict | None = None, encoding='utf-8')[source]#

Bases: Reader

JsonlReader class which reads the file.

Initiates a JsonlReader object.

Parameters:
  • path (str) – object storage path or local path for a file.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding of files. Only used for “TEXT” dataset.

Examples

>>> from ads.data_labeling.reader.jsonl_reader import JsonlReader
>>> path = "your/path/to/jsonl/file.jsonl"
>>> from ads.common import auth as authutil
>>> reader = JsonlReader(path=path, auth=authutil.api_keys(), encoding="utf-8")
>>> next(reader.read())
read(skip: int | None = None) Generator[Dict, Any, Any][source]#

Reads and yields the content of the file.

Parameters:

skip ((int, optional). Defaults to None.) – The number of records that should be skipped.

Returns:

The content of the file.

Return type:

Generator[Dict, Any, Any]

Raises:

ads.data_labeling.reader.metadata_reader module#

class ads.data_labeling.reader.metadata_reader.DLSMetadataReader(dataset_id: str, compartment_id: str, auth: dict)[source]#

Bases: Reader

DLSMetadataReader class which reads the metadata jsonl file from the cloud.

Initializes the DLS metadata reader instance.

Parameters:
  • dataset_id (str) – The dataset OCID.

  • compartment_id (str) – The compartment OCID of the dataset.

  • auth (dict) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Raises:
  • ValueError – When dataset_id is empty or not a string.:

  • TypeError – When dataset_id not a string.:

read() Metadata[source]#

Reads the content from the metadata file.

Returns:

The metadata of the labeled dataset.

Return type:

Metadata

Raises:
exception ads.data_labeling.reader.metadata_reader.DatasetNotFoundError(id: str)[source]#

Bases: Exception

exception ads.data_labeling.reader.metadata_reader.EmptyMetadata[source]#

Bases: Exception

Empty Metadata.

class ads.data_labeling.reader.metadata_reader.ExportMetadataReader(path: str, auth: Dict | None = None, encoding='utf-8')[source]#

Bases: JsonlReader

ExportMetadataReader class which reads the metadata jsonl file from local/object storage path.

Initiates a JsonlReader object.

Parameters:
  • path (str) – object storage path or local path for a file.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding of files. Only used for “TEXT” dataset.

Examples

>>> from ads.data_labeling.reader.jsonl_reader import JsonlReader
>>> path = "your/path/to/jsonl/file.jsonl"
>>> from ads.common import auth as authutil
>>> reader = JsonlReader(path=path, auth=authutil.api_keys(), encoding="utf-8")
>>> next(reader.read())
read() Metadata[source]#

Reads the content from the metadata file.

Returns:

The metadata of the labeled dataset.

Return type:

Metadata

class ads.data_labeling.reader.metadata_reader.MetadataReader(reader: Reader)[source]#

Bases: object

MetadataReader class which reads and extracts the labeled dataset metadata.

Examples

>>> from ads.data_labeling import MetadataReader
>>> import oci
>>> import os
>>> from ads.common import auth as authutil
>>> reader = MetadataReader.from_export_file("metadata_export_file_path",
...                                 auth=authutil.api_keys())
>>> reader.read()

Initiate a MetadataReader instance.

Parameters:

reader (Reader) – Reader instance which reads and extracts the labeled dataset metadata.

classmethod from_DLS(dataset_id: str, compartment_id: str | None = None, auth: dict | None = None) MetadataReader[source]#

Contructs a MetadataReader instance.

Parameters:
  • dataset_id (str) – The dataset OCID.

  • compartment_id ((str, optional). Default None) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns:

The MetadataReader instance whose reader is a DLSMetadataReader instance.

Return type:

MetadataReader

classmethod from_export_file(path: str, auth: Dict | None = None) MetadataReader[source]#

Contructs a MetadataReader instance.

Parameters:
  • path (str) – metadata file path, can be either local or object storage path.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns:

The MetadataReader instance whose reader is a ExportMetadataReader instance.

Return type:

MetadataReader

read() Metadata[source]#

Reads the content from the metadata file.

Returns:

The metadata of the labeled dataset.

Return type:

Metadata

exception ads.data_labeling.reader.metadata_reader.ReadDatasetError(id: str)[source]#

Bases: Exception

ads.data_labeling.reader.record_reader module#

class ads.data_labeling.reader.record_reader.RecordReader(reader: Reader, parser: Parser, loader: Loader | None = None, include_unlabeled: bool = False, encoding: str = 'utf-8', materialize: bool = False)[source]#

Bases: object

Record Reader Class consists of parser, reader and loader. Reader reads the the content from the record file. Parser parses the label for each record. And Loader loads the content of the file path in that record.

Examples

>>> import os
>>> import oci
>>> from ads.data_labeling import RecordReader
>>> from ads.common import auth as authutil
>>> file_path = "/path/to/your_record.jsonl"
>>> dataset_type = "IMAGE"
>>> annotation_type = "BOUNDING_BOX"
>>> record_reader = RecordReader.from_export_file(file_path, dataset_type, annotation_type, "image_file_path", authutil.api_keys())
>>> next(record_reader.read())

Initiates a RecordReader instance.

Parameters:
  • reader (Reader) – Reader instance to read content from the record file.

  • parser (Parser) – Parser instance to parse the labels from record file.

  • loader (Loader. Defaults to None.) – Loader instance to load the content from the file path in the record.

  • materialize (bool, optional. Defaults to False.) – Whether to materialize the content using loader.

  • include_unlabeled ((bool, optional). Default to False.) – Whether to load the unlabeled records or not.

  • encoding (str, optional) – Encoding for text files. Used only to extract the content of the text dataset contents.

Raises:

ValueError – If the record reader and record parser must be specified. If the loader is not specified when materialize if True.

classmethod from_DLS(dataset_id: str, dataset_type: str, annotation_type: str, dataset_source_path: str, compartment_id: str | None = None, auth: Dict | None = None, include_unlabeled: bool = False, encoding: str = 'utf-8', materialize: bool = False, format: str | None = None, categories: List[str] | None = None) RecordReader[source]#

Constructs Record Reader instance.

Parameters:
  • dataset_id (str) – The dataset OCID.

  • dataset_type (str) – Dataset type. Currently supports TEXT, IMAGE and DOCUMENT.

  • annotation_type (str) – Annotation Type. Currently TEXT supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION. IMAGE supports SINGLE_LABEL, MULTI_LABEL and BOUNDING_BOX. DOCUMENT supports SINGLE_LABEL and MULTI_LABEL.

  • dataset_source_path (str) – Dataset source path.

  • compartment_id ((str, optional). Defaults to None.) – The compartment OCID of the dataset.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • encoding ((str, optional). Defaults to 'utf-8'.) – Encoding for files.

  • materialize ((bool, optional). Defaults to False.) – Whether the content of the dataset file should be loaded or it should return the file path to the content. By default the content will not be loaded.

  • format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

Returns:

The RecordReader instance.

Return type:

RecordReader

classmethod from_export_file(path: str, dataset_type: str, annotation_type: str, dataset_source_path: str, auth: Dict | None = None, include_unlabeled: bool = False, encoding: str = 'utf-8', materialize: bool = False, format: str | None = None, categories: List[str] | None = None, includes_metadata=False) RecordReader[source]#

Initiates a RecordReader instance.

Parameters:
  • path (str) – Record file path.

  • dataset_type (str) – Dataset type. Currently supports TEXT, IMAGE and DOCUMENT.

  • annotation_type (str) – Annotation Type. Currently TEXT supports SINGLE_LABEL, MULTI_LABEL, ENTITY_EXTRACTION. IMAGE supports SINGLE_LABEL, MULTI_LABEL and BOUNDING_BOX. DOCUMENT supports SINGLE_LABEL and MULTI_LABEL.

  • dataset_source_path (str) – Dataset source path.

  • auth ((dict, optional). Default None) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

  • include_unlabeled ((bool, optional). Default to False.) – Whether to load the unlabeled records or not.

  • encoding ((str, optional). Defaults to "utf-8".) – Encoding for text files. Used only to extract the content of the text dataset contents.

  • materialize ((bool, optional). Defaults to False.) – Whether to materialize the content by loader.

  • format ((str, optional). Defaults to None.) – Output format of annotations. Can be None, “spacy” for dataset Entity Extraction type or “yolo” for Object Detection type. When None, it outputs List[NERItem] or List[BoundingBoxItem]. When “spacy”, it outputs List[Tuple]. When “yolo”, it outputs List[List[Tuple]].

  • categories ((List[str], optional). Defaults to None.) – The list of object categories in proper order for model training. Example: [‘cat’,’dog’,’horse’]

  • includes_metadata ((bool, optional). Defaults to False.) – Determines whether the export file includes metadata or not.

Returns:

A RecordReader instance.

Return type:

RecordReader

read() Generator[Tuple[str, List | str], Any, Any][source]#

Reads the record.

Yields:

Generator[Tuple[str, Union[List, str]], Any, Any] – File path, content and labels in a tuple.

Module contents#