ads.data_labeling.loader package#

Submodules#

ads.data_labeling.loader.file_loader module#

class ads.data_labeling.loader.file_loader.FileLoader(auth: Dict | None = None)[source]#

Bases: object

FileLoader Base Class.

Attributes:#

auth: (dict, optional). Defaults to None.

The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Examples

>>> from ads.data_labeling.loader.file_loader import FileLoader
>>> import oci
>>> import os
>>> from ads.common import auth as authutil
>>> path = "path/to/your_text_file.txt"
>>> file_content = FileLoader(auth=authutil.api_keys()).load(path)

Initiates a FileLoader instance.

param auth:

The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

type auth:

(dict, optional). Defaults to None.

bulk_load(paths: List[str], **kwargs) Dict[str, Any][source]#

Loads the files content from the list of paths. The ThreadPoolExecutor is used to load the files in parallel threads.

Parameters:

paths (List[str]) – The list of file paths, can be local or object storage paths.

Returns:

The map between file path and file content.

Return type:

Dict[str, Any]

load(path: str, **kwargs) BytesIO[source]#

Loads the file content from the path.

Parameters:
  • path (str) – The file path, can be local or object storage path.

  • kwargs – Nothing.

Returns:

The data in BytesIO format.

Return type:

BytesIO

class ads.data_labeling.loader.file_loader.FileLoaderFactory[source]#

Bases: object

FileLoaderFactory class to create/register FileLoaders.

static loader(dataset_type: str, auth: Dict | None = None) FileLoader[source]#

Gets the loader based on the dataset_type.

Parameters:
  • dataset_type (str) – Dataset type. Currently supports TEXT, IMAGE and DOCUMENT.

  • auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

Returns:

A FileLoader instance corresponding to the dataset_type.

Return type:

FileLoader

classmethod register(dataset_type: str, loader: Loader) None[source]#

Registers a new loader for a given dataset_type.

Parameters:
  • dataset_type (str) – Dataset type. Currently supports TEXT and IMAGE.

  • loader (Loader) – A Loader class which supports loading content of the given dataset_type.

Returns:

Nothing.

Return type:

None

class ads.data_labeling.loader.file_loader.ImageFileLoader(auth: Dict | None = None)[source]#

Bases: FileLoader

ImageFileLoader class which loads image files.

Examples

>>> from ads.data_labeling import ImageFileLoader
>>> import oci
>>> import os
>>> from ads.common import auth as authutil
>>> path = "path/to/image.png"
>>> image = ImageFileLoader(auth=authutil.api_keys()).load(path)

Initiates a FileLoader instance.

Parameters:

auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

load(path: str, **kwargs) ImageFile[source]#

Loads the image from the path.

Parameters:
  • path (str) – Image file path, can be local or object storage path.

  • kwargs – Nothing.

Returns:

Image opened by Pillow.

Return type:

PIL.ImageFile.ImageFile

class ads.data_labeling.loader.file_loader.TextFileLoader(auth: Dict | None = None)[source]#

Bases: FileLoader

TextFileLoader class which loads text files.

Examples

>>> from ads.data_labeling import TextFileLoader
>>> import oci
>>> import os
>>> from ads.common import auth as authutil
>>> path = "path/to/your_text_file.txt"
>>> file_content = TextFileLoader(auth=authutil.api_keys()).load(path)

Initiates a FileLoader instance.

Parameters:

auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.

load(path: str, backend: str | Base = 'default', **kwargs) str[source]#

Loads the content from the path.

Parameters:
  • path (str) – Text file path, can be local or object storage path.

  • backend (Union[str, backends.Base]) – Default to “default”. Valid options are “default” and “tika” or ads.text_dataset.backends.Base, ads.text_dataset.backends.Tika

  • kwargs

    encoding: (str, optional). Defaults to ‘utf-8’.

    Encoding for text files. Used only to extract the content of the text dataset contents.

Returns:

Content of the text file.

Return type:

str

Module contents#