ads.data_labeling.loader package
Submodules
ads.data_labeling.loader.file_loader module
- class ads.data_labeling.loader.file_loader.FileLoader(auth: Optional[Dict] = None)
Bases:
object
FileLoader Base Class.
Attributes:
- auth: (dict, optional). Defaults to None.
The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
Examples
>>> from ads.data_labeling.loader.file_loader import FileLoader >>> import oci >>> import os >>> from ads.common import auth as authutil >>> path = "path/to/your_text_file.txt" >>> file_content = FileLoader(auth=authutil.api_keys()).load(path)
Initiates a FileLoader instance.
- param auth:
The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
- type auth:
(dict, optional). Defaults to None.
- bulk_load(paths: List[str], **kwargs) Dict[str, Any]
Loads the files content from the list of paths. The ThreadPoolExecutor is used to load the files in parallel threads.
- Parameters:
paths (List[str]) – The list of file paths, can be local or object storage paths.
- Returns:
The map between file path and file content.
- Return type:
Dict[str, Any]
- load(path: str, **kwargs) BytesIO
Loads the file content from the path.
- Parameters:
path (str) – The file path, can be local or object storage path.
kwargs – Nothing.
- Returns:
The data in BytesIO format.
- Return type:
BytesIO
- class ads.data_labeling.loader.file_loader.FileLoaderFactory
Bases:
object
FileLoaderFactory class to create/register FileLoaders.
- static loader(dataset_type: str, auth: Optional[Dict] = None) FileLoader
Gets the loader based on the dataset_type.
- Parameters:
dataset_type (str) – Dataset type. Currently supports TEXT, IMAGE and DOCUMENT.
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
- Returns:
A FileLoader instance corresponding to the dataset_type.
- Return type:
- classmethod register(dataset_type: str, loader: Loader) None
Registers a new loader for a given dataset_type.
- Parameters:
dataset_type (str) – Dataset type. Currently supports TEXT and IMAGE.
loader (Loader) – A Loader class which supports loading content of the given dataset_type.
- Returns:
Nothing.
- Return type:
None
- class ads.data_labeling.loader.file_loader.ImageFileLoader(auth: Optional[Dict] = None)
Bases:
FileLoader
ImageFileLoader class which loads image files.
Examples
>>> from ads.data_labeling import ImageFileLoader >>> import oci >>> import os >>> from ads.common import auth as authutil >>> path = "path/to/image.png" >>> image = ImageFileLoader(auth=authutil.api_keys()).load(path)
Initiates a FileLoader instance.
- Parameters:
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
- load(path: str, **kwargs) ImageFile
Loads the image from the path.
- Parameters:
path (str) – Image file path, can be local or object storage path.
kwargs – Nothing.
- Returns:
Image opened by Pillow.
- Return type:
PIL.ImageFile.ImageFile
- class ads.data_labeling.loader.file_loader.TextFileLoader(auth: Optional[Dict] = None)
Bases:
FileLoader
TextFileLoader class which loads text files.
Examples
>>> from ads.data_labeling import TextFileLoader >>> import oci >>> import os >>> from ads.common import auth as authutil >>> path = "path/to/your_text_file.txt" >>> file_content = TextFileLoader(auth=authutil.api_keys()).load(path)
Initiates a FileLoader instance.
- Parameters:
auth ((dict, optional). Defaults to None.) – The default authetication is set using ads.set_auth API. If you need to override the default, use the ads.common.auth.api_keys or ads.common.auth.resource_principal to create appropriate authentication signer and kwargs required to instantiate IdentityClient object.
- load(path: str, backend: Union[str, Base] = 'default', **kwargs) str
Loads the content from the path.
- Parameters:
path (str) – Text file path, can be local or object storage path.
backend (Union[str, backends.Base]) – Default to “default”. Valid options are “default” and “tika” or ads.text_dataset.backends.Base, ads.text_dataset.backends.Tika
kwargs –
- encoding: (str, optional). Defaults to ‘utf-8’.
Encoding for text files. Used only to extract the content of the text dataset contents.
- Returns:
Content of the text file.
- Return type:
str