Supported Formats ***************** You can load datasets into ADS, either locally or from network file systems. You can open datasets with ``DatasetBrowser`` or ``pandas``. ``DatasetBrowser`` supports opening the datasets from web sites and libraries, such as scikit-learn directly into ADS. When you load a dataset in ``ADSDataset`` from ``pandas.DataFrame``, you can get the summary statistics, correlations, and visualizations of the dataset. ADS Supports: +-------------------+-----------------------------------------------+ | **Data Sources** | Amazon S3 | | +-----------------------------------------------+ | | Autonomous Databases: ADW and ATP | | +-----------------------------------------------+ | | Blob | | +-----------------------------------------------+ | | Elastic Search instances | | +-----------------------------------------------+ | | Google Cloud Service | | +-----------------------------------------------+ | | HTTP and HTTPs Sources | | +-----------------------------------------------+ | | Hadoop Distributed File System | | +-----------------------------------------------+ | | Local files | | +-----------------------------------------------+ | | Microsoft Azure | | +-----------------------------------------------+ | | MongoDB | | +-----------------------------------------------+ | | NoSQL DB instances | | +-----------------------------------------------+ | | Oracle Cloud Infrastructure Object Storage | | +-----------------------------------------------+ | | Oracle Database with cx_Oracle | +-------------------+-----------------------------------------------+ | **Data Formats** | Apache server log files | | +-----------------------------------------------+ | | Array, Dictionary | | +-----------------------------------------------+ | | Attribute-Relation File Format (ARFF) | | +-----------------------------------------------+ | | Avro | | +-----------------------------------------------+ | | Comma Separated Values (CSV) | | +-----------------------------------------------+ | | HTML | | +-----------------------------------------------+ | | Hierarchical Data Format 5 (HDF5) | | +-----------------------------------------------+ | | Javascript Object Notation (JSON) | | +-----------------------------------------------+ | | LIBSVM | | +-----------------------------------------------+ | | Pandas.DataFrame, Dask.DataFrame | | +-----------------------------------------------+ | | Parquet | | +-----------------------------------------------+ | | PDF | | +-----------------------------------------------+ | | Tab Separated Values (TSV) | | +-----------------------------------------------+ | | xls, xlsx (Excel) | | +-----------------------------------------------+ | | XML | +-------------------+-----------------------------------------------+ | **Data Types** | Boolean Types (`bool`) | | +-----------------------------------------------+ | | Numeric Types (`int`, `float`) | | +-----------------------------------------------+ | | Text Types (`str`) | +-------------------+-----------------------------------------------+ ADS *Does Not* Support: +-------------------+-----------------------------------------------+ | **Data Formats** | DOCX | | +-----------------------------------------------+ | | Raw Images | | +-----------------------------------------------+ | | SAS | | +-----------------------------------------------+ | | Text Files | +-------------------+-----------------------------------------------+ | **Data Types** | Mapping Types (`dict`) | | +-----------------------------------------------+ | | Set Types (`set`) | | +-----------------------------------------------+ | | Sequence Types (`list`, `tuple`, `range`) | +-------------------+-----------------------------------------------+ For reading text files, DOCX and PDF, see "Text Extraction" section.