Data Integration

Supported Data Sources

The Operator can read data from the following sources:

  • Oracle RDBMS

  • OCI Object Storage

  • OCI Data Lake

  • HTTPS

  • S3

  • Azure Blob Storage

  • Google Cloud Storage

  • Local file systems

Additionally, the operator supports any data source supported by fsspec.

Examples

Reading from OCI Object Storage

Below is an example of reading data from OCI Object Storage using the operator:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: oci://<bucket_name>@<namespace_name>/example_yosemite_temps.csv
    horizon: 3
    target_column: y

Reading from Oracle Database

Below is an example of reading data from an Oracle Database:

kind: operator
type: forecast
version: v1
spec:
    historical_data:
        connect_args:
            user: XXX
            password: YYY
            dsn: "localhost/orclpdb"
        sql: 'SELECT Store_ID, Sales, Date FROM live_data'
    datetime_column:
        name: ds
    horizon: 1
    target_column: y

Data Preprocessing

The forecasting operator simplifies powerful data preprocessing. By default, it includes several preprocessing steps to ensure dataset compliance with each framework. However, users can disable one or more of these steps if needed, though doing so may cause the model to fail. Proceed with caution.

Default preprocessing steps: - Missing value imputation - Outlier treatment

To disable outlier_treatment, modify the YAML file as shown below:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    target_column: y
    preprocessing:
        enabled: true
        steps:
            missing_value_imputation: True
            outlier_treatment: False

Real-Time Trigger

The Operator can be run locally or on an OCI Data Science Job. The resultant model can be saved and deployed for future use if needed. For questions regarding this integration, please reach out to the OCI Data Science team.