AI Forecast Operator

The AI Forecast Operator leverages historical time series data to generate accurate forecasts for future trends. This operator simplifies and accelerates the data science process by automating model selection, hyperparameter tuning, and feature identification for a given prediction task.

📦 Installation

On a Notebook Session (OCI Data Science)

odsc conda install -s forecast_p311_cpu_x86_64_v6
conda activate /home/datascience/conda/forecast_p311_cpu_x86_64_v6
../../../_images/forecast_conda.png

Locally

pip install "oracle_ads[forecast]"

🚀 Getting Started

Using the CLI

# forecast.yaml
kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    model: prophet
    target_column: y
ads operator run -f forecast.yaml

Using the API

from ads.opctl.forecast import operate, ForecastOperatorConfig

spec = {
  "spec": {
    "historical_data": {"url": "https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv"},
    "datetime_column": {"name": "ds"},
    "target_column": "y",
    "model": "prophet",
    "horizon": 3
    }
}
config = ForecastOperatorConfig.from_dict(spec)
result = operate(config)

Using the Notebook UI

wget https://raw.githubusercontent.com/oracle-samples/oci-data-science-ai-samples/refs/heads/main/ai-operators/Forecast_UI.ipynb -O Forecast_UI.ipynb
../../../_images/notebook_form.png

Simply fill in the fields and click “run”:

../../../_images/notebook_form_filled.png

🧠 Tweak the Model

Select a specific model

model:
  name: arima
The model name can be any of the following:
  • Prophet - Recommended for smaller datasets, and datasets with seasonality or holidays

  • ARIMA - Recommended for highly cyclical datasets

  • AutoMLx - Oracle Lab’s proprietary modelling framework

  • NeuralProphet - Recommended for large or wide datasets

  • AutoTS - M6 Benchmark winner. Recommended if the other frameworks aren’t providing enough accuracy

  • Auto-Select - The best of all of the above. Recommended for comparing the above frameworks. Caution, it can be very slow.

Auto-Select the Best Model

Auto-Select will backtest all models and select the best performing model. Users can select which models to include using the model_list parameter. Users can tune the number of backtests per model using the num_backtests parameter, which is 5 by default. Users can adjust the portion of the data to backtest on using the sample_ratio parameter. The default of 0.2 means that all backtests will be trained on at least the first 80% of data, and the cross validation will occur over the most recent 20% of the data.

model:
  name: auto-select
model_kwargs:
  model_list: ["prophet", "arima", "neuralprophet"]
  sample_ratio: 0.2
  num_backtests: 5

Additional Modeling Options

The Operator offers several additional parameters to ensure it’s generating the best model possible. In prophet models, users can specify the min and max parameters, which dictate the smallest and largest value possible. This is useful for percentages or revenues - quantities that are naturally bounded on their upper or lower values. In prophet models, users can specify a monthly seasonality with the parameter monthly_seasonality. By default, monthly seasonality will only be fit if the trend is very strong.

model:
  name: prophet
model_kwargs:
  min: 0
  max: 100
  monthly_seasonality: True

Full Extensible Control over the Model

Users have the option to take full control of the modeling and pass through whatever parameters they like to the underlying framework. With prophet, for instance, there are options to dictate seasonality and changepoints. In the example below, anything passed to model_kwargs will be passed through to prophet (excluding a few key parameters Operators extracts).

model:
  name: prophet
model_kwargs:
  seasonality_mode: multiplicative
  changepoint_prior_scale: 0.05

➕ Add Additional Column(s)

Additional data is essential for multivariate modeling.

Structuring Data

Multivariate forecasting differs from other multivariate machine learning problems. In forecasting, all additional variables must be known over the entire forecast horizon. Consequently, the Forecast Operator requires additional_data to cover the full horizon. For example, if you’re forecasting the peak temperature for tomorrow, you cannot use tomorrow’s humidity because it’s unknown. However, many enterprise scenarios do not face this issue, as retailers often have long-term marketing plans with knowable future expenditures, holidays are predictable, etc. In some cases, users might make assumptions for a “what-if” analysis.

Sometimes, variables are useful but unknowable in advance. For these cases, we recommend lagging the variable. To lag a variable, shift all its values so that the horizon is filled with data. Typically, users shift by the entire horizon, though advanced users may shift by more or less depending on their needs. Essentially, the operator uses the humidity from five days ago to predict today’s peak temperature.

The additional data must always share the same datetime column as the historical data and must extend beyond the horizon. In other words, the number of rows in additional_data should equal the number of rows in the historical data plus the horizon.

If the historical data includes a target_category_columns, it should also be present in the additional data.

For example, if the historical data is:

Then the additional data (with a horizon of 1) should be formatted as:

Note that the additional data does not include the target column (Revenue), but it does include the datetime column (Qtr). You would include this additional data in the YAML file as follows:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: Month
    historical_data:
        url: historical_data.csv
    additional_data:
        url: additional_data.csv
    horizon: 1
    model: prophet
    target_column: Revenue

You can experiment by removing columns and observing how the results change. Below is an example of ingesting only two of the three additional columns:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: Month
    historical_data:
        url: historical_data.csv
    additional_data:
        url: additional_data.csv
        columns:
            - Discount
            - SP500 Futures
    horizon: 1
    model: prophet
    target_column: Revenue

Sourcing Data

The Operator can read data from the following sources:

  • Oracle RDBMS

  • OCI Object Storage

  • OCI Data Lake

  • HTTPS

  • S3

  • Azure Blob Storage

  • Google Cloud Storage

  • Local file systems

Additionally, the operator supports any data source supported by fsspec.

Reading from OCI Object Storage

Below is an example of reading data from OCI Object Storage using the operator:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: oci://<bucket_name>@<namespace_name>/example_yosemite_temps.csv
    horizon: 3
    target_column: y

Reading from Oracle Database

Below is an example of reading data from an Oracle Database:

kind: operator
type: forecast
version: v1
spec:
    historical_data:
        connect_args:
            user: XXX
            password: YYY
            dsn: "localhost/orclpdb"
        sql: 'SELECT Store_ID, Sales, Date FROM live_data'
    datetime_column:
        name: ds
    horizon: 1
    target_column: y

Data Preprocessing

The forecasting operator simplifies powerful data preprocessing. By default, it includes several preprocessing steps to ensure dataset compliance with each framework. However, users can disable one or more of these steps if needed, though doing so may cause the model to fail. Proceed with caution.

Default preprocessing steps: - Missing value imputation - Outlier treatment

To disable outlier_treatment, modify the YAML file as shown below:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    target_column: y
    preprocessing:
        enabled: true
        steps:
            missing_value_imputation: True
            outlier_treatment: False

Real-Time Trigger

The Operator can be run locally or on an OCI Data Science Job. The resultant model can be saved and deployed for future use if needed. For questions regarding this integration, please reach out to the OCI Data Science team.

🧠 Enable Explainability

When additional data is provided, the Operator can optionally generate explanations for these additional features (columns) using SHAP values. Users can enable explanations in the YAML file:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_pedestrians_covid.csv
    additional_data:
        url: additional_data.csv
    horizon: 3
    model: prophet
    target_column: y
    generate_explanations: True

🧾 Disable File Generation

spec:
    generate_forecast_file: False
    generate_explanations_file: False
    generate_metrics_file: False

📏 Change Evaluation Metric

Different use cases optimize for different metrics. The Operator allows users to specify the metric they want to optimize from the following list:

  • MAPE

  • RMSE

  • SMAPE

  • MSE

The metric can be optionally specified in the YAML file (default: “smape”):

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    model: prophet
    target_column: y
    metric: rmse

🧵 Run as a Job

Using the CLI, Operators can easily be run as a job using the backend parameter:

ads operator run -f forecast.yaml -b job