AI Forecast Operator¶

The AI Forecast Operator leverages historical time series data to generate accurate forecasts for future trends. This operator simplifies and accelerates the data science process by automating model selection, hyperparameter tuning, and feature identification for a given prediction task.

📦 Installation¶

On a Notebook Session (OCI Data Science)¶

odsc conda install -s forecast_p311_cpu_x86_64_v13
conda activate /home/datascience/conda/forecast_p311_cpu_x86_64_v13

Locally¶

pip install "oracle_ads[forecast]"

🚀 Getting Started¶

Using the CLI¶

# forecast.yaml
kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    model: prophet
    target_column: y

ads operator run -f forecast.yaml

Using the API¶

from ads.opctl.forecast import operate, ForecastOperatorConfig

spec = {
  "spec": {
    "historical_data": {"url": "https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv"},
    "datetime_column": {"name": "ds"},
    "target_column": "y",
    "model": "prophet",
    "horizon": 3
    }
}
config = ForecastOperatorConfig.from_dict(spec)
result = operate(config)

Using the Notebook UI¶

wget https://raw.githubusercontent.com/oracle-samples/oci-data-science-ai-samples/refs/heads/main/ai-operators/Forecast_UI.ipynb -O Forecast_UI.ipynb

Simply fill in the fields and click “run”:

../../../_images/notebook_form_filled.png

🧠 Tweak the Model¶

Select a specific model¶

model: arima

The model name can be any of the following:

Prophet - Recommended for smaller datasets, and datasets with seasonality or holidays
ARIMA - Recommended for highly cyclical datasets
AutoMLx - Oracle Lab’s proprietary modelling framework
NeuralProphet - Recommended for large or wide datasets
AutoTS - M6 Benchmark winner. Recommended if the other frameworks aren’t providing enough accuracy
Auto-Select - Backtests multiple frameworks and picks the single best performer across the series.
Auto-Select-Series - Chooses a model per series using either the default meta_learning strategy or an optional backtesting strategy. meta_learning is designed for fast, low-latency model selection, while backtesting is designed for more evidence-based, potentially more accurate selection by validating candidates against each series’ history.

Auto-Select the Best Model¶

Auto-Select backtests multiple models and selects the overall best performer across the series. Auto-Select-Series supports two per-series selection strategies. By default it derives meta-features per series, infers from a trained meta model, and recommends the forecasting framework that performed best during meta-model training for that series’ pattern. Users can select which models to include using the model_list parameter. Users can tune the number of backtests per model using the num_backtests parameter, which is 5 by default. Users can adjust the portion of the data to backtest on using the sample_ratio parameter. The default of 0.2 means that all backtests will be trained on at least the first 80% of data, and the cross validation will occur over the most recent 20% of the data.

model: auto-select
model_kwargs:
  model_list: ["prophet", "arima", "neuralprophet"]
  sample_ratio: 0.2
  num_backtests: 5

Automatically Choosing the Right Model Per Series¶

model: auto-select-series
model_kwargs:
  selection_strategy: meta_learning

Use this option when model-selection latency should stay very low, including millisecond-level selection. The operator extracts meta-features from each series (including frequency, variability, and exogenous behaviour) and then uses a trained meta-learning selector to recommend one of the supported frameworks (arima, ets, lgbforecast, prophet, theta, xgbforecast) per series without running historical backtests for every candidate.

Backtesting Candidate Models Per Series¶

model: auto-select-series
model_kwargs:
  selection_strategy: backtesting
  model_list: ["prophet", "arima"]
  num_backtests: 5

Use this option when you want auto-select-series to evaluate a fixed model list with historical backtests for each series instead of using the default meta-learning selector. This is especially useful for noisy series and can improve accuracy when recent historical performance is the best signal for model choice. The operator compares the candidate models independently per series using the configured metric, picks the winner for that series, and then retrains that winning model on the full series history before producing the final forecast. Because every candidate is backtested per series, this strategy trades additional runtime for more evidence-based model selection. If model_list is omitted, the operator evaluates all supported concrete forecasting models by default.

Global and Per-Series Models¶

Some forecasting frameworks train one model across all series, while others fit one model per series.

Behavior	Models	Notes
Global	`lgbforecast`, `xgbforecast`	Learns shared patterns across related series. This can help with short, sparse, or noisy series because the model can use signal from the broader dataset.
Per-series	`prophet`, `arima`, `neuralprophet`, `theta`, `ets`	Fits each series independently.

When auto-select-series chooses the model for each series, it groups series by the selected framework and runs each group with that framework. If a global model is selected for a group, the series in that group are trained together.

Additional Modeling Options¶

The Operator offers several additional parameters to ensure it’s generating the best model possible. In prophet models, users can specify the min and max parameters, which dictate the smallest and largest value possible. This is useful for percentages or revenues - quantities that are naturally bounded on their upper or lower values. In prophet models, users can specify a monthly seasonality with the parameter monthly_seasonality. By default, monthly seasonality will only be fit if the trend is very strong.

model: prophet
model_kwargs:
  min: 0
  max: 100
  monthly_seasonality: True

Full Extensible Control over the Model¶

Users have the option to take full control of the modeling and pass through whatever parameters they like to the underlying framework. With prophet, for instance, there are options to dictate seasonality and changepoints. In the example below, anything passed to model_kwargs will be passed through to prophet (excluding a few key parameters Operators extracts).

model: prophet
model_kwargs:
  seasonality_mode: multiplicative
  changepoint_prior_scale: 0.05

➕ Add Additional Column(s)¶

Additional data is essential for multivariate modeling.

Structuring Data¶

Multivariate forecasting differs from other multivariate machine learning problems. In forecasting, all additional variables must be known over the entire forecast horizon. Consequently, the Forecast Operator requires additional_data to cover the full horizon. For example, if you’re forecasting the peak temperature for tomorrow, you cannot use tomorrow’s humidity because it’s unknown. However, many enterprise scenarios do not face this issue, as retailers often have long-term marketing plans with knowable future expenditures, holidays are predictable, etc. In some cases, users might make assumptions for a “what-if” analysis.

Sometimes, variables are useful but unknowable in advance. For these cases, we recommend lagging the variable. To lag a variable, shift all its values so that the horizon is filled with data. Typically, users shift by the entire horizon, though advanced users may shift by more or less depending on their needs. Essentially, the operator uses the humidity from five days ago to predict today’s peak temperature.

The additional data must always share the same datetime column as the historical data and must extend beyond the horizon. In other words, the number of rows in additional_data should equal the number of rows in the historical data plus the horizon.

If the historical data includes a target_category_columns, it should also be present in the additional data.

For example, if the historical data is:

Then the additional data (with a horizon of 1) should be formatted as:

Note that the additional data does not include the target column (Revenue), but it does include the datetime column (Qtr). You would include this additional data in the YAML file as follows:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: Month
    historical_data:
        url: historical_data.csv
    additional_data:
        url: additional_data.csv
    horizon: 1
    model: prophet
    target_column: Revenue

You can experiment by removing columns and observing how the results change. Below is an example of ingesting only two of the three additional columns:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: Month
    historical_data:
        url: historical_data.csv
    additional_data:
        url: additional_data.csv
        columns:
            - Discount
            - SP500 Futures
    horizon: 1
    model: prophet
    target_column: Revenue

Sourcing Data¶

The Operator can read data from the following sources:

Oracle RDBMS
OCI Object Storage
OCI Data Lake
HTTPS
S3
Azure Blob Storage
Google Cloud Storage
Local file systems

Additionally, the operator supports any data source supported by fsspec.

Reading from OCI Object Storage¶

Below is an example of reading data from OCI Object Storage using the operator:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: oci://<bucket_name>@<namespace_name>/example_yosemite_temps.csv
    horizon: 3
    target_column: y

Reading from Oracle Database¶

Below is an example of reading data from an Oracle Database:

kind: operator
type: forecast
version: v1
spec:
    historical_data:
        connect_args:
            user: XXX
            password: YYY
            dsn: "localhost/orclpdb"
        sql: 'SELECT Store_ID, Sales, Date FROM live_data'
    datetime_column:
        name: ds
    horizon: 1
    target_column: y

Data Preprocessing¶

The forecasting operator simplifies powerful data preprocessing. By default, it includes several preprocessing steps to ensure dataset compliance with each framework. However, users can disable one or more of these steps if needed, though doing so may cause the model to fail. Proceed with caution.

Default preprocessing steps: - Missing value imputation - Outlier treatment

To disable outlier_treatment, modify the YAML file as shown below:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    target_column: y
    preprocessing:
        enabled: true
        steps:
            missing_value_imputation: True
            outlier_treatment: False

Real-Time Trigger¶

The Operator can be run locally or on an OCI Data Science Job. The resultant model can be saved and deployed for future use if needed. For questions regarding this integration, please reach out to the OCI Data Science team.

🧠 Enable Explainability¶

When additional data is provided, the Operator can optionally generate explanations for these additional features (columns) using SHAP values. Users can enable explanations in the YAML file:

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_pedestrians_covid.csv
    additional_data:
        url: additional_data.csv
    horizon: 3
    model: prophet
    target_column: y
    generate_explanations: True

🧾 Disable File Generation¶

spec:
    generate_forecast_file: False
    generate_explanations_file: False
    generate_metrics_file: False

📏 Change Evaluation Metric¶

Different use cases optimize for different metrics. The Operator allows users to specify the metric they want to optimize from the following list:

MAPE
RMSE
SMAPE
MSE

The metric can be optionally specified in the YAML file (default: “smape”):

kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    model: prophet
    target_column: y
    metric: rmse

🧵 Run as a Job¶

Using the CLI, Operators can easily be run as a job using the backend parameter:

ads operator run -f forecast.yaml -b job