AI Forecast Operator¶
The AI Forecast Operator leverages historical time series data to generate accurate forecasts for future trends. This operator simplifies and accelerates the data science process by automating model selection, hyperparameter tuning, and feature identification for a given prediction task.
Power in Simplicity¶
The Operator is designed to be simple to use, easy to extend, and as powerful as a team of data scientists. To get started with the simplest forecast, use the following YAML configuration:
kind: operator
type: forecast
version: v1
spec:
datetime_column:
name: ds
historical_data:
url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
horizon: 3
target_column: y
We will extend this example in various ways throughout this documentation. However, all parameters beyond those shown above are optional.
Modeling Options¶
There is no perfect model. A core feature of the Operator is the ability to select from various model frameworks. For enterprise AI, typically one or two frameworks perform best for your problem space. Each model is optimized for different assumptions, such as dataset size, frequency, complexity, and seasonality. The best way to determine which framework is right for you is through empirical testing. Based on experience with several enterprise forecasting problems, the ADS team has found the following frameworks to be the most effective, ranging from traditional statistical models to complex machine learning and deep neural networks:
Prophet
ARIMA
MLForecast
NeuralProphet
AutoTS
Note: AutoTS is not a single modeling framework but a combination of many. AutoTS algorithms include (v0.6.15): ConstantNaive, LastValueNaive, AverageValueNaive, GLS, GLM, ETS, ARIMA, FBProphet, RollingRegression, GluonTS, SeasonalNaive, UnobservedComponents, VECM, DynamicFactor, MotifSimulation, WindowRegression, VAR, DatepartRegression, UnivariateRegression, UnivariateMotif, MultivariateMotif, NVAR, MultivariateRegression, SectionalMotif, Theta, ARDL, NeuralProphet, DynamicFactorMQ, PytorchForecasting, ARCH, RRVAR, MAR, TMF, LATC, KalmanStateSpace, MetricMotif, Cassandra, SeasonalityMotif, MLEnsemble, PreprocessingRegression, FFT, BallTreeMultivariateMotif, TiDE, NeuralForecast, DMD.
Auto-Select¶
For users new to forecasting, the Operator also has an auto-select
option. This is the most computationally expensive option as it splits the training data into several validation sets, evaluates each framework, and attempts to determine the best one. However, auto-select does not guarantee to find the optimal model and is not recommended as the default configuration for end-users due to its complexity.
Specify Model¶
You can manually select the desired model from the list above and insert it into the model
parameter slot.
kind: operator
type: forecast
version: v1
spec:
datetime_column:
name: ds
historical_data:
url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
horizon: 3
model: <INSERT_MODEL_NAME_HERE>
target_column: y
Evaluation and Explanation¶
As an enterprise AI solution, the Operator ensures that the evaluation and explanation of forecasts are as critical as the forecasts themselves.
Reporting¶
With every operator run, a report is generated to summarize the work done. The report includes:
Summary of the input data
Visualization of the forecast
Breakdown of major trends
Explanation (via SHAP values) of additional features
Table of metrics
A copy of the configuration YAML file
Metrics¶
Different use cases optimize for different metrics. The Operator allows users to specify the metric they want to optimize from the following list:
MAPE
RMSE
SMAPE
MSE
The metric can be optionally specified in the YAML file:
kind: operator
type: forecast
version: v1
spec:
datetime_column:
name: ds
historical_data:
url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
horizon: 3
model: prophet
target_column: y
metric: rmse
Explanations¶
When additional data is provided, the Operator can optionally generate explanations for these additional features (columns) using SHAP values. Users can enable explanations in the YAML file:
kind: operator
type: forecast
version: v1
spec:
datetime_column:
name: ds
historical_data:
url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_pedestrians_covid.csv
additional_data:
url: additional_data.csv
horizon: 3
model: prophet
target_column: y
generate_explanations: True
With large datasets, SHAP values can be expensive to generate. Enterprise applications may vary in their need for decimal accuracy versus computational cost. Therefore, the Operator offers several options:
FAST_APPROXIMATE (default): Generated SHAP values are typically within 1% of the true values and require 1% of the time.
BALANCED: Generated SHAP values are typically within 0.1% of the true values and require 10% of the time.
HIGH_ACCURACY: Generates the true SHAP values at full precision.
kind: operator
type: forecast
version: v1
spec:
datetime_column:
name: ds
historical_data:
url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
horizon: 3
model: prophet
target_column: y
generate_explanations: True
explanations_accuracy_mode: BALANCED
Selecting the best accuracy mode will require empirical testing, but FAST_APPROXIMATE
is usually sufficient for real-world data.
Note: The above example won’t generate explanations because there is no additional data. The SHAP values would be 100% for the feature ``y``.