Productionize¶
Configure¶
After having set up ads opctl
on your desired machine using ads opctl configure
, you are ready to begin forecasting. At a bare minimum, you will need to provide the following details about your forecasting problem:
Path to the historical data (historical_data)
Name of the Datetime column (datetime_column)
Forecast horizon (horizon)
Name of the Target column (target_column)
These details exactly match the initial forecast.yaml file generated by running ads operator init --type forecast
:
kind: operator
type: forecast
version: v1
spec:
datetime_column:
name: Date
historical_data:
url: data.csv
horizon: 3
target_column: target
Optionally, you are able to specify much more. The most common additions are:
Path to the additional data, which has values for each period of the forecast horizon (additional_data)
Path to test data, in the event you want to evaluate the forecast on a test set (test_data)
List of column names that index different timeseries within the data, such as a product_ID or some other such series (target_category_columns)
Path to the output directory, where the operator will place the forecast.csv, metrics.csv, and other artifacts produced from the run (output_directory)
An extensive list of parameters can be found in the YAML Schema
section.
Run¶
After you have your forecast.yaml written, you simply run the forecast using:
ads operator run -f forecast.yaml
Interpret Results¶
The forecasting operator produces many output files: forecast.csv
, metrics.csv
, local_explanations.csv
, global_explanations.csv
, report.html
.
We will go through each of these output files in turn.
Forecast.csv
This file contains the entire historical dataset with the following columns:
Series: Categorical or numerical index
Date: Time series data
Real values: Target values from historical data
Fitted values: Model’s predictions on historical data
Forecasted values: Only available over the forecast horizon, representing the true forecasts
Upper and lower bounds: Confidence intervals for the predictions (based on the specified confidence interval width in the YAML file)
report.html
The report.html file is designed differently for each model type. Generally, it contains a summary of the historical and additional data, a plot of the target from historical data overlaid with fitted and forecasted values, analysis of the models used, and details about the model components. It also includes a receipt YAML file, providing a fully detailed version of the original forecast.yaml file.
Metrics.csv
The metrics file includes relevant metrics calculated on the training set.
Global and Local Explanations in Forecasting Models
In the realm of forecasting models, understanding not only the predictions themselves but also the factors and features driving those predictions is of paramount importance. Global and local explanations are two distinct approaches to achieving this understanding, providing insights into the inner workings of forecasting models at different levels of granularity.
Global Explanations:
Global explanations aim to provide a high-level overview of how a forecasting model works across the entire dataset or a specific feature space. They offer insights into the model’s general behavior, helping users grasp the overarching patterns and relationships it has learned. Here are key aspects of global explanations:
Feature Importance: Global explanations often involve the identification of feature importance, which ranks variables based on their contribution to the model’s predictions. This helps users understand which features have the most significant influence on the forecasts.
Model Structure: Global explanations can also reveal the architecture and structure of the forecasting model, shedding light on the algorithms, parameters, and hyperparameters used. This information aids in understanding the model’s overall approach to forecasting.
Trends and Patterns: By analyzing global explanations, users can identify broad trends and patterns in the data that the model has captured. This can include seasonality, long-term trends, and cyclical behavior.
Assumptions and Constraints: Global explanations may uncover any underlying assumptions or constraints the model operates under, highlighting potential limitations or biases.
While global explanations provide valuable insights into the model’s behavior at a holistic level, they may not capture the nuances and variations that exist within the dataset.
Local Explanations:
Local explanations, on the other hand, delve deeper into the model’s predictions for specific data points or subsets of the dataset. They offer insights into why the model made a particular prediction for a given instance. Key aspects of local explanations include:
Instance-specific Insights: Local explanations provide information about the individual features and their contribution to a specific prediction. This helps users understand why the model arrived at a particular forecast for a particular data point.
Contextual Understanding: They consider the context of the prediction, taking into account the unique characteristics of the data point in question. This is particularly valuable when dealing with outliers or anomalous data.
Model Variability: Local explanations may reveal the model’s sensitivity to changes in input variables. Users can assess how small modifications to the data impact the predictions.
Decision Boundaries: In classification problems, local explanations can elucidate the decision boundaries and the factors that led to a specific classification outcome.
While local explanations offer granular insights, they may not provide a comprehensive understanding of the model’s behavior across the entire dataset.
Examples¶
Simple Example
The simplest yaml file is generated by the ads operator init --type forecast
and looks like the following:
kind: operator
type: forecast
version: v1
spec:
datetime_column:
name: Date
historical_data:
url: data.csv
horizon: 3
model: auto
target_column: target
Typical Example
A typical forecast yaml will usually have the following fields:
kind: operator
type: forecast
version: v1
spec:
additional_data:
url: additional_data.csv
datetime_column:
name: time
format: "%d/%m/%Y"
generate_explanations: true
historical_data:
url: primary_data.csv
horizon: 5
metric: smape
model: "auto"
output_directory:
url: results
target_category_columns:
- Series
target_column: Total
test_data:
url: test_data.csv
Complex Example
The yaml can also be maximally stated as follows:
kind: operator
type: forecast
version: v1
spec:
historical_data:
url: primary_data.csv
additional_data:
url: additional_data.csv
output_directory:
url: results
test_data:
url: test_data.csv
target_category_columns:
- Store_ID
target_column: Sales
horizon: 5
datetime_column:
format: "%d/%m/%y"
name: Date
model: automlx
model_kwargs:
time_budget: 100
preprocessing: true
metric: smape
confidence_interval_width: 0.95
generate_explanations: true
generate_metrics: true
generate_report: true
local_explanation_filename: local_explanation.csv
metrics_filename: metrics.csv
report_filename: report.html
report_theme: light
forecast_filename: forecast.csv
global_explanation_filename: global_explanation.csv
test_metrics_filename: test_metrics.csv