YAML Schema¶
Let’s explore each line of the forecast.yaml so we can better understand options for extending and customizing the operator to our use case.
Here is an example forecast.yaml with every parameter specified:
kind: operator
type: forecast
version: v1
spec:
datetime_column:
name: Date
historical_data:
url: data.csv
horizon: 3
model: auto
target_column: target
Kind: The yaml file always starts with
kind: operator
. There are many other kinds of yaml files that can be run byads opctl
, so we need to specify this is an operator.Type: The type of operator is
forecast
.Version: The only available version is
v1
.- Spec: Spec contains the bulk of the information for the specific problem.
- historical_data: This dictionary contains the details for how to read the historical data. Historical data must contain the target column, the datetime column, and optionally the target category column.
url: Insert the uri for the dataset if it’s on object storage or Data Lake using the URI pattern
oci://<bucket>@<namespace>/path/to/data.csv
.kwargs: Insert any other args for pandas to load the data (
format
,options
, etc.) See full list inYAML Schema
section.
target_column: This string specifies the name of the column where the target data is within the historical data.
- datetime_column: The dictionary outlining details around the datetime column.
name: the name of the datetime column. Must be the same in both historical and additional data.
format: the format of the datetime string in python notation detailed here.
horizon: the integer number of periods to forecast.
target_category_columns: (optional) The category ID of the target.
- additional_data: (optional) This dictionary contains the details for how to read the addtional data. Additional data must contain the the datetime column, the target category column (if present in historical), and any other columns with values over the horizon.
url: Insert the uri for the dataset if it’s on object storage or Data Lake using the URI pattern
oci://<bucket>@<namespace>/path/to/data.csv
.kwargs: Insert any other args for pandas to load the data (
format
,options
, etc.) See full list inYAML Schema
section.
- output_directory: (optional) This dictionary contains the details for where to put the output artifacts. The directory need not exist, but must be accessible by the Operator during runtime.
url: Insert the uri for the dataset if it’s on object storage or Data Lake using the URI pattern
oci://<bucket>@<namespace>/subfolder/
.kwargs: Insert any other args for pandas to load the data (
format
,options
, etc.) See full list inYAML Schema
section.
model: (optional) The name of the model framework you want to use. Defaults to “auto”. Other options are:
arima
,automlx
,prophet
,neuralprophet
,autots
, andauto
.model_kwargs: (optional) This kwargs dict passes straight through to the model framework. If you want to take direct control of the modeling, this is the best way.
- test_data: (optional) This dictionary contains the details for how to read the test data. Test data must be formatted identically to historical data and contain values for every period in the forecast horizon.
url: Insert the uri for the dataset if it’s on object storage or Data Lake using the URI pattern
oci://<bucket>@<namespace>/path/to/data.csv
.kwargs: Insert any other args for pandas to load the data (
format
,options
, etc.) See full list inYAML Schema
section.
- tuning: (optional) This dictionary specific details around tuning the NeuralProphet and Prophet models.
n_trials: The number of separate tuning jobs to run. Increasing this integer increases the time to completion, but may improve the quality.
preprocessing: (optional) Preprocessing and feature engineering can be disabled using this flag, Defaults to true
metric: (optional) The metric to select across. Users can select among: MAPE, RMSE, MSE, and SMAPE
confidence_interval_width: (optional) The width of the confidence interval to caluclate in the forecast and report.html. Defaults to 0.80 meaning an 80% confidence interval
report_filename: (optional) Placed into output_directory location. Defaults to report.html
report_title: (optional) The title of the report.
report_theme: (optional) Can be “dark” or “light”. Defaults to “light”.
metrics_filename: (optional) Placed into output_directory location. Defaults to metrics.csv
test_metrics_filename: (optional) Placed into output_directory location. Defaults to test_metrics.csv
forecast_filename: (optional) Placed into output_directory location. Defaults to forecast.csv
generate_explanations: (optional) Explainability, both local and global, can be disabled using this flag. Defaults to false.
generate_report: (optional) Report file generation can be enabled using this flag. Defaults to true.
generate_metrics: (optional) Metrics files generation can be enabled using this flag. Defaults to true.