Productionize¶
Configure¶
After having set up ads opctl
on your desired machine using ads opctl configure
, you are ready to begin you anomaly detection applicaiton. At a bare minimum, you will need to provide the following details about the data:
Path to the input data (input_data)
Name of the Datetime column (datetime_column)
Name of the Target column (target_column)
These details exactly match the initial anomaly.yaml file generated by running ads operator init --type anomaly
:
kind: operator
type: anomaly
version: v1
spec:
datetime_column:
name: Date
input_data:
url: data.csv
target_column: target
Optionally, you are able to specify much more. The most common additions are:
Path to the validation data, which has all of the columns of the input_data plus an
anomaly
column. (validation_data)Path to test data, in the event you want to evaluate the selected model on a test set (test_data)
List of column names that index different timeseries within the data, such as a product_ID or some other such series (target_category_columns)
Path to the output directory, where the operator will place the outliers.csv, report.html, and other artifacts produced from the run (output_directory)
An extensive list of parameters can be found in the YAML Schema
section.
Run¶
Once written, run the anomaly.yaml file:
ads operator run -f anomaly.yaml
Interpret Results¶
The anomaly detection operator produces many output files: outliers.csv
, report.html
, and optionally inliers.csv
.
We will go through each of these output files in turn.
outliers.csv
This file contains the entire historical dataset with the following columns:
Date: Time series data
Series: Categorical or numerical index
Target Column: Input data
Score: This will give a score from 0-1 of how anomalous a datapoint is
report.html
The report.html file is designed differently for each model type. Generally, it contains a summary of the input and validation data, a plot of the target from input data overlaid with red dots for anomalous values, analysis of the models used, and details about the model components. It also includes a receipt YAML file, providing a fully detailed version of the original anomaly.yaml file.
Metrics.csv
The metrics file includes relevant metrics calculated on the training set.
Examples¶
Simple Example
The simplest yaml file is generated by the ads operator init --type anomaly
and looks like the following:
kind: operator
type: anomaly
version: v1
spec:
datetime_column:
name: Date
input_data:
url: data.csv
model: auto
target_column: target
Typical Example
A typical anomaly detection application may have the following fields:
kind: operator
type: anomaly
version: v1
spec:
input_data:
connect_args:
user: XXX
password: YYY
dsn: "localhost/orclpdb"
sql: 'SELECT Series, Total, time FROM live_data'
datetime_column:
name: time
format: "%H:%M:%S"
model: "auto"
output_directory:
url: results
target_category_columns:
- Series
target_column: Total
test_data:
url: oci://bucket@namespace/test_data.csv
Complex Example
The yaml can also be maximally stated as follows:
kind: operator
type: anomaly
version: v1
spec:
input_data:
connect_args:
user: XXX
password: YYY
dsn: "localhost/orclpdb"
sql: 'SELECT Store_ID, Sales, Date FROM live_data'
validation_data:
url: oci://bucket@namespace/additional_data.csv
columns:
- Date
- Store_ID
- v1
- v3
- v4
output_directory:
url: results
test_data:
url: test_data.csv
target_category_columns:
- Store_ID
target_column: Sales
datetime_column:
format: "%d/%m/%y"
name: Date
model: automlx
model_kwargs:
time_budget: 100
preprocessing: true
generate_metrics: true
generate_report: true
metrics_filename: metrics.csv
report_filename: report.html
report_theme: light
test_metrics_filename: test_metrics.csv