Productionize#
Configure#
After having set up ads opctl
on your desired machine using ads opctl configure
, you are ready to begin you anomaly detection applicaiton. At a bare minimum, you will need to provide the following details about the data:
Path to the input data (input_data)
Name of the Datetime column (datetime_column)
Name of the Target column (target_column)
These details exactly match the initial anomaly.yaml file generated by running ads operator init --type anomaly
:
kind: operator
type: anomaly
version: v1
spec:
datetime_column:
name: Date
input_data:
url: data.csv
target_column: target
Optionally, you are able to specify much more. The most common additions are:
Path to the validation data, which has all of the columns of the input_data plus an
anomaly
column. (validation_data)Path to test data, in the event you want to evaluate the selected model on a test set (test_data)
List of column names that index different timeseries within the data, such as a product_ID or some other such series (target_category_columns)
Path to the output directory, where the operator will place the outliers.csv, report.html, and other artifacts produced from the run (output_directory)
An extensive list of parameters can be found in the YAML Schema
section.
Run#
Once written, run the anomaly.yaml file:
ads operator run -f anomaly.yaml
Interpret Results#
The anomaly detection operator produces many output files: outliers.csv
, report.html
, and optionally inliers.csv
.
We will go through each of these output files in turn.
outliers.csv
This file contains the entire historical dataset with the following columns:
Date: Time series data
Series: Categorical or numerical index
Target Column: Input data
Score: This will give a score from 0-1 of how anomalous a datapoint is
report.html
The report.html file is designed differently for each model type. Generally, it contains a summary of the input and validation data, a plot of the target from input data overlaid with red dots for anomalous values, analysis of the models used, and details about the model components. It also includes a receipt YAML file, providing a fully detailed version of the original anomaly.yaml file.
Metrics.csv
The metrics file includes relevant metrics calculated on the training set.
Examples#
Simple Example
The simplest yaml file is generated by the ads operator init --type anomaly
and looks like the following:
kind: operator
type: anomaly
version: v1
spec:
datetime_column:
name: Date
input_data:
url: data.csv
model: auto
target_column: target
Typical Example
A typical anomaly detection application may have the following fields:
kind: operator
type: anomaly
version: v1
spec:
input_data:
connect_args:
user: XXX
password: YYY
dsn: "localhost/orclpdb"
sql: 'SELECT Series, Total, time FROM live_data'
datetime_column:
name: time
format: "%H:%M:%S"
model: "auto"
output_directory:
url: results
target_category_columns:
- Series
target_column: Total
test_data:
url: oci://bucket@namespace/test_data.csv
Complex Example
The yaml can also be maximally stated as follows:
kind: operator
type: anomaly
version: v1
spec:
input_data:
connect_args:
user: XXX
password: YYY
dsn: "localhost/orclpdb"
sql: 'SELECT Store_ID, Sales, Date FROM live_data'
validation_data:
url: oci://bucket@namespace/additional_data.csv
columns:
- Date
- Store_ID
- v1
- v3
- v4
output_directory:
url: results
test_data:
url: test_data.csv
target_category_columns:
- Store_ID
target_column: Sales
datetime_column:
format: "%d/%m/%y"
name: Date
model: automlx
model_kwargs:
time_budget: 100
preprocessing: true
generate_metrics: true
generate_report: true
metrics_filename: metrics.csv
report_filename: report.html
report_theme: light
test_metrics_filename: test_metrics.csv