Productionize#

Configure#

After having set up ads opctl on your desired machine using ads opctl configure, you are ready to begin you anomaly detection applicaiton. At a bare minimum, you will need to provide the following details about the data:

  • Path to the input data (input_data)

  • Name of the Datetime column (datetime_column)

  • Name of the Target column (target_column)

These details exactly match the initial anomaly.yaml file generated by running ads operator init --type anomaly:

kind: operator
type: anomaly
version: v1
spec:
    datetime_column:
        name: Date
    input_data:
        url: data.csv
    target_column: target

Optionally, you are able to specify much more. The most common additions are:

  • Path to the validation data, which has all of the columns of the input_data plus an anomaly column. (validation_data)

  • Path to test data, in the event you want to evaluate the selected model on a test set (test_data)

  • List of column names that index different timeseries within the data, such as a product_ID or some other such series (target_category_columns)

  • Path to the output directory, where the operator will place the outliers.csv, report.html, and other artifacts produced from the run (output_directory)

An extensive list of parameters can be found in the YAML Schema section.

Run#

Once written, run the anomaly.yaml file:

ads operator run -f anomaly.yaml

Interpret Results#

The anomaly detection operator produces many output files: outliers.csv, report.html, and optionally inliers.csv.

We will go through each of these output files in turn.

outliers.csv

This file contains the entire historical dataset with the following columns:

  • Date: Time series data

  • Series: Categorical or numerical index

  • Target Column: Input data

  • Score: This will give a score from 0-1 of how anomalous a datapoint is

report.html

The report.html file is designed differently for each model type. Generally, it contains a summary of the input and validation data, a plot of the target from input data overlaid with red dots for anomalous values, analysis of the models used, and details about the model components. It also includes a receipt YAML file, providing a fully detailed version of the original anomaly.yaml file.

Metrics.csv

The metrics file includes relevant metrics calculated on the training set.

Examples#

Simple Example

The simplest yaml file is generated by the ads operator init --type anomaly and looks like the following:

kind: operator
type: anomaly
version: v1
spec:
    datetime_column:
        name: Date
    input_data:
        url: data.csv
    model: auto
    target_column: target

Typical Example

A typical anomaly detection application may have the following fields:

kind: operator
type: anomaly
version: v1
spec:
    input_data:
        connect_args:
            user: XXX
            password: YYY
            dsn: "localhost/orclpdb"
        sql: 'SELECT Series, Total, time FROM live_data'
    datetime_column:
        name: time
        format: "%H:%M:%S"
    model: "auto"
    output_directory:
        url: results
    target_category_columns:
        - Series
    target_column: Total
    test_data:
        url: oci://bucket@namespace/test_data.csv

Complex Example

The yaml can also be maximally stated as follows:

kind: operator
type: anomaly
version: v1
spec:
    input_data:
        connect_args:
            user: XXX
            password: YYY
            dsn: "localhost/orclpdb"
        sql: 'SELECT Store_ID, Sales, Date FROM live_data'
    validation_data:
        url: oci://bucket@namespace/additional_data.csv
        columns:
            - Date
            - Store_ID
            - v1
            - v3
            - v4
    output_directory:
        url: results
    test_data:
        url: test_data.csv
    target_category_columns:
        - Store_ID
    target_column: Sales
    datetime_column:
        format: "%d/%m/%y"
        name: Date
    model: automlx
    model_kwargs:
        time_budget: 100
    preprocessing: true
    generate_metrics: true
    generate_report: true
    metrics_filename: metrics.csv
    report_filename: report.html
    report_theme: light
    test_metrics_filename: test_metrics.csv