Productionize

Configure

After having set up ads opctl on your desired machine using ads opctl configure, you are ready to begin you anomaly detection applicaiton. At a bare minimum, you will need to provide the following details about the data:

  • Path to the input data (input_data)

  • Name of the Datetime column (datetime_column)

  • Name of the Target column (target_column)

These details exactly match the initial anomaly.yaml file generated by running ads operator init --type anomaly:

kind: operator
type: anomaly
version: v1
spec:
    datetime_column:
        name: Date
    input_data:
        url: data.csv
    target_column: target

Optionally, you are able to specify much more. The most common additions are:

  • Path to the validation data, which has all of the columns of the input_data plus an anomaly column. (validation_data)

  • Path to test data, in the event you want to evaluate the selected model on a test set (test_data)

  • List of column names that index different timeseries within the data, such as a product_ID or some other such series (target_category_columns)

  • Path to the output directory, where the operator will place the outliers.csv, report.html, and other artifacts produced from the run (output_directory)

An extensive list of parameters can be found in the YAML Schema section.

Run

Once written, run the anomaly.yaml file:

ads operator run -f anomaly.yaml

Interpret Results

The anomaly detection operator produces many output files: outliers.csv, report.html, and optionally inliers.csv.

We will go through each of these output files in turn.

outliers.csv

This file contains the entire historical dataset with the following columns:

  • Date: Time series data

  • Series: Categorical or numerical index

  • Target Column: Input data

  • Score: This will give a score from 0-1 of how anomalous a datapoint is

report.html

The report.html file is designed differently for each model type. Generally, it contains a summary of the input and validation data, a plot of the target from input data overlaid with red dots for anomalous values, analysis of the models used, and details about the model components. It also includes a receipt YAML file, providing a fully detailed version of the original anomaly.yaml file.

Metrics.csv

The metrics file includes relevant metrics calculated on the training set.

Examples

Simple Example

The simplest yaml file is generated by the ads operator init --type anomaly and looks like the following:

kind: operator
type: anomaly
version: v1
spec:
    datetime_column:
        name: Date
    input_data:
        url: data.csv
    model: auto
    target_column: target

Typical Example

A typical anomaly detection application may have the following fields:

kind: operator
type: anomaly
version: v1
spec:
    input_data:
        connect_args:
            user: XXX
            password: YYY
            dsn: "localhost/orclpdb"
        sql: 'SELECT Series, Total, time FROM live_data'
    datetime_column:
        name: time
        format: "%H:%M:%S"
    model: "auto"
    output_directory:
        url: results
    target_category_columns:
        - Series
    target_column: Total
    test_data:
        url: oci://bucket@namespace/test_data.csv

Complex Example

The yaml can also be maximally stated as follows:

kind: operator
type: anomaly
version: v1
spec:
    input_data:
        connect_args:
            user: XXX
            password: YYY
            dsn: "localhost/orclpdb"
        sql: 'SELECT Store_ID, Sales, Date FROM live_data'
    validation_data:
        url: oci://bucket@namespace/additional_data.csv
        columns:
            - Date
            - Store_ID
            - v1
            - v3
            - v4
    output_directory:
        url: results
    test_data:
        url: test_data.csv
    target_category_columns:
        - Store_ID
    target_column: Sales
    datetime_column:
        format: "%d/%m/%y"
        name: Date
    model: automlx
    model_kwargs:
        time_budget: 100
    preprocessing: true
    generate_metrics: true
    generate_report: true
    metrics_filename: metrics.csv
    report_filename: report.html
    report_theme: light
    test_metrics_filename: test_metrics.csv