Getting Started

Configure

After having set up ads opctl on your desired machine using ads opctl configure, you are ready to begin using pii operator. At a bare minimum, you will need to provide the following details about your tasks:

  • Path to the input data (input_data)

  • Path to the output directory, where the operator will place the processed data and report.html produced from the run (output_directory)

  • Name of the column with user data (target_column)

  • The detector will be used in the operator (detectors)

You can check Configure Detector for more details on how to configure detectors parameter. These details exactly match the initial pii.yaml file generated by running ads operator init --type pii:

kind: operator
type: pii
version: v1
spec:
    input_data:
        url: mydata.csv
    target_column: target
    output_directory:
        url: result/
    detectors:
        - name: default.phone
          action: mask

Optionally, you are able to specify much more. The most common additions are:

  • Whether to show sensitive content in the report (show_sensitive_content)

  • Way to process the detected entity (action)

An extensive list of parameters can be found in the YAML Schema.

Run

After you have your pii.yaml written, you simply run the operator using:

ads operator run -f pii.yaml

Interpret Results

The pii operator produces the following output files: mydata-out.csv and report.html.

We will go through each of these output files in turn.

mydata-out.csv

The name of this file can be customized based on output_directory parameters in the configuration yaml. This file contains the processed dataset.

report.html

The report.html file is customized based on report parameters in the configuration yaml. It contains a summary of statistics, a plot of entities distributions, details of the resolved entites, and details about any modelused. By default sensitive information is not shown in the report, but for debugging purposes you can disable this with show_sensitive_content. It also includes a copy of YAML file, providing a fully detailed version of the original specification.