After having set up
ads opctl on your desired machine using
ads opctl configure, you are ready to begin using pii operator. At a bare minimum, you will need to provide the following details about your tasks:
Path to the input data (input_data)
Path to the output directory, where the operator will place the processed data and report.html produced from the run (output_directory)
Name of the column with user data (target_column)
The detector will be used in the operator (detectors)
You can check Configure Detector for more details on how to configure
detectors parameter. These details exactly match the initial pii.yaml file generated by running
ads operator init --type pii:
- name: default.phone
Optionally, you are able to specify much more. The most common additions are:
Whether to show sensitive content in the report (show_sensitive_content)
Way to process the detected entity (action)
An extensive list of parameters can be found in the YAML Schema.
After you have your pii.yaml written, you simply run the operator using:
ads operator run -f pii.yaml
The pii operator produces the following output files:
We will go through each of these output files in turn.
The name of this file can be customized based on
output_directory parameters in the configuration yaml. This file contains the processed dataset.
The report.html file is customized based on report parameters in the configuration yaml. It contains a summary of statistics, a plot of entities distributions, details of the resolved entites, and details about any modelused. By default sensitive information is not shown in the report, but for debugging purposes you can disable this with
show_sensitive_content. It also includes a copy of YAML file, providing a fully detailed version of the original specification.