Regression Operator¶

The Regression Operator is a low-code operator for supervised tabular regression. It trains a model from a training dataset, optionally evaluates on held-out test data, and writes a consistent set of artifacts such as predictions, metrics, an HTML report, and a serialized model bundle.

Overview¶

Required inputs

The current implementation requires:

training_data
target_column

All columns in training_data except target_column are treated as features.

Optional inputs

The operator also supports:

test_data for held-out evaluation
output_directory for artifact location
column_types to override automatic type inference
model_kwargs to control explicit model runs
save_and_deploy_to_md to save the trained model to OCI Model Catalog and create a Model Deployment

Supported models

The supported model values are:

auto
linear_regression
random_forest
knn
xgboost

auto performs cross-validation across the explicit model families and selects the best one for the configured metric. Explicit models use Optuna-based tuning by default.

Preprocessing

By default, the operator:

infers numeric, categorical, and date columns
imputes missing numeric values with the median
imputes missing categorical values with the mode
one-hot encodes categorical columns
expands date columns into year, month, day, dayofweek, and dayofyear

Artifacts

Depending on the configuration and available data, the operator can write:

training_predictions.csv
test_predictions.csv
training_metrics.csv
test_metrics.csv
global_explanations.csv
report.html
model.pkl
model_registration_info.json
deployment_info.json

global_explanations.csv is written only when generate_explanations: true and explainability output is successfully produced.