Model Registration¶

Model Artifact¶

To save a trained model on OCI Data Science, prepare a Model Artifact.

Model Artifact is a zip file which contains the following artifacts -

Serialized model or models

runtime.yaml - This yaml captures provenance information and deployment conda environment

score.py - Entry module which is used by the model deployment server to load the model and run prediction

input_schema.json - Describes the schema of the features that will be used within predict function

output_schema.json - Describes the schem of the prediction values

Any other artifcat that are required during inference time.

ADS can auto generate all the mandatory files to help save the models that are compliant with the OCI Data Science Model Deployment service.

Auto generation of score.py with framework specific code for loading models and fetching prediction is available for following frameworks-

scikit-learn
XGBoost
LightGBM
PyTorch
SparkPipelineModel
TensorFlow

To accomodate for other frameworks that are unknown to ADS, a template code for score.py is generated in the provided artificat directory location.

Prepare the Model Artifact¶

To prepare the model artifact -

Train a model using the framework of your choice
Create a Model object from one of the framework specific Models available under ads.model.framework.*. The Model class takes two parameters - estimator object and a directory location to store autogenerated artifacts.
call prepare() to generate all the files.

See API documentation for more details about the parameters.

Here is an example for preparing a model artifact for TensorFlow model.

from ads.model.framework.tensorflow_model import TensorFlowModel
from uuid import uuid4
import tensorflow as tf
from ads.common.model_metadata import UseCaseType

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

tf_estimator = tf.keras.models.Sequential(
        [
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(128, activation="relu"),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(10),
        ]
    )
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
tf_estimator.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
tf_estimator.fit(x_train, y_train, epochs=1)

tf_model = TensorFlowModel(tf_estimator, artifact_dir=f"./model-artifact-{str(uuid4())}")

# Autogenerate score.py, pickled model, runtime.yaml, input_schema.json and output_schema.json
tf_model.prepare(inference_conda_env="generalml_p38_cpu_v1",
                    use_case_type=UseCaseType.MULTINOMIAL_CLASSIFICATION,
                    X_sample=trainx,
                    y_sample=trainy
                )

# Verify generated artifacts
tf_model.verify(x_test[:1])

# Register TensorFlow model
model_id = tf_model.save()

['output_schema.json', 'score.py', 'runtime.yaml',  'model.h5', '.model-ignore', 'input_schema.json']

ADS automatically captures:

Provenance metadata - commit id, git branch, etc

Taxonomy metadata such as model hyperparameters, framework name.

Custom metadata such as Data science conda environment when available, Model artifact inventory, model serialization format, etc.

Schema of input and target variables. This requires input sample and target sample to be passed while calling prepare

Note:

UseCaseType in metadata_taxonomy cannot be automatically populated. One way to populate the use case is to pass use_case_type to the prepare method.
Model introspection is automatically triggered.

Prepare with custom `score.py`¶

Added in version 2.8.4.

You could provide the location of your own score.py by score_py_uri in prepare(). The provided score.py will be added into model artifact.

tf_model.prepare(
    inference_conda_env="generalml_p38_cpu_v1",
    use_case_type=UseCaseType.MULTINOMIAL_CLASSIFICATION,
    X_sample=trainx,
    y_sample=trainy,
    score_py_uri="/path/to/score.py"
)

score.py¶

In the prepare step, the service automatically generates a score.py file in the artifact directory.

score.py is used by the Data Science Model Deployment service to generate predictions in the input feature. Here is a minimal score.py implementation -

import joblib
model_name = "model.joblib"

def load_model(): # load_model must mandatorily return ``not None`` object.

  model = None
  with open(os.path.join(os.path.dirname(os.path.realpath(__file__)), model_file_name), "rb") as mfile:
    model = joblib.load(mfile)
  return model

def predict(data, model=load_model()):
  return model.predict(data).tolist()

ADS autogenerates framework specific score.py which provides following functionality -

Parse the input data and convert to pandas dataframe/numpy array/list
Ensure the data type after converting to pandas dataframe matches the training time. This is achieved using the schema definition generated during prepare step.
Serialize prediction generated by the model such that it is json serializable to avoid deployment runtime errors

You could customize the score.py to fit your use case. The most common use case for changing the score.py file is to add preprocessing and postprocessing steps to the predict() method.

Refer Cusotmization section for how to change and verify the model artifacts.

The score.py consists of multiple functions among which the load_model and predict are most important.

GPU Deployments¶

When deploying your TensorFlow or PyTorch models onto a GPU shape, the ADS generated score.py manages GPU integration for you. It will automatically transfer your data to the GPU (or multiple GPUs) and perform the inference on that GPU. When using ADS 2.8.4 or later, any TensorFlow or PyTorch model artifact can be deployed efficiently on either CPU or GPU regardless of how it was trained.

The Model Deployment Service handles parallelization for you. Whether you have a single or multi GPU deployment, the Model Deployment Service will determine how many replicas of your model can be supported based on the size of your model artifact and the size of your GPU shape. Finally, the auto-generated score.py will randomly assign those replicas across the GPU(s). The following code example registers a PyTorch Model tuned and deployed on GPUs. Learn more about the Model Deployment Service here.

import torch
import torchvision
from ads.common.model_metadata import UseCaseType
from ads.model.framework.pytorch_model import PyTorchModel

model = torchvision.models.resnet18(pretrained=True)
model.to("cuda:0")

# Tune your model
fake_input = torch.Tensor(np.zeros((1, 3, 224, 224))).to("cuda:0")
model.forward(fake_input)

# Prepare and Register your model
model.eval()
pytorch_model = PyTorchModel("pytorch_model_artifact", artifact_dir=artifact_dir)
pytorch_model.prepare(
    inference_conda_env="pytorch110_p38_gpu_v1",
    training_conda_env="pytorch110_p38_gpu_v1",
    use_case_type=UseCaseType.IMAGE_CLASSIFICATION,
    force_overwrite=True,
    use_torch_script=True,
)
pytorch_model.save()
pytorch_model.deploy(deployment_instance_shape="VM.GPU3.2")

pytorch_model.predict(fake_input.to_numpy())

load_model¶

During deployment, the load_model method loads the serialized model. The load_model method is always fully populated, except when you set serialize=False for GenericModel.

For the GenericModel class, if you choose serialize=True in the init function, the model is pickled and the score.py is fully auto-populated to support loading the pickled model. Otherwise, the user is responsible to fill the load_model.
For other frameworks, this part is fully populated.

Note: load_model should return not None value for successful deployment.

predict¶

The predict method is triggered every time a payload is sent to the model deployment endpoint. The method takes the payload and the loaded model as inputs. Based on the payload, the method returns the predicted results output by the model.

pre_inference¶

If the payload passed to the endpoint needs preprocessing, this function does the preprocessing step. The user is fully responsible for the preprocessing step.

post_inference¶

If the predicted result from the model needs some postprocessing, the user can put the logic in this function.

deserialize¶

When you use the .verify() or .predict() methods from model classes such as GenericModel or SklearnModel, if the data passed in is not in bytes or JsonSerializable, the models try to serialize the data. For example, if a pandas dataframe is passed and not accepted by the deployment endpoint, the pandas dataframe is converted to JSON internally. When the X_sample variable is passed into the .prepare() function, the data type of pandas dataframe is passed to the endpoint, and the schema of the dataframe is recorded in the input_schema.json file. Then, the JSON payload is sent to the endpoint. Because the model expects to take a pandas dataframe, the .deserialize() method converts the JSON back to the pandas dataframe using the schema and the data type. For all frameworks except for the GenericModel class, the .deserialize() method is auto-populated. Note that for each framework, only specific data types are supported.

Starting from .. versionadded:: 2.6.3, you can send the bytes to the endpoint directly. If the bytes payload is sent to the endpoint, bytes are passed directly to the model. If the model expects a specific data format, you need to write the conversion logic yourself.

fetch_data_type_from_schema¶

This function is used to load the schema from the input_schema.json when needed.

Model Introspection¶

The .intropect() method runs some sanity checks on the runtime.yaml, and score.py files. This is to help you identify potential errors that might occur during model deployment. It checks fields such as environment path, validates the path’s existence on the Object Storage, checks if the .load_model(), and .predict() functions are defined in score.py, and so on. The result of model introspection is automatically saved to the taxonomy metadata and model artifacts.

tf_model.introspect()

['output_schema.json', 'runtime.yaml', 'model.joblib', 'input_schema.json', 'score.py']

Reloading model artifacts automatically invokes model introspection. However, you can invoke introspection manually by calling tf_model.introspect():

The ArtifactTestResults field is populated in metadata_taxonomy when instrospect is triggered:

tf_model.metadata_taxonomy['ArtifactTestResults']

key: ArtifactTestResults
value:
  runtime_env_path:
    category: conda_env
    description: Check that field MODEL_DEPLOYMENT.INFERENCE_ENV_PATH is set
  ...

Save Model¶

The .save() method saves the model artifacts, introspection results, schema, metadata, etc on OCI Data Science Service and returns a model OCID. See API documentation for more details.

model_catalog_id = tf_model.save(display_name='TF Model')

You can print a model to see some details about it.

print(tf_model)

You can also print a summary status of the model.

lgbm_model.summary_status()

Save Large Model¶

Added in version 2.6.4.

Large models are models with artifacts between 2 and 6 GB. You must first upload large models to an Object Storage bucket, and then transfer them to a model catalog.

ADS framework specific wrapper classes save large models using a process almost identical to model artifacts that are less than 2GB. An Object Storage bucket is required with Data Science service access granted to that bucket.

If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Create an Object Storage bucket. Make a note of the namespace, compartment, and bucket name. Configure the following policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure these policies in IAM in the Console.

Allow service datascience to manage object-family in compartment <compartment> where ALL {target.bucket.name='<bucket_name>'}

Allow service objectstorage-<region_identifier> to manage object-family in compartment <compartment> where ALL {target.bucket.name='<bucket_name>'}

Because Object Storage is a regional service, you must authorize the Object Storage service for each region. To determine the region identifier value of an Oracle Cloud Infrastructure region, see Regions and Availability Domains. See API documentation for more details.

The following saves the framework specific wrapper object, model, to the model catalog and returns the OCID from the model catalog:

model_catalog_id = tf_model.save(
    display_name='TF Model'
    bucket_uri=<oci://<bucket_name>@<namespace>/<path>/>,
    overwrite_existing_artifact = True,
    remove_existing_artifact = True,

)

Export Model Artifact to Object Storage¶

The model artifacts can be uploaded to the Object Storage bucket. The .upload_artifact() method archives all files located in the artifact folder and uploads it to the Object Storage bucket.

tf_model.upload_artifact(uri="<oci://bucket@namespace/prefix/tf_model.zip>")

Uploaded artifacts can be used to load and recreate framework specific wrapper objects from the existing model artifact archive. To construct the model back from the existing artifact, use the .from_model_artifact() method.

from ads.model.framework.tensorflow_model import TensorFlowModel

tf_model = TensorFlowModel.from_model_artifact(
    "<oci://bucket@namespace/prefix/tf_model.zip>"
)