Upload a Model Group artifact (homogeneous)¶

A Model Group is a logical construct used to encapsulate several machine learning models into a single, version-controlled unit. With a model Group, you can group deployments, share resources, and perform live updates while maintaining immutability and reproducibility. In ADS, a Model Group is represented by ads.model.datascience_model_group.DataScienceModelGroup.

Model Group types¶

Model Groups can be created in different forms depending on your deployment pattern:

Homogeneous: A group of member models that share the same runtime and can be deployed together. For homogeneous model groups, ADS supports attaching a deployment runtime artifact (score.py + runtime.yaml) to the group.
Stacked: A group with a designated base model (base_model_id) plus additional member models. This is commonly used for stacked deployments.

Common operations¶

The DataScienceModelGroup API supports standard lifecycle operations:

create() / update() / delete()
activate() / deactivate()
from_id(<model_group_ocid>)
list(...)

Member models¶

Member models are provided via with_member_models(...) as a list of dictionaries:

model_id: The model OCID.
inference_key: A short name used to identify the model within the group.

Deployment types and runtime types¶

In OCI Data Science, deployments can be created for:

A single model (ModelDeploymentContainerRuntime.with_model_uri(<model_ocid>)), or
A model group (ModelDeploymentContainerRuntime.with_model_group_id(<model_group_ocid>)).

ADS supports different runtime options for model deployments. Two common patterns are:

Conda-based runtime using a standard model artifact (score.py + runtime.yaml).
Container runtime (BYOC) where you specify the container image and runtime configuration.

The examples below focus on container runtime deployment using a model group.

Artifact requirements¶

The artifact must be either:

A directory containing the required files at the top level, or
A .zip file containing the same structure.

At minimum, the following files must exist:

score.py
runtime.yaml

Example usage¶

Homogeneous model group (with runtime artifact)¶

import os

from ads.model.datascience_model_group import DataScienceModelGroup
from ads.model.model_metadata import ModelCustomMetadata


# Path to a model deployment runtime artifact directory.
# Example layout:
#   ./group_runtime_artifact/
#     score.py
#     runtime.yaml
artifact_dir = "./group_runtime_artifact"

custom_metadata = ModelCustomMetadata()
custom_metadata.add(
    key="test_key",
    value="test_value",
    description="test_description",
    category="other"
)

model_group = (
    DataScienceModelGroup()
    .with_compartment_id(os.environ.get("NB_SESSION_COMPARTMENT_OCID"))
    .with_project_id(os.environ.get("PROJECT_OCID"))
    .with_display_name("test-model-group")
    .with_description("Homogeneous model group with runtime artifact")
    .with_custom_metadata_list(custom_metadata)
    .with_member_models(
        [
            {"inference_key": "model_a", "model_id": "<model_ocid_a>"},
            {"inference_key": "model_b", "model_id": "<model_ocid_b>"},
        ]
    )
    .with_artifact(artifact_dir)
)

# For homogeneous model groups, `create()` uploads the artifact after the group is created.
model_group.create()

Stacked model group (no group artifact)¶

For stacked model groups, you provide a base_model_id. The model group artifact upload is only applicable for homogeneous model groups.

import os

from ads.model.datascience_model_group import DataScienceModelGroup
from ads.model.model_metadata import ModelCustomMetadata


custom_metadata = ModelCustomMetadata()
custom_metadata.add(
    key="test_key",
    value="test_value",
    description="test_description",
    category="other",
)

base_model_id = "<base_model_ocid>"

stacked_group = (
    DataScienceModelGroup()
    .with_compartment_id(os.environ.get("NB_SESSION_COMPARTMENT_OCID"))
    .with_project_id(os.environ.get("PROJECT_OCID"))
    .with_display_name("test-stacked-model-group")
    .with_description("Stacked model group")
    .with_custom_metadata_list(custom_metadata)
    .with_base_model_id(base_model_id)
    .with_member_models(
        [
            {"inference_key": "base", "model_id": base_model_id},
            {"inference_key": "adapter_1", "model_id": "<adapter_model_ocid_1>"},
        ]
    )
)

stacked_group.create()

Deploy a Model Group using container runtime¶

The following example shows how to create a model group and deploy it using a custom container runtime.

from ads.model.datascience_model_group import DataScienceModelGroup
from ads.model.model_metadata import ModelCustomMetadata
from ads.model.deployment import (
    ModelDeployment,
    ModelDeploymentInfrastructure,
    ModelDeploymentContainerRuntime,
)


custom_metadata = ModelCustomMetadata()
custom_metadata.add(
    key="test_key",
    value="test_value",
    description="test_description",
    category="other",
)

model_group = (
    DataScienceModelGroup()
    .with_display_name("test_create_model_group")
    .with_description("test create model group description")
    .with_freeform_tags(**{"test_key": "test_value"})
    .with_custom_metadata_list(custom_metadata)
    .with_member_models(
        [
            {
                "inference_key": "meta-llama/Llama-2-7b-hf",
                "model_id": "ocid1.datasciencemodel.oc1.<region>.<unique_id>",
            },
            {
                "inference_key": "gemma-2b-gov-ext",
                "model_id": "ocid1.datasciencemodel.oc1.<region>.<unique_id>",
            },
        ]
    )
)
model_group.create()


# Configure model deployment infrastructure
infrastructure = (
    ModelDeploymentInfrastructure()
    .with_project_id("<PROJECT_OCID>")
    .with_compartment_id("<COMPARTMENT_OCID>")
    .with_shape_name("VM.Standard.E4.Flex")
    .with_shape_config_details(ocpus=1, memory_in_gbs=16)
    .with_replica(1)
    .with_bandwidth_mbps(10)
    .with_web_concurrency(10)
    .with_access_log(
        log_group_id="<ACCESS_LOG_GROUP_OCID>",
        log_id="<ACCESS_LOG_OCID>",
    )
    .with_predict_log(
        log_group_id="<PREDICT_LOG_GROUP_OCID>",
        log_id="<PREDICT_LOG_OCID>",
    )
    .with_subnet_id("<SUBNET_OCID>")
)


# Configure model deployment runtime
container_runtime = (
    ModelDeploymentContainerRuntime()
    .with_image("<region>.ocir.io/<namespace>/<image>:<tag>")
    .with_image_digest("<IMAGE_DIGEST>")
    .with_entrypoint(["python", "/opt/ds/model/deployed_model/api.py"])
    .with_server_port(5000)
    .with_health_check_port(5000)
    .with_env({"key": "value"})
    .with_deployment_mode("HTTPS_ONLY")
    .with_model_group_id(model_group.id)
)


# Configure model deployment
deployment = (
    ModelDeployment()
    .with_display_name("Model Deployment Demo using ADS")
    .with_description("The model deployment description")
    .with_freeform_tags(**{"key1": "value1"})
    .with_infrastructure(infrastructure)
    .with_runtime(container_runtime)
)

# Deploy
deployment.deploy()

# Invoke endpoint
deployment.predict(data=<data>)