EmbeddingONNXModel ****************** See `API Documentation <../../../ads.model.framework.html#ads.model.framework.embedding_onnx_model.EmbeddingONNXModel>`__ Overview ======== The ``ads.model.framework.embedding_onnx_model.EmbeddingONNXModel`` class in ADS is designed to rapidly get an Embedding ONNX Model into production. The ``.prepare()`` method creates the model artifacts that are needed without configuring it or writing code. ``EmbeddingONNXModel`` supports `OpenAI spec `_ for embeddings endpoint. .. include:: ../_template/overview.rst The following steps take the `sentence-transformers/all-MiniLM-L6-v2 `_ model and deploy it into production with a few lines of code. **Download Embedding Model from HuggingFace** .. code-block:: python3 import tempfile import os import shutil from huggingface_hub import snapshot_download local_dir = tempfile.mkdtemp() allow_patterns=[ "onnx/model.onnx", "config.json", "special_tokens_map.json", "tokenizer_config.json", "tokenizer.json", "vocab.txt" ] # download files needed for this demostration to local folder snapshot_download( repo_id="sentence-transformers/all-MiniLM-L6-v2", local_dir=local_dir, allow_patterns=allow_patterns ) artifact_dir = tempfile.mkdtemp() # copy all downloaded files to artifact folder for file in allow_patterns: shutil.copy(local_dir + "/" + file, artifact_dir) Install Conda Pack ================== To deploy the embedding onnx model, start with the onnx conda pack with slug ``onnxruntime_p311_gpu_x86_64``. .. code-block:: bash odsc conda install -s onnxruntime_p311_gpu_x86_64 Prepare Model Artifact ====================== Instantiate an ``EmbeddingONNXModel()`` object with Embedding ONNX model. All the model related files will be saved under ``artifact_dir``. ADS will auto generate the ``score.py`` and ``runtime.yaml`` that are required for the deployment. For more detailed information on what parameters that ``EmbeddingONNXModel`` takes, refer to the `API Documentation <../../../ads.model.framework.html#ads.model.framework.embedding_onnx_model.EmbeddingONNXModel>`__ .. code-block:: python3 import ads from ads.model import EmbeddingONNXModel # other options are `api_keys` or `security_token` depending on where the code is executed ads.set_auth("resource_principal") embedding_onnx_model = EmbeddingONNXModel(artifact_dir=artifact_dir) embedding_onnx_model.prepare( inference_conda_env="onnxruntime_p311_gpu_x86_64", inference_python_version="3.11", model_file_name="model.onnx", force_overwrite=True ) Summary Status ============== .. include:: ../_template/summary_status.rst .. figure:: ../figures/summary_status.png :align: center Verify Model ============ Call the ``verify()`` to check if the model can be executed locally. .. code-block:: python3 embedding_onnx_model.verify( { "input": ['What are activation functions?', 'What is Deep Learning?'], "model": "sentence-transformers/all-MiniLM-L6-v2" }, ) If successful, similar results as below should be presented. .. code-block:: python3 { 'object': 'list', 'data': [{ 'object': 'embedding', 'embedding': [[ -0.11011122167110443, -0.39235609769821167, 0.38759472966194153, -0.34653618931770325, ..., ]] }] } Register Model ============== Save the model artifacts and create an model entry in OCI DataScience Model Catalog. .. code-block:: python3 embedding_onnx_model.save(display_name="sentence-transformers/all-MiniLM-L6-v2") Deploy and Generate Endpoint ============================ Create a model deployment from the embedding onnx model in Model Catalog. The process takes several minutes and the deployment configurations will be presented once it's completed. .. code-block:: python3 embedding_onnx_model.deploy( display_name="all-MiniLM-L6-v2 Embedding Model Deployment", deployment_log_group_id="", deployment_access_log_id="", deployment_predict_log_id="", deployment_instance_shape="VM.Standard.E4.Flex", deployment_ocpus=20, deployment_memory_in_gbs=256, ) Run Prediction against Endpoint =============================== Call ``predict()`` to check the model deployment endpoint. .. code-block:: python3 embedding_onnx_model.predict( { "input": ["What are activation functions?", "What is Deep Learning?"], "model": "sentence-transformers/all-MiniLM-L6-v2" }, ) If successful, similar results as below should be presented. .. code-block:: python3 { 'object': 'list', 'data': [{ 'object': 'embedding', 'embedding': [[ -0.11011122167110443, -0.39235609769821167, 0.38759472966194153, -0.34653618931770325, ..., ]] }] } Run Prediction with OCI CLI =========================== Model deployment endpoints can also be invoked with the OCI CLI. .. code-block:: bash oci raw-request --http-method POST --target-uri --request-body '{"input": ["What are activation functions?", "What is Deep Learning?"], "model": "sentence-transformers/all-MiniLM-L6-v2"}' --auth resource_principal Example ======= .. code-block:: python3 import tempfile import os import shutil import ads from ads.model import EmbeddingONNXModel from huggingface_hub import snapshot_download # other options are `api_keys` or `security_token` depending on where the code is executed ads.set_auth("resource_principal") local_dir = tempfile.mkdtemp() allow_patterns=[ "onnx/model.onnx", "config.json", "special_tokens_map.json", "tokenizer_config.json", "tokenizer.json", "vocab.txt" ] # download files needed for this demostration to local folder snapshot_download( repo_id="sentence-transformers/all-MiniLM-L6-v2", local_dir=local_dir, allow_patterns=allow_patterns ) artifact_dir = tempfile.mkdtemp() # copy all downloaded files to artifact folder for file in allow_patterns: shutil.copy(local_dir + "/" + file, artifact_dir) # initialize EmbeddingONNXModel instance and prepare score.py, runtime.yaml and openapi.json files. embedding_onnx_model = EmbeddingONNXModel(artifact_dir=artifact_dir) embedding_onnx_model.prepare( inference_conda_env="onnxruntime_p311_gpu_x86_64", inference_python_version="3.11", model_file_name="model.onnx", force_overwrite=True ) # validates model locally embedding_onnx_model.verify( { "input": ['What are activation functions?', 'What is Deep Learning?'], "model": "sentence-transformers/all-MiniLM-L6-v2" }, ) # save model to oci model catalog embedding_onnx_model.save(display_name="sentence-transformers/all-MiniLM-L6-v2") # deploy model embedding_onnx_model.deploy( display_name="all-MiniLM-L6-v2 Embedding Model Deployment", deployment_log_group_id="", deployment_access_log_id="", deployment_predict_log_id="", deployment_instance_shape="VM.Standard.E4.Flex", deployment_ocpus=20, deployment_memory_in_gbs=256, ) # check model deployment endpoint embedding_onnx_model.predict( { "input": ["What are activation functions?", "What is Deep Learning?"], "model": "sentence-transformers/all-MiniLM-L6-v2" }, )