1. Overview
Ready to elevate the security and privacy of your GPU-accelerated workloads? This codelab will guide you through the capabilities of Trusted Space, an offering to provide strong operator isolation and accelerator support for your sensitive AI/ML workloads.
Protecting valuable data, models, and keys is more critical than ever. Trusted Space offers a solution by ensuring that your workloads operate within a secure, trusted environment where even the workload operator has no access.
Here's what Trusted Space offers:
- Enhanced Privacy and Security: Trusted Space provides a trusted execution environment where your sensitive assets (e.g. models, valuable data and keys) remain protected, backed by cryptographic proof.
- Operator Isolation: Eliminate concerns about operator interference. With Trusted Space, even your workload operators have no access, preventing them from SSHing, accessing data, installing software, or tampering with your code.
- Accelerator Support: Trusted Space is designed to work seamlessly with a wide range of hardware accelerators, including GPUs like the H100, A100, T4, and L4. This ensures your performance-critical AI/ML applications run smoothly.
What you'll learn
- Gain the understanding of Trusted Space's key offerings.
- Learn how to deploy and configure a Trusted Space environment for securing valuable assets of your AI/Ml workload.
What you'll need
- A Google Cloud Platform Project
- Basic knowledge of Google Compute Engine and Accelerators.
- Basic knowledge of Service Accounts, Key Management, Workload Identity Federation and attribute conditions.
- Basic knowledge of Containers and Artifact Registry
Protecting Sensitive Code Generation Prompts with Primus Company
In this codelab, we will step into the shoes of Primus, a company that prioritizes the privacy and security of its employees' data. Primus wants to deploy a code generation model to assist its developers with their coding tasks. However, they are concerned about protecting the confidentiality of the prompts submitted by their employees, as these prompts often contain sensitive code snippets, internal project details, or proprietary algorithms.
Why doesn't the Primus Company trust the Operator ?
Primus Corp operates in a highly competitive market. Their codebase contains valuable intellectual property, including proprietary algorithms and sensitive code snippets that provide a competitive edge. They are concerned about the possibility of corporate espionage by workload operators. Additionally, employee prompts might include confidential "Need To Know" parts of code that Primus Corp wants to protect.
To address this concern, Primus Corp will leverage Trusted Space to isolate the inference server running the model for code generation. Here's how it works:
- Prompt Encryption: Before sending a prompt to the inference server, each employee will encrypt it using a KMS key managed by Primus Corp in Google Cloud. This ensures that only the Trusted Space environment, where the corresponding decryption key is available, can decrypt it and access the plaintext prompt. In a real-world scenario, client-side encryption can be handled by the available libraries (e.g. tink). As part of this codelab, we would be using this sample client application with envelope encryption.
- Operator Isolation: Only the inference server, running within a Trusted Space environment, will have access to the key used for encryption and would be able to decrypt the prompt under a trusted environment. Access of the encryption key would be protected by the Workload Identity Pool. Due to the isolation guarantees of Trusted Space, even the workload operator cannot access the key used for encryption and the decrypted content.
- Secure Inference using Accelerator(s): The inference server would be launched on a Shielded VM (as part of Trusted space setup) which would ensure that the workload instance has not been compromised by boot- or kernel-level malware or rootkits. This server decrypts the prompt within the Trusted Space environment, performs the inference using the code generation model, and returns the generated code to the employee.
2. Set up Cloud Resources
Before you begin
- Clone this repository using the below command to get required scripts that are used as part of this codelab.
git clone https://github.com/GoogleCloudPlatform/confidential-space.git
- Change the directory for this codelab.
cd confidential-space/codelabs/trusted_space_codelab/scripts
- Ensure you have set the required project environment variables as shown below. For more information about setting up a GCP project, please refer to this codelab. You can refer to this to get details about how to retrieve project id and how it is different from project name and project number.
export PRIMUS_PROJECT_ID=<GCP project id of Primus>
- Enable Billing for your projects.
- Enable Confidential Computing API and following APIs for both the projects.
gcloud services enable \
cloudapis.googleapis.com \
cloudresourcemanager.googleapis.com \
cloudkms.googleapis.com \
cloudshell.googleapis.com \
container.googleapis.com \
containerregistry.googleapis.com \
iam.googleapis.com \
confidentialcomputing.googleapis.com
- Assign values to the variables for the resource names specified above using the following command. These variables allow you to customize the resource names as needed and also use existing resources if they are already created. (e.g
export PRIMUS_SERVICE_ACCOUNT='my-service-account'
)
- You can set the following variables with existing cloud resource names in Primus project. If the variable is set, then the corresponding existing cloud resource from the Primus project would be used. If the variable is not set, the cloud resource name would be generated from the project-name and a new cloud-resource would be created with that name. Following are the supported variables for resource names:
| Region under which regional resources would be created for Primus company. |
| Location at which resources would be created for Primus company. |
| Zone under which zonal resources would be created for Primus company. |
| Primus company's Workload Identity Pool for protecting the cloud resources. |
| Primus company's Workload Identity Pool provider which includes the authorization condition to use for tokens signed by the Attestation Verifier Service. |
| Primus company's service account that |
| The KMS key is used to encrypt the prompts provided by employees of Primus company. |
| The KMS keyring which will be used to create the encryption key |
| The KMS key version of encryption key |
| The artifact repository where the workload docker image will be pushed. |
| The region for the artifact repository which would have the published workload docker image. |
| Name of workload VM. |
| Name of workload docker image. |
| Tag of workload container image. |
| The service account that has permission to access the Confidential VM that runs the workload. |
| Name of client VM which would run the client application of the inference server. |
| The service account used by the |
- You will need Storage Admin, Artifact Registry Administrator, Cloud KMS Admin, Service Account Admin, IAM Workload Identity Pool Admin roles for the project
$PRIMUS_PROJECT_ID
. You can refer to this guide on how to grant IAM roles using the GCP console. - For the
$PRIMUS_PROJECT_ID
, Run the following script to set the remaining variable names to values based on your project ID for resource names.
source config_env.sh
Set up Primus Company resources
As part of this step, you will set up the required cloud resources for Primus. Run the following script to set up the resources for Primus. Following resources will be created as part of script execution:
- Encryption key (
$PRIMUS_ENC_KEY
) and keyring ($PRIMUS_ENC_KEYRING
) in KMS to encrypt the customer data file of Primus company. - Workload Identity Pool (
$PRIMUS_WORKLOAD_IDENTITY_POOL
) to validate claims based on attributes conditions configured under its provider. - Service account (
$PRIMUS_SERVICE_ACCOUNT
) attached to above mentioned workload identity pool ($PRIMUS_WORKLOAD_IDENTITY_POOL
) has an access to decrypt data using the KMS key (usingroles/cloudkms.cryptoKeyDecrypter
role), encrypt data using the KMS key (usingroles/cloudkms.cryptoKeyEncrypter
role), read data from the cloud storage bucket (usingobjectViewer
role) and connecting the service-account to the workload identity pool (usingroles/iam.workloadIdentityUser
).
./setup_primus_resources.sh
3. Create Workload
Create workload service account
Now, you will create a service account for the workload with required roles and permissions. Run the following script to create a workload service account in the Primus project. This service account would be used by the VM that runs the inference server.
This workload service-account ($WORKLOAD_SERVICEACCOUNT
) will have the following roles:
confidentialcomputing.workloadUser
to get an attestation tokenlogging.logWriter
to write logs to Cloud Logging.
./create_workload_service_account.sh
Create workload
As part of this step, you will create a workload docker image. Workload would be authored by Primus company. The workload used in this codelab is Python code which uses the codegemma model from the publicly available GCS bucket (of vertex model garden). Workload will load the codegemma model and launch the inference server which would serve the code generation requests from the developers of Primus.
On the code generation request, Workload will get the wrapped DEK along with an encrypted prompt. Workload will then make the KMS API call for decrypting the DEK and then it will decrypt the prompt using this DEK. Encryption keys (for DEK) would be protected via workload identity pool and access would be granted to the workloads who meet the attribute conditions. These attribute conditions are described in more detail in the next section about authorizing the workload. Once the inference server has the decrypted prompt, it would generate the code using a loaded model and would return the response back.
Run the following script to create a workload in which the following steps are being performed:
- Create Artifact Registry(
$PRIMUS_ARTIFACT_REGISTRY
) owned by Primus. - Update the workload code with required resources names.
- Build the inference server workload and create Dockerfile for building a Docker image of the workload code. Here is the Dockerfile used for this codelab.
- Build and publish the Docker image to the Artifact Registry (
$PRIMUS_ARTIFACT_REGISTRY
) owned by Primus. - Grant
$WORKLOAD_SERVICEACCOUNT
read permission for$PRIMUS_ARTIFACT_REGISTRY
. This is needed for the workload container to pull the workload docker image from the Artifact Registry.
./create_workload.sh
For your reference, here is the generate() method of the workload that is created and used in this codelab (you can find the entire workload code here).
def generate():
try:
data = request.get_json()
ciphertext = base64.b64decode(data["ciphertext"])
wrapped_dek = base64.b64decode(data["wrapped_dek"])
unwrapped_dek_response = kms_client.decrypt(
request={"name": key_name, "ciphertext": wrapped_dek}
)
unwrapped_dek = unwrapped_dek_response.plaintext
f = Fernet(unwrapped_dek)
plaintext = f.decrypt(ciphertext)
prompt = plaintext.decode("utf-8")
tokens = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**tokens, max_new_tokens=128)
generated_code = tokenizer.decode(outputs[0])
generated_code_bytes = generated_code.encode("utf-8")
response = f.encrypt(generated_code_bytes)
ciphertext_base64 = base64.b64encode(response).decode("utf-8")
response = {"generated_code_ciphertext": ciphertext_base64}
return jsonify(response)
except (ValueError, TypeError, KeyError) as e:
return jsonify({"error": str(e)}), 500
4. Authorize and Run Workload
Authorize Workload
Primus wants to authorize workloads to access their KMS key used for prompt encryption based on attributes of the following resources:
- What: Code that is verified
- Where: An environment that is secure
- Who: An operator that is trusted
Primus uses Workload identity federation to enforce an access policy based on these requirements. Workload identity federation allows you to specify attribute conditions. These conditions restrict which identities can authenticate with the workload identity pool (WIP). You can add the Attestation Verifier Service to the WIP as a workload identity pool provider to present measurements and enforce the policy.
The workload identity pool was already created earlier as part of the cloud resources setup step. Now Primus will create a new OIDC workload identity pool provider. The specified --attribute-condition
authorizes access to the workload container. It requires:
- What: Latest
$WORKLOAD_IMAGE_NAME
uploaded to the$PRIMUS_ARTIFACT_REPOSITORY
repository. - Where: Confidential Space trusted execution environment is running on the fully supported Confidential Space VM image.
- Who: Primus
$WORKLOAD_SERVICE_ACCOUNT
service account.
export WORKLOAD_IMAGE_DIGEST=$(gcloud artifacts docker images describe ${PRIMUS_PROJECT_REPOSITORY_REGION}-docker.pkg.dev/$PRIMUS_PROJECT_ID/$PRIMUS_ARTIFACT_REPOSITORY/$WORKLOAD_IMAGE_NAME:$WORKLOAD_IMAGE_TAG --format="value(image_summary.digest)" --project ${PRIMUS_PROJECT_ID})
gcloud iam workload-identity-pools providers create-oidc $PRIMUS_WIP_PROVIDER \
--location="global" \
--project="$PRIMUS_PROJECT_ID" \
--workload-identity-pool="$PRIMUS_WORKLOAD_IDENTITY_POOL" \
--issuer-uri="https://confidentialcomputing.googleapis.com/" \
--allowed-audiences="https://sts.googleapis.com" \
--attribute-mapping="google.subject='assertion.sub'" \
--attribute-condition="assertion.swname == 'HARDENED_SHIELDED' && assertion.hwmodel == 'GCP_SHIELDED_VM' &&
assertion.submods.container.image_digest == '${WORKLOAD_IMAGE_DIGEST}' &&
assertion.submods.container.image_reference == '${PRIMUS_PROJECT_REPOSITORY_REGION}-docker.pkg.dev/$PRIMUS_PROJECT_ID/$PRIMUS_ARTIFACT_REPOSITORY/$WORKLOAD_IMAGE_NAME:$WORKLOAD_IMAGE_TAG' &&
'$WORKLOAD_SERVICEACCOUNT@$PRIMUS_PROJECT_ID.iam.gserviceaccount.com' in assertion.google_service_accounts"
The above command verifies that the workload is running in a trusted space environment by checking that hwmodel
is set to "GCP_SHIELDED_VM" and swname
is set to "HARDENED_SHIELDED". Furthermore, it includes workload-specific assertions, such as image_digest
and image_reference
, to enhance security and ensure the integrity of the running workload.
Run Workload
As part of this step, we will be running the workload in the Trusted Space VM which will have an accelerator attached. Required TEE arguments are passed using the metadata flag. Arguments for the workload container are passed using the "tee-cmd
" portion of the flag. To equip the workload VM with an Nvidia Tesla T4 GPU, we will use the --accelerator=type=nvidia-tesla-t4,count=1
flag. This will attach one GPU to the VM. We'll also need to include tee-install-gpu-driver=true
in the metadata flags to trigger the installation of the appropriate GPU driver.
gcloud compute instances create ${WORKLOAD_VM} \
--accelerator=type=nvidia-tesla-t4,count=1 \
--machine-type=n1-standard-16 \
--shielded-secure-boot \
--image-project=conf-space-images-preview \
--image=confidential-space-0-gpupreview-796705b \
--zone=${PRIMUS_PROJECT_ZONE} \
--maintenance-policy=TERMINATE \
--boot-disk-size=40 \
--scopes=cloud-platform \
--service-account=${WORKLOAD_SERVICEACCOUNT}@${PRIMUS_PROJECT_ID}.iam.gserviceaccount.com \
--metadata="^~^tee-image-reference=${PRIMUS_PROJECT_REPOSITORY_REGION}-docker.pkg.dev/${PRIMUS_PROJECT_ID}/${PRIMUS_ARTIFACT_REPOSITORY}/${WORKLOAD_IMAGE_NAME}:${WORKLOAD_IMAGE_TAG}~tee-install-gpu-driver=true~tee-restart-policy=Never"
Run Inference Query
After the workload inference server is successfully launched, now employees of Primus company can send the code generation requests to the inference server.
As part of this codelab we would be using the following script to set up the client application which would interact with the inference server. Run this script to set up the client VM.
./setup_client.sh
The following steps demonstrate how to SSH into the client VM and execute a sample client application within a Python virtual environment. This example application utilizes envelope encryption with the Fernet library, but keep in mind that the specific encryption libraries can be adapted to suit different use cases.
gcloud compute ssh ${CLIENT_VM} --zone=${PRIMUS_PROJECT_ZONE}
Run the following commands to activate the Python virtual environment in the client VM and execute the client application.
source venv/bin/activate
python3 inference_client.py
Output of this sample client application will show the encryption and plaintext prompt requests and their corresponding encrypted and decrypted responses.
5. Clean Up
Here is the script that can be used to clean up the resources that we have created as part of this codelab. As part of this cleanup, the following resources will be deleted:
- Primus service-account (
$PRIMUS_SERVICEACCOUNT
). - Primus encryption key (
$PRIMUS_ENC_KEY
). - Artifact repository of Primus (
$PRIMUS_ARTIFACT_REPOSITORY
). - Primus workload identity pool (
$PRIMUS_WORKLOAD_IDENTITY_POOL
) with its provider. - Workload service account of Primus (
$WORKLOAD_SERVICEACCOUNT
). - Workload VM (
$WORKLOAD_VM
) & Client VM ($CLIENT_VM
).
./cleanup.sh
If you are done exploring, please consider deleting your project.
- Go to the Cloud Platform Console
- Select the project you want to shut down, then click ‘Delete' at the top: this schedules the project for deletion