How to deploy a FastAPI chatbot app to Cloud Run using Gemini

1. Introduction

Overview

In this codelab, you will learn how to deploy a FastAPI app to Cloud Run. The app is a chatbot app that prompts a Gemini model.

What you'll learn

How to deploy FastAPI to Cloud Run
Prompt Gemini from Cloud Run in python using a Google client library

2. Setup and Requirements

Set environment variables that will be used throughout this codelab.

PROJECT_ID=<YOUR_PROJECT_ID>
REGION=<YOUR_REGION>
GEMINI_MODEL=gemini-2.0-flash-001

SERVICE_NAME=fastapi-gemini
SERVICE_ACCOUNT=fastapi-gemini-sa
SERVICE_ACCOUNT_ADDRESS=$SERVICE_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com

Create the service account by running this command:

gcloud iam service-accounts create $SERVICE_ACCOUNT \
  --display-name="Service Account for FastAPI Gemini CR service"

Give your service account access to Gemini with the Vertex AI User role.

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:$SERVICE_ACCOUNT_ADDRESS" \
  --role="roles/aiplatform.user"

3. Create the app

Create a directory for your code.

mkdir codelab-cr-fastapi-gemini
cd codelab-cr-fastapi-gemini

First, you'll create the html templates by create a templates directory.

mkdir templates
cd templates

Create a new file called ai_message.html with the following content:

<div class="message-container ai-message-container">
    {{ ai_response_text }}
</div>

Create a new file called message.html with the following content:

<div class="message-container user-message">
    {{ message }}
</div>

Create a new file called index.html with the following content:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>FastAPI HTMX Gemini Chat</title>
    <style>
        body { font-family: sans-serif; max-width: 700px; margin: auto; padding: 20px; background-color: #f4f4f4; }
        #chat-messages { border: 1px solid #ccc; background-color: #fff; padding: 15px; height: 400px; overflow-y: scroll; margin-bottom: 15px; border-radius: 5px; box-shadow: inset 0 1px 3px rgba(0,0,0,0.1); }
        .message-container { margin-bottom: 10px; padding: 8px 12px; border-radius: 15px; max-width: 80%; word-wrap: break-word; }
        .user-message { background-color: #dcf8c6; align-self: flex-end; margin-left: auto; text-align: right; border-bottom-right-radius: 0;}
        .ai-message-container { background-color: #eee; align-self: flex-start; margin-right: auto; border-bottom-left-radius: 0;}
        .ai-message-container p { margin: 0.2em 0; } /* Spacing for streamed paragraphs */
        .ai-message-container p:first-child { margin-top: 0; }
        .ai-message-container p:last-child { margin-bottom: 0; }
        form { display: flex; margin-top: 10px; }
        input[type="text"] { flex-grow: 1; padding: 10px; border: 1px solid #ccc; border-radius: 20px; margin-right: 10px; }
        button { padding: 10px 20px; background-color: #0b93f6; color: white; border: none; border-radius: 20px; cursor: pointer; font-weight: bold; }
        button:hover { background-color: #0a84dd; }
    </style>
    <script src="https://unpkg.com/htmx.org@2.0.4"
    integrity="sha384-HGfztofotfshcF7+8n44JQL2oJmowVChPTg48S+jvZoztPfvwD79OC/LTtG6dMp+" crossorigin="anonymous"></script>
    <script src="https://unpkg.com/htmx-ext-sse@2.2.2" crossorigin="anonymous"></script>
</head>
<body>

    <h1>Chat with Gemini</h1>

    <div id="chat-messages">
        {% for msg in messages %}
             {# Render initial messages if needed #}
        {% endfor %}
    </div>

    <form
        hx-post="/ask"             {# Post to the /ask endpoint #}
        hx-target="#chat-messages" {# Target the main chat area #}
        hx-swap="beforeend"        {# Append the response (user msg + AI placeholder) #}
        hx-on::after-request="this.reset(); document.getElementById('chat-messages').scrollTop = document.getElementById('chat-messages').scrollHeight;" {# Clear form & scroll down #}
        >
        <input type="text" name="message" placeholder="Ask Gemini..." autofocus autocomplete="off">
        <button type="submit">Send</button>
    </form>

    <script>
        // Initial scroll to bottom on page load (if needed)
        window.onload = () => {
            const chatBox = document.getElementById('chat-messages');
            chatBox.scrollTop = chatBox.scrollHeight;
        }
    </script>

</body>
</html>

Now create your python code and other files in the root directory

cd ..

Create a .gcloudignore file with the following content:

__pycache__

Create a file called main.py with the following content:

from fastapi import FastAPI, Request, Form
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from typing import List, Annotated
from google import genai
import os

# in case the env var isn't set, use YOUR_<VARIABLE> as the default
# to help with debugging
project_id = os.getenv("PROJECT_ID", "YOUR_PROJECT_ID")
region = os.getenv("REGION", "YOUR_REGION")
gemini_model = os.getenv("GEMINI_MODEL", "gemini-2.0-flash-001")

app = FastAPI(title="FastAPI HTMX Chat")

templates = Jinja2Templates(directory="templates")

genai_client = genai.Client(
    vertexai=True, project=project_id, location=region
)

system_prompt = f"""
You're a chatbot that helps pass the time with small talk, that is
polite conversation about unimportant or uncontroversial matters
that allows people to pass the time. Please keep your answers short.
"""

chat_messages: List[str] = []

# --- Routes ---
@app.get("/", response_class=HTMLResponse)
async def get_chat_ui(request: Request):
    """Serves the main chat page."""
    print("Serving index.html")
    return templates.TemplateResponse(
        "index.html",
        {"request": request, "messages": chat_messages} # Pass existing messages
    )

@app.post("/ask", response_class=HTMLResponse)
async def ask_gemini_and_respond(
    request: Request,
    # Use Annotated for dependency injection with Form data
    message: Annotated[str, Form()]
):
    
    user_msg_html = templates.get_template('message.html').render({'message': message})
    
    print("asking gemini...")
    response = genai_client.models.generate_content(
        model=gemini_model,
        contents=[message],
        config=genai.types.GenerateContentConfig(
            system_instruction=system_prompt,
            temperature=0.7,
        ),
    )
    
    print("Gemini responded with: " + response.text)
    
    ai_response_html = templates.get_template('ai_message.html').render({'ai_response_text': response.text})

    combined_html = user_msg_html + ai_response_html

    return HTMLResponse(content=combined_html)

Create a Dockerfile with the following content:

# Build stage
FROM python:3.12-slim AS builder

WORKDIR /app

# Install poetry
RUN pip install poetry
RUN poetry self add poetry-plugin-export

# Copy poetry files
COPY pyproject.toml poetry.lock* ./

# Copy application code
COPY . .

# Export dependencies to requirements.txt
RUN poetry export -f requirements.txt --output requirements.txt 

# Final stage
FROM python:3.12-slim

RUN apt-get update && apt-get install -y libcairo2 python3-dev libffi-dev

WORKDIR /app

# Copy files from builder
COPY --from=builder /app/ .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Compile bytecode to improve startup latency
# -q: Quiet mode 
# -b: Write legacy bytecode files (.pyc) alongside source
# -f: Force rebuild even if timestamps are up-to-date
RUN python -m compileall -q -b -f .

# Expose port
EXPOSE 8080

# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Create a pyproject.toml file

[tool.poetry]
name = "codelab"
version = "0.1.0"
description = ""
authors = ["Your Name <you@example.com>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.12"
fastapi = "^0.115.12"
uvicorn = {extras = ["standard"], version = "^0.34.0"}
jinja2 = "^3.1.6"
python-multipart = "^0.0.20"
google-genai = "^1.8.0"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

4. Deploy to Cloud Run

gcloud run deploy $SERVICE_NAME \
 --source . \
 --allow-unauthenticated \
 --service-account=$SERVICE_ACCOUNT_ADDRESS \
 --set-env-vars=PROJECT_ID=$PROJECT_ID \
 --set-env-vars=REGION=$REGION \
 --set-env-vars=GEMINI_MODEL=$GEMINI_MODEL

5. Test your service

Open the Service URL in your web browser and ask Gemini a question, e.g. Why is the sky blue?

6. Congratulations!

Congratulations for completing the codelab!

What we've covered

How to deploy FastAPI to Cloud Run
Prompt Gemini from Cloud Run in python using a Google client library

7. Clean up

To delete the Cloud Run service, go to the Cloud Run Cloud Console at https://console.cloud.google.com/run and delete the service.

If you choose to delete the entire project, you can go to https://console.cloud.google.com/cloud-resource-manager, select the project you created in Step 2, and choose Delete. If you delete the project, you'll need to change projects in your Cloud SDK. You can view the list of all available projects by running gcloud projects list.