Build a contextual Yoga Poses recommender app with Firestore, Vector Search and Gemini 2.0 (Java version)!

1. Overview

In the world of wellness and fitness apps, providing users with a rich and engaging experience is key. For a Yoga app, this means going beyond simple text descriptions of poses and offering comprehensive information, multimedia content, and intelligent search capabilities. In this blog, we'll explore how to build a robust Yoga pose database using Google Cloud's Firestore, leverage its Vector Search Extension for contextual matching, and integrate the power of Gemini 2.0 Flash (Experimental) for working with multimodal content.

Why Firestore?

Firestore, Google Cloud's serverless NoSQL document database, is an excellent choice for building scalable and dynamic applications. Here's why it's a great fit for our Yoga app:

Scalability and Performance: Firestore automatically scales to handle millions of users and massive datasets, ensuring your app remains responsive even as it grows.
Real-time Updates: Built-in real-time synchronization keeps data consistent across all connected clients, making it perfect for features like live classes or collaborative practice.
Flexible Data Model: Firestore's document-based structure allows you to store diverse data types, including text, images, and even embeddings, making it ideal for representing complex Yoga pose information.
Powerful Querying: Firestore supports complex queries, including equality, inequality, and now, with the new extension, vector similarity searches.
Offline Support: Firestore caches data locally, allowing your app to function even when users are offline.

Enhancing Search with Firestore Vector Search Extension

Traditional keyword-based search can be limiting when dealing with complex concepts like Yoga poses. A user might search for a pose that "opens the hips" or "improves balance" without knowing the specific pose name. This is where Vector Search comes in.

Vector Search with Firestore allows you to:

Generate Embeddings: Transform text descriptions, and in the future potentially images and audio, into numerical vector representations (embeddings) that capture their semantic meaning using models like those available in Vertex AI or custom models.
Store Embeddings: Store these embeddings directly in Firestore documents.
Perform Similarity Searches: Query your database to find documents that are semantically similar to a given query vector, enabling contextual matching.

Integrating Gemini 2.0 Flash (Experimental)

Gemini 2.0 Flash is Google's cutting-edge multimodal AI model. While still experimental, it offers exciting possibilities for enriching our Yoga app:

Text Generation: Use Gemini 2.0 Flash to generate detailed descriptions of Yoga poses, including benefits, modifications, and contraindications.
Image Generation (Mimicked): Although direct image generation with Gemini is not yet publicly available, I have simulated this using Google's Imagen, generating images that visually represent the poses.
Audio Generation (Mimicked): Similarly, we can use a Text-to-Speech (TTS) service to create audio instructions for each pose, guiding users through the practice.

Potentially, I envision proposing the integration for enhancing the app to use the following features of the model:

Multimodal Live API: This new API helps you create real-time vision and audio streaming applications with tool use.
Speed and performance: Gemini 2.0 Flash has a significantly improved time to first token (TTFT) over Gemini 1.5 Flash.
Improved agentic experiences: Gemini 2.0 delivers improvements to multimodal understanding, coding, complex instruction following, and function calling. These improvements work together to support better agentic experiences.

For more details refer to this documentation page%20over%20Gemini%201.5%20Flash).

Grounding with Google Search

To enhance the credibility and provide further resources, we can integrate Google Search to ground the information provided by our app. This means:

Contextual Search: When an admin user enters the details for a pose, we can use the pose name to perform a Google Search.
URL Extraction: From the search results, we can extract relevant URLs, such as articles, videos, or reputable Yoga websites, and display them within the app.

What you'll build

As part of this lab, you will:

Create a Firestore collection and load Yoga documents
Learn how to create CRUD applications with Firestore
Generate Yoga pose description with Gemini 2.0 Flash
Enable the Firebase Vector Search with Firestore Integration
Generate embeddings from Yoga description
Perform similarity search for user search text

Requirements

A browser, such as Chrome or Firefox
A Google Cloud project with billing enabled.

2. Before you begin

Create a project

In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project .
You'll use Cloud Shell, a command-line environment running in Google Cloud that comes preloaded with bq. Click Activate Cloud Shell at the top of the Google Cloud console.

Once connected to Cloud Shell, you check that you're already authenticated and that the project is set to your project ID using the following command:

gcloud auth list

Run the following command in Cloud Shell to confirm that the gcloud command knows about your project.

gcloud config list project

If your project is not set, use the following command to set it:

gcloud config set project <YOUR_PROJECT_ID>

Enable the required APIs by following this link until you are able to click the "Enable" button.

If any API is missed, you can always enable it during the course of the implementation.

Refer documentation for gcloud commands and usage.

3. Database setup

The documentation has more complete steps on how to set up a Firestore instance. At a high level, to start out, I will follow these steps:

1 Go to the Firestore Viewer and from the Select a database service screen, choose Firestore in Native mode

Select a location for your Firestore (Make sure to choose us-central1 and follow this wherever you choose region / location throughout this codelab)
Click Create Database (if this is the first time, leave it as "(default)" database)

When you create a Firestore project, it also enables the API in the Cloud API Manager

IMPORTANT: Choose the TEST (not PRODUCTION) version of the Security Rules so that the data is accessible
Once it's set up, you should see the Firestore Database, Collection and Document view in Native mode as seen in image below:

Don't do this step yet, but just for the record - you could click "Start Collection" and create a new collection. Set Collection ID as "poses". Click on Save button.

Pro tips for production application:

Once you finalize your data model and identify who should be able to access different kinds of documents, you can create, edit and monitor Security Rules from the Firebase interface. You can access Security Rules from this link: https://console.firebase.google.com/u/0/project/<<your_project_id>>/firestore/rules
Be sure to edit, monitor and test your security rules before deploying / rolling out the project from the development phase because it is often the silent culprit behind why your app is working differently :)

For this demo, we will use it in TEST mode.

4. Firestore REST API

The REST API can be helpful for the following use cases:a. Accessing Firestore from a resource-constrained environment where running a complete client library is not possibleb. Automating database administration or retrieving detailed database metadata
The easiest way to use Firestore is to use one of the native client libraries, there are some situations when it is useful to call the REST API directly
In the scope of this blog, you will see usage and demonstration of Firestore REST APIs and not native client libraries
For authentication, the Firestore REST API accepts either a Firebase Authentication ID token or a Google Identity OAuth 2.0 token. For more information on the Authentication and Authorization topic, refer to the documentation.
All REST API endpoints exist under the base URL https://firestore.googleapis.com/v1/.

Spring Boot and Firestore API

This solution in Spring Boot Framework is to demonstrate a client application that uses Firestore APIs to collect and modify Yoga posture and breath details with a user interactive experience.

For a detailed step-by-step explanation of the Firestore CRUD solution part of the Yoga poses app, you can go through the blog link.

To focus on the current solution and learn the CRUD part on the go, clone the entire solution focused on this blog from the repository below from your Cloud Shell Terminal and get a copy of the codebase.

git clone https://github.com/AbiramiSukumaran/firestore-poserecommender

Please note:

Once you have cloned this repo, you just have to make a few changes around your Project ID, APIs etc. No other change is required to get your application up and running. Each component of the application is explained in the upcoming sections. Here is a list of changes:
In src/main/java/com/example/demo/GenerateImageSample.java file, replace "<<YOUR_PROJECT_ID>>" with your project id
In src/main/java/com/example/demo/GenerateEmbeddings.java file, replace "<<YOUR_PROJECT_ID>>" with your project id
In src/main/java/com/example/demo/PoseController.java, replace all instances of "<<YOUR_PROJECT_ID>>" and the database name, in this case "(default)", with appropriate values from your configuration:
In src/main/java/com/example/demo/PoseController.java, replace the "[YOUR_API_KEY]" with your API KEY for Gemini 2.0 Flash. You can get this from AI Studio.
If you want to test locally, run the following commands from the project folder in the Cloud Shell Terminal:

mvn package

mvn spring-boot:run

Right now, you can view your application running by clicking the "web preview" option from Cloud Shell Terminal. We are not ready to perform tests and try the application just yet.

Optional: If you want to deploy the app in Cloud Run, you will have to bootstrap a brand new Java Cloud Run application from the scratch from Cloud Shell Editor, and add the src files and template files from the repo to your new project in the respective folders (as the current github repo project is not set up by default for Cloud Run deployment configuration). The following are the steps to be followed, in that case (instead of cloning the existing repo):
Go to Cloud Shell Editor (Make sure Editor is open and not the terminal), click the Google Cloud Project name icon in the left side of the status bar (the blocked out portion in the screenshot below)

Select New application -> Cloud Run Application -> Java: Cloud Run from the list of choices and name it "firestore-poserecommender"

Now you should see a full stack template for the Java Cloud Run Application, pre-configured and ready to go
Remove the existing Controller class and copy over the following files into their respective folders in the project structure:

firestore-poserecommender/src/main/java/com/example/demo/

FirestoreSampleApplication.java
GenerateEmbeddings.java
GenerateImageSample.java
Pose.java
PoseController.java

ServletInitializer.java

         firestore-poserecommender/src/main/resources/static/

Index.html

firestore-poserecommender/src/main/resources/templates/

contextsearch.html
createpose.html
errmessage.html
pose.html
ryoq.html
searchpose.html
showmessage.html

firestore-poserecommender/

Dockerfile
You need to make the changes in the corresponding files to replace PROJECT ID and API KEY with your respective values. (steps 1 a,b, c and d above).

5. Data Ingestion

Data for the application is available in this file data.json: https://github.com/AbiramiSukumaran/firestore-poserecommender/blob/main/data.json

If you like to start off with some predefined data, you can copy the json over and replace all occurrences of the "<<YOUR_PROJECT_ID>>" with your value

Go to Firestore Studio
Make sure you have created a collection named "poses"
Add documents from the repo file mentioned above manually one at a time

You can alternatively import data in one shot from the predefined set that we have created for you by running the following steps:

Go to Cloud Shell Terminal and make sure your active Google Cloud project is set and make sure you are authorized. Create a bucket in your project with the gsutil command given below. Replace the <PROJECT_ID> variable in the command below with your Google Cloud Project Id:

gsutil mb -l us gs://<PROJECT_ID>-yoga-poses-bucket

Now that the bucket is created, we need to copy the database export that we have prepared into this bucket, before we can import it into the Firebase database. Use the command given below:

gsutil cp -r gs://demo-bq-gemini-public/yoga_poses gs://<PROJECT_ID>-yoga-poses-bucket

Now that we have the data to import, we can move to the final step of importing the data into the Firebase database (default) that we've created.

Go to the Firestore console now and click Import/Export from the navigation menu on the left.

Select Import and choose the Cloud Storage path that you just created and navigate until you can select the file "yoga_poses.overall_export_metadata":

Click Import.

The import will take a few seconds and once it's ready, you can validate your Firestore database and the collection by visiting https://console.cloud.google.com/firestore/databases, select the default database and the poses collection as shown below:

Another method is that you can also create the records manually through the application once you deploy using the "Create a New Pose" action.

6. Vector Search

Enable Firestore Vector Search Extension

Use this extension to automatically embed and query your Firestore documents with the new vector search feature! This will take you to the Firebase Extensions Hub.

When you install the Vector Search extension, you specify a collection and a document field name. Adding or updating a document with this field triggers this extension to calculate a vector embedding for the document. This vector embedding is written back to the same document, and the document is indexed in the vector store, ready to be queried against.

Let's go through the steps:

Install the Extension:

Install the "Vector Search with Firestore" extension from the Firebase Extensions Marketplace by clicking the "Install in Firebase Console".

IMPORTANT:

When you first navigate to this extensions page, you need to select the same project that you are working on in the Google Cloud console listed in the Firebase console.

If your project is not listed, go ahead and add the project in Firebase (choose your existing Google Cloud project from the list).

Configure the Extension:

Specify the collection ("poses"), the field containing the text to embed ("posture"), and other parameters like the embedding dimensions.

If there are APIs that need to be enabled listed in this step, the configuration page will allow you to do so, follow the steps accordingly.

If the page doesn't respond after enabling APIs for a while, just refresh and you should be able to see the APIs enabled.

In one of the following steps, it allows you to use the LLM of your choice for generating the embeddings. Choose "Vertex AI".

The next few settings are related to your collection and the field you want to embed:

LLM: Vertex AI

Collection path: poses

Default query limit: 3

Distance measure: Cosine

Input field name: posture

Output field name: embedding

Status field name: status

Embed existing documents: Yes

Update existing embeddings: Yes

Cloud Functions location: us-central1

Enable Events: Not checked

Once all of this is set up, click the Install Extension button. This will take 3 - 5 minutes.

Generate Embeddings:

As you add or update documents in the "poses" collection, the extension will automatically generate embeddings using a pre-trained model or a model of your choice via an API endpoint. In this case we have chosen the Vertex AI in the extension configuration.

Index Creation

It will mandate the creation of Index on the embedding field at the time of usage of the embedding in the application.

Firestore automatically creates indexes for basic queries; however, you can let Firestore generate index syntax by running queries that don't have an index, and it will provide you a link to the generated index in the error message at the application side. Here is the list of steps to create vector index:

Go to Cloud Shell Terminal
Run the following command:

gcloud firestore indexes composite create --collection-group="poses" --query-scope=COLLECTION --database="(default)" --field-config vector-config='{"dimension":"768", "flat": "{}"}',field-path="embedding"

Performing Vector Search

Let's take a look at how your newly built application approaches Vector Search. Once embeddings are stored, you can use the VectorQuery class of the Firestore Java SDK to perform Vector Search and get nearest neighbor results:

CollectionReference coll = firestore.collection("poses");
    VectorQuery vectorQuery = coll.findNearest(
        "embedding",
        userSearchTextEmbedding, 
        /* limit */ 3,
        VectorQuery.DistanceMeasure.EUCLIDEAN,
        VectorQueryOptions.newBuilder().setDistanceResultField("vector_distance")
         .setDistanceThreshold(2.0)
          .build());
ApiFuture<VectorQuerySnapshot> future = vectorQuery.get();
VectorQuerySnapshot vectorQuerySnapshot = future.get();
List<Pose> posesList = new ArrayList<Pose>();
// Get the ID of the closest document (assuming results are sorted by distance)
String closestDocumentId = vectorQuerySnapshot.getDocuments().get(0).getId();

This snippet compares the embedding of the user search text with the embeddings of the documents in Firestore and extracts the contextually closest one.

7. Gemini 2.0 Flash

Integrating Gemini 2.0 Flash (for description generation)

Let's take a look at how your newly built application handles Gemini 2.0 Flash integration for description generation.

Now lets say an admin user / Yoga instructor wants to enter the details of the poses with the help of Gemini 2.0 Flash and then perform a search to see the nearest matches. This results in extracting the details of matching poses along with multimodal objects that support the results.

String apiUrl = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=[YOUR_API_KEY]";
Map<String, Object> requestBody = new HashMap<>();
List<Map<String, Object>> contents = new ArrayList<>();
List<Map<String, Object>> tools = new ArrayList<>();
Map<String, Object> content = new HashMap<>();
List<Map<String, Object>> parts = new ArrayList<>();
Map<String, Object> part = new HashMap<>();
part.put("text", prompt);
parts.add(part);
content.put("parts", parts);
contents.add(content);
requestBody.put("contents", contents);
/**Setting up Grounding*/
Map<String, Object> googleSearchTool = new HashMap<>();
googleSearchTool.put("googleSearch", new HashMap<>());
tools.add(googleSearchTool);
requestBody.put("tools", tools);

HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_JSON);
HttpEntity<Map<String, Object>> requestEntity = new HttpEntity<>(requestBody, headers);
ResponseEntity<String> response = restTemplate.exchange(apiUrl, HttpMethod.POST, requestEntity, String.class);
System.out.println("Generated response: " + response);
String responseBody = response.getBody();
JSONObject jsonObject = new JSONObject(responseBody);
JSONArray candidates = jsonObject.getJSONArray("candidates");
JSONObject candidate = candidates.getJSONObject(0);
JSONObject contentResponse = candidate.getJSONObject("content");
JSONArray partsResponse = contentResponse.getJSONArray("parts");
JSONObject partResponse = partsResponse.getJSONObject(0);
String generatedText = partResponse.getString("text");
System.out.println("Generated Text: " + generatedText);

a. Mimicking Image and Audio Generation

Gemini 2.0 Flash Experimental is capable of generating multimodal results, however I have not signed up for its early access yet so I have mimicked the image and audio output with Imagen and TTS APIs respectively. Imagine how great it is to get all of this generated with one API call to Gemini 2.0 Flash!!

try (PredictionServiceClient predictionServiceClient =
          PredictionServiceClient.create(predictionServiceSettings)) {
  
        final EndpointName endpointName =
            EndpointName.ofProjectLocationPublisherModelName(
                projectId, location, "google", "imagen-3.0-generate-001");
  
        Map<String, Object> instancesMap = new HashMap<>();
        instancesMap.put("prompt", prompt);
        Value instances = mapToValue(instancesMap);
  
        Map<String, Object> paramsMap = new HashMap<>();
        paramsMap.put("sampleCount", 1);
        paramsMap.put("aspectRatio", "1:1");
        paramsMap.put("safetyFilterLevel", "block_few");
        paramsMap.put("personGeneration", "allow_adult");
        Value parameters = mapToValue(paramsMap);
  
        PredictResponse predictResponse =
            predictionServiceClient.predict(
                endpointName, Collections.singletonList(instances), parameters);
  
        for (Value prediction : predictResponse.getPredictionsList()) {
          Map<String, Value> fieldsMap = prediction.getStructValue().getFieldsMap();
          if (fieldsMap.containsKey("bytesBase64Encoded")) {
            bytesBase64Encoded = fieldsMap.get("bytesBase64Encoded").getStringValue();
       }
      }
      return bytesBase64Encoded;
    }

 try {
            // Create a Text-to-Speech client
            try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
                // Set the text input to be synthesized
                SynthesisInput input = SynthesisInput.newBuilder().setText(postureString).build();

                // Build the voice request, select the language code ("en-US") and the ssml
                // voice gender
                // ("neutral")
                VoiceSelectionParams voice =
                        VoiceSelectionParams.newBuilder()
                                .setLanguageCode("en-US")
                                .setSsmlGender(SsmlVoiceGender.NEUTRAL)
                                .build();

                // Select the type of audio file you want returned
                AudioConfig audioConfig =
                        AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3).build();

                // Perform the text-to-speech request on the text input with the selected voice
                // parameters and audio file type
                SynthesizeSpeechResponse response =
                        textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

                // Get the audio contents from the response
                ByteString audioContents = response.getAudioContent();

                // Convert to Base64 string
                String base64Audio = Base64.getEncoder().encodeToString(audioContents.toByteArray());

                // Add the Base64 encoded audio to the Pose object
               return base64Audio;
            }

        } catch (Exception e) {
            e.printStackTrace(); // Handle exceptions appropriately. For a real app, log and provide user feedback.
            return "Error in Audio Generation";
        }
}

b. Grounding with Google Search:

If you check the Gemini invocation code in step 6, you will notice the following code snippet to enable Google Search grounding for the LLM response:

 /**Setting up Grounding*/
Map<String, Object> googleSearchTool = new HashMap<>();
googleSearchTool.put("googleSearch", new HashMap<>());
tools.add(googleSearchTool);
requestBody.put("tools", tools);

This is to ensure we:

Ground our model to actual search results
Extract relevant URLs referenced in the search

8. Run your Application

Let's take a look at all the capabilities of your newly built Java Spring Boot application with a simple Thymeleaf web interface:

Firestore CRUD operations (Create, Read, Update, Delete)
Keyword Search
Generative AI based Context Creation
Contextual Search (Vector Search)
Multimodal output to relate to the search
Run Your Own Query (Queries in structuredQuery format)

Example: {"structuredQuery":{"select":{"fields":[{"fieldPath":"name"}]},"from":[{"collectionId":"fitness_poses"}]}}

All these features discussed so far are part of the application that you just created from the repo: https://github.com/AbiramiSukumaran/firestore-poserecommender

To build, run and deploy it, run the following commands from the Cloud Shell Terminal:

mvn package

mvn spring-boot:run

You should see the result and be able to play around with your applications features. Check out the video below for the demo of the output:

Pose Recommender with Firestore, Vector Search and Gemini 2.0 Flash

Optional Step:

To deploy it on Cloud Run (assuming that you have bootstrapped a brand new application with Dockerfile and copied over the files as needed), run the following command from the Cloud Shell Terminal from within your project directory:

gcloud run deploy --source .

Provide application name, region code (choose the one for us-central1) and choose unauthenticated invocation "Y" as prompted. You should receive your application endpoint in the terminal once deployment is successful.

9. Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this post, follow these steps:

In the Google Cloud console, go to the Manage resources page.
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

10. Congratulations

Congratulations! You have successfully utilized Firestore to create a robust and intelligent Yoga posture management application. By combining the power of Firestore, the Vector Search Extension, and the capabilities of Gemini 2.0 Flash (with simulated image and audio generation), we have created a truly engaging and informative Yoga app to implement the CRUD operations, perform keyword-based search, contextual vector search, and generated multimedia content.

This approach is not limited to Yoga apps. As AI models like Gemini continue to evolve, the possibilities for creating even more immersive and personalized user experiences will only grow. Remember to stay updated with the latest developments and documentation from Google Cloud and Firebase to leverage the full potential of these technologies.

If I were to extend this app, I would try to do two things with Gemini 2.0 Flash:

Put the Multimodal Live API to use by creating real-time vision and audio streaming for the use case.
Engage the Thinking Mode to generate the thoughts behind the responses for the interaction with real-time data to make the experience more life-like.

Feel free to give it a try and send in a pull request :>D!!!