Documentation Index Fetch the complete documentation index at: https://mintlify.com/GoogleCloudPlatform/generative-ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
FixMyCar is a production-ready Retrieval-Augmented Generation (RAG) application that helps car owners troubleshoot issues by querying vehicle owner’s manuals. The application demonstrates how to integrate Vertex AI Search with Gemini for accurate, grounded responses.
Architecture
System Components
Frontend Streamlit Python App
Chat interface
Real-time streaming
Deployed on GKE
Backend Java Spring Boot
REST API
Vertex AI Search client
Gemini integration
Search Engine Vertex AI Search
OCR Parser for PDFs
Vector embeddings
Extractive answers
Infrastructure GKE Autopilot
Auto-scaling
Workload Identity
Load balancing
RAG Implementation
Two-Step RAG Pipeline
FixMyCar implements the classic RAG pattern:
Retrieval: Vertex AI Search
Search the car manual datastore using natural language query
Augmentation: Prompt Engineering
Construct Gemini prompt with search results as grounding context
Generation: Gemini Inference
Generate accurate, contextual response based on manual excerpts
Java Backend Implementation
RAG Controller
Vertex AI Search Client
Gemini Inference
package com.cpet.fixmycarbackend;
import com.google.cloud.discoveryengine.v1. * ;
import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.generativeai.ChatSession;
import com.google.cloud.vertexai.generativeai.GenerativeModel;
@ RestController
public class FixMyCarBackendController {
@ Autowired
private FixMyCarConfiguration config ;
@ PostMapping ( "/chat" )
public ChatMessage message (@ RequestBody ChatMessage message ) {
return ragVertexAISearch (message);
}
public ChatMessage ragVertexAISearch ( ChatMessage message ) {
// Step 1: Search Vertex AI Search datastore
String searchQuery = message . getPrompt ();
String vectorSearchResults = performVertexAISearch (searchQuery);
// Step 2: Generate response with Gemini
String result = geminiInference (
message . getPrompt (),
vectorSearchResults
);
message . setResponse (result);
return message;
}
}
Streamlit Frontend
import streamlit as st
import requests
BACKEND_URL = "http://fixmycar-backend:8080/chat"
st.title( "🚗 FixMyCar Assistant" )
st.write( "Ask questions about your Cymbal Starlight 2024" )
# Chat interface
if prompt := st.chat_input( "What's your question?" ):
st.chat_message( "user" ).write(prompt)
# Call backend API
response = requests.post(
BACKEND_URL ,
json = { "prompt" : prompt},
headers = { "Content-Type" : "application/json" }
)
if response.status_code == 200 :
data = response.json()
st.chat_message( "assistant" ).write(data[ "response" ])
else :
st.error( "Failed to get response from backend" )
Vertex AI Search Configuration
OCR Parser for PDFs
Vertex AI Search uses Document AI’s OCR parser to extract text from owner’s manuals:
Upload PDFs to Cloud Storage
Store manuals in GCS bucket (e.g., cymbal-starlight-2024.pdf)
Create Datastore
Configure with:
Source : Cloud Storage bucket
Parser : OCR Parser (not Layout Parser)
Region : Global
Enterprise features : Enabled
Indexing
Vertex AI Search automatically:
Extracts text from PDFs
Generates vector embeddings
Creates extractive answer indexes
Builds search indexes
Duration : ~10 minutes for typical owner’s manual
Test Search
Use Preview interface to test queries before deployment
Vertex AI Search returns structured extractive answers:
{
"results" : [
{
"document" : {
"derivedStructData" : {
"extractive_answers" : [
{
"content" : "The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet. The cargo area is located in the trunk of the vehicle." ,
"pageNumber" : 42
}
]
}
}
}
]
}
These answers are pre-extracted during indexing, not generated by LLM, ensuring:
Accuracy : Direct quotes from source documents
Low latency : No inference required during retrieval
Grounding : Provenance with page numbers
GKE Deployment
Workload Identity Setup
FixMyCar uses GKE Workload Identity to authenticate with Vertex AI:
#!/bin/bash
# workload_identity.sh
PROJECT_ID = $( gcloud config get-value project )
CLUSTER_NAME = "fixmycar"
REGION = "us-central1"
NAMESPACE = "default"
KSA_NAME = "fixmycar-backend-sa" # Kubernetes Service Account
GSA_NAME = "fixmycar-gsa" # Google Cloud Service Account
# Create Google Cloud Service Account
gcloud iam service-accounts create ${ GSA_NAME } \
--display-name= "FixMyCar Backend Service Account"
# Grant Vertex AI permissions
gcloud projects add-iam-policy-binding ${ PROJECT_ID } \
--member= "serviceAccount:${ GSA_NAME }@${ PROJECT_ID }.iam.gserviceaccount.com" \
--role= "roles/aiplatform.user"
gcloud projects add-iam-policy-binding ${ PROJECT_ID } \
--member= "serviceAccount:${ GSA_NAME }@${ PROJECT_ID }.iam.gserviceaccount.com" \
--role= "roles/discoveryengine.editor"
# Create Kubernetes Service Account
kubectl create serviceaccount ${ KSA_NAME } -n ${ NAMESPACE }
# Bind Kubernetes SA to Google Cloud SA
gcloud iam service-accounts add-iam-policy-binding \
${ GSA_NAME } @ ${ PROJECT_ID } .iam.gserviceaccount.com \
--role= "roles/iam.workloadIdentityUser" \
--member= "serviceAccount:${ PROJECT_ID }.svc.id.goog[${ NAMESPACE }/${ KSA_NAME }]"
# Annotate Kubernetes SA
kubectl annotate serviceaccount ${ KSA_NAME } \
-n ${ NAMESPACE } \
iam.gke.io/gcp-service-account= ${ GSA_NAME } @ ${ PROJECT_ID } .iam.gserviceaccount.com
Kubernetes Manifests
Backend Deployment
Backend Service
Frontend Deployment
Frontend Service
apiVersion : apps/v1
kind : Deployment
metadata :
name : fixmycar-backend
spec :
replicas : 2
selector :
matchLabels :
app : fixmycar-backend
template :
metadata :
labels :
app : fixmycar-backend
spec :
serviceAccountName : fixmycar-backend-sa
containers :
- name : fixmycar-backend
image : us-central1-docker.pkg.dev/PROJECT_ID/fixmycar/backend:latest
ports :
- containerPort : 8080
env :
- name : GCP_PROJECT_ID
value : "your-project-id"
- name : VERTEX_AI_DATASTORE_ID
value : "your-datastore-id"
resources :
requests :
memory : "512Mi"
cpu : "500m"
limits :
memory : "1Gi"
cpu : "1000m"
Deployment Steps
Prerequisites
Google Cloud project with billing
gcloud CLI installed
Docker or Colima for container builds
Java 18+, Maven 3.9.6+
Python 3.9+
Create Artifact Registry
gcloud artifacts repositories create fixmycar \
--repository-format=docker \
--location=us-central1
Build & Push Containers
# Authenticate Docker
gcloud auth configure-docker us-central1-docker.pkg.dev
# Update PROJECT_ID in dockerpush.sh
./dockerpush.sh
Create GKE Cluster
gcloud container clusters create-auto fixmycar \
--region=us-central1 \
--project=YOUR_PROJECT_ID
# Get credentials
gcloud container clusters get-credentials fixmycar \
--region=us-central1
Upload Owner's Manual
# Create bucket
gcloud storage buckets create gs://YOUR_PROJECT_ID-fixmycar \
--location=us-central1
# Upload manual
gcloud storage cp cymbal-starlight-2024.pdf \
gs://YOUR_PROJECT_ID-fixmycar/
Configure Vertex AI Search
Navigate to Agent Builder in console
Create Search app: YOUR_PROJECT_ID-fixmycar
Create datastore:
Source: Cloud Storage bucket
Parser: OCR Parser
Region: Global
Wait ~10 minutes for indexing
Test in Preview interface
Deploy to GKE
# Update image and env vars in YAML files
kubectl apply -f kubernetes/backend-deployment-vertex-search.yaml
kubectl apply -f kubernetes/backend-service.yaml
kubectl apply -f kubernetes/frontend-deployment.yaml
kubectl apply -f kubernetes/frontend-service.yaml
# Wait for pods to be ready
kubectl get pods -w
Access Application
# Get external IP
kubectl get service fixmycar-frontend
# Open in browser: http://EXTERNAL_IP
Testing & Validation
Example Queries
Cargo Capacity
Towing Capability
Maintenance Schedule
Cymbal Starlight 2024: What is the max cargo capacity?
# Expected Response:
# The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet.
# The cargo area is located in the trunk of the vehicle.
Backend Logs
View RAG pipeline execution:
kubectl logs -l app=fixmycar-backend --tail=100 -f
Example output:
2024-03-23T23:35:07.059Z INFO --- 🔍 Vertex AI Search results:
Chapter 6: Towing, Cargo, and Luggage
The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet.
2024-03-23T23:35:07.060Z INFO --- 🔮 Gemini Prompt:
You are a helpful car manual chatbot...
Human prompt: What is the max cargo capacity?
Grounding data: [The Cymbal Starlight 2024 has a cargo capacity...]
2024-03-23T23:35:07.762Z INFO --- 🔮 Gemini Response:
The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet.
Caching Strategy
@ Configuration
public class CacheConfig {
@ Bean
public CacheManager cacheManager () {
return new ConcurrentMapCacheManager ( "searchResults" );
}
}
@ Service
public class SearchService {
@ Cacheable ( value = "searchResults" , key = "#query" )
public String search ( String query ) {
return performVertexAISearch (query);
}
}
GKE Autoscaling
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : fixmycar-backend-hpa
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : fixmycar-backend
minReplicas : 2
maxReplicas : 10
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 70
Troubleshooting
Pods stuck in Pending state
GKE Autopilot is scaling up nodes. Wait 3-5 minutes. kubectl describe pod < pod-nam e >
403 Forbidden from Vertex AI
Check Workload Identity configuration: # Verify annotation
kubectl get sa fixmycar-backend-sa -o yaml
# Verify IAM binding
gcloud iam service-accounts get-iam-policy \
fixmycar-gsa@PROJECT_ID.iam.gserviceaccount.com
Vertex AI Search returns no results
Ensure:
Datastore indexing completed (check Activity tab)
OCR Parser selected (not Layout Parser)
PDFs uploaded to correct bucket path
Test query in Preview interface first
Backend returns 500 error
Check logs for detailed error: kubectl logs -l app=fixmycar-backend --tail=50
Common issues:
Incorrect VERTEX_AI_DATASTORE_ID
Missing GCP_PROJECT_ID
Network policy blocking egress
Cleanup
# Delete GKE cluster
gcloud container clusters delete fixmycar --region=us-central1
# Delete Artifact Registry
gcloud artifacts repositories delete fixmycar --location=us-central1
# Delete Cloud Storage bucket
gcloud storage rm -r gs://YOUR_PROJECT_ID-fixmycar
# Delete Vertex AI Search app
# (Must be done via console)
# Delete service account
gcloud iam service-accounts delete fixmycar-gsa@PROJECT_ID.iam.gserviceaccount.com
Key Takeaways
Vertex AI Search Managed search with OCR removes complexity of building custom RAG pipelines
Extractive Answers Pre-computed answers ensure accurate, low-latency retrieval
GKE Workload Identity Secure, keyless authentication for Google Cloud services
Spring Boot + Gemini Java ecosystem integrates seamlessly with Vertex AI SDKs
Next Steps