Documentation Index Fetch the complete documentation index at: https://mintlify.com/firebase/genkit/llms.txt
Use this file to discover all available pages before exploring further.
Cloud Run Deployment
Deploy Genkit applications to Google Cloud Run with automatic scaling, containerization, and support for all languages (JavaScript, Go, Python).
Overview
Cloud Run provides:
Fully managed - Serverless container platform
Any language - JavaScript, Go, Python, or any container
Automatic scaling - Scale to zero when not in use
Pay per use - Only pay for actual request time
Custom domains - Map to your own domain
Prerequisites
# Install Google Cloud CLI
curl https://sdk.cloud.google.com | bash
# Login and set project
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
# Enable required APIs
gcloud services enable run.googleapis.com
gcloud services enable cloudbuild.googleapis.com
Node.js Deployment
1. Create Express Server
import { expressHandler , startFlowServer } from '@genkit-ai/express' ;
import { googleAI } from '@genkit-ai/google-genai' ;
import express from 'express' ;
import { genkit , z } from 'genkit' ;
const ai = genkit ({
plugins: [ googleAI ()],
});
const jokeFlow = ai . defineFlow (
{
name: 'jokeFlow' ,
inputSchema: z . string (),
outputSchema: z . string (),
},
async ( subject ) => {
const result = await ai . generate ({
model: googleAI . model ( 'gemini-2.5-flash' ),
prompt: `Tell me a joke about ${ subject } ` ,
});
return result . text ;
}
);
const app = express ();
app . use ( express . json ());
// Health check for Cloud Run
app . get ( '/health' , ( req , res ) => {
res . status ( 200 ). json ({ status: 'healthy' });
});
// Expose flow
app . post ( '/joke' , expressHandler ( jokeFlow ));
const port = process . env . PORT || 8080 ;
app . listen ( port , () => {
console . log ( `Server listening on port ${ port } ` );
});
2. Create Dockerfile
FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
ENV PORT=8080
EXPOSE 8080
CMD [ "node" , "dist/index.js" ]
3. Create .dockerignore
node_modules
npm-debug.log
.git
.gitignore
README.md
.env
*.local
dist
build
4. Deploy to Cloud Run
# Build and deploy in one command
gcloud run deploy genkit-app \
--source . \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars GEMINI_API_KEY=your-api-key
# Or build separately
gcloud builds submit --tag gcr.io/PROJECT_ID/genkit-app
gcloud run deploy genkit-app \
--image gcr.io/PROJECT_ID/genkit-app \
--region us-central1
Go Deployment
1. Create Go Server
package main
import (
" context "
" fmt "
" log "
" net/http "
" os "
" github.com/firebase/genkit/go/ai "
" github.com/firebase/genkit/go/genkit "
" github.com/firebase/genkit/go/plugins/googlegenai "
)
func main () {
ctx := context . Background ()
// Initialize Genkit
g := genkit . Init ( ctx , genkit . WithPlugins ( & googlegenai . GoogleAI {}))
// Define a flow
genkit . DefineFlow ( g , "jokeFlow" ,
func ( ctx context . Context , input string ) ( string , error ) {
if input == "" {
input = "programming"
}
return genkit . GenerateText ( ctx , g ,
ai . WithModelName ( "googleai/gemini-2.5-flash" ),
ai . WithPrompt ( "Tell me a joke about %s ." , input ),
)
},
)
// Create HTTP server
mux := http . NewServeMux ()
// Health check
mux . HandleFunc ( "GET /health" , func ( w http . ResponseWriter , r * http . Request ) {
w . WriteHeader ( http . StatusOK )
w . Write ([] byte ( `{"status":"healthy"}` ))
})
// Expose all flows as HTTP endpoints
for _ , flow := range genkit . ListFlows ( g ) {
mux . HandleFunc ( "POST /" + flow . Name (), genkit . Handler ( flow ))
}
// Get port from environment (Cloud Run sets this)
port := os . Getenv ( "PORT" )
if port == "" {
port = "8080"
}
addr := fmt . Sprintf ( ": %s " , port )
log . Printf ( "Server listening on %s " , addr )
log . Fatal ( http . ListenAndServe ( addr , mux ))
}
2. Create Dockerfile for Go
# Build stage
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /server .
# Runtime stage
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /server .
ENV PORT=8080
EXPOSE 8080
CMD [ "./server" ]
3. Deploy Go App
gcloud run deploy genkit-go-app \
--source . \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars GEMINI_API_KEY=your-api-key
Python Deployment
1. Create FastAPI Server
import os
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI
ai = Genkit(
plugins = [GoogleAI()],
model = 'googleai/gemini-2.0-flash' ,
)
app = FastAPI( title = 'Genkit App' )
class JokeRequest ( BaseModel ):
subject: str
class JokeResponse ( BaseModel ):
text: str
@app.get ( '/health' )
async def health ():
return { 'status' : 'healthy' }
@ai.flow ()
async def joke_flow ( subject : str ) -> str :
"""Generate a joke about a subject."""
response = await ai.generate(
prompt = f 'Tell me a joke about { subject } '
)
return response.text
@app.post ( '/joke' , response_model = JokeResponse)
async def joke_endpoint ( request : JokeRequest) -> JokeResponse:
result = await joke_flow(request.subject)
return JokeResponse( text = result)
if __name__ == '__main__' :
port = int (os.getenv( 'PORT' , 8080 ))
uvicorn.run(app, host = '0.0.0.0' , port = port)
2. Create requirements.txt
fastapi
uvicorn[standard]
genkit
genkit-plugin-google-genai
3. Create Dockerfile for Python
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PORT=8080
EXPOSE 8080
CMD [ "python" , "main.py" ]
4. Deploy Python App
gcloud run deploy genkit-python-app \
--source . \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars GEMINI_API_KEY=your-api-key
Configuration
Environment Variables
# Set environment variables
gcloud run deploy genkit-app \
--set-env-vars GEMINI_API_KEY=your-key \
--set-env-vars LOG_LEVEL=info
# Or use Secret Manager
gcloud run deploy genkit-app \
--update-secrets GEMINI_API_KEY=genkit-api-key:latest
Memory and CPU
gcloud run deploy genkit-app \
--memory 2Gi \
--cpu 2 \
--timeout 300s # 5 minutes
Concurrency and Autoscaling
gcloud run deploy genkit-app \
--concurrency 80 \
--min-instances 1 \
--max-instances 100
Custom Domain
# Map to your domain
gcloud run domain-mappings create \
--service genkit-app \
--domain api.yourdomain.com \
--region us-central1
Authentication
Require Authentication
# Deploy with authentication required
gcloud run deploy genkit-app \
--no-allow-unauthenticated
# Call with authentication
curl -H "Authorization: Bearer $( gcloud auth print-identity-token)" \
https://genkit-app-xxx.run.app/joke
Service Account
# Create service account
gcloud iam service-accounts create genkit-service
# Grant permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
--member= "serviceAccount:genkit-service@PROJECT_ID.iam.gserviceaccount.com" \
--role= "roles/aiplatform.user"
# Deploy with service account
gcloud run deploy genkit-app \
--service-account genkit-service@PROJECT_ID.iam.gserviceaccount.com
Monitoring
View Logs
# Stream logs
gcloud run services logs tail genkit-app \
--region us-central1
# View in Cloud Console
echo "https://console.cloud.google.com/run/detail/us-central1/genkit-app/logs"
Enable Tracing
import { enableGoogleCloudTelemetry } from '@genkit-ai/google-cloud' ;
enableGoogleCloudTelemetry ({
projectId: 'your-project-id' ,
});
Testing
Test Deployed Service
# Get service URL
SERVICE_URL = $( gcloud run services describe genkit-app \
--region us-central1 \
--format 'value(status.url)' )
# Test health check
curl $SERVICE_URL /health
# Test flow
curl -X POST $SERVICE_URL /joke \
-H "Content-Type: application/json" \
-d '{"data": "programming"}'
Load Testing
# Install Apache Bench
sudo apt-get install apache2-utils
# Run load test
ab -n 100 -c 10 -p data.json -T application/json \
$SERVICE_URL /joke
Multi-Region Deployment
Deploy to multiple regions for lower latency:
# Deploy to multiple regions
for region in us-central1 europe-west1 asia-east1 ; do
gcloud run deploy genkit-app \
--region $region \
--source .
done
# Use Cloud Load Balancer for global routing
gcloud compute backend-services create genkit-backend \
--global \
--load-balancing-scheme=EXTERNAL
Cost Optimization
Scale to Zero
# Allow scaling to zero (default)
gcloud run deploy genkit-app \
--min-instances 0
CPU Allocation
# Only allocate CPU during request processing
gcloud run deploy genkit-app \
--cpu-throttling # Default
# Keep CPU always allocated (faster response, higher cost)
gcloud run deploy genkit-app \
--no-cpu-throttling
Troubleshooting
Container Fails to Start
Problem: Service deployment fails.
Solution: Check logs:
gcloud run services logs read genkit-app \
--region us-central1 \
--limit 50
Timeout Errors
Problem: Requests timeout.
Solution: Increase timeout:
gcloud run deploy genkit-app \
--timeout 540s # Max 60 minutes for 2nd gen
Out of Memory
Problem: Container crashes with OOM.
Solution: Increase memory:
gcloud run deploy genkit-app \
--memory 4Gi
Best Practices
Use health checks - Cloud Run uses / by default, add a dedicated endpoint
Set appropriate timeouts - AI operations need longer timeouts than default
Enable tracing - Use Cloud Trace for debugging
Use secrets - Store API keys in Secret Manager, not environment variables
Implement graceful shutdown - Handle SIGTERM signals
Monitor costs - Set up billing alerts
Next Steps
Express Plugin Learn about Express.js integration
Monitoring Set up Cloud Trace and monitoring