Publishing Bot APIs - Bedrock Chat

API publishing allows you to expose your custom bots as standalone REST APIs. This enables integration with external applications, workflows, and services without requiring users to access the web interface.

Overview

Published APIs provide:

Standalone Endpoints: Independent API Gateway endpoints
API Key Authentication: Secure access with generated keys
Rate Limiting: Configurable throttling and quotas
IP Restrictions: WAF-based access control
Async Operations: Handle long-running LLM operations

Only bots with shared_scope = "all" (public) can be published as APIs. The bot must be shared with all users before publishing.

Prerequisites

User Permissions

API publishing requires membership in the PublishAllowed Cognito user group:

Navigate to Amazon Cognito in the AWS Console
Select your user pool (CloudFormation output: AuthUserPoolIdxxxx)
Add users to the PublishAllowed group

Bot Requirements

Bot must be public (shared_scope = "all")
Bot sync status must be SUCCEEDED
Bot must be tested and working correctly

Publishing a Bot

Via UI

Navigate to your public bot
Click API Publish Settings
Configure settings:
- Burst Limit: Max requests per second during burst
- Rate Limit: Sustained requests per second
- Quota: Total requests allowed per period
Click Publish

Deployment takes about 3-5 minutes via AWS CodeBuild.

API Settings

Throttling Configuration

Burst Limit:

Maximum requests per second during burst periods
Default: 100 requests/second
Handles traffic spikes

Rate Limit:

Sustained requests per second
Default: 50 requests/second
Long-term throughput limit

Quota:

Total requests allowed per time period
Example: 10,000 requests/day
Prevents excessive usage

For more details, see AWS API Gateway Throttling.

Architecture

Published APIs use a dedicated infrastructure stack:

Client → WAF → API Gateway → Lambda → SQS → LLM Processing
          ↓                      ↓
    IP Filter              API Key Check

Components

AWS WAF: IP address restrictions (shared across all published APIs)
API Gateway: REST API endpoint with API key authentication
Lambda: Request handler and response formatter
SQS: Decouples request from LLM processing (handles >30s operations)
API Keys: Generated keys for authentication

Why SQS?

LLM response generation can exceed API Gateway’s 30-second timeout. SQS enables:

Client sends request → Immediate acknowledgment
Request queued in SQS
Worker processes request asynchronously
Client polls for result

API Specification

Published APIs follow a standard format. See the complete API specification for details.

Request Format

POST https://{api-id}.execute-api.{region}.amazonaws.com/prod/predict
Content-Type: application/json
x-api-key: your-api-key-here

{
  "message": "Your question or prompt here",
  "temperature": 0.7,
  "max_tokens": 2000
}

Response Format

{
  "request_id": "req-123",
  "status": "processing",
  "message": "Request accepted"
}

Polling for Results

GET https://{api-id}.execute-api.{region}.amazonaws.com/prod/result/{request_id}
x-api-key: your-api-key-here

Response:

{
  "request_id": "req-123",
  "status": "completed",
  "response": "The AI-generated response text...",
  "usage": {
    "input_tokens": 150,
    "output_tokens": 300
  }
}

Managing API Keys

Creating Keys

After publishing:

View your published bot
Navigate to API Keys section
Click Generate New Key
Copy the key (shown only once)

Store API keys securely. They cannot be retrieved after initial generation.

Rotating Keys

Generate a new API key
Update client applications with new key
Delete old key once migration is complete

Revoking Keys

Navigate to API Keys section
Select the key to revoke
Click Delete

Revoked keys immediately stop working.

Security

IP Address Restrictions

Configure allowed IP ranges in CDK deployment:

{
  "publishedApiAllowedIpV4AddressRanges": ["203.0.113.0/24"],
  "publishedApiAllowedIpV6AddressRanges": ["2001:db8::/32"]
}

WAF rules are shared across all published APIs to reduce costs. IP restrictions apply to all published bot APIs.

Authentication

All requests require x-api-key header:

curl -X POST https://api-endpoint/prod/predict \
  -H "x-api-key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello"}'

Missing or invalid keys receive 403 Forbidden.

Rate Limiting

Exceeding rate limits returns:

{
  "message": "Too Many Requests"
}

HTTP Status: 429 Clients should implement exponential backoff.

Monitoring and Analytics

CloudWatch Metrics

View metrics for published APIs:

Request count
Error rate (4xx, 5xx)
Latency (p50, p99)
Throttled requests

Bot Analytics

Administrators can view:

API usage per bot
Token consumption
Cost allocation
Request patterns

See Administrator documentation for details.

Updating Published APIs

Bot Changes

When you update the bot:

Instructions, knowledge, and tools update automatically
No API redeployment required
Changes apply to new requests immediately

Throttling Changes

To update rate limits:

Edit API publish settings
Adjust burst/rate/quota values
Click Update

Changes apply within minutes.

Unpublishing APIs

To remove a published API:

Navigate to bot API Publish Settings
Click Unpublish
Confirm deletion

This:

Deletes the CloudFormation stack
Removes API Gateway endpoint
Revokes all API keys
Cannot be undone

Existing client applications will receive 404 Not Found errors after unpublishing.

Integration Examples

Python Client

import requests
import time

API_ENDPOINT = "https://abc123.execute-api.us-east-1.amazonaws.com/prod"
API_KEY = "your-api-key"

def ask_bot(message: str) -> str:
    # Submit request
    response = requests.post(
        f"{API_ENDPOINT}/predict",
        headers={"x-api-key": API_KEY},
        json={"message": message}
    )
    request_id = response.json()["request_id"]
    
    # Poll for result
    while True:
        result = requests.get(
            f"{API_ENDPOINT}/result/{request_id}",
            headers={"x-api-key": API_KEY}
        )
        data = result.json()
        
        if data["status"] == "completed":
            return data["response"]
        elif data["status"] == "failed":
            raise Exception(data.get("error", "Unknown error"))
        
        time.sleep(2)  # Wait before next poll

# Usage
answer = ask_bot("What is the capital of France?")
print(answer)

JavaScript/Node.js Client

const axios = require('axios');

const API_ENDPOINT = 'https://abc123.execute-api.us-east-1.amazonaws.com/prod';
const API_KEY = 'your-api-key';

async function askBot(message) {
  // Submit request
  const { data: submitResponse } = await axios.post(
    `${API_ENDPOINT}/predict`,
    { message },
    { headers: { 'x-api-key': API_KEY } }
  );
  
  const requestId = submitResponse.request_id;
  
  // Poll for result
  while (true) {
    const { data } = await axios.get(
      `${API_ENDPOINT}/result/${requestId}`,
      { headers: { 'x-api-key': API_KEY } }
    );
    
    if (data.status === 'completed') {
      return data.response;
    } else if (data.status === 'failed') {
      throw new Error(data.error || 'Unknown error');
    }
    
    await new Promise(resolve => setTimeout(resolve, 2000));
  }
}

// Usage
askBot('What is the capital of France?')
  .then(answer => console.log(answer))
  .catch(error => console.error(error));

cURL Example

# Submit request
REQUEST_ID=$(curl -X POST \
  https://abc123.execute-api.us-east-1.amazonaws.com/prod/predict \
  -H "x-api-key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"message": "What is the capital of France?"}' \
  | jq -r '.request_id')

# Poll for result
while true; do
  RESULT=$(curl -s \
    https://abc123.execute-api.us-east-1.amazonaws.com/prod/result/$REQUEST_ID \
    -H "x-api-key: your-api-key")
  
  STATUS=$(echo $RESULT | jq -r '.status')
  
  if [ "$STATUS" = "completed" ]; then
    echo $RESULT | jq -r '.response'
    break
  elif [ "$STATUS" = "failed" ]; then
    echo "Error: $(echo $RESULT | jq -r '.error')"
    exit 1
  fi
  
  sleep 2
done

Best Practices

Key Management

Rotate API keys regularly. Use different keys for different clients or environments.

Error Handling

Implement retry logic with exponential backoff for transient failures.

Rate Limiting

Design clients to respect rate limits. Cache responses when possible.

Monitoring

Monitor API usage and errors. Set up CloudWatch alarms for anomalies.

Troubleshooting

403 Forbidden

Causes:

Invalid or missing API key
IP address not in allowlist
Key has been revoked

Solution: Verify API key and source IP address.

429 Too Many Requests

Causes:

Exceeded burst limit
Exceeded rate limit
Quota exhausted

Solution: Implement backoff logic or request quota increase.

504 Gateway Timeout

Causes:

LLM processing taking too long
SQS queue full

Solution: Use async pattern (submit + poll). Increase timeout if possible.

Bot Not Available for Publishing

Causes:

Bot is not public (shared_scope != "all")
User not in PublishAllowed group
Bot sync status is not SUCCEEDED

Solution: Make bot public and ensure it’s fully synced.

Cost Optimization

Shared WAF: IP restrictions shared across APIs reduces costs
Async Processing: SQS prevents Lambda timeout charges
API Caching: Implement client-side caching for repeated queries
Quotas: Set reasonable limits to control usage and costs

Next Steps

Create a Bot

Build a bot to publish as an API

Bot Store

Share your bot before publishing

Admin Guide

Manage published APIs as administrator

API Docs

Complete API specification

Get Started

Deployment

Core Features

Configuration

Administration

Development

Migration & Support

​Overview

​Prerequisites

​User Permissions

​Bot Requirements

​Publishing a Bot

​Via UI

​API Settings

​Architecture

​Components

​Why SQS?

​API Specification

​Request Format

​Response Format

​Polling for Results

​Managing API Keys

​Creating Keys

​Rotating Keys

​Revoking Keys

​Security

​IP Address Restrictions

​Authentication

​Rate Limiting

​Monitoring and Analytics

​CloudWatch Metrics

​Bot Analytics

​Updating Published APIs

​Bot Changes

​Throttling Changes

​Unpublishing APIs

​Integration Examples

​Best Practices

Key Management

Error Handling

Rate Limiting

Monitoring

​Troubleshooting

​Cost Optimization

​Next Steps

Create a Bot

Bot Store

Admin Guide

API Docs

Build docs developers (and LLMs) love

Overview

Prerequisites

User Permissions

Bot Requirements

Publishing a Bot

Via UI

API Settings

Architecture

Components

Why SQS?

API Specification

Request Format

Response Format

Polling for Results

Managing API Keys

Creating Keys

Rotating Keys

Revoking Keys

Security

IP Address Restrictions

Authentication

Rate Limiting

Monitoring and Analytics

CloudWatch Metrics

Bot Analytics

Updating Published APIs

Bot Changes

Throttling Changes

Unpublishing APIs

Integration Examples

Best Practices

Troubleshooting

Cost Optimization

Next Steps