Overview
Published APIs provide:- Standalone Endpoints: Independent API Gateway endpoints
- API Key Authentication: Secure access with generated keys
- Rate Limiting: Configurable throttling and quotas
- IP Restrictions: WAF-based access control
- Async Operations: Handle long-running LLM operations
Only bots with
shared_scope = "all" (public) can be published as APIs. The bot must be shared with all users before publishing.Prerequisites
User Permissions
API publishing requires membership in thePublishAllowed Cognito user group:
- Navigate to Amazon Cognito in the AWS Console
- Select your user pool (CloudFormation output:
AuthUserPoolIdxxxx) - Add users to the
PublishAllowedgroup
Bot Requirements
- Bot must be public (
shared_scope = "all") - Bot sync status must be
SUCCEEDED - Bot must be tested and working correctly
Publishing a Bot
Via UI
- Navigate to your public bot
- Click API Publish Settings
- Configure settings:
- Burst Limit: Max requests per second during burst
- Rate Limit: Sustained requests per second
- Quota: Total requests allowed per period
- Click Publish
API Settings
Throttling Configuration
Throttling Configuration
Burst Limit:
- Maximum requests per second during burst periods
- Default: 100 requests/second
- Handles traffic spikes
- Sustained requests per second
- Default: 50 requests/second
- Long-term throughput limit
- Total requests allowed per time period
- Example: 10,000 requests/day
- Prevents excessive usage
Architecture
Published APIs use a dedicated infrastructure stack:Components
- AWS WAF: IP address restrictions (shared across all published APIs)
- API Gateway: REST API endpoint with API key authentication
- Lambda: Request handler and response formatter
- SQS: Decouples request from LLM processing (handles >30s operations)
- API Keys: Generated keys for authentication
Why SQS?
LLM response generation can exceed API Gateway’s 30-second timeout. SQS enables:- Client sends request → Immediate acknowledgment
- Request queued in SQS
- Worker processes request asynchronously
- Client polls for result
API Specification
Published APIs follow a standard format. See the complete API specification for details.Request Format
Response Format
Polling for Results
Managing API Keys
Creating Keys
After publishing:- View your published bot
- Navigate to API Keys section
- Click Generate New Key
- Copy the key (shown only once)
Store API keys securely. They cannot be retrieved after initial generation.
Rotating Keys
- Generate a new API key
- Update client applications with new key
- Delete old key once migration is complete
Revoking Keys
- Navigate to API Keys section
- Select the key to revoke
- Click Delete
Security
IP Address Restrictions
Configure allowed IP ranges in CDK deployment:WAF rules are shared across all published APIs to reduce costs. IP restrictions apply to all published bot APIs.
Authentication
All requests requirex-api-key header:
403 Forbidden.
Rate Limiting
Exceeding rate limits returns:429
Clients should implement exponential backoff.
Monitoring and Analytics
CloudWatch Metrics
View metrics for published APIs:- Request count
- Error rate (4xx, 5xx)
- Latency (p50, p99)
- Throttled requests
Bot Analytics
Administrators can view:- API usage per bot
- Token consumption
- Cost allocation
- Request patterns
Updating Published APIs
Bot Changes
When you update the bot:- Instructions, knowledge, and tools update automatically
- No API redeployment required
- Changes apply to new requests immediately
Throttling Changes
To update rate limits:- Edit API publish settings
- Adjust burst/rate/quota values
- Click Update
Unpublishing APIs
To remove a published API:- Navigate to bot API Publish Settings
- Click Unpublish
- Confirm deletion
- Deletes the CloudFormation stack
- Removes API Gateway endpoint
- Revokes all API keys
- Cannot be undone
Existing client applications will receive
404 Not Found errors after unpublishing.Integration Examples
Python Client
Python Client
JavaScript/Node.js Client
JavaScript/Node.js Client
cURL Example
cURL Example
Best Practices
Key Management
Rotate API keys regularly. Use different keys for different clients or environments.
Error Handling
Implement retry logic with exponential backoff for transient failures.
Rate Limiting
Design clients to respect rate limits. Cache responses when possible.
Monitoring
Monitor API usage and errors. Set up CloudWatch alarms for anomalies.
Troubleshooting
403 Forbidden
403 Forbidden
Causes:
- Invalid or missing API key
- IP address not in allowlist
- Key has been revoked
429 Too Many Requests
429 Too Many Requests
Causes:
- Exceeded burst limit
- Exceeded rate limit
- Quota exhausted
504 Gateway Timeout
504 Gateway Timeout
Causes:
- LLM processing taking too long
- SQS queue full
Bot Not Available for Publishing
Bot Not Available for Publishing
Causes:
- Bot is not public (
shared_scope != "all") - User not in
PublishAllowedgroup - Bot sync status is not
SUCCEEDED
Cost Optimization
- Shared WAF: IP restrictions shared across APIs reduces costs
- Async Processing: SQS prevents Lambda timeout charges
- API Caching: Implement client-side caching for repeated queries
- Quotas: Set reasonable limits to control usage and costs
Next Steps
Create a Bot
Build a bot to publish as an API
Bot Store
Share your bot before publishing
Admin Guide
Manage published APIs as administrator
API Docs
Complete API specification