The YouTube Automation Agent uses multiple AI providers to generate high-quality content. This guide covers configuration and usage of each service.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/darkzOGx/youtube-automation-agent/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The system supports multiple AI providers with automatic fallback:| Provider | Purpose | Required | Fallback |
|---|---|---|---|
| OpenAI | Content, Images, TTS | Yes | None |
| Google Gemini | Alternative content generation | No | OpenAI |
| Azure Speech | High-quality TTS | No | OpenAI TTS |
| ElevenLabs | Premium voice generation | No | Azure/OpenAI |
| Replicate | Advanced video generation | No | Slideshow |
OpenAI Configuration
OpenAI is the primary AI provider and is required for the system to function.Get API Key
- Sign up at OpenAI Platform
- Navigate to API Keys
- Create a new secret key
- Copy the key (starts with
sk-proj-orsk-)
Available Models
Select the model that best fits your needs:- GPT-4 Turbo (Recommended)
- GPT-4
- GPT-3.5 Turbo
- GPT-3.5 Turbo 16K
- High-quality content generation
- Complex script writing
- SEO optimization
- Nuanced storytelling
OpenAI Features Used
The system utilizes multiple OpenAI services:1. Text Generation (GPT)
Used for:- Script writing
- Title and description generation
- SEO optimization
- Content strategy
2. Image Generation (DALL-E 3)
Used for:- Video thumbnails
- Visual assets for videos
- Background images
DALL-E 3 generates images in 16:9 aspect ratio (1792x1024) optimized for YouTube thumbnails and video content.
3. Text-to-Speech (TTS)
Used for:- Video narration
- Voice-over generation
Available OpenAI TTS Voices
Available OpenAI TTS Voices
- alloy - Neutral, balanced
- echo - Deep, authoritative
- fable - British, warm
- onyx - Deep, serious
- nova - Friendly, clear (default)
- shimmer - Soft, expressive
Google Gemini Configuration
Gemini can be used as an alternative or supplement to OpenAI for content generation.Get Gemini API Key
- Visit Google AI Studio
- Click “Get API Key”
- Create or select a Google Cloud project
- Copy the generated API key
When to Use Gemini
- Benefits
- Limitations
- Free tier - Generous free quota for testing
- Long context - Up to 1M tokens context window
- Multimodal - Native image and video understanding
- Cost-effective - Generally cheaper than GPT-4
Fallback Configuration
The system automatically uses Gemini as fallback if OpenAI fails:Azure Speech Services
Azure provides high-quality neural voices for text-to-speech.Create Azure Account
Sign up at Azure Portal
Create Speech Service
- Click “Create a resource”
- Search for “Speech”
- Select your region (e.g.,
eastus) - Choose pricing tier:
- F0 (Free): 5M characters/month
- S0 (Standard): Pay-as-you-go
Voice Selection
Azure offers premium neural voices:Female Voices
Female Voices
| Voice | Characteristics | Best For |
|---|---|---|
en-US-JennyNeural | Friendly, professional | General content, tutorials |
en-US-AriaNeural | Clear, expressive | News, informative content |
en-US-AmberNeural | Warm, conversational | Stories, personal vlogs |
en-US-AshleyNeural | Young, energetic | Tech, gaming content |
en-US-SaraNeural | Soft, storytelling | Narration, bedtime stories |
Male Voices
Male Voices
| Voice | Characteristics | Best For |
|---|---|---|
en-US-GuyNeural | Professional, authoritative | Business, education |
en-US-DavisNeural | Clear, trustworthy | News, documentaries |
en-US-TonyNeural | News-anchor style | Formal content |
en-US-BrianNeural | Friendly, approachable | Tutorials, how-tos |
en-US-AndrewNeural | Warm, mature | Storytelling |
TTS Priority
The system uses this priority order for TTS:- ElevenLabs (if configured) - Highest quality
- Azure Speech (if configured) - High quality neural voices
- OpenAI TTS (fallback) - Good quality, always available
ElevenLabs Configuration
ElevenLabs offers premium, ultra-realistic voice generation.Sign Up
Create account at ElevenLabs
Choose Plan
- Free: 10,000 characters/month
- Starter: $5/month - 30,000 characters
- Creator: $22/month - 100,000 characters
- Pro: $99/month - 500,000 characters
Get API Key
- Go to Profile Settings
- Copy your API key
Select Voice
- Browse Voice Library
- Choose a voice
- Copy the Voice ID
ElevenLabs provides the most natural-sounding voices and is recommended for premium channels focused on high production quality.
Replicate Configuration
Replicate provides access to advanced AI models including Stable Video Diffusion.Create Account
Sign up at Replicate
Get API Token
- Go to API Tokens
- Create a new token
- Copy it (starts with
r8_)
Models Used
The system uses:- Stable Video Diffusion - Convert images to video clips
- Custom video generation models - For animated content
Testing Your Configuration
After configuring AI providers, test the connections:- Validate all API keys
- Test connections to each service
- Report any configuration issues
Manual Testing
Test individual services:- OpenAI
- Azure Speech
- ElevenLabs
Cost Optimization
Tips for managing AI service costs:Use GPT-3.5 for Simple Tasks
Use GPT-3.5 for Simple Tasks
Configure different models for different agent types:
Cache Generated Content
Cache Generated Content
The system automatically caches:
- Generated scripts
- Visual assets
- Audio files
Set Usage Limits
Set Usage Limits
Configure limits in each provider’s dashboard:
- OpenAI: Set monthly spending limits
- Azure: Use free tier for testing
- ElevenLabs: Choose appropriate plan
- Replicate: Monitor per-prediction costs
Monitor Usage
Monitor Usage
Enable analytics to track AI costs:View cost reports at
/analytics/ai-costsBest Practices
Use Environment-Specific Keys
Use different API keys for development and production to separate usage tracking.
Enable Fallbacks
Configure multiple providers so the system can continue operating if one fails.
Monitor Quotas
Regularly check API quotas and usage to avoid service interruptions.
Rotate Keys Periodically
Change API keys every few months for security best practices.
Next Steps
YouTube Setup
Complete YouTube API configuration and authentication