Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/darkzOGx/youtube-automation-agent/llms.txt

Use this file to discover all available pages before exploring further.

The YouTube Automation Agent uses multiple AI providers to generate high-quality content. This guide covers configuration and usage of each service.

Overview

The system supports multiple AI providers with automatic fallback:
ProviderPurposeRequiredFallback
OpenAIContent, Images, TTSYesNone
Google GeminiAlternative content generationNoOpenAI
Azure SpeechHigh-quality TTSNoOpenAI TTS
ElevenLabsPremium voice generationNoAzure/OpenAI
ReplicateAdvanced video generationNoSlideshow

OpenAI Configuration

OpenAI is the primary AI provider and is required for the system to function.
1

Get API Key

  1. Sign up at OpenAI Platform
  2. Navigate to API Keys
  3. Create a new secret key
  4. Copy the key (starts with sk-proj- or sk-)
2

Configure in credentials.json

{
  "openai": {
    "apiKey": "sk-proj-...",
    "model": "gpt-4-turbo-preview"
  }
}
3

Or use environment variable

OPENAI_API_KEY=sk-proj-...

Available Models

Select the model that best fits your needs:

OpenAI Features Used

The system utilizes multiple OpenAI services:

1. Text Generation (GPT)

Used for:
  • Script writing
  • Title and description generation
  • SEO optimization
  • Content strategy
// Configured model is used automatically
await openai.chat.completions.create({
  model: credentials.openai.model,
  messages: [...]
});

2. Image Generation (DALL-E 3)

Used for:
  • Video thumbnails
  • Visual assets for videos
  • Background images
await openai.images.generate({
  model: "dall-e-3",
  prompt: "...",
  size: "1792x1024",
  quality: "hd"
});
DALL-E 3 generates images in 16:9 aspect ratio (1792x1024) optimized for YouTube thumbnails and video content.

3. Text-to-Speech (TTS)

Used for:
  • Video narration
  • Voice-over generation
await openai.audio.speech.create({
  model: "tts-1-hd",
  voice: "nova",
  input: scriptText
});
  • alloy - Neutral, balanced
  • echo - Deep, authoritative
  • fable - British, warm
  • onyx - Deep, serious
  • nova - Friendly, clear (default)
  • shimmer - Soft, expressive

Google Gemini Configuration

Gemini can be used as an alternative or supplement to OpenAI for content generation.
1

Get Gemini API Key

  1. Visit Google AI Studio
  2. Click “Get API Key”
  3. Create or select a Google Cloud project
  4. Copy the generated API key
2

Add to credentials.json

{
  "gemini": {
    "apiKey": "AIza..."
  }
}

When to Use Gemini

  • Free tier - Generous free quota for testing
  • Long context - Up to 1M tokens context window
  • Multimodal - Native image and video understanding
  • Cost-effective - Generally cheaper than GPT-4

Fallback Configuration

The system automatically uses Gemini as fallback if OpenAI fails:
try {
  // Try OpenAI first
  content = await generateWithOpenAI(prompt);
} catch (error) {
  // Fallback to Gemini
  if (credentials.gemini) {
    content = await generateWithGemini(prompt);
  }
}

Azure Speech Services

Azure provides high-quality neural voices for text-to-speech.
1

Create Azure Account

Sign up at Azure Portal
2

Create Speech Service

  1. Click “Create a resource”
  2. Search for “Speech”
  3. Select your region (e.g., eastus)
  4. Choose pricing tier:
    • F0 (Free): 5M characters/month
    • S0 (Standard): Pay-as-you-go
3

Get Credentials

Navigate to “Keys and Endpoint” and copy:
  • Key 1 (subscription key)
  • Region
4

Configure

{
  "azureSpeech": {
    "subscriptionKey": "your_key",
    "region": "eastus",
    "voice": "en-US-JennyNeural"
  }
}
Or use environment variables:
AZURE_SPEECH_KEY=your_subscription_key
AZURE_SPEECH_REGION=eastus
TTS_VOICE=en-US-JennyNeural

Voice Selection

Azure offers premium neural voices:
VoiceCharacteristicsBest For
en-US-JennyNeuralFriendly, professionalGeneral content, tutorials
en-US-AriaNeuralClear, expressiveNews, informative content
en-US-AmberNeuralWarm, conversationalStories, personal vlogs
en-US-AshleyNeuralYoung, energeticTech, gaming content
en-US-SaraNeuralSoft, storytellingNarration, bedtime stories
VoiceCharacteristicsBest For
en-US-GuyNeuralProfessional, authoritativeBusiness, education
en-US-DavisNeuralClear, trustworthyNews, documentaries
en-US-TonyNeuralNews-anchor styleFormal content
en-US-BrianNeuralFriendly, approachableTutorials, how-tos
en-US-AndrewNeuralWarm, matureStorytelling

TTS Priority

The system uses this priority order for TTS:
  1. ElevenLabs (if configured) - Highest quality
  2. Azure Speech (if configured) - High quality neural voices
  3. OpenAI TTS (fallback) - Good quality, always available

ElevenLabs Configuration

ElevenLabs offers premium, ultra-realistic voice generation.
1

Sign Up

Create account at ElevenLabs
2

Choose Plan

  • Free: 10,000 characters/month
  • Starter: $5/month - 30,000 characters
  • Creator: $22/month - 100,000 characters
  • Pro: $99/month - 500,000 characters
3

Get API Key

  1. Go to Profile Settings
  2. Copy your API key
4

Select Voice

  1. Browse Voice Library
  2. Choose a voice
  3. Copy the Voice ID
5

Configure

{
  "elevenLabs": {
    "apiKey": "your_api_key",
    "voiceId": "your_voice_id"
  }
}
Or:
ELEVENLABS_API_KEY=your_api_key
ELEVENLABS_VOICE_ID=your_voice_id
ElevenLabs provides the most natural-sounding voices and is recommended for premium channels focused on high production quality.

Replicate Configuration

Replicate provides access to advanced AI models including Stable Video Diffusion.
1

Create Account

Sign up at Replicate
2

Get API Token

  1. Go to API Tokens
  2. Create a new token
  3. Copy it (starts with r8_)
3

Add to Configuration

{
  "replicate": {
    "apiKey": "r8_..."
  }
}
Or:
REPLICATE_API_KEY=r8_...

Models Used

The system uses:
  • Stable Video Diffusion - Convert images to video clips
  • Custom video generation models - For animated content
Replicate charges per prediction. Video generation can be expensive. Monitor your usage carefully.

Testing Your Configuration

After configuring AI providers, test the connections:
npm run credentials:setup
This will:
  1. Validate all API keys
  2. Test connections to each service
  3. Report any configuration issues

Manual Testing

Test individual services:
const { Configuration, OpenAIApi } = require('openai');
const config = new Configuration({
  apiKey: process.env.OPENAI_API_KEY
});
const openai = new OpenAIApi(config);

// Test
await openai.listModels();
console.log('✅ OpenAI connected');

Cost Optimization

Tips for managing AI service costs:
Configure different models for different agent types:
{
  "agents": {
    "strategy": "gpt-4-turbo-preview",
    "script": "gpt-4-turbo-preview",
    "seo": "gpt-3.5-turbo",
    "production": "gpt-3.5-turbo"
  }
}
The system automatically caches:
  • Generated scripts
  • Visual assets
  • Audio files
Reuse when possible to avoid regeneration costs.
Configure limits in each provider’s dashboard:
  • OpenAI: Set monthly spending limits
  • Azure: Use free tier for testing
  • ElevenLabs: Choose appropriate plan
  • Replicate: Monitor per-prediction costs
Enable analytics to track AI costs:
ENABLE_ANALYTICS=true
View cost reports at /analytics/ai-costs

Best Practices

Use Environment-Specific Keys

Use different API keys for development and production to separate usage tracking.

Enable Fallbacks

Configure multiple providers so the system can continue operating if one fails.

Monitor Quotas

Regularly check API quotas and usage to avoid service interruptions.

Rotate Keys Periodically

Change API keys every few months for security best practices.

Next Steps

YouTube Setup

Complete YouTube API configuration and authentication

Build docs developers (and LLMs) love