AI Providers

The YouTube Automation Agent uses multiple AI providers to generate high-quality content. This guide covers configuration and usage of each service.

Overview

The system supports multiple AI providers with automatic fallback:

Provider	Purpose	Required	Fallback
OpenAI	Content, Images, TTS	Yes	None
Google Gemini	Alternative content generation	No	OpenAI
Azure Speech	High-quality TTS	No	OpenAI TTS
ElevenLabs	Premium voice generation	No	Azure/OpenAI
Replicate	Advanced video generation	No	Slideshow

OpenAI Configuration

OpenAI is the primary AI provider and is required for the system to function.

Get API Key

Sign up at OpenAI Platform
Navigate to API Keys
Create a new secret key
Copy the key (starts with sk-proj- or sk-)

Configure in credentials.json

{
  "openai": {
    "apiKey": "sk-proj-...",
    "model": "gpt-4-turbo-preview"
  }
}

Or use environment variable

OPENAI_API_KEY=sk-proj-...

Available Models

Select the model that best fits your needs:

GPT-4 Turbo (Recommended)
GPT-4
GPT-3.5 Turbo
GPT-3.5 Turbo 16K

"model": "gpt-4-turbo-preview"

Best for:

High-quality content generation
Complex script writing
SEO optimization
Nuanced storytelling

Cost: ~

0.01 per 1K tokens (input), ~

0.03 per 1K tokens (output)

"model": "gpt-4"

Best for:

Maximum quality
Premium content channels
Complex analysis

Cost: ~

0.03 per 1K tokens (input), ~

0.06 per 1K tokens (output)

"model": "gpt-3.5-turbo"

Best for:

High-volume content
Budget-conscious operations
Simple content types

Cost: ~

0.0005 per 1K tokens (input), ~

0.0015 per 1K tokens (output)

"model": "gpt-3.5-turbo-16k"

Best for:

Long-form content
Extended scripts
Large context windows

Cost: ~

0.003 per 1K tokens (input), ~

0.004 per 1K tokens (output)

OpenAI Features Used

The system utilizes multiple OpenAI services:

1. Text Generation (GPT)

Used for:

Script writing
Title and description generation
SEO optimization
Content strategy

// Configured model is used automatically
await openai.chat.completions.create({
  model: credentials.openai.model,
  messages: [...]
});

2. Image Generation (DALL-E 3)

Used for:

Video thumbnails
Visual assets for videos
Background images

await openai.images.generate({
  model: "dall-e-3",
  prompt: "...",
  size: "1792x1024",
  quality: "hd"
});

DALL-E 3 generates images in 16:9 aspect ratio (1792x1024) optimized for YouTube thumbnails and video content.

3. Text-to-Speech (TTS)

Used for:

Video narration
Voice-over generation

await openai.audio.speech.create({
  model: "tts-1-hd",
  voice: "nova",
  input: scriptText
});

Available OpenAI TTS Voices

alloy - Neutral, balanced
echo - Deep, authoritative
fable - British, warm
onyx - Deep, serious
nova - Friendly, clear (default)
shimmer - Soft, expressive

Google Gemini Configuration

Gemini can be used as an alternative or supplement to OpenAI for content generation.

Get Gemini API Key

Visit Google AI Studio
Click “Get API Key”
Create or select a Google Cloud project
Copy the generated API key

Add to credentials.json

{
  "gemini": {
    "apiKey": "AIza..."
  }
}

When to Use Gemini

Benefits
Limitations

Free tier - Generous free quota for testing
Long context - Up to 1M tokens context window
Multimodal - Native image and video understanding
Cost-effective - Generally cheaper than GPT-4

Fallback Configuration

The system automatically uses Gemini as fallback if OpenAI fails:

try {
  // Try OpenAI first
  content = await generateWithOpenAI(prompt);
} catch (error) {
  // Fallback to Gemini
  if (credentials.gemini) {
    content = await generateWithGemini(prompt);
  }
}

Azure Speech Services

Azure provides high-quality neural voices for text-to-speech.

Create Azure Account

Create Speech Service

Click “Create a resource”
Search for “Speech”
Select your region (e.g., eastus)
Choose pricing tier:
- F0 (Free): 5M characters/month
- S0 (Standard): Pay-as-you-go

Get Credentials

Navigate to “Keys and Endpoint” and copy:

Key 1 (subscription key)
Region

Configure

{
  "azureSpeech": {
    "subscriptionKey": "your_key",
    "region": "eastus",
    "voice": "en-US-JennyNeural"
  }
}

Or use environment variables:

AZURE_SPEECH_KEY=your_subscription_key
AZURE_SPEECH_REGION=eastus
TTS_VOICE=en-US-JennyNeural

Voice Selection

Azure offers premium neural voices:

Female Voices

Voice	Characteristics	Best For
`en-US-JennyNeural`	Friendly, professional	General content, tutorials
`en-US-AriaNeural`	Clear, expressive	News, informative content
`en-US-AmberNeural`	Warm, conversational	Stories, personal vlogs
`en-US-AshleyNeural`	Young, energetic	Tech, gaming content
`en-US-SaraNeural`	Soft, storytelling	Narration, bedtime stories

Male Voices

Voice	Characteristics	Best For
`en-US-GuyNeural`	Professional, authoritative	Business, education
`en-US-DavisNeural`	Clear, trustworthy	News, documentaries
`en-US-TonyNeural`	News-anchor style	Formal content
`en-US-BrianNeural`	Friendly, approachable	Tutorials, how-tos
`en-US-AndrewNeural`	Warm, mature	Storytelling

TTS Priority

The system uses this priority order for TTS:

ElevenLabs (if configured) - Highest quality
Azure Speech (if configured) - High quality neural voices
OpenAI TTS (fallback) - Good quality, always available

ElevenLabs Configuration

ElevenLabs offers premium, ultra-realistic voice generation.

Create account at ElevenLabs

Choose Plan

Free: 10,000 characters/month
Starter: $5/month - 30,000 characters
Creator: $22/month - 100,000 characters
Pro: $99/month - 500,000 characters

Get API Key

Go to Profile Settings
Copy your API key

Select Voice

Browse Voice Library
Choose a voice
Copy the Voice ID

Configure

{
  "elevenLabs": {
    "apiKey": "your_api_key",
    "voiceId": "your_voice_id"
  }
}

Or:

ELEVENLABS_API_KEY=your_api_key
ELEVENLABS_VOICE_ID=your_voice_id

ElevenLabs provides the most natural-sounding voices and is recommended for premium channels focused on high production quality.

Replicate Configuration

Replicate provides access to advanced AI models including Stable Video Diffusion.

Create Account

Get API Token

Go to API Tokens
Create a new token
Copy it (starts with r8_)

Add to Configuration

{
  "replicate": {
    "apiKey": "r8_..."
  }
}

Or:

REPLICATE_API_KEY=r8_...

Models Used

The system uses:

Stable Video Diffusion - Convert images to video clips
Custom video generation models - For animated content

Replicate charges per prediction. Video generation can be expensive. Monitor your usage carefully.

Testing Your Configuration

After configuring AI providers, test the connections:

npm run credentials:setup

This will:

Validate all API keys
Test connections to each service
Report any configuration issues

Manual Testing

Test individual services:

OpenAI
Azure Speech
ElevenLabs

const { Configuration, OpenAIApi } = require('openai');
const config = new Configuration({
  apiKey: process.env.OPENAI_API_KEY
});
const openai = new OpenAIApi(config);

// Test
await openai.listModels();
console.log('✅ OpenAI connected');

const sdk = require('microsoft-cognitiveservices-speech-sdk');
const speechConfig = sdk.SpeechConfig.fromSubscription(
  process.env.AZURE_SPEECH_KEY,
  process.env.AZURE_SPEECH_REGION
);
console.log('✅ Azure Speech configured');

curl -X GET "https://api.elevenlabs.io/v1/voices" \
  -H "xi-api-key: YOUR_API_KEY"

Cost Optimization

Tips for managing AI service costs:

Use GPT-3.5 for Simple Tasks

Configure different models for different agent types:

{
  "agents": {
    "strategy": "gpt-4-turbo-preview",
    "script": "gpt-4-turbo-preview",
    "seo": "gpt-3.5-turbo",
    "production": "gpt-3.5-turbo"
  }
}

Cache Generated Content

The system automatically caches:

Generated scripts
Visual assets
Audio files

Reuse when possible to avoid regeneration costs.

Set Usage Limits

Configure limits in each provider’s dashboard:

OpenAI: Set monthly spending limits
Azure: Use free tier for testing
ElevenLabs: Choose appropriate plan
Replicate: Monitor per-prediction costs

Monitor Usage

Enable analytics to track AI costs:

ENABLE_ANALYTICS=true

View cost reports at /analytics/ai-costs

Best Practices

Use Environment-Specific Keys

Use different API keys for development and production to separate usage tracking.

Enable Fallbacks

Configure multiple providers so the system can continue operating if one fails.

Monitor Quotas

Regularly check API quotas and usage to avoid service interruptions.

Rotate Keys Periodically

Change API keys every few months for security best practices.

Next Steps

YouTube Setup

Complete YouTube API configuration and authentication

Get Started

Core Features

Configuration

Usage Guides

Deployment

Advanced

Overview

OpenAI Configuration

Available Models

OpenAI Features Used

1. Text Generation (GPT)

2. Image Generation (DALL-E 3)

3. Text-to-Speech (TTS)

Google Gemini Configuration

When to Use Gemini

Fallback Configuration

Azure Speech Services

Voice Selection

TTS Priority

ElevenLabs Configuration

Replicate Configuration

Models Used

Testing Your Configuration

Manual Testing

Cost Optimization

Best Practices

Use Environment-Specific Keys

Enable Fallbacks

Monitor Quotas

Rotate Keys Periodically

Next Steps

YouTube Setup

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Usage Guides

Deployment

Advanced

Documentation Index

​Overview

​OpenAI Configuration

​Available Models

​OpenAI Features Used

​1. Text Generation (GPT)

​2. Image Generation (DALL-E 3)

​3. Text-to-Speech (TTS)

​Google Gemini Configuration

​When to Use Gemini

​Fallback Configuration

​Azure Speech Services

​Voice Selection

​TTS Priority

​ElevenLabs Configuration

​Replicate Configuration

​Models Used

​Testing Your Configuration

​Manual Testing

​Cost Optimization

​Best Practices

Use Environment-Specific Keys

Enable Fallbacks

Monitor Quotas

Rotate Keys Periodically

​Next Steps

YouTube Setup

Build docs developers (and LLMs) love

Overview

OpenAI Configuration

Available Models

OpenAI Features Used

1. Text Generation (GPT)

2. Image Generation (DALL-E 3)

3. Text-to-Speech (TTS)

Google Gemini Configuration

When to Use Gemini

Fallback Configuration

Azure Speech Services

Voice Selection

TTS Priority

ElevenLabs Configuration

Replicate Configuration

Models Used

Testing Your Configuration

Manual Testing

Cost Optimization

Best Practices

Next Steps