AI Integration

Overview

Echoes of the Past integrates three AI services to power real-time voice conversations:

Vapi AI: Orchestrates voice conversations with WebRTC
ElevenLabs: Generates realistic voice synthesis for historical figures
OpenAI: Provides language understanding and structured feedback generation

Vapi AI Integration

Setup

Vapi is initialized as a singleton client in lib/vapi.ts:

import Vapi from '@vapi-ai/web'

export const vapi = new Vapi(process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN)

Assistant Configuration

Each conversation creates a transient Vapi assistant with character-specific configuration:

const assistant: CreateAssistantDTO = {
  name: character.name,
  firstMessage: generateCallFirstMessage(character),
  model: {
    provider: 'openai',
    model: 'gpt-3.5-turbo',
    temperature: 0.7,
    messages: [
      {
        role: 'system',
        content: systemPrompt  // Character-specific prompt
      }
    ]
  },
  voice: {
    provider: '11labs',
    voiceId: character.voiceId,
    stability: 0.4,
    similarityBoost: 0.8,
    speed: 1,
    style: 0.5,
    useSpeakerBoost: true
  },
  messagePlan: {
    idleMessages: [
      'If you have a question, feel free to ask',
      'Are you there?',
      'What are you thinking? I can help you!!',
      "I'm here whenever you're ready to continue"
    ],
    idleTimeoutSeconds: 15,
    idleMessageMaxSpokenCount: 3,
    idleMessageResetCountOnUserSpeechEnabled: true
  },
  backgroundDenoisingEnabled: true
}

Voice Parameters Explained

stability

number (0-1)

default:"0.4"

Lower values add more variation and emotion; higher values are more consistent but less expressive

similarityBoost

number (0-1)

default:"0.8"

How closely the output matches the original voice sample. Higher values increase similarity

speed

number

default:"1"

Speech rate multiplier. 1.0 is normal speed

style

number (0-1)

default:"0.5"

Voice expressiveness. Higher values add more dramatic intonation

useSpeakerBoost

boolean

default:"true"

Enhances voice clarity, especially in noisy environments

Event Handling

The useVapi hook manages real-time events:

vapi.on('speech-start', () => setIsSpeechActive(true))
vapi.on('speech-end', () => setIsSpeechActive(false))
vapi.on('call-start', () => setCallStatus(CALL_STATUS.ACTIVE))
vapi.on('call-end', () => setCallStatus(CALL_STATUS.INACTIVE))
vapi.on('volume-level', (volume) => setAudioLevel(volume))
vapi.on('message', (message) => {
  if (message.type === 'transcript' && message.transcriptType === 'partial') {
    setActiveTranscript(message)  // Real-time transcription
  } else {
    setMessages(prev => [...prev, message])  // Final messages
  }
})
vapi.on('error', (e) => console.error(e))

Programmatic Question Injection

For quiz mode, questions can be injected into the conversation:

vapi.send({
  type: 'add-message',
  message: {
    role: 'system',
    content: `The user has pressed a button for you to ask him ${question}.`
  }
})

ElevenLabs Integration

Setup

ElevenLabs client is configured in lib/elevenlabs.ts:

import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js'

export const elevenlabs = new ElevenLabsClient({
  apiKey: process.env.ELEVEN_LABS_API_KEY
})

Voice Selection

Each historical figure has a pre-selected ElevenLabs voice ID stored in the historicalFigures.voiceId field. Voices are chosen to match:

Historical accuracy (e.g., British accent for Isaac Newton)
Gender and age of the figure
Tone and speaking style appropriate to their personality

Integration with Vapi

ElevenLabs is used as the voice provider in Vapi’s assistant configuration. Vapi handles the text-to-speech conversion automatically using the specified voiceId.

OpenAI Integration

Setup

OpenAI client is configured in lib/ai.ts:

import OpenAI from 'openai'

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  dangerouslyAllowBrowser: true  // For client-side usage
})

Dual Usage

Via Vapi (GPT-3.5 Turbo): Real-time conversation processing
Direct API (GPT-4 Turbo): Structured feedback generation

Feedback Generation

Implemented in features/call/lib/generate-feedback.ts:

const completion = await openai.chat.completions.create({
  model: 'gpt-4-turbo-preview',
  messages: [
    {
      role: 'system',
      content: `You are a professional interviewer analyzing a mock interview.
      
      You MUST return your response as a JSON object with the following structure:
      {
        "totalScore": number,
        "categoryScores": [
          {
            "name": "Communication Skills",
            "score": number,
            "comment": string
          },
          // ... 4 more categories
        ],
        "strengths": string[],
        "areasForImprovement": string[],
        "finalAssessment": string
      }`
    },
    {
      role: 'user',
      content: `Analyze this mock interview and provide detailed feedback.
      
      Transcript:
      ${formattedTranscript}
      
      Score the candidate from 0 to 100 in these categories:
      - Communication Skills
      - Technical Knowledge
      - Problem Solving
      - Cultural Fit
      - Confidence and Clarity`
    }
  ],
  response_format: { type: 'json_object' }
})

Rate Limiting

Feedback generation is limited to prevent API abuse:

const rateLimit = await redis.incr(`feedback-rate-limit:${user.id}`)

if (rateLimit > 10) {
  return {
    data: null,
    error: 'You have reached the maximum number of feedback per day'
  }
}

Response Validation

OpenAI responses are validated with Zod schemas:

import { feedbackSchema } from '@/schema'

const result = completion.choices[0].message.content
const parsedResult = feedbackSchema.parse(JSON.parse(result))

Prompt Engineering

Conversation Prompts

Defined in lib/prompt.ts, prompts are structured for optimal performance:

Structure

export const generateCallPrompt = (character: HistoricalFigure) => `
[Identity]
You are ${character.name}, a famous historical personality, speaking directly to the user in the present day.

[Style]
- Speak in a warm, informal, and conversational tone
- Use first-person perspective ("I")
- Sprinkle in era-appropriate humor and metaphors
- Add natural speech elements: pauses ("..."), hesitations ("uh", "well")
- Never sound robotic or overly formal

[Response Guidelines]
- Stay true to your biography, era, and cultural context
- Share personal anecdotes and lesser-known facts
- Handle criticism with reflection and grace
- Never say you are an AI or mention tools/functions
- Keep responses 1-3 paragraphs

[Configuration]
Historical Figure: ${character.name}
Time Period: ${formatDate(character.dateOfBirth)} to ${formatDate(character.dateOfDeath)}
Personality Traits: ${character.description}
Key Achievements: ${character.notableWork}
Signature Themes: ${character.category}
`

Key Design Principles

Clear Role Definition: Establishes identity and self-awareness
Style Guidelines: Natural speech patterns with emotional depth
Behavioral Constraints: Maintains immersion, avoids AI references
Context Injection: Historical dates, achievements, and personality
Category-Specific Themes: Domain imagery (e.g., apples for Newton)

First Message Generation

export function generateCallFirstMessage(character: HistoricalFigure): string {
  const bio = character.description.replace(/\s*\(\d{4}[-–]\d{4}\)\s*$/, '')
  const work = character.notableWork?.split(',')[0]?.trim() || ''
  
  let intro = `Hey, I'm ${character.name}! ${bio}`
  if (work) {
    intro += ` You might know me from "${work}."`
  }
  intro += ` Let's chat—ask me anything!`
  return intro.trim()
}

Quiz Prompts

export const generateQuizPrompt = (character: HistoricalFigure, questions: string[]) => `
[Identity]
You are ${character.name}, hosting a lively, in-character quiz about your life.

[Style]
- Speak casually and cheekily with your unmistakable personality
- Use era-appropriate humor
- Include natural speech patterns but don't overdo it

[Response Guidelines]
- Ask exactly ${questions.length} questions, one at a time
- Keep transitions short and natural
- Before the final question: "Here comes the final question—brace yourself!"
- After correct answers: confirm confidently and move on
- After incorrect answers: give one short hint, then reveal if still wrong
- After final question: give score summary and end warmly

[Questions]
${questions.map((q, i) => `${i + 1}. ${q}`).join('\n')}
`

Category-Specific Hooks

Fun introductions tailored by category:

const funnyHooks: Record<Enums<'categories'>, string> = {
  scientists: `Hope you've got your thinking cap on—preferably one with equations on it.`,
  philosophers: `Ready to question everything, including your last answer?`,
  artists: `Let's paint the quiz red—or at least try not to mess it up.`,
  leaders: `Command your thoughts wisely, the quiz battlefield awaits.`,
  others: `Let's see if you're smarter than you look. 😉`
}

Error Handling

Vapi Errors

vapi.on('error', (e: Error) => {
  setCallStatus(CALL_STATUS.INACTIVE)
  console.error('Vapi error:', e)
})

OpenAI Errors

try {
  const completion = await openai.chat.completions.create({...})
  const result = completion.choices[0].message.content
  if (!result) throw new Error('No feedback generated')
} catch (error) {
  return { data: null, error: 'Failed to generate feedback' }
}

Environment Validation

All AI services validate environment variables at initialization:

if (!process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN) {
  throw new Error('NEXT_PUBLIC_VAPI_WEB_TOKEN environment variable is required')
}

Performance Considerations

Model Selection

GPT-3.5 Turbo: Faster, cheaper for real-time conversation
GPT-4 Turbo: More accurate for complex structured feedback

WebRTC Benefits

Direct peer-to-peer audio reduces latency
No server-side audio processing required
Automatic bandwidth adaptation

Caching Strategy

Vapi assistants are ephemeral (not cached)
Character data cached by TanStack Query
Voice IDs stored in database for instant retrieval

Cost Optimization

Rate Limiting: 10 feedback requests per user per day
Model Selection: GPT-3.5 for conversation, GPT-4 only for feedback
Prompt Efficiency: Concise system prompts reduce token usage
Voice Caching: ElevenLabs voice clones used across all conversations

Architecture Overview - Full system architecture
Database Schema - Data models for AI-generated content

Get Started

Core Features

User Guide

Architecture

Overview

Vapi AI Integration

Setup

Assistant Configuration

Voice Parameters Explained

Event Handling

Programmatic Question Injection

ElevenLabs Integration

Setup

Voice Selection

Integration with Vapi

OpenAI Integration

Setup

Dual Usage

Feedback Generation

Rate Limiting

Response Validation

Prompt Engineering

Conversation Prompts

Structure

Key Design Principles

First Message Generation

Quiz Prompts

Category-Specific Hooks

Error Handling

Vapi Errors

OpenAI Errors

Environment Validation

Performance Considerations

Model Selection

WebRTC Benefits

Caching Strategy

Cost Optimization

Build docs developers (and LLMs) love

Get Started

Core Features

User Guide

Architecture

​Overview

​Vapi AI Integration

​Setup

​Assistant Configuration

​Voice Parameters Explained

​Event Handling

​Programmatic Question Injection

​ElevenLabs Integration

​Setup

​Voice Selection

​Integration with Vapi

​OpenAI Integration

​Setup

​Dual Usage

​Feedback Generation

​Rate Limiting

​Response Validation

​Prompt Engineering

​Conversation Prompts

​Structure

​Key Design Principles

​First Message Generation

​Quiz Prompts

​Category-Specific Hooks

​Error Handling

​Vapi Errors

​OpenAI Errors

​Environment Validation

​Performance Considerations

​Model Selection

​WebRTC Benefits

​Caching Strategy

​Cost Optimization

​Related Documentation

Build docs developers (and LLMs) love

Overview

Vapi AI Integration

Setup

Assistant Configuration

Voice Parameters Explained

Event Handling

Programmatic Question Injection

ElevenLabs Integration

Setup

Voice Selection

Integration with Vapi

OpenAI Integration

Setup

Dual Usage

Feedback Generation

Rate Limiting

Response Validation

Prompt Engineering

Conversation Prompts

Structure

Key Design Principles

First Message Generation

Quiz Prompts

Category-Specific Hooks

Error Handling

Vapi Errors

OpenAI Errors

Environment Validation

Performance Considerations

Model Selection

WebRTC Benefits

Caching Strategy

Cost Optimization

Related Documentation