Skip to main content

Overview

Echoes of the Past integrates three AI services to power real-time voice conversations:
  1. Vapi AI: Orchestrates voice conversations with WebRTC
  2. ElevenLabs: Generates realistic voice synthesis for historical figures
  3. OpenAI: Provides language understanding and structured feedback generation

Vapi AI Integration

Setup

Vapi is initialized as a singleton client in lib/vapi.ts:
import Vapi from '@vapi-ai/web'

export const vapi = new Vapi(process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN)

Assistant Configuration

Each conversation creates a transient Vapi assistant with character-specific configuration:
const assistant: CreateAssistantDTO = {
  name: character.name,
  firstMessage: generateCallFirstMessage(character),
  model: {
    provider: 'openai',
    model: 'gpt-3.5-turbo',
    temperature: 0.7,
    messages: [
      {
        role: 'system',
        content: systemPrompt  // Character-specific prompt
      }
    ]
  },
  voice: {
    provider: '11labs',
    voiceId: character.voiceId,
    stability: 0.4,
    similarityBoost: 0.8,
    speed: 1,
    style: 0.5,
    useSpeakerBoost: true
  },
  messagePlan: {
    idleMessages: [
      'If you have a question, feel free to ask',
      'Are you there?',
      'What are you thinking? I can help you!!',
      "I'm here whenever you're ready to continue"
    ],
    idleTimeoutSeconds: 15,
    idleMessageMaxSpokenCount: 3,
    idleMessageResetCountOnUserSpeechEnabled: true
  },
  backgroundDenoisingEnabled: true
}

Voice Parameters Explained

stability
number (0-1)
default:"0.4"
Lower values add more variation and emotion; higher values are more consistent but less expressive
similarityBoost
number (0-1)
default:"0.8"
How closely the output matches the original voice sample. Higher values increase similarity
speed
number
default:"1"
Speech rate multiplier. 1.0 is normal speed
style
number (0-1)
default:"0.5"
Voice expressiveness. Higher values add more dramatic intonation
useSpeakerBoost
boolean
default:"true"
Enhances voice clarity, especially in noisy environments

Event Handling

The useVapi hook manages real-time events:
vapi.on('speech-start', () => setIsSpeechActive(true))
vapi.on('speech-end', () => setIsSpeechActive(false))
vapi.on('call-start', () => setCallStatus(CALL_STATUS.ACTIVE))
vapi.on('call-end', () => setCallStatus(CALL_STATUS.INACTIVE))
vapi.on('volume-level', (volume) => setAudioLevel(volume))
vapi.on('message', (message) => {
  if (message.type === 'transcript' && message.transcriptType === 'partial') {
    setActiveTranscript(message)  // Real-time transcription
  } else {
    setMessages(prev => [...prev, message])  // Final messages
  }
})
vapi.on('error', (e) => console.error(e))

Programmatic Question Injection

For quiz mode, questions can be injected into the conversation:
vapi.send({
  type: 'add-message',
  message: {
    role: 'system',
    content: `The user has pressed a button for you to ask him ${question}.`
  }
})

ElevenLabs Integration

Setup

ElevenLabs client is configured in lib/elevenlabs.ts:
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js'

export const elevenlabs = new ElevenLabsClient({
  apiKey: process.env.ELEVEN_LABS_API_KEY
})

Voice Selection

Each historical figure has a pre-selected ElevenLabs voice ID stored in the historicalFigures.voiceId field. Voices are chosen to match:
  • Historical accuracy (e.g., British accent for Isaac Newton)
  • Gender and age of the figure
  • Tone and speaking style appropriate to their personality

Integration with Vapi

ElevenLabs is used as the voice provider in Vapi’s assistant configuration. Vapi handles the text-to-speech conversion automatically using the specified voiceId.

OpenAI Integration

Setup

OpenAI client is configured in lib/ai.ts:
import OpenAI from 'openai'

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  dangerouslyAllowBrowser: true  // For client-side usage
})

Dual Usage

  1. Via Vapi (GPT-3.5 Turbo): Real-time conversation processing
  2. Direct API (GPT-4 Turbo): Structured feedback generation

Feedback Generation

Implemented in features/call/lib/generate-feedback.ts:
const completion = await openai.chat.completions.create({
  model: 'gpt-4-turbo-preview',
  messages: [
    {
      role: 'system',
      content: `You are a professional interviewer analyzing a mock interview.
      
      You MUST return your response as a JSON object with the following structure:
      {
        "totalScore": number,
        "categoryScores": [
          {
            "name": "Communication Skills",
            "score": number,
            "comment": string
          },
          // ... 4 more categories
        ],
        "strengths": string[],
        "areasForImprovement": string[],
        "finalAssessment": string
      }`
    },
    {
      role: 'user',
      content: `Analyze this mock interview and provide detailed feedback.
      
      Transcript:
      ${formattedTranscript}
      
      Score the candidate from 0 to 100 in these categories:
      - Communication Skills
      - Technical Knowledge
      - Problem Solving
      - Cultural Fit
      - Confidence and Clarity`
    }
  ],
  response_format: { type: 'json_object' }
})

Rate Limiting

Feedback generation is limited to prevent API abuse:
const rateLimit = await redis.incr(`feedback-rate-limit:${user.id}`)

if (rateLimit > 10) {
  return {
    data: null,
    error: 'You have reached the maximum number of feedback per day'
  }
}

Response Validation

OpenAI responses are validated with Zod schemas:
import { feedbackSchema } from '@/schema'

const result = completion.choices[0].message.content
const parsedResult = feedbackSchema.parse(JSON.parse(result))

Prompt Engineering

Conversation Prompts

Defined in lib/prompt.ts, prompts are structured for optimal performance:

Structure

export const generateCallPrompt = (character: HistoricalFigure) => `
[Identity]
You are ${character.name}, a famous historical personality, speaking directly to the user in the present day.

[Style]
- Speak in a warm, informal, and conversational tone
- Use first-person perspective ("I")
- Sprinkle in era-appropriate humor and metaphors
- Add natural speech elements: pauses ("..."), hesitations ("uh", "well")
- Never sound robotic or overly formal

[Response Guidelines]
- Stay true to your biography, era, and cultural context
- Share personal anecdotes and lesser-known facts
- Handle criticism with reflection and grace
- Never say you are an AI or mention tools/functions
- Keep responses 1-3 paragraphs

[Configuration]
Historical Figure: ${character.name}
Time Period: ${formatDate(character.dateOfBirth)} to ${formatDate(character.dateOfDeath)}
Personality Traits: ${character.description}
Key Achievements: ${character.notableWork}
Signature Themes: ${character.category}
`

Key Design Principles

  1. Clear Role Definition: Establishes identity and self-awareness
  2. Style Guidelines: Natural speech patterns with emotional depth
  3. Behavioral Constraints: Maintains immersion, avoids AI references
  4. Context Injection: Historical dates, achievements, and personality
  5. Category-Specific Themes: Domain imagery (e.g., apples for Newton)

First Message Generation

export function generateCallFirstMessage(character: HistoricalFigure): string {
  const bio = character.description.replace(/\s*\(\d{4}[-–]\d{4}\)\s*$/, '')
  const work = character.notableWork?.split(',')[0]?.trim() || ''
  
  let intro = `Hey, I'm ${character.name}! ${bio}`
  if (work) {
    intro += ` You might know me from "${work}."`
  }
  intro += ` Let's chat—ask me anything!`
  return intro.trim()
}

Quiz Prompts

export const generateQuizPrompt = (character: HistoricalFigure, questions: string[]) => `
[Identity]
You are ${character.name}, hosting a lively, in-character quiz about your life.

[Style]
- Speak casually and cheekily with your unmistakable personality
- Use era-appropriate humor
- Include natural speech patterns but don't overdo it

[Response Guidelines]
- Ask exactly ${questions.length} questions, one at a time
- Keep transitions short and natural
- Before the final question: "Here comes the final question—brace yourself!"
- After correct answers: confirm confidently and move on
- After incorrect answers: give one short hint, then reveal if still wrong
- After final question: give score summary and end warmly

[Questions]
${questions.map((q, i) => `${i + 1}. ${q}`).join('\n')}
`

Category-Specific Hooks

Fun introductions tailored by category:
const funnyHooks: Record<Enums<'categories'>, string> = {
  scientists: `Hope you've got your thinking cap on—preferably one with equations on it.`,
  philosophers: `Ready to question everything, including your last answer?`,
  artists: `Let's paint the quiz red—or at least try not to mess it up.`,
  leaders: `Command your thoughts wisely, the quiz battlefield awaits.`,
  others: `Let's see if you're smarter than you look. 😉`
}

Error Handling

Vapi Errors

vapi.on('error', (e: Error) => {
  setCallStatus(CALL_STATUS.INACTIVE)
  console.error('Vapi error:', e)
})

OpenAI Errors

try {
  const completion = await openai.chat.completions.create({...})
  const result = completion.choices[0].message.content
  if (!result) throw new Error('No feedback generated')
} catch (error) {
  return { data: null, error: 'Failed to generate feedback' }
}

Environment Validation

All AI services validate environment variables at initialization:
if (!process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN) {
  throw new Error('NEXT_PUBLIC_VAPI_WEB_TOKEN environment variable is required')
}

Performance Considerations

Model Selection

  • GPT-3.5 Turbo: Faster, cheaper for real-time conversation
  • GPT-4 Turbo: More accurate for complex structured feedback

WebRTC Benefits

  • Direct peer-to-peer audio reduces latency
  • No server-side audio processing required
  • Automatic bandwidth adaptation

Caching Strategy

  • Vapi assistants are ephemeral (not cached)
  • Character data cached by TanStack Query
  • Voice IDs stored in database for instant retrieval

Cost Optimization

  1. Rate Limiting: 10 feedback requests per user per day
  2. Model Selection: GPT-3.5 for conversation, GPT-4 only for feedback
  3. Prompt Efficiency: Concise system prompts reduce token usage
  4. Voice Caching: ElevenLabs voice clones used across all conversations

Build docs developers (and LLMs) love