Voice chat

A voice chat application that uses the LFM2-Audio-1.5B model to generate conversational audio responses. This application could work 100% locally, but the liquid-audio library requires CUDA. This is why the model is wrapped inside a Modal function and deployed to a serverless GPU environment with CUDA, so you can run it even if you don’t have an NVIDIA GPU at home. Record your voice, send it for processing, and receive an audio response that plays automatically.

What’s in this example?

This example demonstrates how to build an interactive voice chat application using:

Audio Recording: Records audio from your microphone with automatic silence detection
Cloud Processing: Uses Modal to run the LFM2-Audio-1.5B model on GPU in the cloud
Audio Playback: Automatically plays the generated audio response

The application works by:

Recording your voice question from the microphone (with auto-stop on silence)
Uploading the audio to a Modal volume
Processing the audio with LFM2-Audio-1.5B on a GPU instance to generate an interleaved text and audio response
Downloading the generated audio response
Playing the response through your speakers

The model generates responses that can include both text and audio tokens, creating a natural conversational experience.

Tools

This example uses the following tools and libraries:

liquid-audio: Python library for working with LFM2-Audio models
Modal: Serverless cloud platform for running GPU workloads
PyAudio: Cross-platform audio I/O library for recording from microphone
pygame: Audio playback library
torchaudio: Audio processing utilities for PyTorch
rich: Terminal formatting library for audio visualization

Prerequisites

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Set up Modal

Create a Modal account at modal.com
Install the Modal CLI: uv add modal
Authenticate: uv run modal token new

Grant microphone permissions

Ensure microphone permissions are granted to your terminal/IDE

How to run it

Deploy the server (first time only)

make deploy-server

Or directly:

uv run modal deploy -m src.voice_chat.server

Run the client

make run

Or directly:

uv run modal run -m src.voice_chat.client

Follow the prompts

The application will start recording when you run it
Speak your question into the microphone
Recording will automatically stop after 2 seconds of silence
The audio will be uploaded to Modal and processed
The generated audio response will be downloaded and played automatically

The generated audio file will be saved locally as answer_YYYYMMDD_HHMMSS.wav for each session.

Source code

View the complete source code on GitHub.

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

What’s in this example?

Tools

Prerequisites

How to run it

Source code

Build docs developers (and LLMs) love

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

Documentation Index

​What’s in this example?

​Tools

​Prerequisites

​How to run it

​Source code

Build docs developers (and LLMs) love

What’s in this example?

Tools

Prerequisites

How to run it

Source code