Meeting summarization CLI

This example is a 100% local meeting summarization tool that runs on your machine using LiquidAI/LFM2-2.6B-Transcript, a small language model specialized in summarizing meeting transcripts, powered by llama.cpp for fast inference.

What it does

This tool provides a simple CLI to summarize meeting transcripts locally:

Processes meeting transcripts without sending data to any cloud service
Uses a specialized 2.6B parameter model optimized for meeting summaries
Streams tokens in real-time so you can see the summary being generated
Can be piped with an audio transcription model for a complete audio-to-summary pipeline

Prerequisites

Install uv

If you don’t have uv installed, follow the installation instructions.

macOS/Linux
Windows

curl -LsSf https://astral.sh/uv/install.sh | sh

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Quick start

Run with default transcript

Run the tool without cloning the repository using a uv run one-liner:

uv run https://raw.githubusercontent.com/Liquid4All/cookbook/refs/heads/main/examples/meeting-summarization/summarize.py

This uses the default example transcript.

Use a custom transcript

Pass a different transcript file using the --transcript-file argument (supports local files or HTTP/HTTPS URLs):

uv run https://raw.githubusercontent.com/Liquid4All/cookbook/refs/heads/main/examples/meeting-summarization/summarize.py \
  --transcript-file https://raw.githubusercontent.com/Liquid4All/cookbook/refs/heads/main/examples/meeting-summarization/transcripts/example_2.txt

Clone and customize (optional)

For deeper experimentation and code modification:

git clone https://github.com/Liquid4All/cookbook.git
cd cookbook/examples/meeting-summarization

Run with custom parameters:

uv run summarize.py \
  --model LiquidAI/LFM2-2.6B-Transcript-GGUF \
  --hf-model-file LFM2-2.6B-Transcript-1-GGUF.gguf \
  --transcript-file transcripts/example_1.txt

How it works

The CLI uses llama.cpp Python bindings to automatically download and build the llama.cpp binary optimized for your platform:

model = Llama(
    model_path="LiquidAI/LFM2-2.6B-Transcript-GGUF",
    n_ctx=8192,
    n_threads=4,
    verbose=False,
)

Tokens are streamed to the console in real-time:

stream = model.create_chat_completion(
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": transcript}
    ],
    max_tokens=2048,
    temperature=0.0,
    top_p=0.9,
    stream=True,  # Enable streaming
)

for chunk in stream:
    delta = chunk['choices'][0]['delta']
    if 'content' in delta:
        token = delta['content']
        summary_text += token
        console.print(token, end='', highlight=False)

The llama.cpp binary is automatically built and optimized for your platform during the first run, so no manual setup is required.

Next steps

Build a complete 2-step pipeline:

Use an audio transcription model to convert meeting audio to text
Pipe the transcript into this summarization tool

This entire workflow can run on your machine without any cloud services or API keys.

Source code

View the complete source code on GitHub.

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

Meeting summarization CLI

What it does

Prerequisites

Quick start

How it works

Next steps

Source code

Build docs developers (and LLMs) love

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

Documentation Index

​What it does

​Prerequisites

​Quick start

​How it works

​Next steps

​Source code

Build docs developers (and LLMs) love

What it does

Prerequisites

Quick start

How it works

Next steps

Source code