Introduction to Gambiarra

Gambiarra is a local-first LLM sharing system that allows multiple users on a network to pool their LLM resources together. Think of it as a “LLM Club” where everyone shares their Ollama, LM Studio, LocalAI, or any OpenAI-compatible endpoint.

Quickstart

Get up and running with Gambiarra in under 5 minutes

Installation

Install the CLI and SDK for your environment

CLI commands

Learn about all available CLI commands

SDK reference

Integrate Gambiarra with the Vercel AI SDK

Why Gambiarra?

If you’re working with local LLMs, you know the challenge: your gaming PC has a powerful GPU running Ollama, but your laptop doesn’t. Your teammate has a different model you’d like to try. Gambiarra solves this by creating a shared pool of LLM resources on your local network.

Key features

Local-first

Your data stays on your network. No cloud services, no external dependencies.

Universal compatibility

Works with any OpenAI-compatible API: Ollama, LM Studio, LocalAI, vLLM, and more.

Vercel AI SDK integration

Drop-in replacement for your existing AI SDK workflows.

Auto-discovery

mDNS/Bonjour support for zero-config networking.

Real-time monitoring

Beautiful Terminal UI for tracking room activity and participant health.

Production ready

Built with TypeScript, Bun, and modern tooling for reliability.

Use cases

Development teams

Share expensive LLM endpoints across your team. Let your team access your high-powered GPU server without giving everyone SSH access.

// Developer A shares their powerful GPU
// Terminal 1: Start hub and join with Ollama
$ gambiarra serve --mdns
$ gambiarra create --name "Team Room"
Room created! Code: ABC123

$ gambiarra join ABC123 --model llama3 --endpoint http://localhost:11434

// Developer B uses the shared model from their laptop
// Terminal 2: Use the SDK
import { createGambiarra } from "gambiarra-sdk";
import { generateText } from "ai";

const gambiarra = createGambiarra({ roomCode: "ABC123" });
const result = await generateText({
  model: gambiarra.any(),
  prompt: "Write a function to sort an array",
});

Hackathons

Pool resources for AI projects. Everyone brings their laptop, and collectively you have access to multiple models running on different machines.

Research labs

Coordinate LLM access across multiple workstations. Each researcher can contribute their local models to a shared pool.

Home labs

Share your gaming PC’s LLM with your laptop. Run the heavy model on your desktop, access it from anywhere on your network.

Education

Classroom environments where students share compute resources. The instructor’s machine runs the models, students access them for assignments.

How it works

Gambiarra uses a simple HTTP + SSE architecture for universal compatibility:

┌─────────────────────────────────────────────────────────────┐
│                    GAMBIARRA HUB (HTTP)                     │
│                                                             │
│  Endpoints:                                                 │
│  • POST   /rooms                    (Create room)          │
│  • GET    /rooms                    (List rooms)           │
│  • POST   /rooms/:code/join         (Join room)            │
│  • POST   /rooms/:code/v1/chat/completions (Proxy)        │
│  • GET    /rooms/:code/events       (SSE updates)          │
└─────────────────────────────────────────────────────────────┘
       ▲                    ▲                      ▲
       │ HTTP               │ HTTP                 │ SSE
       │                    │                      │
  ┌────┴────┐    ┌─────────┴────────┐      ┌──────┴─────┐
  │   SDK   │    │  Participants    │      │    TUI     │
  └─────────┘    └──────────────────┘      └────────────┘

Core components

Hub
Central HTTP server that routes requests and manages rooms. Can run on any machine on your network. Participants
LLM endpoints registered in a room. Each participant exposes an OpenAI-compatible API (Ollama, LM Studio, etc.). SDK
Vercel AI SDK provider that proxies requests to the hub. Use it in your applications just like any other AI SDK provider. CLI
Command-line tool for starting hubs, creating rooms, and joining as a participant. TUI
Real-time monitoring interface using Server-Sent Events. Track participant health, model usage, and room activity.

Model routing

The SDK provides three ways to route requests:

Pattern	Example	Description
Participant ID	`gambiarra.participant("joao")`	Route to specific participant
Model name	`gambiarra.model("llama3")`	Route to first participant with this model
Any	`gambiarra.any()`	Route to random online participant

import { createGambiarra } from "gambiarra-sdk";
import { generateText } from "ai";

const gambiarra = createGambiarra({ roomCode: "ABC123" });

const result = await generateText({
  model: gambiarra.participant("joao"),
  prompt: "Hello from a specific participant!",
});

Architecture highlights

Health checking

Participants automatically send health checks every 10 seconds. If a participant doesn’t respond for 30 seconds, it’s marked offline. This ensures your application always routes to available models.

// Health checks happen automatically in packages/cli/src/commands/join.ts:182
const healthInterval = setInterval(async () => {
  const response = await fetch(`${hubUrl}/rooms/${code}/health`, {
    method: "POST",
    body: JSON.stringify({ id: participantId }),
  });
}, HEALTH_CHECK_INTERVAL); // 10 seconds

OpenAI compatibility

Gambiarra acts as a transparent proxy for OpenAI-compatible requests. Your existing code works without modification:

// packages/core/src/hub.ts:269
const targetUrl = `${participant.endpoint}/v1/chat/completions`;

const response = await fetch(targetUrl, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ ...body, model: participant.model }),
});

Streaming support

Full support for streaming responses using Server-Sent Events:

import { createGambiarra } from "gambiarra-sdk";
import { streamText } from "ai";

const gambiarra = createGambiarra({ roomCode: "ABC123" });

const stream = await streamText({
  model: gambiarra.model("llama3"),
  prompt: "Write a story about AI",
});

for await (const chunk of stream.textStream) {
  process.stdout.write(chunk);
}

Security considerations

Gambiarra is designed for trusted local networks. It currently has no built-in authentication and uses plain HTTP.

Best practices for production use:

Run on an isolated network (VPN, WireGuard, or air-gapped)
Use a reverse proxy (Caddy, Nginx) for HTTPS and authentication

Enable password protection for rooms when creating them:

gambiarra create --name "Secure Room" --password mySecretPass

Consider network-level security (firewall rules, VLANs)

Supported providers

Gambiarra works with any OpenAI-compatible API:

Provider	Default endpoint	Notes
Ollama	`http://localhost:11434`	Most popular local LLM server
LM Studio	`http://localhost:1234`	GUI-based LLM management
LocalAI	`http://localhost:8080`	Self-hosted OpenAI alternative
vLLM	`http://localhost:8000`	High-performance inference
text-generation-webui	`http://localhost:5000`	Gradio-based interface
Custom	Any URL	Any OpenAI-compatible endpoint

Next steps

Installation

Install the CLI and SDK

Quickstart

Get started in 5 minutes

CLI reference

Learn all CLI commands

SDK reference

Integrate with your app

Get Started

CLI Commands

SDK

Terminal UI

Guides

Advanced

Introduction to Gambiarra

Quickstart

Installation

CLI commands

SDK reference

Why Gambiarra?

Key features

Local-first

Universal compatibility

Vercel AI SDK integration

Auto-discovery

Real-time monitoring

Production ready

Use cases

Development teams

Hackathons

Research labs

Home labs

Education

How it works

Core components

Model routing

Architecture highlights

Health checking

OpenAI compatibility

Streaming support

Security considerations

Supported providers

Next steps

Installation

Quickstart

CLI reference

SDK reference

Build docs developers (and LLMs) love

Get Started

CLI Commands

SDK

Terminal UI

Guides

Advanced

Quickstart

Installation

CLI commands

SDK reference

​Why Gambiarra?

​Key features

Local-first

Universal compatibility

Vercel AI SDK integration

Auto-discovery

Real-time monitoring

Production ready

​Use cases

​Development teams

​Hackathons

​Research labs

​Home labs

​Education

​How it works

​Core components

​Model routing

​Architecture highlights

​Health checking

​OpenAI compatibility

​Streaming support

​Security considerations

​Supported providers

​Next steps

Installation

Quickstart

CLI reference

SDK reference

Build docs developers (and LLMs) love

Why Gambiarra?

Key features

Use cases

Development teams

Hackathons

Research labs

Home labs

Education

How it works

Core components

Model routing

Architecture highlights

Health checking

OpenAI compatibility

Streaming support

Security considerations

Supported providers

Next steps