Skip to main content
Gambiarra is a local-first LLM sharing system that allows multiple users on a network to pool their LLM resources together. Think of it as a “LLM Club” where everyone shares their Ollama, LM Studio, LocalAI, or any OpenAI-compatible endpoint.

Quickstart

Get up and running with Gambiarra in under 5 minutes

Installation

Install the CLI and SDK for your environment

CLI commands

Learn about all available CLI commands

SDK reference

Integrate Gambiarra with the Vercel AI SDK

Why Gambiarra?

If you’re working with local LLMs, you know the challenge: your gaming PC has a powerful GPU running Ollama, but your laptop doesn’t. Your teammate has a different model you’d like to try. Gambiarra solves this by creating a shared pool of LLM resources on your local network.

Key features

Local-first

Your data stays on your network. No cloud services, no external dependencies.

Universal compatibility

Works with any OpenAI-compatible API: Ollama, LM Studio, LocalAI, vLLM, and more.

Vercel AI SDK integration

Drop-in replacement for your existing AI SDK workflows.

Auto-discovery

mDNS/Bonjour support for zero-config networking.

Real-time monitoring

Beautiful Terminal UI for tracking room activity and participant health.

Production ready

Built with TypeScript, Bun, and modern tooling for reliability.

Use cases

Development teams

Share expensive LLM endpoints across your team. Let your team access your high-powered GPU server without giving everyone SSH access.
// Developer A shares their powerful GPU
// Terminal 1: Start hub and join with Ollama
$ gambiarra serve --mdns
$ gambiarra create --name "Team Room"
Room created! Code: ABC123

$ gambiarra join ABC123 --model llama3 --endpoint http://localhost:11434

// Developer B uses the shared model from their laptop
// Terminal 2: Use the SDK
import { createGambiarra } from "gambiarra-sdk";
import { generateText } from "ai";

const gambiarra = createGambiarra({ roomCode: "ABC123" });
const result = await generateText({
  model: gambiarra.any(),
  prompt: "Write a function to sort an array",
});

Hackathons

Pool resources for AI projects. Everyone brings their laptop, and collectively you have access to multiple models running on different machines.

Research labs

Coordinate LLM access across multiple workstations. Each researcher can contribute their local models to a shared pool.

Home labs

Share your gaming PC’s LLM with your laptop. Run the heavy model on your desktop, access it from anywhere on your network.

Education

Classroom environments where students share compute resources. The instructor’s machine runs the models, students access them for assignments.

How it works

Gambiarra uses a simple HTTP + SSE architecture for universal compatibility:
┌─────────────────────────────────────────────────────────────┐
│                    GAMBIARRA HUB (HTTP)                     │
│                                                             │
│  Endpoints:                                                 │
│  • POST   /rooms                    (Create room)          │
│  • GET    /rooms                    (List rooms)           │
│  • POST   /rooms/:code/join         (Join room)            │
│  • POST   /rooms/:code/v1/chat/completions (Proxy)        │
│  • GET    /rooms/:code/events       (SSE updates)          │
└─────────────────────────────────────────────────────────────┘
       ▲                    ▲                      ▲
       │ HTTP               │ HTTP                 │ SSE
       │                    │                      │
  ┌────┴────┐    ┌─────────┴────────┐      ┌──────┴─────┐
  │   SDK   │    │  Participants    │      │    TUI     │
  └─────────┘    └──────────────────┘      └────────────┘

Core components

Hub
Central HTTP server that routes requests and manages rooms. Can run on any machine on your network.
Participants
LLM endpoints registered in a room. Each participant exposes an OpenAI-compatible API (Ollama, LM Studio, etc.).
SDK
Vercel AI SDK provider that proxies requests to the hub. Use it in your applications just like any other AI SDK provider.
CLI
Command-line tool for starting hubs, creating rooms, and joining as a participant.
TUI
Real-time monitoring interface using Server-Sent Events. Track participant health, model usage, and room activity.

Model routing

The SDK provides three ways to route requests:
PatternExampleDescription
Participant IDgambiarra.participant("joao")Route to specific participant
Model namegambiarra.model("llama3")Route to first participant with this model
Anygambiarra.any()Route to random online participant
import { createGambiarra } from "gambiarra-sdk";
import { generateText } from "ai";

const gambiarra = createGambiarra({ roomCode: "ABC123" });

const result = await generateText({
  model: gambiarra.participant("joao"),
  prompt: "Hello from a specific participant!",
});

Architecture highlights

Health checking

Participants automatically send health checks every 10 seconds. If a participant doesn’t respond for 30 seconds, it’s marked offline. This ensures your application always routes to available models.
// Health checks happen automatically in packages/cli/src/commands/join.ts:182
const healthInterval = setInterval(async () => {
  const response = await fetch(`${hubUrl}/rooms/${code}/health`, {
    method: "POST",
    body: JSON.stringify({ id: participantId }),
  });
}, HEALTH_CHECK_INTERVAL); // 10 seconds

OpenAI compatibility

Gambiarra acts as a transparent proxy for OpenAI-compatible requests. Your existing code works without modification:
// packages/core/src/hub.ts:269
const targetUrl = `${participant.endpoint}/v1/chat/completions`;

const response = await fetch(targetUrl, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ ...body, model: participant.model }),
});

Streaming support

Full support for streaming responses using Server-Sent Events:
import { createGambiarra } from "gambiarra-sdk";
import { streamText } from "ai";

const gambiarra = createGambiarra({ roomCode: "ABC123" });

const stream = await streamText({
  model: gambiarra.model("llama3"),
  prompt: "Write a story about AI",
});

for await (const chunk of stream.textStream) {
  process.stdout.write(chunk);
}

Security considerations

Gambiarra is designed for trusted local networks. It currently has no built-in authentication and uses plain HTTP.
Best practices for production use:
  • Run on an isolated network (VPN, WireGuard, or air-gapped)
  • Use a reverse proxy (Caddy, Nginx) for HTTPS and authentication
  • Enable password protection for rooms when creating them:
    gambiarra create --name "Secure Room" --password mySecretPass
    
  • Consider network-level security (firewall rules, VLANs)

Supported providers

Gambiarra works with any OpenAI-compatible API:
ProviderDefault endpointNotes
Ollamahttp://localhost:11434Most popular local LLM server
LM Studiohttp://localhost:1234GUI-based LLM management
LocalAIhttp://localhost:8080Self-hosted OpenAI alternative
vLLMhttp://localhost:8000High-performance inference
text-generation-webuihttp://localhost:5000Gradio-based interface
CustomAny URLAny OpenAI-compatible endpoint

Next steps

Installation

Install the CLI and SDK

Quickstart

Get started in 5 minutes

CLI reference

Learn all CLI commands

SDK reference

Integrate with your app

Build docs developers (and LLMs) love