Skip to main content

Overview

Gambiarra works with any OpenAI-compatible API endpoint. This includes popular local LLM servers and any custom implementation that follows the OpenAI chat completions API specification.

Supported Providers

The following table lists officially tested and supported LLM providers:
ProviderDefault EndpointNotes
Ollamahttp://localhost:11434Most popular local LLM server
LM Studiohttp://localhost:1234GUI-based LLM management
LocalAIhttp://localhost:8080Self-hosted OpenAI alternative
vLLMhttp://localhost:8000High-performance inference
text-generation-webuihttp://localhost:5000Gradio-based interface
CustomAny URLAny OpenAI-compatible endpoint

Provider Configuration

Ollama

Ollama is the most commonly used provider with Gambiarra. It exposes models through both its native API (/api/tags) and OpenAI-compatible endpoints. Configuration:
gambiarra join ABC123 \
  --endpoint http://localhost:11434 \
  --model llama3 \
  --nickname "My Ollama Server"
Key Features:
  • Automatic model discovery via /api/tags
  • Native support for model pulling and management
  • GPU acceleration with CUDA/ROCm
  • Supports most popular open-source models
Endpoint Structure:
  • Native API: http://localhost:11434/api/*
  • OpenAI-compatible: http://localhost:11434/v1/*
Gambiarra automatically detects Ollama models via the /api/tags endpoint during the join process.

LM Studio

LM Studio provides a desktop GUI for managing and running LLMs locally. Configuration:
gambiarra join ABC123 \
  --endpoint http://localhost:1234 \
  --model mistral-7b \
  --nickname "LM Studio"
Key Features:
  • User-friendly GUI for model management
  • Built-in model downloader
  • Hardware acceleration support
  • OpenAI-compatible API by default
Considerations:
  • Ensure the LM Studio server is running before joining
  • Model names should match what’s loaded in LM Studio
  • Check the server settings in LM Studio for the correct port

LocalAI

LocalAI is a drop-in replacement for OpenAI API that runs locally. Configuration:
gambiarra join ABC123 \
  --endpoint http://localhost:8080 \
  --model gpt-3.5-turbo \
  --nickname "LocalAI"
Key Features:
  • Full OpenAI API compatibility
  • Supports multiple model formats (GGML, GGUF, etc.)
  • Audio transcription and image generation
  • Docker-ready deployment
Endpoint Structure:
  • OpenAI-compatible: http://localhost:8080/v1/*

vLLM

vLLM is a high-performance inference engine optimized for serving LLMs. Configuration:
gambiarra join ABC123 \
  --endpoint http://localhost:8000 \
  --model meta-llama/Llama-2-7b-chat-hf \
  --nickname "vLLM Server"
Key Features:
  • High throughput and low latency
  • PagedAttention for efficient memory management
  • OpenAI-compatible API
  • Continuous batching
Considerations:
  • Model names often use HuggingFace format
  • Requires GPU with sufficient VRAM
  • Best for production deployments

text-generation-webui

A Gradio-based web interface for running LLMs with OpenAI API extension. Configuration:
gambiarra join ABC123 \
  --endpoint http://localhost:5000 \
  --model vicuna-13b \
  --nickname "WebUI"
Key Features:
  • Web-based interface with multiple extensions
  • OpenAI API extension available
  • Supports various model formats
  • Character/chat mode
Considerations:
  • Must enable the OpenAI API extension
  • Check the extensions tab for API settings
  • Default port may vary based on configuration

Custom Providers

Any service implementing the OpenAI chat completions API can be used. Required Endpoint:
POST /v1/chat/completions
Minimum Request Format:
{
  "model": "string",
  "messages": [
    {
      "role": "user",
      "content": "string"
    }
  ]
}
Response Format:
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "string",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "string"
      },
      "finish_reason": "stop"
    }
  ]
}
For custom providers, ensure your endpoint responds to both /v1/models (for model listing) and /v1/chat/completions (for inference).

Model Discovery

Gambiarra attempts to discover models from your endpoint during the join process:
  1. Ollama Format: Tries GET /api/tags
  2. OpenAI Format: Tries GET /v1/models
See packages/cli/src/commands/join.ts:24-50 for implementation details.

Generation Parameters

Gambiarra supports standard OpenAI-compatible generation parameters:
ParameterTypeRangeDescription
temperaturenumber0-2Controls randomness
top_pnumber0-1Nucleus sampling
max_tokensnumber-Maximum tokens to generate
stopstring[]-Stop sequences
frequency_penaltynumber-2 to 2Penalize token frequency
presence_penaltynumber-2 to 2Penalize token presence
seednumber-Deterministic generation
Example with SDK:
import { createGambiarra } from "gambiarra-sdk";
import { generateText } from "ai";

const gambiarra = createGambiarra({ roomCode: "ABC123" });

const result = await generateText({
  model: gambiarra.any(),
  prompt: "Write a haiku",
  temperature: 0.7,
  max_tokens: 100,
  stop: ["\n\n"],
});
See packages/core/src/types.ts:22-31 for the full schema.

Provider-Specific Considerations

Endpoint Availability

Ensure your LLM server is accessible from the machine running the Gambiarra participant. If running in Docker or VMs, configure network bridges appropriately.

Model Names

Different providers use different model naming conventions:
  • Ollama: Simple names like llama3, mistral
  • vLLM: HuggingFace format like meta-llama/Llama-2-7b-chat-hf
  • LM Studio: Display names from the GUI
  • LocalAI: Custom aliases defined in configuration

Performance

Hardware Requirements:
  • CPU-only: 8-16GB RAM minimum, very slow inference
  • GPU (8GB VRAM): Good for 7B models
  • GPU (16GB+ VRAM): Can run 13B+ models
  • GPU (24GB+ VRAM): Can run 30B+ models or quantized 70B
Use the --no-specs flag when joining if you don’t want to share your hardware specifications with other room participants.

Streaming Support

All providers should support streaming responses via Server-Sent Events (SSE):
import { createGambiarra } from "gambiarra-sdk";
import { streamText } from "ai";

const gambiarra = createGambiarra({ roomCode: "ABC123" });

const stream = await streamText({
  model: gambiarra.model("llama3"),
  prompt: "Write a story about robots",
});

for await (const chunk of stream.textStream) {
  process.stdout.write(chunk);
}
The hub proxies streaming responses transparently (see packages/core/src/hub.ts:284-293).

Troubleshooting

Model Not Found

If you get “Model not found” errors:
  1. Verify the model is loaded in your LLM server
  2. Check the exact model name (case-sensitive)
  3. Ensure the server is running on the specified endpoint
  4. Try listing models manually:
# Ollama
curl http://localhost:11434/api/tags

# OpenAI-compatible
curl http://localhost:11434/v1/models

Connection Refused

If connection fails:
  1. Verify the server is running
  2. Check firewall rules
  3. Ensure correct port and hostname
  4. Test connectivity:
curl http://localhost:11434/v1/models

Provider-Specific Issues

Refer to the Troubleshooting guide for common provider-specific issues.

Build docs developers (and LLMs) love