Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/badlogic/pi-mono/llms.txt

Use this file to discover all available pages before exploring further.

The @mariozechner/pi-pods CLI simplifies running large language models on remote GPU pods with automatic vLLM configuration for agentic workloads.

Key Features

Automatic Setup

Sets up vLLM on fresh Ubuntu pods automatically

Tool Calling

Configures tool calling for agentic models

Smart GPU Allocation

Manages multiple models with automatic GPU assignment

OpenAI Compatible

Provides OpenAI-compatible API endpoints

Installation

npm install -g @mariozechner/pi

Quick Start

1

Set Environment Variables

export HF_TOKEN=your_huggingface_token
export PI_API_KEY=your_api_key
2

Setup Pod

pi pods setup dc1 "ssh root@1.2.3.4" \
  --mount "sudo mount -t nfs nfs.server:/path /mnt/models"
3

Start Model

pi start Qwen/Qwen2.5-Coder-32B-Instruct --name qwen
4

Test with Agent

# Single message
pi agent qwen "What is the Fibonacci sequence?"

# Interactive mode
pi agent qwen -i

Supported Providers

  • NFS volumes shareable across pods
  • Models download once, use everywhere
  • Best for teams or multiple experiments
pi pods setup dc1 "ssh root@instance.datacrunch.io" \
  --mount "sudo mount -t nfs -o nconnect=16 nfs.fin-02.datacrunch.io:/pseudo /mnt/models"

RunPod

  • Network volumes with good persistence
  • Cannot share between running pods
  • Good for single-pod workflows
pi pods setup runpod "ssh root@pod.runpod.io" --models-path /runpod-volume

Also Works With

  • Vast.ai
  • Prime Intellect
  • AWS EC2 with EFS
  • Any Ubuntu machine with NVIDIA GPUs

Pod Management

Setup New Pod

pi pods setup <name> "<ssh>" [options]
  --mount "<mount_command>"    # Run mount command during setup
  --models-path <path>          # Override extracted path
  --vllm release|nightly|gpt-oss  # vLLM version

List and Manage Pods

pi pods                  # List all configured pods
pi pods active <name>    # Switch active pod
pi pods remove <name>    # Remove pod from config
pi shell [<name>]        # SSH into pod
pi ssh [<name>] "<cmd>"  # Run command on pod

Model Management

Start Models

pi start <model> --name <name> [options]
  --memory <percent>   # GPU memory: 30%, 50%, 90%
  --context <size>     # Context: 4k, 8k, 16k, 32k, 64k, 128k
  --gpus <count>       # Number of GPUs
  --pod <name>         # Target specific pod
  --vllm <args...>     # Custom vLLM args

Manage Running Models

pi stop [<name>]    # Stop model (or all)
pi list             # List running models
pi logs <name>      # Stream model logs

Predefined Models

Qwen Models

# Qwen2.5-Coder-32B - Excellent coding model
pi start Qwen/Qwen2.5-Coder-32B-Instruct --name qwen

# Qwen3-Coder-30B - Advanced reasoning
pi start Qwen/Qwen3-Coder-30B-A3B-Instruct --name qwen3

# Qwen3-Coder-480B - 8xH200 required
pi start Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --name qwen-480b

GPT-OSS Models

# Requires special vLLM build
pi pods setup gpt-pod "ssh root@1.2.3.4" --models-path /workspace --vllm gpt-oss

pi start openai/gpt-oss-20b --name gpt20
pi start openai/gpt-oss-120b --name gpt120

GLM Models

pi start zai-org/GLM-4.5 --name glm
pi start zai-org/GLM-4.5-Air --name glm-air

Custom Models

# DeepSeek with custom settings
pi start deepseek-ai/DeepSeek-V3 --name deepseek --vllm \
  --tensor-parallel-size 4 --trust-remote-code

# Any model with specific parser
pi start some/model --name mymodel --vllm \
  --tool-call-parser hermes --enable-auto-tool-choice

Multi-GPU Support

Automatic Assignment

pi start model1 --name m1  # Auto-assigns GPU 0
pi start model2 --name m2  # Auto-assigns GPU 1
pi start model3 --name m3  # Auto-assigns GPU 2

Specify GPU Count

# Run on 1 GPU instead of all
pi start Qwen/Qwen2.5-Coder-32B-Instruct --name qwen --gpus 1

# Run on 8 GPUs
pi start zai-org/GLM-4.5 --name glm --gpus 8

Tensor Parallelism

pi start meta-llama/Llama-3.1-70B-Instruct --name llama70b --vllm \
  --tensor-parallel-size 4

Agent Interface

Single Messages

pi agent <name> "<message>"
pi agent <name> "<msg1>" "<msg2>"  # Multiple messages

Interactive Mode

pi agent <name> -i       # Interactive chat
pi agent <name> -i -c    # Continue previous session

Standalone Agent

# Works with any OpenAI-compatible API
pi-agent --base-url http://localhost:8000/v1 --model model-name "Hello"
pi-agent --api-key sk-... "What is 2+2?"
pi-agent --json "What is 2+2?"  # JSONL output
pi-agent -i  # Interactive mode

API Integration

All models expose OpenAI-compatible endpoints:
from openai import OpenAI

client = OpenAI(
    base_url="http://your-pod-ip:8001/v1",
    api_key="your-pi-api-key"
)

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-Coder-32B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)

Memory and Context

GPU Memory Allocation

  • --memory 30% - High concurrency, limited context
  • --memory 50% - Balanced (default)
  • --memory 90% - Maximum context, low concurrency

Context Window

  • --context 4k - 4,096 tokens
  • --context 32k - 32,768 tokens
  • --context 128k - 131,072 tokens
pi start Qwen/Qwen2.5-Coder-32B-Instruct --name coder \
  --context 64k --memory 70%

Tool Calling

Automatic configuration for known models:
  • Qwen: hermes parser
  • GLM: glm4_moe parser with reasoning
  • GPT-OSS: Uses /v1/responses endpoint
  • Custom: Specify with --vllm --tool-call-parser <parser>
Disable tool calling:
pi start model --name mymodel --vllm --disable-tool-call-parser

Troubleshooting

OOM Errors

  • Reduce --memory percentage
  • Use quantized version (FP8)
  • Reduce --context size

Model Won’t Start

pi ssh "nvidia-smi"  # Check GPU usage
pi list              # Check port conflicts
pi stop              # Force stop all

Tool Calling Issues

  • Try different parser: --vllm --tool-call-parser mistral
  • Or disable: --vllm --disable-tool-call-parser

Environment Variables

VariableDescription
HF_TOKENHuggingFace token for downloads
PI_API_KEYAPI key for vLLM endpoints
PI_CONFIG_DIRConfig directory (default: ~/.pi)
OPENAI_API_KEYUsed by pi-agent

Next Steps

DataCrunch Setup

Detailed DataCrunch configuration

RunPod Setup

RunPod configuration guide

GitHub Repository

View source code and examples

Build docs developers (and LLMs) love