Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Pavelevich/llm-checker/llms.txt
Use this file to discover all available pages before exploring further.
Basic flow
Install LLM Checker
npm install -g llm-checker
Detect your hardware
Run hw-detect to inspect your CPU, GPU, available memory, and the best inference backend for your system.Summary:
Apple M4 Pro (24GB Unified Memory)
Tier: MEDIUM HIGH
Max model size: 15GB
Best backend: metal
CPU:
Apple M4 Pro
Cores: 12 (12 physical)
SIMD: NEON
Metal:
GPU Cores: 16
Unified Memory: 24GB
Memory Bandwidth: 273GB/s
On hybrid or integrated-only systems, hw-detect also surfaces GPU topology explicitly:Dedicated GPUs: NVIDIA GeForce RTX 4060
Integrated GPUs: Intel Iris Xe Graphics
Assist path: Integrated/shared-memory GPU detected, runtime remains CPU
Get model recommendations
Run recommend to see the top-ranked models for each category (coding, reasoning, multimodal, and more) based on your hardware profile.llm-checker recommend --category coding
INTELLIGENT RECOMMENDATIONS BY CATEGORY
Hardware Tier: HIGH | Models Analyzed: 205
Coding:
qwen2.5-coder:14b (14B)
Score: 78/100
Fine-tuning: LoRA+QLoRA
Command: ollama pull qwen2.5-coder:14b
Reasoning:
deepseek-r1:14b (14B)
Score: 86/100
Fine-tuning: QLoRA
Command: ollama pull deepseek-r1:14b
Multimodal:
llama3.2-vision:11b (11B)
Score: 83/100
Fine-tuning: LoRA+QLoRA
Command: ollama pull llama3.2-vision:11b
Use the --category flag to focus on a specific use case: coding, reasoning, multimodal, or general. You can also steer ranking by optimization profile with --optimize speed, --optimize quality, --optimize balanced, or --optimize context.
Pull a model and run it
Copy the ollama pull command from the recommendation output, then use ai-run to prompt it directly:# Pull the recommended model
ollama pull qwen2.5-coder:14b
# Run a prompt with auto model selection
llm-checker ai-run --category coding --prompt "Write a hello world in Python"
If you have already calibrated routing, pass the --calibrated flag to use your policy file:llm-checker ai-run --calibrated --category coding --prompt "Refactor this function"
Calibration quick start (10 minutes)
Calibration generates routing policy artifacts from a prompt suite so that recommend and ai-run can make deterministic, measured decisions instead of relying solely on hardware heuristics.
Copy the sample prompt suite
cp ./docs/fixtures/calibration/sample-suite.jsonl ./sample-suite.jsonl
Generate calibration artifacts
Run calibrate in dry-run mode to produce both a calibration contract and a routing policy without executing live model calls:mkdir -p ./artifacts
llm-checker calibrate \
--suite ./sample-suite.jsonl \
--models qwen2.5-coder:7b llama3.2:3b \
--runtime ollama \
--objective balanced \
--dry-run \
--output ./artifacts/calibration-result.json \
--policy-out ./artifacts/calibration-policy.yaml
This creates two files:
./artifacts/calibration-result.json — calibration contract
./artifacts/calibration-policy.yaml — routing policy for use with recommend and ai-run
Apply calibrated routing
Pass the generated policy file to recommend or ai-run with the --calibrated flag:llm-checker recommend --calibrated ./artifacts/calibration-policy.yaml --category coding
llm-checker ai-run --calibrated ./artifacts/calibration-policy.yaml --category coding --prompt "Refactor this function"
Flag precedence: --policy <file> takes precedence over --calibrated [file]. If you omit the path from --calibrated, discovery defaults to ~/.llm-checker/calibration-policy.{yaml,yml,json}.