Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/AlexsJones/llmfit/llms.txt

Use this file to discover all available pages before exploring further.

Models API

The models module provides access to the embedded model database and metadata operations.

Core Types

LlmModel

Represents a single LLM model with all metadata:
pub struct LlmModel {
    pub name: String,
    pub provider: String,
    pub parameter_count: String,
    pub parameters_raw: Option<u64>,
    pub min_ram_gb: f64,
    pub recommended_ram_gb: f64,
    pub min_vram_gb: Option<f64>,
    pub quantization: String,
    pub context_length: u32,
    pub use_case: String,
    pub is_moe: bool,
    pub num_experts: Option<u32>,
    pub active_experts: Option<u32>,
    pub active_parameters: Option<u64>,
    pub release_date: Option<String>,
    pub gguf_sources: Vec<GgufSource>,
}
Key Fields:
  • name - Model identifier (e.g., “llama-3.1-8b-instruct”)
  • provider - Original provider (“Meta”, “Qwen”, etc.)
  • parameter_count - Human-readable size (“7B”, “8x7B”)
  • parameters_raw - Exact parameter count
  • min_ram_gb - Minimum system RAM for CPU inference
  • recommended_ram_gb - Recommended RAM for best performance
  • min_vram_gb - Minimum VRAM for GPU inference
  • quantization - Default quantization level (“Q4_K_M”, “mlx-4bit”)
  • context_length - Maximum context window
  • use_case - Primary use case category
  • is_moe - Whether this is a Mixture-of-Experts model
  • num_experts / active_experts - MoE expert configuration
  • active_parameters - Active parameter count for MoE models
  • release_date - Release date string (ISO 8601)
  • gguf_sources - Known GGUF download sources

GgufSource

A known GGUF download source:
pub struct GgufSource {
    pub repo: String,      // e.g., "bartowski/Llama-3.1-8B-Instruct-GGUF"
    pub provider: String,  // e.g., "bartowski", "unsloth"
}

ModelDatabase

Container for the embedded model database:
pub struct ModelDatabase {
    // Private fields
}
Methods:
impl ModelDatabase {
    pub fn new() -> Self;
    pub fn get_all_models(&self) -> &Vec<LlmModel>;
    pub fn find_model(&self, query: &str) -> Vec<&LlmModel>;
    pub fn models_fitting_system(
        &self,
        available_ram_gb: f64,
        has_gpu: bool,
        vram_gb: Option<f64>,
    ) -> Vec<&LlmModel>;
}

UseCase

Model use-case categories:
pub enum UseCase {
    General,
    Coding,
    Reasoning,
    Chat,
    Multimodal,
    Embedding,
}
Methods:
impl UseCase {
    pub fn label(&self) -> &'static str;
    pub fn from_model(model: &LlmModel) -> Self;
}

Functions

ModelDatabase::new()

Loads the embedded model database:
pub fn new() -> Self
Returns: ModelDatabase with all models loaded Example:
use llmfit_core::ModelDatabase;

let db = ModelDatabase::new();
println!("Loaded {} models", db.get_all_models().len());
The database is embedded at compile time from data/hf_models.json. No runtime file I/O occurs.

ModelDatabase::get_all_models()

Returns all models in the database:
pub fn get_all_models(&self) -> &Vec<LlmModel>
Returns: Reference to all models Example:
let db = ModelDatabase::new();

for model in db.get_all_models() {
    println!("{}: {} params, {} ctx",
        model.name,
        model.parameter_count,
        model.context_length
    );
}

ModelDatabase::find_model()

Searches models by name, provider, or parameter count:
pub fn find_model(&self, query: &str) -> Vec<&LlmModel>
Parameters:
  • query - Search term (case-insensitive substring match)
Returns: Matching models Example:
let db = ModelDatabase::new();

// Find all Llama models
let llama_models = db.find_model("llama");

// Find 7B models
let seven_b = db.find_model("7B");

// Find Qwen models
let qwen = db.find_model("qwen");

for model in qwen {
    println!("Found: {}", model.name);
}

ModelDatabase::models_fitting_system()

Filters models that fit on specific hardware:
pub fn models_fitting_system(
    &self,
    available_ram_gb: f64,
    has_gpu: bool,
    vram_gb: Option<f64>,
) -> Vec<&LlmModel>
Parameters:
  • available_ram_gb - Available system RAM
  • has_gpu - Whether GPU is present
  • vram_gb - GPU VRAM if available
Returns: Models that meet hardware requirements Example:
use llmfit_core::{SystemSpecs, ModelDatabase};

let specs = SystemSpecs::detect();
let db = ModelDatabase::new();

let fitting = db.models_fitting_system(
    specs.available_ram_gb,
    specs.has_gpu,
    specs.gpu_vram_gb,
);

println!("Models that fit: {}", fitting.len());
for model in fitting.iter().take(5) {
    println!("  - {}", model.name);
}

LlmModel Methods

is_mlx_model()

Checks if model is MLX-specific:
pub fn is_mlx_model(&self) -> bool
Returns: true if model name contains “-MLX-” suffix Example:
let model_name = "Qwen3-8B-MLX-4bit";
let is_mlx = model_name.contains("-MLX-");

if is_mlx {
    println!("Apple Silicon only");
}

params_b()

Parameter count in billions:
pub fn params_b(&self) -> f64
Returns: Parameter count in billions Example:
let model = /* ... */;
let params = model.params_b();

if params < 10.0 {
    println!("Small model: {:.1}B parameters", params);
} else if params < 100.0 {
    println!("Medium model: {:.1}B parameters", params);
} else {
    println!("Large model: {:.1}B parameters", params);
}

estimate_memory_gb()

Estimates memory required for specific quantization and context:
pub fn estimate_memory_gb(&self, quant: &str, ctx: u32) -> f64
Parameters:
  • quant - Quantization level (“Q4_K_M”, “Q8_0”, etc.)
  • ctx - Context length in tokens
Returns: Estimated memory in GB Example:
let model = /* ... */;

// 4K context with Q4 quantization
let mem_q4_4k = model.estimate_memory_gb("Q4_K_M", 4096);

// 32K context with Q8 quantization
let mem_q8_32k = model.estimate_memory_gb("Q8_0", 32768);

println!("Q4_K_M @ 4K: {:.2} GB", mem_q4_4k);
println!("Q8_0 @ 32K: {:.2} GB", mem_q8_32k);

best_quant_for_budget()

Selects best quantization that fits in memory:
pub fn best_quant_for_budget(
    &self,
    budget_gb: f64,
    ctx: u32,
) -> Option<(&'static str, f64)>
Parameters:
  • budget_gb - Available memory
  • ctx - Target context length
Returns: (quantization, estimated_memory) or None if nothing fits Example:
let model = /* ... */;
let budget = 16.0; // 16 GB available

if let Some((quant, mem)) = model.best_quant_for_budget(budget, 4096) {
    println!("Best quantization: {} ({:.2} GB)", quant, mem);
} else {
    println!("Model too large for available memory");
}
The function tries quantization levels in quality order:
  1. Q8_0 (best quality)
  2. Q6_K
  3. Q5_K_M
  4. Q4_K_M
  5. Q3_K_M
  6. Q2_K (smallest)
If nothing fits, it tries halving the context length once.

MoE-Specific Methods

For Mixture-of-Experts models:
// Active expert VRAM (GPU)
pub fn moe_active_vram_gb(&self) -> Option<f64>;

// Inactive expert RAM (offloaded to system RAM)
pub fn moe_offloaded_ram_gb(&self) -> Option<f64>;
Example:
let model = /* MoE model like Mixtral 8x7B */;

if model.is_moe {
    if let Some(active_vram) = model.moe_active_vram_gb() {
        println!("Active experts: {:.2} GB VRAM", active_vram);
    }
    if let Some(offloaded) = model.moe_offloaded_ram_gb() {
        println!("Inactive experts: {:.2} GB RAM", offloaded);
    }
}

Quantization Functions

quant_bpp()

Bytes per parameter for quantization level:
pub fn quant_bpp(quant: &str) -> f64
Example:
use llmfit_core::models::quant_bpp;

assert_eq!(quant_bpp("F16"), 2.0);
assert_eq!(quant_bpp("Q8_0"), 1.05);
assert_eq!(quant_bpp("Q4_K_M"), 0.58);
assert_eq!(quant_bpp("mlx-4bit"), 0.55);

quant_speed_multiplier()

Speed impact of quantization:
pub fn quant_speed_multiplier(quant: &str) -> f64
Higher values = faster inference (lower precision = faster math).

quant_quality_penalty()

Quality penalty for quantization:
pub fn quant_quality_penalty(quant: &str) -> f64
Negative values indicate quality loss relative to F16.

Quantization Hierarchies

Predefined quantization hierarchies (best to worst quality):
// Standard GGUF hierarchy
pub const QUANT_HIERARCHY: &[&str] = &[
    "Q8_0", "Q6_K", "Q5_K_M", "Q4_K_M", "Q3_K_M", "Q2_K"
];

// MLX-native hierarchy
pub const MLX_QUANT_HIERARCHY: &[&str] = &[
    "mlx-8bit", "mlx-4bit"
];
Example:
use llmfit_core::models::{QUANT_HIERARCHY, MLX_QUANT_HIERARCHY};

let model = /* ... */;
let budget = 12.0;

// Try GGUF quantizations
if let Some((q, mem)) = model.best_quant_for_budget_with(
    budget, 4096, QUANT_HIERARCHY
) {
    println!("GGUF: {} ({:.2} GB)", q, mem);
}

// Try MLX quantizations
if let Some((q, mem)) = model.best_quant_for_budget_with(
    budget, 4096, MLX_QUANT_HIERARCHY
) {
    println!("MLX: {} ({:.2} GB)", q, mem);
}

Use Case Inference

use llmfit_core::{UseCase, ModelDatabase};

let db = ModelDatabase::new();

for model in db.get_all_models() {
    let use_case = UseCase::from_model(model);
    println!("{}: {}", model.name, use_case.label());
}
Use cases are inferred from model name and metadata:
  • Embedding: “embed”, “bge” in name
  • Coding: “code” in name or use_case
  • Multimodal: “vision” in use_case
  • Reasoning: “reason” or “deepseek-r1” in name
  • Chat: “chat” or “instruction” in use_case
  • General: Default fallback

Build docs developers (and LLMs) love