Documentation Index
Fetch the complete documentation index at: https://mintlify.com/AlexsJones/llmfit/llms.txt
Use this file to discover all available pages before exploring further.
Models API
The models module provides access to the embedded model database and metadata operations.
Core Types
LlmModel
Represents a single LLM model with all metadata:
pub struct LlmModel {
pub name: String,
pub provider: String,
pub parameter_count: String,
pub parameters_raw: Option<u64>,
pub min_ram_gb: f64,
pub recommended_ram_gb: f64,
pub min_vram_gb: Option<f64>,
pub quantization: String,
pub context_length: u32,
pub use_case: String,
pub is_moe: bool,
pub num_experts: Option<u32>,
pub active_experts: Option<u32>,
pub active_parameters: Option<u64>,
pub release_date: Option<String>,
pub gguf_sources: Vec<GgufSource>,
}
Key Fields:
name - Model identifier (e.g., “llama-3.1-8b-instruct”)
provider - Original provider (“Meta”, “Qwen”, etc.)
parameter_count - Human-readable size (“7B”, “8x7B”)
parameters_raw - Exact parameter count
min_ram_gb - Minimum system RAM for CPU inference
recommended_ram_gb - Recommended RAM for best performance
min_vram_gb - Minimum VRAM for GPU inference
quantization - Default quantization level (“Q4_K_M”, “mlx-4bit”)
context_length - Maximum context window
use_case - Primary use case category
is_moe - Whether this is a Mixture-of-Experts model
num_experts / active_experts - MoE expert configuration
active_parameters - Active parameter count for MoE models
release_date - Release date string (ISO 8601)
gguf_sources - Known GGUF download sources
GgufSource
A known GGUF download source:
pub struct GgufSource {
pub repo: String, // e.g., "bartowski/Llama-3.1-8B-Instruct-GGUF"
pub provider: String, // e.g., "bartowski", "unsloth"
}
ModelDatabase
Container for the embedded model database:
pub struct ModelDatabase {
// Private fields
}
Methods:
impl ModelDatabase {
pub fn new() -> Self;
pub fn get_all_models(&self) -> &Vec<LlmModel>;
pub fn find_model(&self, query: &str) -> Vec<&LlmModel>;
pub fn models_fitting_system(
&self,
available_ram_gb: f64,
has_gpu: bool,
vram_gb: Option<f64>,
) -> Vec<&LlmModel>;
}
UseCase
Model use-case categories:
pub enum UseCase {
General,
Coding,
Reasoning,
Chat,
Multimodal,
Embedding,
}
Methods:
impl UseCase {
pub fn label(&self) -> &'static str;
pub fn from_model(model: &LlmModel) -> Self;
}
Functions
ModelDatabase::new()
Loads the embedded model database:
Returns: ModelDatabase with all models loaded
Example:
use llmfit_core::ModelDatabase;
let db = ModelDatabase::new();
println!("Loaded {} models", db.get_all_models().len());
The database is embedded at compile time from data/hf_models.json. No runtime file I/O occurs.
ModelDatabase::get_all_models()
Returns all models in the database:
pub fn get_all_models(&self) -> &Vec<LlmModel>
Returns: Reference to all models
Example:
let db = ModelDatabase::new();
for model in db.get_all_models() {
println!("{}: {} params, {} ctx",
model.name,
model.parameter_count,
model.context_length
);
}
ModelDatabase::find_model()
Searches models by name, provider, or parameter count:
pub fn find_model(&self, query: &str) -> Vec<&LlmModel>
Parameters:
query - Search term (case-insensitive substring match)
Returns: Matching models
Example:
let db = ModelDatabase::new();
// Find all Llama models
let llama_models = db.find_model("llama");
// Find 7B models
let seven_b = db.find_model("7B");
// Find Qwen models
let qwen = db.find_model("qwen");
for model in qwen {
println!("Found: {}", model.name);
}
ModelDatabase::models_fitting_system()
Filters models that fit on specific hardware:
pub fn models_fitting_system(
&self,
available_ram_gb: f64,
has_gpu: bool,
vram_gb: Option<f64>,
) -> Vec<&LlmModel>
Parameters:
available_ram_gb - Available system RAM
has_gpu - Whether GPU is present
vram_gb - GPU VRAM if available
Returns: Models that meet hardware requirements
Example:
use llmfit_core::{SystemSpecs, ModelDatabase};
let specs = SystemSpecs::detect();
let db = ModelDatabase::new();
let fitting = db.models_fitting_system(
specs.available_ram_gb,
specs.has_gpu,
specs.gpu_vram_gb,
);
println!("Models that fit: {}", fitting.len());
for model in fitting.iter().take(5) {
println!(" - {}", model.name);
}
LlmModel Methods
is_mlx_model()
Checks if model is MLX-specific:
pub fn is_mlx_model(&self) -> bool
Returns: true if model name contains “-MLX-” suffix
Example:
let model_name = "Qwen3-8B-MLX-4bit";
let is_mlx = model_name.contains("-MLX-");
if is_mlx {
println!("Apple Silicon only");
}
params_b()
Parameter count in billions:
pub fn params_b(&self) -> f64
Returns: Parameter count in billions
Example:
let model = /* ... */;
let params = model.params_b();
if params < 10.0 {
println!("Small model: {:.1}B parameters", params);
} else if params < 100.0 {
println!("Medium model: {:.1}B parameters", params);
} else {
println!("Large model: {:.1}B parameters", params);
}
estimate_memory_gb()
Estimates memory required for specific quantization and context:
pub fn estimate_memory_gb(&self, quant: &str, ctx: u32) -> f64
Parameters:
quant - Quantization level (“Q4_K_M”, “Q8_0”, etc.)
ctx - Context length in tokens
Returns: Estimated memory in GB
Example:
let model = /* ... */;
// 4K context with Q4 quantization
let mem_q4_4k = model.estimate_memory_gb("Q4_K_M", 4096);
// 32K context with Q8 quantization
let mem_q8_32k = model.estimate_memory_gb("Q8_0", 32768);
println!("Q4_K_M @ 4K: {:.2} GB", mem_q4_4k);
println!("Q8_0 @ 32K: {:.2} GB", mem_q8_32k);
best_quant_for_budget()
Selects best quantization that fits in memory:
pub fn best_quant_for_budget(
&self,
budget_gb: f64,
ctx: u32,
) -> Option<(&'static str, f64)>
Parameters:
budget_gb - Available memory
ctx - Target context length
Returns: (quantization, estimated_memory) or None if nothing fits
Example:
let model = /* ... */;
let budget = 16.0; // 16 GB available
if let Some((quant, mem)) = model.best_quant_for_budget(budget, 4096) {
println!("Best quantization: {} ({:.2} GB)", quant, mem);
} else {
println!("Model too large for available memory");
}
The function tries quantization levels in quality order:
- Q8_0 (best quality)
- Q6_K
- Q5_K_M
- Q4_K_M
- Q3_K_M
- Q2_K (smallest)
If nothing fits, it tries halving the context length once.
MoE-Specific Methods
For Mixture-of-Experts models:
// Active expert VRAM (GPU)
pub fn moe_active_vram_gb(&self) -> Option<f64>;
// Inactive expert RAM (offloaded to system RAM)
pub fn moe_offloaded_ram_gb(&self) -> Option<f64>;
Example:
let model = /* MoE model like Mixtral 8x7B */;
if model.is_moe {
if let Some(active_vram) = model.moe_active_vram_gb() {
println!("Active experts: {:.2} GB VRAM", active_vram);
}
if let Some(offloaded) = model.moe_offloaded_ram_gb() {
println!("Inactive experts: {:.2} GB RAM", offloaded);
}
}
Quantization Functions
quant_bpp()
Bytes per parameter for quantization level:
pub fn quant_bpp(quant: &str) -> f64
Example:
use llmfit_core::models::quant_bpp;
assert_eq!(quant_bpp("F16"), 2.0);
assert_eq!(quant_bpp("Q8_0"), 1.05);
assert_eq!(quant_bpp("Q4_K_M"), 0.58);
assert_eq!(quant_bpp("mlx-4bit"), 0.55);
quant_speed_multiplier()
Speed impact of quantization:
pub fn quant_speed_multiplier(quant: &str) -> f64
Higher values = faster inference (lower precision = faster math).
quant_quality_penalty()
Quality penalty for quantization:
pub fn quant_quality_penalty(quant: &str) -> f64
Negative values indicate quality loss relative to F16.
Quantization Hierarchies
Predefined quantization hierarchies (best to worst quality):
// Standard GGUF hierarchy
pub const QUANT_HIERARCHY: &[&str] = &[
"Q8_0", "Q6_K", "Q5_K_M", "Q4_K_M", "Q3_K_M", "Q2_K"
];
// MLX-native hierarchy
pub const MLX_QUANT_HIERARCHY: &[&str] = &[
"mlx-8bit", "mlx-4bit"
];
Example:
use llmfit_core::models::{QUANT_HIERARCHY, MLX_QUANT_HIERARCHY};
let model = /* ... */;
let budget = 12.0;
// Try GGUF quantizations
if let Some((q, mem)) = model.best_quant_for_budget_with(
budget, 4096, QUANT_HIERARCHY
) {
println!("GGUF: {} ({:.2} GB)", q, mem);
}
// Try MLX quantizations
if let Some((q, mem)) = model.best_quant_for_budget_with(
budget, 4096, MLX_QUANT_HIERARCHY
) {
println!("MLX: {} ({:.2} GB)", q, mem);
}
Use Case Inference
use llmfit_core::{UseCase, ModelDatabase};
let db = ModelDatabase::new();
for model in db.get_all_models() {
let use_case = UseCase::from_model(model);
println!("{}: {}", model.name, use_case.label());
}
Use cases are inferred from model name and metadata:
- Embedding: “embed”, “bge” in name
- Coding: “code” in name or use_case
- Multimodal: “vision” in use_case
- Reasoning: “reason” or “deepseek-r1” in name
- Chat: “chat” or “instruction” in use_case
- General: Default fallback