GGUF format
ik_llama.cpp uses the GGUF (GPT-Generated Unified Format) binary format. Every model file stores tensor data alongside metadata such as the architecture, tokenizer, and quantization type. Key metadata fields logged at startup include tensor types (f32, q6_K, etc.) and KV cache sizes. Check these when diagnosing memory or quality issues.
Converting from HuggingFace
Convert a HuggingFace model to GGUF with the bundledconvert_hf_to_gguf.py script:
--help for the full option list.
Inspecting a GGUF file
Usegguf_dump.py to view all tensor names, shapes, and metadata:
Splitting large models
Split an oversized GGUF into parts for easier storage or upload:--model. ik_llama.cpp discovers the remaining parts automatically.
Checking imatrix metadata
An importance matrix (imatrix) calibrates quantization to reduce perceptual loss. To verify whether a GGUF was quantized with an imatrix, inspect its metadata:quantize.imatrix.* fields. Their presence indicates the file was built with imatrix data. For quantization types below Q6_0, imatrix use is strongly recommended.
To convert a GGUF imatrix file to the older
.dat format expected by some
tools, use convert_imatrix_gguf_to_dat.py.