Q6_0, using an imatrix is strongly recommended.
Generating an imatrix
Prepare a calibration dataset
You need a plain-text file containing representative text. A common choice is a deduplicated sample from the same domain the model will be used on.
Options
—layer-similarity
Collect cosine-similarity statistics that measure how much each layer changes its activations:—hide-imatrix
Obscure the imatrix provenance in the output file. When this flag is set,llama-imatrix stores top_secret in the data file name and calibration dataset fields, and writes zeros for the batch size and number of chunks:
Using an imatrix when quantizing
Pass--imatrix to llama-quantize:
--imatrix for best quality results. Omitting it on quants below Q6_0 will noticeably degrade output quality.
Converting GGUF imatrix files
Some imatrix files are distributed in the newer GGUF format rather than the.dat format used by ik_llama.cpp. Convert them with the included script:
Verifying imatrix usage
To confirm that a downloaded GGUF was quantized with an imatrix, inspect its metadata. Look for fields with thequantize.imatrix.* prefix — their presence confirms an imatrix was applied during quantization.
You can view metadata with gguf_dump.py: