Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

In addition to the six curated desktop presets, the USB-Uncensored-LLM installer supports downloading any .gguf model file directly from HuggingFace and registering it in the Ollama engine automatically. This is useful when you want to test a specific quantization level not offered in the catalog, explore a different model family entirely, or use a fine-tune that has not yet made it into the preset list. The custom model flow is fully integrated into the same interactive installer used for preset models — no separate tooling or command-line knowledge required.

How to Add a Custom Model

1

Run the installer for your platform

Navigate to the folder for your current operating system and launch the installer:
  • Windows: Double-click Windows/install.bat
  • macOS: Open Terminal, drag in Mac/install.command, and press Enter
  • Linux: Run bash Linux/install.sh from the project root
The installer will display the full model selection menu with the curated presets listed by number.
2

Enter C at the model selection prompt

At the Your choice: prompt, type c (or C) to trigger the custom model flow. You can also combine it with preset numbers in a comma-separated list — for example, entering 1,c downloads Gemma 2 2B alongside your custom model in a single run.
Your choice: c
or mixed with a preset:
Your choice: 1,3,c
3

Paste a direct .gguf URL from HuggingFace

The installer will prompt you for a direct download URL. The URL must resolve to a raw .gguf binary — use the resolve/main/ path format, not the HuggingFace model page URL.URL format:
https://huggingface.co/<user>/<repo>/resolve/main/<filename>-Q4_K_M.gguf
Example:
https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
Paste the full URL and press Enter. The installer extracts the filename automatically from the last path segment.
4

Give the model a short local name

You will be asked for a short identifier used as the Ollama model name and Modelfile filename:
Give it a short name (e.g. mymodel-local):
Enter something descriptive and lowercase, such as llama3-8b-local. The installer automatically:
  • Converts to lowercase
  • Replaces spaces with dashes
  • Appends -local if the name does not already end with it
If you press Enter without typing anything, the name defaults to custom-local.
5

Enter a system prompt (or press Enter for the default)

Customize the model’s persona and behavior with a system prompt:
System prompt (press Enter for default):
If you press Enter without typing anything, the installer uses:
You are a helpful AI assistant.
You can paste any system prompt here, including multi-sentence instructions. The full text is written verbatim into the generated Modelfile.
6

The model downloads and is imported into Ollama

The installer downloads the .gguf file to Shared/models/<filename>.gguf, generates a Modelfile at Shared/models/Modelfile-<local-name>, and registers the model with the Ollama engine using:
ollama create <local-name> -f Modelfile-<local-name>
Once complete, the model is immediately available the next time you launch the chat UI with start-fast-chat.bat (Windows) or start.sh (macOS/Linux).

Generated Modelfile

After the installer completes Step 5 of its setup process, it creates a Modelfile at:
Shared/models/Modelfile-<local-name>
For a custom model named llama3-8b-local with the default system prompt, the file contents look like this:
FROM ./Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are a helpful AI assistant.
The FROM directive uses a relative path so the Modelfile works regardless of where the USB drive is mounted or which operating system is running. temperature 0.7 and top_p 0.9 are applied to all models — preset and custom alike — as sensible defaults for interactive chat.

Manual Installation

If you are on a machine without a working installer, or you want to add a model that you have already downloaded separately, you can do the whole process by hand. 1. Place the GGUF file in the models directory:
cp /path/to/your-model-Q4_K_M.gguf Shared/models/
2. Create a Modelfile manually at Shared/models/Modelfile-mymodel-local:
FROM ./your-model-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are a helpful AI assistant.
3. Register the model with Ollama:
OLLAMA_MODELS=Shared/models/ollama_data ./Shared/bin/ollama-linux create mymodel-local -f Shared/models/Modelfile-mymodel-local
On macOS, replace ollama-linux with ollama-darwin. On Windows, use ollama-windows.exe inside a PowerShell or Command Prompt session with $env:OLLAMA_MODELS set accordingly.

Finding Models on HuggingFace

HuggingFace hosts thousands of GGUF-quantized models. To browse effectively:
  1. Go to huggingface.co/models
  2. In the Library filter on the left sidebar, select GGUF
  3. Browse by task, language, or search by model family name
Recommended community quantizers for USB usage:
  • bartowski — High-quality Q4_K_M and Q5_K_M quantizations of popular open-source models, including most of the models in the USB-Uncensored-LLM preset catalog
  • TheBloke — One of the original large-scale GGUF quantization repositories; slightly older releases but extremely broad model coverage
Recommended quantization for USB use: Q4_K_M — 4-bit K-quant medium offers the best tradeoff between model quality and file size for portable storage. Q5_K_M is higher quality at the cost of roughly 25% more storage. Q2_K is the smallest option but with a noticeable quality drop.

URL Validation

The installer performs two checks on the URL you provide:
  1. .gguf extension check: If the URL does not contain .gguf, the installer displays a warning and asks whether to proceed:
    WARNING: URL does not contain .gguf - this may not be a valid model file.
    Try anyway? (yes/no):
    
    Entering yes or y allows the download to proceed. Entering no skips the custom model entirely.
  2. Minimum file size check: After download, the installer verifies the file is larger than 100 MB. Files smaller than this threshold are considered incomplete or invalid (for example, an HTML error page returned by a bad URL), and the model is flagged as a download error. You can re-run the installer to retry.
Custom models must be in GGUF format. Other model file formats — including safetensors, PyTorch .bin files, and ONNX exports — are not compatible with the Ollama engine that powers USB-Uncensored-LLM. If you only have a non-GGUF checkpoint, it must be converted and quantized to GGUF using tools such as llama.cpp’s convert-hf-to-gguf.py before it can be used here.

Combining Preset and Custom Models

You can mix preset numbers and the c flag in a single installer run. The installer processes all selections in order and downloads them sequentially. Example — download Gemma 2B, Qwen 9B, and a custom model together:
Your choice: 1,3,c
The installer will:
  1. Download gemma-2-2b-it-abliterated-Q4_K_M.gguf (preset #1)
  2. Download Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf (preset #3)
  3. Prompt you for a custom GGUF URL and local name
  4. Import all three models into Ollama
Space warning behavior: Whenever three or more models are selected (preset + custom combined), the installer automatically triggers a disk space warning before downloading anything:
WARNING: You selected 3 models!
Estimated download: ~11.8 GB
Need at least ~16 GB free on the drive!
Continue? (yes/no):
The warning also compares the estimated download size against the drive’s current free space and flags a potential shortfall if available storage looks insufficient. Entering no cancels the entire run so you can re-launch the installer with a smaller selection.
Already-downloaded models are skipped automatically on subsequent runs. If a model file passes the minimum byte size check, the installer prints Already downloaded! Skipping... and moves on. This makes it safe to re-run the installer to add new models without re-downloading anything that is already present.

Build docs developers (and LLMs) love