Custom GGUF Model Installation – USB-Uncensored-LLM

In addition to the six curated desktop presets, the USB-Uncensored-LLM installer supports downloading any .gguf model file directly from HuggingFace and registering it in the Ollama engine automatically. This is useful when you want to test a specific quantization level not offered in the catalog, explore a different model family entirely, or use a fine-tune that has not yet made it into the preset list. The custom model flow is fully integrated into the same interactive installer used for preset models — no separate tooling or command-line knowledge required.

How to Add a Custom Model

Run the installer for your platform

Navigate to the folder for your current operating system and launch the installer:

Windows: Double-click Windows/install.bat
macOS: Open Terminal, drag in Mac/install.command, and press Enter
Linux: Run bash Linux/install.sh from the project root

The installer will display the full model selection menu with the curated presets listed by number.

Enter C at the model selection prompt

At the Your choice: prompt, type c (or C) to trigger the custom model flow. You can also combine it with preset numbers in a comma-separated list — for example, entering 1,c downloads Gemma 2 2B alongside your custom model in a single run.

Your choice: c

or mixed with a preset:

Your choice: 1,3,c

Paste a direct .gguf URL from HuggingFace

The installer will prompt you for a direct download URL. The URL must resolve to a raw .gguf binary — use the resolve/main/ path format, not the HuggingFace model page URL.URL format:

https://huggingface.co/<user>/<repo>/resolve/main/<filename>-Q4_K_M.gguf

Example:

https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf

Paste the full URL and press Enter. The installer extracts the filename automatically from the last path segment.

Give the model a short local name

You will be asked for a short identifier used as the Ollama model name and Modelfile filename:

Give it a short name (e.g. mymodel-local):

Enter something descriptive and lowercase, such as llama3-8b-local. The installer automatically:

Converts to lowercase
Replaces spaces with dashes
Appends -local if the name does not already end with it

If you press Enter without typing anything, the name defaults to custom-local.

Enter a system prompt (or press Enter for the default)

Customize the model’s persona and behavior with a system prompt:

System prompt (press Enter for default):

If you press Enter without typing anything, the installer uses:

You are a helpful AI assistant.

You can paste any system prompt here, including multi-sentence instructions. The full text is written verbatim into the generated Modelfile.

The model downloads and is imported into Ollama

The installer downloads the .gguf file to Shared/models/<filename>.gguf, generates a Modelfile at Shared/models/Modelfile-<local-name>, and registers the model with the Ollama engine using:

ollama create <local-name> -f Modelfile-<local-name>

Once complete, the model is immediately available the next time you launch the chat UI with start-fast-chat.bat (Windows) or start.sh (macOS/Linux).

Generated Modelfile

After the installer completes Step 5 of its setup process, it creates a Modelfile at:

Shared/models/Modelfile-<local-name>

For a custom model named llama3-8b-local with the default system prompt, the file contents look like this:

FROM ./Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are a helpful AI assistant.

The FROM directive uses a relative path so the Modelfile works regardless of where the USB drive is mounted or which operating system is running. temperature 0.7 and top_p 0.9 are applied to all models — preset and custom alike — as sensible defaults for interactive chat.

Manual Installation

If you are on a machine without a working installer, or you want to add a model that you have already downloaded separately, you can do the whole process by hand. 1. Place the GGUF file in the models directory:

cp /path/to/your-model-Q4_K_M.gguf Shared/models/

2. Create a Modelfile manually at Shared/models/Modelfile-mymodel-local:

FROM ./your-model-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are a helpful AI assistant.

3. Register the model with Ollama:

OLLAMA_MODELS=Shared/models/ollama_data ./Shared/bin/ollama-linux create mymodel-local -f Shared/models/Modelfile-mymodel-local

On macOS, replace ollama-linux with ollama-darwin. On Windows, use ollama-windows.exe inside a PowerShell or Command Prompt session with $env:OLLAMA_MODELS set accordingly.

Finding Models on HuggingFace

HuggingFace hosts thousands of GGUF-quantized models. To browse effectively:

Go to huggingface.co/models
In the Library filter on the left sidebar, select GGUF
Browse by task, language, or search by model family name

Recommended community quantizers for USB usage:

bartowski — High-quality Q4_K_M and Q5_K_M quantizations of popular open-source models, including most of the models in the USB-Uncensored-LLM preset catalog
TheBloke — One of the original large-scale GGUF quantization repositories; slightly older releases but extremely broad model coverage

Recommended quantization for USB use: Q4_K_M — 4-bit K-quant medium offers the best tradeoff between model quality and file size for portable storage. Q5_K_M is higher quality at the cost of roughly 25% more storage. Q2_K is the smallest option but with a noticeable quality drop.

URL Validation

The installer performs two checks on the URL you provide:

.gguf extension check: If the URL does not contain .gguf, the installer displays a warning and asks whether to proceed:
```
WARNING: URL does not contain .gguf - this may not be a valid model file.
Try anyway? (yes/no):
```
Entering yes or y allows the download to proceed. Entering no skips the custom model entirely.
Minimum file size check: After download, the installer verifies the file is larger than 100 MB. Files smaller than this threshold are considered incomplete or invalid (for example, an HTML error page returned by a bad URL), and the model is flagged as a download error. You can re-run the installer to retry.

Custom models must be in GGUF format. Other model file formats — including safetensors, PyTorch .bin files, and ONNX exports — are not compatible with the Ollama engine that powers USB-Uncensored-LLM. If you only have a non-GGUF checkpoint, it must be converted and quantized to GGUF using tools such as llama.cpp’s convert-hf-to-gguf.py before it can be used here.

Combining Preset and Custom Models

You can mix preset numbers and the c flag in a single installer run. The installer processes all selections in order and downloads them sequentially. Example — download Gemma 2B, Qwen 9B, and a custom model together:

Your choice: 1,3,c

The installer will:

Download gemma-2-2b-it-abliterated-Q4_K_M.gguf (preset #1)
Download Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf (preset #3)
Prompt you for a custom GGUF URL and local name
Import all three models into Ollama

Space warning behavior: Whenever three or more models are selected (preset + custom combined), the installer automatically triggers a disk space warning before downloading anything:

WARNING: You selected 3 models!
Estimated download: ~11.8 GB
Need at least ~16 GB free on the drive!
Continue? (yes/no):

The warning also compares the estimated download size against the drive’s current free space and flags a potential shortfall if available storage looks insufficient. Entering no cancels the entire run so you can re-launch the installer with a smaller selection.

Already-downloaded models are skipped automatically on subsequent runs. If a model file passes the minimum byte size check, the installer prints Already downloaded! Skipping... and moves on. This makes it safe to re-run the installer to add new models without re-downloading anything that is already present.

Get Started

Platform Guides

Models

Architecture

Reference

Custom GGUF Model Installation – USB-Uncensored-LLM

How to Add a Custom Model

Generated Modelfile

Manual Installation

Finding Models on HuggingFace

URL Validation

Combining Preset and Custom Models

Build docs developers (and LLMs) love

Get Started

Platform Guides

Models

Architecture

Reference

Documentation Index

​How to Add a Custom Model

​Generated Modelfile

​Manual Installation

​Finding Models on HuggingFace

​URL Validation

​Combining Preset and Custom Models

Build docs developers (and LLMs) love

How to Add a Custom Model

Generated Modelfile

Manual Installation

Finding Models on HuggingFace

URL Validation

Combining Preset and Custom Models