In addition to the six curated desktop presets, the USB-Uncensored-LLM installer supports downloading anyDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt
Use this file to discover all available pages before exploring further.
.gguf model file directly from HuggingFace and registering it in the Ollama engine automatically. This is useful when you want to test a specific quantization level not offered in the catalog, explore a different model family entirely, or use a fine-tune that has not yet made it into the preset list. The custom model flow is fully integrated into the same interactive installer used for preset models — no separate tooling or command-line knowledge required.
How to Add a Custom Model
Run the installer for your platform
Navigate to the folder for your current operating system and launch the installer:
- Windows: Double-click
Windows/install.bat - macOS: Open Terminal, drag in
Mac/install.command, and press Enter - Linux: Run
bash Linux/install.shfrom the project root
Enter C at the model selection prompt
At the or mixed with a preset:
Your choice: prompt, type c (or C) to trigger the custom model flow. You can also combine it with preset numbers in a comma-separated list — for example, entering 1,c downloads Gemma 2 2B alongside your custom model in a single run.Paste a direct .gguf URL from HuggingFace
The installer will prompt you for a direct download URL. The URL must resolve to a raw Example:Paste the full URL and press Enter. The installer extracts the filename automatically from the last path segment.
.gguf binary — use the resolve/main/ path format, not the HuggingFace model page URL.URL format:Give the model a short local name
You will be asked for a short identifier used as the Ollama model name and Modelfile filename:Enter something descriptive and lowercase, such as
llama3-8b-local. The installer automatically:- Converts to lowercase
- Replaces spaces with dashes
- Appends
-localif the name does not already end with it
custom-local.Enter a system prompt (or press Enter for the default)
Customize the model’s persona and behavior with a system prompt:If you press Enter without typing anything, the installer uses:You can paste any system prompt here, including multi-sentence instructions. The full text is written verbatim into the generated Modelfile.
The model downloads and is imported into Ollama
The installer downloads the Once complete, the model is immediately available the next time you launch the chat UI with
.gguf file to Shared/models/<filename>.gguf, generates a Modelfile at Shared/models/Modelfile-<local-name>, and registers the model with the Ollama engine using:start-fast-chat.bat (Windows) or start.sh (macOS/Linux).Generated Modelfile
After the installer completes Step 5 of its setup process, it creates a Modelfile at:llama3-8b-local with the default system prompt, the file contents look like this:
FROM directive uses a relative path so the Modelfile works regardless of where the USB drive is mounted or which operating system is running. temperature 0.7 and top_p 0.9 are applied to all models — preset and custom alike — as sensible defaults for interactive chat.
Manual Installation
If you are on a machine without a working installer, or you want to add a model that you have already downloaded separately, you can do the whole process by hand. 1. Place the GGUF file in the models directory:Shared/models/Modelfile-mymodel-local:
ollama-linux with ollama-darwin. On Windows, use ollama-windows.exe inside a PowerShell or Command Prompt session with $env:OLLAMA_MODELS set accordingly.
Finding Models on HuggingFace
HuggingFace hosts thousands of GGUF-quantized models. To browse effectively:- Go to huggingface.co/models
- In the Library filter on the left sidebar, select GGUF
- Browse by task, language, or search by model family name
- bartowski — High-quality Q4_K_M and Q5_K_M quantizations of popular open-source models, including most of the models in the USB-Uncensored-LLM preset catalog
- TheBloke — One of the original large-scale GGUF quantization repositories; slightly older releases but extremely broad model coverage
Q4_K_M — 4-bit K-quant medium offers the best tradeoff between model quality and file size for portable storage. Q5_K_M is higher quality at the cost of roughly 25% more storage. Q2_K is the smallest option but with a noticeable quality drop.
URL Validation
The installer performs two checks on the URL you provide:-
.ggufextension check: If the URL does not contain.gguf, the installer displays a warning and asks whether to proceed:Enteringyesoryallows the download to proceed. Enteringnoskips the custom model entirely. - Minimum file size check: After download, the installer verifies the file is larger than 100 MB. Files smaller than this threshold are considered incomplete or invalid (for example, an HTML error page returned by a bad URL), and the model is flagged as a download error. You can re-run the installer to retry.
Combining Preset and Custom Models
You can mix preset numbers and thec flag in a single installer run. The installer processes all selections in order and downloads them sequentially.
Example — download Gemma 2B, Qwen 9B, and a custom model together:
- Download
gemma-2-2b-it-abliterated-Q4_K_M.gguf(preset #1) - Download
Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf(preset #3) - Prompt you for a custom GGUF URL and local name
- Import all three models into Ollama
no cancels the entire run so you can re-launch the installer with a smaller selection.
Already-downloaded models are skipped automatically on subsequent runs. If a model file passes the minimum byte size check, the installer prints
Already downloaded! Skipping... and moves on. This makes it safe to re-run the installer to add new models without re-downloading anything that is already present.