Requirements
- Python
>=3.10, <3.13 - CUDA-capable GPU (NVIDIA)
- The following Python packages (installed automatically):
| Package | Minimum Version |
|---|---|
torch | >=2.4.0 |
triton | >=3.0.0 |
transformers | >=4.51.0 |
flash-attn | latest |
xxhash | latest |
Install Nano-vLLM
Install from GitHub
Install the latest version of Nano-vLLM directly from the source repository:This will also install all required dependencies listed above.
Download a model
Nano-vLLM loads models from a local directory. Use Replace
huggingface-cli to download model weights:Qwen/Qwen3-0.6B and the --local-dir path with the model and location of your choice. Any model supported by transformers and flash-attn should work.