Installation

Requirements

Python >=3.10, <3.13
CUDA-capable GPU (NVIDIA)
The following Python packages (installed automatically):

Package	Minimum Version
`torch`	`>=2.4.0`
`triton`	`>=3.0.0`
`transformers`	`>=4.51.0`
`flash-attn`	latest
`xxhash`	latest

flash-attn is compiled against your local CUDA toolkit during installation and can take several minutes to build. Ensure your CUDA version is compatible with your PyTorch installation before proceeding.

Install Nano-vLLM

Install from GitHub

Install the latest version of Nano-vLLM directly from the source repository:

pip install git+https://github.com/GeeeekExplorer/nano-vllm.git

This will also install all required dependencies listed above.

Download a model

Nano-vLLM loads models from a local directory. Use huggingface-cli to download model weights:

huggingface-cli download --resume-download Qwen/Qwen3-0.6B \
  --local-dir ~/huggingface/Qwen3-0.6B/ \
  --local-dir-use-symlinks False

Replace Qwen/Qwen3-0.6B and the --local-dir path with the model and location of your choice. Any model supported by transformers and flash-attn should work.

If huggingface-cli is not installed, run pip install huggingface_hub first.

Verify the installation

Confirm that Nano-vLLM imported successfully:

from nanovllm import LLM, SamplingParams
print("Nano-vLLM installed successfully")

Next Steps

Once installation is complete, follow the Quickstart guide to run your first inference.

Introduction

Quickstart

Auto-generate your docs

Requirements
Install Nano-vLLM
Next Steps

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Guides

Architecture

Requirements

Install Nano-vLLM

Next Steps

Build docs developers (and LLMs) love

Get Started

Guides

Architecture

​Requirements

​Install Nano-vLLM

​Next Steps

Build docs developers (and LLMs) love

Requirements

Install Nano-vLLM

Next Steps