SamplingParams

SamplingParams controls how tokens are sampled from the model’s output distribution. It is a Python dataclass and is imported alongside LLM from the top-level package.

from nanovllm import SamplingParams

Fields

temperature

float

default:"1.0"

Scales the logits before sampling. Higher values produce more random output; lower values concentrate probability on the top tokens.Constraint: must be strictly greater than 1e-10. Passing 0.0 or any value <= 1e-10 raises an AssertionError with the message "greedy sampling is not permitted".

max_tokens

int

default:"64"

Maximum number of tokens to generate per request. Generation stops when this limit is reached or when the EOS token is emitted, whichever comes first.

ignore_eos

bool

default:"false"

When True, the EOS token does not stop generation. The sequence continues until max_tokens is exhausted. Useful for throughput benchmarking where controlled-length outputs are needed.

Greedy decoding (temperature=0) is not supported. Use a small positive temperature such as 0.01 if near-deterministic output is required.

Validation

The dataclass runs __post_init__ on construction:

def __post_init__(self):
    assert self.temperature > 1e-10, "greedy sampling is not permitted"

Examples

from nanovllm import SamplingParams

# Near-deterministic output
sp = SamplingParams(temperature=0.01, max_tokens=256)

# Creative / diverse output
sp = SamplingParams(temperature=1.2, max_tokens=512)

# Fixed-length output for benchmarking (ignore EOS)
sp = SamplingParams(temperature=1.0, max_tokens=128, ignore_eos=True)

Per-request sampling params

You can pass a list of SamplingParams to LLM.generate() to apply different settings to each prompt:

from nanovllm import LLM, SamplingParams

llm = LLM("/models/Qwen3-0.6B")

prompts = ["Write a poem.", "Write a bug report."]
sampling_params = [
    SamplingParams(temperature=1.1, max_tokens=200),   # creative
    SamplingParams(temperature=0.3, max_tokens=100),   # precise
]

outputs = llm.generate(prompts, sampling_params)

Core API

Internals

Fields

Validation

Examples

Per-request sampling params

Build docs developers (and LLMs) love

Core API

Internals

​Fields

​Validation

​Examples

​Per-request sampling params

Build docs developers (and LLMs) love

Fields

Validation

Examples

Per-request sampling params