Skip to main content
SamplingParams controls how tokens are sampled from the model’s output distribution. It is a Python dataclass and is imported alongside LLM from the top-level package.
from nanovllm import SamplingParams

Fields

temperature
float
default:"1.0"
Scales the logits before sampling. Higher values produce more random output; lower values concentrate probability on the top tokens.Constraint: must be strictly greater than 1e-10. Passing 0.0 or any value <= 1e-10 raises an AssertionError with the message "greedy sampling is not permitted".
max_tokens
int
default:"64"
Maximum number of tokens to generate per request. Generation stops when this limit is reached or when the EOS token is emitted, whichever comes first.
ignore_eos
bool
default:"false"
When True, the EOS token does not stop generation. The sequence continues until max_tokens is exhausted. Useful for throughput benchmarking where controlled-length outputs are needed.
Greedy decoding (temperature=0) is not supported. Use a small positive temperature such as 0.01 if near-deterministic output is required.

Validation

The dataclass runs __post_init__ on construction:
def __post_init__(self):
    assert self.temperature > 1e-10, "greedy sampling is not permitted"

Examples

from nanovllm import SamplingParams

# Near-deterministic output
sp = SamplingParams(temperature=0.01, max_tokens=256)

# Creative / diverse output
sp = SamplingParams(temperature=1.2, max_tokens=512)

# Fixed-length output for benchmarking (ignore EOS)
sp = SamplingParams(temperature=1.0, max_tokens=128, ignore_eos=True)

Per-request sampling params

You can pass a list of SamplingParams to LLM.generate() to apply different settings to each prompt:
from nanovllm import LLM, SamplingParams

llm = LLM("/models/Qwen3-0.6B")

prompts = ["Write a poem.", "Write a bug report."]
sampling_params = [
    SamplingParams(temperature=1.1, max_tokens=200),   # creative
    SamplingParams(temperature=0.3, max_tokens=100),   # precise
]

outputs = llm.generate(prompts, sampling_params)

Build docs developers (and LLMs) love