Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vllm-project/vllm/llms.txt

Use this file to discover all available pages before exploring further.

The PoolingParams class controls how vLLM performs pooling operations for embeddings, classification, and scoring tasks.

Constructor

from vllm import PoolingParams

pooling_params = PoolingParams(
    use_activation=True,
    dimensions=768,
)

Parameters

use_activation
bool | None
default:"None"
Whether to apply activation function to pooler outputs. None uses the model’s default (typically True).
dimensions
int | None
default:"None"
Reduce embedding dimensions if the model supports matryoshka representation. Only valid for embedding tasks.
task
str | None
default:"None"
The pooling task to perform. Should be one of:
  • "embed" - Generate embeddings
  • "classify" - Classification task
  • "score" - Scoring/ranking task
  • "token_embed" - Token-level embeddings
  • "token_classify" - Token-level classification

Task-specific parameters

Different pooling tasks support different parameters:

Embedding tasks (embed, token_embed)

  • use_activation: Whether to apply activation
  • dimensions: Output dimensionality (if model supports matryoshka)

Classification tasks (classify, token_classify)

  • use_activation: Whether to apply activation

Scoring task (score)

  • use_activation: Whether to apply activation

Example: Generate embeddings

from vllm import LLM, PoolingParams

# Initialize embedding model
llm = LLM(
    model="sentence-transformers/all-MiniLM-L6-v2",
    runner="pooling",
)

# Configure pooling
pooling_params = PoolingParams(
    use_activation=True,
)

# Generate embeddings
prompts = [
    "Hello world",
    "How are you?",
]

outputs = llm.embed(prompts, pooling_params=pooling_params)

for output in outputs:
    embedding = output.outputs.embedding
    print(f"Embedding dimension: {len(embedding)}")
    print(f"Embedding: {embedding[:5]}...")  # First 5 values

Example: Matryoshka embeddings

# Reduce embedding dimensions for a matryoshka model
pooling_params = PoolingParams(
    use_activation=True,
    dimensions=256,  # Reduce from default (e.g., 768) to 256
)

outputs = llm.embed(["Sample text"], pooling_params=pooling_params)
embedding = outputs[0].outputs.embedding
assert len(embedding) == 256

Example: Classification

from vllm import LLM, PoolingParams

# Initialize classification model
llm = LLM(
    model="your-classifier-model",
    runner="pooling",
)

pooling_params = PoolingParams(
    use_activation=True,
)

# Classify text
outputs = llm.classify(
    ["This movie is amazing!"],
    pooling_params=pooling_params,
)

for output in outputs:
    probs = output.outputs.probs
    print(f"Classification probabilities: {probs}")

Example: Scoring/Reranking

# Score query-document pairs
llm = LLM(
    model="your-reranker-model",
    runner="pooling",
)

pooling_params = PoolingParams(
    use_activation=True,
)

query = "What is machine learning?"
documents = [
    "Machine learning is a subset of AI",
    "Python is a programming language",
    "Deep learning uses neural networks",
]

# Create query-document pairs
pairs = [f"{query} [SEP] {doc}" for doc in documents]

outputs = llm.score(pairs, pooling_params=pooling_params)

for i, output in enumerate(outputs):
    score = output.outputs.score
    print(f"Document {i} score: {score}")

Valid parameter combinations

The PoolingParams class validates that only task-appropriate parameters are specified:
TaskValid Parameters
embeduse_activation, dimensions
classifyuse_activation
scoreuse_activation
token_embeduse_activation, dimensions
token_classifyuse_activation
Attempting to use invalid parameters for a task will raise a validation error.
  • LLM - Use PoolingParams with llm.embed(), llm.classify(), or llm.score()
  • SamplingParams - Parameters for text generation
  • Output classes - Output formats for pooling tasks

Build docs developers (and LLMs) love