Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Rubric class is the foundation for evaluating LLM responses in Verifiers environments. It manages reward functions and their weights, supports both individual and group-level scoring, and integrates with parsers to extract answers from completions.

Constructor

Rubric(
    funcs: list[RewardFunc | GroupRewardFunc] | None = None,
    weights: list[float] | None = None,
    parser: vf.Parser | None = None,
)
funcs
list[RewardFunc | GroupRewardFunc] | None
default:"None"
List of reward functions to evaluate. Can be individual-level (RewardFunc) or group-level (GroupRewardFunc) functions.
weights
list[float] | None
default:"None"
Weights for each reward function. Must match the length of funcs. Defaults to 1.0 for each function if not provided.
parser
vf.Parser | None
default:"None"
Parser instance for extracting answers from completions. Defaults to vf.Parser() if not provided.

Reward Function Signatures

Individual-level RewardFunc

Reward functions that score single rollouts can accept any combination of:
  • prompt: list[dict[str, str]] | str - The input prompt
  • completion: list[dict[str, str]] | str - The model’s completion
  • answer: Any - Ground truth or metadata for scoring
  • task: str - Task type identifier
  • state: State - Full state dictionary
  • info: dict - Additional metadata
  • **kwargs - Additional keyword arguments
Returns: float

Group-level GroupRewardFunc

Reward functions that score multiple rollouts together accept plural parameters:
  • prompts: list[...] - List of prompts
  • completions: list[...] - List of completions
  • answers: list[...] - List of answers
  • tasks: list[str] - List of task types
  • states: list[State] - List of states
  • infos: list[dict] - List of metadata
Returns: list[float]

Methods

add_reward_func

def add_reward_func(self, func: RewardFunc, weight: float = 1.0)
Add a reward function that contributes to the total reward.
func
RewardFunc
The reward function to add.
weight
float
default:"1.0"
Weight for this function in the total reward calculation.

add_metric

def add_metric(self, func: RewardFunc, weight: float = 0.0)
Add a metric function that is tracked but doesn’t contribute to reward (weight = 0).
func
RewardFunc
The metric function to add.
weight
float
default:"0.0"
Weight for this function (typically 0 for metrics).

add_class_object

def add_class_object(self, name: str, obj: Any)
Register a class object that will be passed to reward functions as a keyword argument.
name
str
The parameter name that reward functions can use to access this object.
obj
Any
The object to make available to reward functions.

score_rollout

async def score_rollout(self, state: State)
Evaluate all individual-level reward functions for a single rollout. Updates state["reward"] and state["metrics"] in place.
state
State
The state dictionary to score. Must contain prompt, completion, and other required fields.
This method requires at least one individual-level reward function and no group-level functions.

score_group

async def score_group(self, states: list[State])
Score multiple rollouts together. Executes all reward functions (both individual and group-level) and updates each state’s reward, advantage, and metrics fields.
states
list[State]
List of state dictionaries to score together.
Group-level functions see all states at once and can implement comparative scoring strategies.

Attributes

funcs
list[RewardFunc | GroupRewardFunc]
List of registered reward functions.
weights
list[float]
Weights corresponding to each function.
parser
vf.Parser
Parser instance for extracting answers.
class_objects
dict[str, Any]
Dictionary of objects available to reward functions, including the parser.

Example Usage

import verifiers as vf

# Define custom reward functions
def length_reward(completion, **kwargs):
    """Reward longer responses."""
    text = completion if isinstance(completion, str) else completion[-1]["content"]
    return min(len(text) / 1000, 1.0)

def correctness_reward(completion, answer, parser, **kwargs):
    """Check if parsed answer matches expected."""
    parsed = parser.parse_answer(completion)
    return 1.0 if parsed == answer else 0.0

# Create rubric with weighted functions
rubric = vf.Rubric(
    funcs=[correctness_reward, length_reward],
    weights=[1.0, 0.1],  # Correctness weighted 10x more than length
    parser=vf.Parser()
)

# Add a metric that doesn't affect reward
rubric.add_metric(lambda completion, **kw: len(completion), weight=0.0)

# Score a state
state = {
    "prompt": "What is 2+2?",
    "completion": [{"role": "assistant", "content": "4"}],
    "answer": "4",
    "task": "math",
    "timing": {"scoring_ms": 0, "total_ms": 0}
}

await rubric.score_rollout(state)
print(f"Reward: {state['reward']}")  # 1.0 * 1.0 + 0.001 * 0.1 = 1.0001
print(f"Metrics: {state['metrics']}")  # Individual scores

Group Scoring Example

def relative_quality(completions, **kwargs):
    """Group function: reward top 50% of responses."""
    lengths = [len(c[-1]["content"]) for c in completions]
    median = sorted(lengths)[len(lengths) // 2]
    return [1.0 if l >= median else 0.0 for l in lengths]

rubric = vf.Rubric(
    funcs=[relative_quality],
    weights=[1.0]
)

# Score multiple states together
states = [create_state(i) for i in range(10)]
await rubric.score_group(states)

# Each state now has reward, advantage, and metrics
for state in states:
    print(f"Reward: {state['reward']}, Advantage: {state['advantage']}")

See Also

Build docs developers (and LLMs) love