Documentation Index
Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt
Use this file to discover all available pages before exploring further.
This example demonstrates a multi-turn game environment where models play Wordle by making guesses and receiving feedback. It showcases parsing structured output, custom reward functions, and integration with TextArena.
Overview
The Wordle environment provides:
- Game: 5-letter word guessing with color-coded feedback
- Format: Multi-turn interaction (up to 6 guesses)
- Parsing: XML tags for structured guess extraction
- Rewards: Correctness, efficiency bonus, and partial credit
- Integration: TextArena game library
Complete Implementation
Here’s the full working implementation from environments/wordle/wordle.py:
import re
import verifiers as vf
from verifiers.envs.integrations.textarena_env import TextArenaEnv
DEFAULT_SYSTEM_PROMPT = """You are a competitive game player. \
Make sure you read the game instructions carefully, and always follow the required format.
In each turn, think step-by-step, then give your guess inside <guess>...</guess> tags."""
### feedback functions
def wordle_feedback_fn(observation: str) -> str:
latest_observation = observation.split("[GAME]")[-1].strip()
if "Feedback:" in latest_observation:
return latest_observation.split("Feedback:")[-1]
else:
return latest_observation
### reward functions
def correct_answer(parser, completion, answer, **kwargs) -> float:
"""Whether the guess is *exactly* correct."""
guess = parser.parse_answer(completion)
return 1.0 if guess == "[" + answer + "]" else 0.0
def length_bonus(parser, completion, answer, **kwargs) -> float:
"""Bonus for shorter correct solutions."""
assistant_messages = parser.get_assistant_messages(completion)
guesses = [
x for x in assistant_messages if re.search(r"<guess>.*</guess>", x["content"])
]
is_correct = correct_answer(parser, completion, answer, **kwargs)
return is_correct / (len(guesses) or 1)
def partial_answer(parser, completion, answer, **kwargs) -> float:
"""Partial credit for the latest guess."""
if correct_answer(parser, completion, answer, **kwargs):
return 0.0
user_messages = parser.get_user_messages(completion)
for user_message in user_messages[::-1]:
feedback = user_message["content"].strip()
feedback_parts = feedback.split("\n")
if len(feedback_parts) == 3:
_, scoring, _ = feedback_parts
scoring = scoring.strip()
num_greens = scoring.count("G")
num_yellows = scoring.count("Y")
return 0.2 * num_greens + 0.1 * num_yellows
return 0.0
### environment loader
def load_environment(
num_train_examples: int = 2000,
num_eval_examples: int = 20,
system_prompt: str = DEFAULT_SYSTEM_PROMPT,
seed: int = 0,
**kwargs,
):
parser = vf.XMLParser(fields=["guess"], answer_field="guess")
rubric = vf.Rubric(parser=parser)
rubric.add_reward_func(correct_answer)
rubric.add_reward_func(partial_answer)
rubric.add_reward_func(length_bonus)
format_reward = parser.get_format_reward_func()
format_reward.__name__ = "format_reward"
rubric.add_reward_func(format_reward, weight=0.2)
return TextArenaEnv(
game="Wordle-v0",
num_train_examples=num_train_examples,
num_eval_examples=num_eval_examples,
feedback_fn=wordle_feedback_fn,
seed=seed,
system_prompt=system_prompt,
parser=parser,
rubric=rubric,
**kwargs,
)
How It Works
1. TextArena Integration
TextArenaEnv wraps TextArena games for RL training:
return TextArenaEnv(
game="Wordle-v0",
num_train_examples=2000,
num_eval_examples=20,
feedback_fn=wordle_feedback_fn,
)
TextArena provides text-based game environments. The Wordle game:
- Generates random 5-letter target words
- Accepts guesses and returns color-coded feedback
- Tracks game state (remaining guesses, history)
2. Structured Output Parsing
XMLParser extracts guesses from model responses:
parser = vf.XMLParser(fields=["guess"], answer_field="guess")
Example parsing:
model_output = "I'll try the word CRANE. <guess>CRANE</guess>"
parser.parse_answer(model_output) # Returns: "[CRANE]"
3. Feedback Processing
The wordle_feedback_fn extracts game feedback:
def wordle_feedback_fn(observation: str) -> str:
latest_observation = observation.split("[GAME]")[-1].strip()
if "Feedback:" in latest_observation:
return latest_observation.split("Feedback:")[-1]
else:
return latest_observation
Example feedback:
Guess: CRANE
Feedback: [Y][G][_][_][Y]
Remaining guesses: 5
Legend:
G = Green (correct letter, correct position)
Y = Yellow (correct letter, wrong position)
_ = Gray (letter not in word)
4. Multi-Part Reward Function
Three reward components encourage different behaviors:
Correctness
Efficiency Bonus
Partial Credit
Format Compliance
def correct_answer(parser, completion, answer, **kwargs) -> float:
"""Whether the guess is *exactly* correct."""
guess = parser.parse_answer(completion)
return 1.0 if guess == "[" + answer + "]" else 0.0
Returns: 1.0 for correct word, 0.0 otherwisedef length_bonus(parser, completion, answer, **kwargs) -> float:
"""Bonus for shorter correct solutions."""
assistant_messages = parser.get_assistant_messages(completion)
guesses = [
x for x in assistant_messages
if re.search(r"<guess>.*</guess>", x["content"])
]
is_correct = correct_answer(parser, completion, answer, **kwargs)
return is_correct / (len(guesses) or 1)
Returns:
- 1st guess: 1.0 bonus
- 2nd guess: 0.5 bonus
- 3rd guess: 0.33 bonus
- Wrong: 0.0
def partial_answer(parser, completion, answer, **kwargs) -> float:
"""Partial credit for the latest guess."""
if correct_answer(parser, completion, answer, **kwargs):
return 0.0 # Already got full credit
# Find latest feedback
user_messages = parser.get_user_messages(completion)
for user_message in user_messages[::-1]:
feedback = user_message["content"].strip()
feedback_parts = feedback.split("\n")
if len(feedback_parts) == 3:
_, scoring, _ = feedback_parts
scoring = scoring.strip()
num_greens = scoring.count("G")
num_yellows = scoring.count("Y")
return 0.2 * num_greens + 0.1 * num_yellows
return 0.0
Returns:
- 0.2 per green letter
- 0.1 per yellow letter
- Max: 1.0 (all green = correct word)
format_reward = parser.get_format_reward_func()
rubric.add_reward_func(format_reward, weight=0.2)
Returns: 1.0 if output contains <guess>...</guess>, 0.0 otherwiseWeighted at 0.2 to encourage proper formatting without dominating the reward.
Combined Reward
rubric = vf.Rubric(parser=parser)
rubric.add_reward_func(correct_answer, weight=1.0) # Main reward
rubric.add_reward_func(partial_answer, weight=1.0) # Progress signal
rubric.add_reward_func(length_bonus, weight=1.0) # Efficiency
rubric.add_reward_func(format_reward, weight=0.2) # Format compliance
Total reward = weighted sum of all components
Example Game Trajectory
Successful Game
Failed Game
Format Error
Target word: STEAK
Turn 1Assistant: I’ll start with a common word to test vowels and common consonants.
CRANEEnvironment:Guess: CRANE
Feedback: [_][_][Y][_][Y]
Remaining guesses: 5
Turn 2Assistant: A is in position 3 (wrong spot), E is in position 5 (wrong spot). Let me try:
STEALEnvironment:Guess: STEAL
Feedback: [G][G][G][G][_]
Remaining guesses: 4
Turn 3Assistant: Almost there! The first four letters are S-T-E-A. Last letter isn’t L. Let me try:
STEAKEnvironment:Guess: STEAK
Feedback: [G][G][G][G][G]
You won! The word was STEAK.
Rewards:
correct_answer: 1.0
length_bonus: 1.0 / 3 = 0.33
partial_answer: 0.0 (already correct)
format_reward: 1.0
- Total: 2.53
Target word: QUIRK
Turn 1-6: Model makes guesses but never finds QUIRKTurn 7Environment:Game over! You've used all 6 guesses.
The word was QUIRK.
Rewards (from last guess with feedback [_][Y][_][G][_]):
correct_answer: 0.0
length_bonus: 0.0 (not correct)
partial_answer: 0.2 * 1 + 0.1 * 1 = 0.3
format_reward: 1.0
- Total: 0.5
Turn 1Assistant: I’ll try HOUSEEnvironment: (No valid guess extracted)Rewards:
correct_answer: 0.0
length_bonus: 0.0
partial_answer: 0.0
format_reward: 0.0 (missing guess tags)
- Total: 0.0
The model failed to use the required format and received no reward.
Running the Environment
Installation
# Install with TextArena integration
prime env install wordle
Quick Evaluation
# Evaluate on 20 games
prime eval run wordle \
-m openai/gpt-4.1-mini \
-b https://api.openai.com/v1 \
-k OPENAI_API_KEY \
-n 20 \
-r 5
Training Dataset
# Generate training data with 1000 games
prime eval run wordle \
-m openai/gpt-4.1-mini \
-a '{"num_train_examples": 1000, "num_eval_examples": 50}' \
--split train \
-n 1000 \
-r 8
Configuration Options
| Parameter | Default | Description |
|---|
num_train_examples | 2000 | Number of training games |
num_eval_examples | 20 | Number of evaluation games |
system_prompt | DEFAULT_SYSTEM_PROMPT | Instructions for the model |
seed | 0 | Random seed for word generation |
Key Features
Structured Output with XMLParser
XMLParser provides:
- Extraction: Pulls content from XML tags
- Validation: Checks format compliance
- Format rewards: Built-in reward function for proper formatting
parser = vf.XMLParser(
fields=["guess"], # Fields to extract
answer_field="guess" # Which field is the answer
)
Multi-Component Rewards
Combining multiple reward signals:
- Sparse signal (
correct_answer): Only 1.0 when winning
- Dense signal (
partial_answer): Credit for progress
- Efficiency (
length_bonus): Reward faster solutions
- Compliance (
format_reward): Enforce output format
This creates a rich learning signal for RL training.
Game State Tracking
TextArenaEnv automatically tracks:
- Number of guesses made
- Guess history
- Remaining attempts
- Win/loss status
Metrics Tracked
correct_answer: 1.0 if word guessed correctly
length_bonus: Efficiency bonus (0.0 to 1.0)
partial_answer: Progress score (0.0 to 1.0)
format_reward: Format compliance (0.0 or 1.0)
reward: Combined weighted sum
num_turns: Number of guesses made
Advanced Usage
Custom Reward Weights
Adjust the importance of different reward components:
rubric = vf.Rubric(parser=parser)
# Emphasize correctness over efficiency
rubric.add_reward_func(correct_answer, weight=5.0) # 5x weight
rubric.add_reward_func(partial_answer, weight=2.0) # 2x weight
rubric.add_reward_func(length_bonus, weight=0.5) # 0.5x weight
rubric.add_reward_func(format_reward, weight=0.2) # 0.2x weight
Different Wordle Variants
TextArena supports multiple Wordle variants:
from verifiers.envs.integrations.textarena_env import TextArenaEnv
# 6-letter Wordle
env_6 = TextArenaEnv(game="Wordle-6-v0", ...)
# 7-letter Wordle
env_7 = TextArenaEnv(game="Wordle-7-v0", ...)
Other TextArena Games
The same pattern works for other TextArena games:
# Hangman
env = TextArenaEnv(
game="Hangman-v0",
parser=vf.XMLParser(fields=["guess"]),
...
)
# 20 Questions
env = TextArenaEnv(
game="TwentyQuestions-v0",
parser=vf.XMLParser(fields=["question"]),
...
)
Next Steps