Running a Benchmark
To run benchmarks with simpE, execute the simpe command:- String Reversal
- Integer Addition
- String Rehearsal
Understanding the Console Output
The simpE CLI provides real-time feedback during benchmark execution through an interactive terminal UI.Progress Indicators
While benchmarks are running, you’ll see a live progress display:- Benchmark name: Current test being executed
- Progress: Number of completed tries out of total tries (e.g.,
45/100) - Success rate: Percentage of successful attempts (e.g.,
87.50%) - Status: Current activity and elapsed time
Real-Time Feedback
The console class provides several types of feedback:Thinking Time
When the model is processing a prompt, you’ll see:Completion Status
When a single test completes:Benchmark Summary
After each benchmark completes, you’ll see a summary:Logs Directory
All benchmark runs are logged to thelogs/ directory:
Log Files
log_YYYY-MM-DD_HH-MM-SS.txt: Timestamped log file for each runlog_recent.txt: Always contains the most recent run (cleared on each new run)
- Timestamp of the event
- Test start/completion messages
- Success/failure status with expected vs. actual results
- Reasoning traces (when using reasoning models)
- API errors and warnings
Log Entry Format
Results Directory
Benchmark results are saved as JSON files in theresults/ directory:
Result Files
Files are named:result_<model-name>_YYYY-MM-DD_HH-MM-SS.json
Example: result_qwen2.5-32b-instruct_2026-03-03_14-30-45.json
Result File Structure
Each result file contains:Result Fields
For each test attempt, the following information is recorded:- Test inputs: The generated test data (string, integers, etc.)
- duration_seconds: Time taken to complete the test
- reasoning: Reasoning trace (only present when using reasoning models)
- response: The model’s output
- model: Model identifier returned by the API
- status: Either
"success"or"fail"
If the benchmark fails to write the results file due to an error, the complete JSON output will be printed to the console as a fallback.