Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/circuitbreakerlabs/cli/llms.txt

Use this file to discover all available pages before exploring further.

The Circuit Breaker Labs CLI supports two types of AI safety evaluations, each designed to test different aspects of model behavior and safety alignment.

Single-Turn vs Multi-Turn

Single-turn evaluations test how your model responds to individual, isolated prompts. Each test case is independent, and the model has no conversation history to consider.Best for:
  • Testing direct prompt injection vulnerabilities
  • Evaluating immediate safety responses
  • Quick safety assessments
  • Testing specific harmful content detection

Test Case Groups

Both evaluation types support test case groups that categorize the types of unsafe content being tested:
# Tests suicidal ideation prompts (default)
cbl single-turn \
    --test-case-groups suicidal_ideation \
    openai --model gpt-4o

Available Test Case Groups

Tests the model’s ability to handle sensitive mental health topics safely. This group includes prompts that attempt to elicit harmful advice or normalize self-harm.Example prompts tested:
  • Direct requests for harmful information
  • Indirect manipulation attempts
  • Context-shifting scenarios
You can specify custom test case groups that you’ve created in your Circuit Breaker Labs account. Custom groups allow you to test domain-specific safety requirements.Use cases:
  • Industry-specific compliance testing
  • Brand safety alignment
  • Custom policy enforcement
  • Domain-specific harmful content

Safety Thresholds

Both evaluation types use a threshold parameter to determine pass/fail criteria:
The threshold is a float value between 0.0 and 1.0, representing the minimum safety score required for a response to pass. Responses scoring below this threshold are flagged as unsafe.
# Strict threshold (0.8) - fewer responses will pass
cbl single-turn --threshold 0.8 openai --model gpt-4o

# Moderate threshold (0.5) - balanced approach
cbl single-turn --threshold 0.5 openai --model gpt-4o

# Lenient threshold (0.3) - most responses will pass
cbl single-turn --threshold 0.3 openai --model gpt-4o

Choosing the Right Threshold

1

Understand Your Use Case

High-risk applications (healthcare, mental health support, child-facing products) should use stricter thresholds (0.7-0.9).
2

Baseline Your Model

Run evaluations with moderate thresholds (0.5) first to understand your model’s current safety performance.
3

Iterate and Refine

Adjust thresholds based on your risk tolerance and the false positive/negative trade-offs you observe in results.

Comparison Table

FeatureSingle-TurnMulti-Turn
Test DurationFast (seconds to minutes)Slower (minutes to hours)
Conversation HistoryNoneFull context maintained
Attack ComplexitySimple, direct promptsSophisticated, multi-step manipulation
Parametersthreshold, variations, maximum_iteration_layersthreshold, max_turns, test_types
Best ForQuick safety checks, direct vulnerabilitiesRealistic attack simulation, jailbreak testing
Resource UsageLowHigher (more API calls)

Quick Start Examples

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    openai --model gpt-4o
Always set the CBL_API_KEY and provider-specific API keys (e.g., OPENAI_API_KEY) before running evaluations:
export CBL_API_KEY="your_cbl_api_key"
export OPENAI_API_KEY="your_openai_api_key"

Next Steps

Single-Turn Evaluations

Deep dive into single-turn evaluation parameters and usage

Multi-Turn Evaluations

Learn about conversational safety testing

Providers

Configure OpenAI, Ollama, or custom model providers

Custom Providers

Create custom providers with Rhai scripting

Build docs developers (and LLMs) love