Documentation Index Fetch the complete documentation index at: https://mintlify.com/BoundaryML/baml/llms.txt
Use this file to discover all available pages before exploring further.
Testing BAML Functions
BAML provides powerful testing capabilities built into the language and VSCode extension. Test your functions iteratively to refine prompts and validate outputs.
Test Syntax
Tests are defined directly in BAML files using the test block:
enum Category {
Refund
CancelOrder
TechnicalSupport
AccountIssue
Question
}
function ClassifyMessage ( input : string ) -> Category {
client GPT4o
prompt # "
Classify the following message:
{{ ctx.output_format }}
Message: {{ input }}
" #
}
test Test1 {
functions [ ClassifyMessage ]
args {
input "Can't access my account using my login credentials. Haven't received the password reset email."
}
}
Test Anatomy
Name : test Test1 - Unique identifier
Functions : Which function(s) to test
Args : Input arguments matching the function signature
Running Tests
VSCode Playground
The BAML VSCode extension provides an interactive playground:
Open any .baml file with a test
Click “Run Test” above the test block
View results in the playground panel
See the rendered prompt and API request
Command Line
Run tests from your terminal:
# Run all tests
baml-cli test
# Run tests for a specific function
baml-cli test -i "ClassifyMessage::"
# Run in parallel with custom concurrency
baml-cli test --parallel 5
# List available tests without running
baml-cli test --list
See the CLI Test Reference for all options.
Test Arguments
Simple Arguments
For primitive types:
function Greet ( name : string , age : int ) -> string {
client GPT4o
prompt # "Hello {{ name }}, you are {{ age }} years old."
}
test GreetTest {
functions [ Greet ]
args {
name "Alice"
age 30
}
}
Complex Objects
For classes, use dictionary syntax:
class Message {
user string
content string
}
function Process ( msg : Message ) -> string {
client GPT4o
prompt # "{{ msg.user }}: {{ msg.content }}"
}
test ProcessTest {
functions [ Process ]
args {
msg {
user "Alice"
content "Hello there!"
}
}
}
Arrays
Test with arrays of values:
function Summarize ( messages : Message []) -> string {
client GPT4o
prompt # "
{% for msg in messages %}
{{ msg.user }}: {{ msg.content }}
{% endfor %}
" #
}
test SummarizeTest {
functions [ Summarize ]
args {
messages [
{
user "Alice"
content "Hi there!"
}
{
user "Bob"
content "Hello Alice!"
}
]
}
}
Multi-line Strings
Use #"..."# for multi-line string arguments:
test ExtractTest {
functions [ ExtractResume ]
args {
resume_text # "
John Doe
Education:
- University of California, Berkeley
B.S. Computer Science, 2020
Skills:
- Python
- Java
- C++
" #
}
}
BAML supports testing with images, audio, PDFs, and video:
function DescribeImage ( img : image ) -> string {
client GPT4o
prompt # "
Describe this image: {{ img }}
" #
}
test ImageTest {
functions [ DescribeImage ]
args {
img {
file "../images/test-photo.png"
}
}
}
Image files must be somewhere in baml_src/. Relative paths are from the current BAML file.
test ImageTest {
functions [ DescribeImage ]
args {
img {
url "https://example.com/photo.jpg"
}
}
}
test ImageTest {
functions [ DescribeImage ]
args {
img {
base64 "iVBORw0KGgoAAAANS..."
media_type "image/png"
}
}
}
function TranscribeAudio ( audio : audio ) -> string {
client GPT4o
prompt # "Transcribe: {{ audio }}"
}
test AudioTest {
functions [ TranscribeAudio ]
args {
audio {
file "../audio/sample.mp3"
}
}
}
function SummarizePDF ( doc : pdf ) -> string {
client GPT4o
prompt # "Summarize: {{ doc }}"
}
test PDFTest {
functions [ SummarizePDF ]
args {
doc {
file "../documents/report.pdf"
}
}
}
function DescribeVideo ( video : video ) -> string {
client GPT4o
prompt # "Describe: {{ video }}"
}
test VideoTest {
functions [ DescribeVideo ]
args {
video {
url "https://example.com/clip.mp4"
}
}
}
Assertions and Checks
Validate test outputs using @@assert and @@check:
Assertions
Hard requirements that must pass:
test ClassifyTest {
functions [ ClassifyMessage ]
args {
input "I want a refund for my purchase"
}
// Assert the result equals a specific value
@@ assert ({{ this == "Refund" }})
// Assert latency is under 1 second
@@ assert ({{ _ . latency_ms < 1000 }})
}
Variables available in assertions:
this - The function result
_.result - Same as this
_.latency_ms - Time taken in milliseconds
_.checks.$NAME - Results of earlier checks
Checks
Soft validations that can fail without stopping the test:
test ExtractTest {
functions [ ExtractResume ]
args {
resume_text "..."
}
// Named checks for later reference
@@ check ( has_name , {{ this . name | length > 0 }})
@@ check ( has_skills , {{ this . skills | length > 0 }})
// Assert all checks passed
@@ assert ({{ _ . checks . has_name and _ . checks . has_skills }})
}
Complex Validations
test EmailTest {
functions [ ExtractEmails ]
args {
text "Contact us at support@example.com or sales@example.com"
}
// Check result is an array with 2 elements
@@ check ( correct_count , {{ this | length == 2 }})
// Check all emails match regex pattern
@@ check ( valid_format , {{
this | map ( attribute = 'match' , args = [ '^[ \\ w.-]+@[ \\ w.-]+ \\ .[a-z]{2,}$' ]) | all
}})
// Assert both checks passed
@@ assert ({{ _ . checks . correct_count and _ . checks . valid_format }})
}
See the Testing guide above for complete documentation on assertions and checks.
Dynamic Types in Tests
Modify dynamic types for specific tests:
enum Category {
Technology
Business
@@ dynamic
}
function Classify ( text : string ) -> Category {
client GPT4o
prompt # "
Classify: {{ text }}
{{ ctx.output_format }}
" #
}
test CustomCategoryTest {
functions [ Classify ]
// Add test-specific enum values
type_builder {
dynamic Category {
Science
Health
Entertainment
}
}
args {
text "Latest breakthrough in quantum computing"
}
@@ assert ({{ this == "Science" }})
}
See Dynamic Types for details.
Testing Multiple Clients
Test the same function with different models:
function Extract ( text : string ) -> Data {
client GPT4o
prompt # "..."
}
test TestWithGPT {
functions [ Extract ]
args { text "Sample" }
}
test TestWithClaude {
functions [ Extract ]
override {
client "anthropic/claude-sonnet-4"
}
args { text "Sample" }
}
test TestWithGemini {
functions [ Extract ]
override {
client "google-ai/gemini-2.0-flash"
}
args { text "Sample" }
}
Compare results across models to find the best fit.
Test Organization
Organize tests for maintainability:
// classification_tests.baml
test RefundCase {
functions [ ClassifyMessage ]
args { input "I want my money back" }
@@ assert ({{ this == "Refund" }})
}
test TechSupportCase {
functions [ ClassifyMessage ]
args { input "My app keeps crashing" }
@@ assert ({{ this == "TechnicalSupport" }})
}
test AccountIssueCase {
functions [ ClassifyMessage ]
args { input "Can't log into my account" }
@@ assert ({{ this == "AccountIssue" }})
}
Use Descriptive Names
test ExtractResume_WithEducation_ReturnsStructuredData
test ExtractResume_MissingEmail_ReturnsNull
test ExtractResume_MultipleJobs_ParsesAll
Production Builds
Exclude tests from production builds to reduce bundle size:
baml-cli generate --no-tests
This strips test blocks from the generated baml_client while keeping all functions intact.
Best Practices
Test edge cases : Empty inputs, missing fields, unusual formatting
Use assertions : Validate outputs programmatically
Test with real data : Use actual examples from your domain
Compare models : Test the same input with different LLMs
Keep tests updated : Update tests when you change prompts
Use descriptive names : Make it clear what each test validates
Test multimodal inputs : Verify image/audio/video handling
Run tests frequently : Catch regressions early
Use checks for soft requirements : Not all validations need to fail the test
Version control your tests : Commit .baml files with tests to Git
Debugging Failed Tests
When a test fails:
Check the Prompt Preview : Verify the rendered prompt looks correct
View the Raw Response : See what the LLM actually returned
Check the cURL Request : Ensure API parameters are correct
Add logging : Use checks to inspect intermediate values
Simplify the input : Start with a minimal test case
Try a different model : Some models handle certain tasks better
Example: Comprehensive Test Suite
enum Priority {
High
Medium
Low
}
class Task {
title string
description string ?
priority Priority
}
function ExtractTasks ( text : string ) -> Task [] {
client GPT4o
prompt # "
Extract tasks from this text:
{{ text }}
{{ ctx.output_format }}
" #
}
// Basic functionality test
test ExtractTasks_BasicInput {
functions [ ExtractTasks ]
args {
text # "
- Fix login bug (urgent)
- Update documentation (low priority)
- Review pull request
" #
}
@@ check ( has_tasks , {{ this | length > 0 }})
@@ check ( has_three_tasks , {{ this | length == 3 }})
@@ assert ({{ _ . checks . has_tasks }})
}
// Edge case: empty input
test ExtractTasks_EmptyInput {
functions [ ExtractTasks ]
args { text "" }
@@ check ( empty_result , {{ this | length == 0 }})
}
// Validation: priorities are parsed correctly
test ExtractTasks_PrioritiesCorrect {
functions [ ExtractTasks ]
args {
text "Fix critical bug (high priority), update readme (low)"
}
@@ check ( first_high , {{ this [ 0 ] . priority == "High" }})
@@ check ( second_low , {{ this [ 1 ] . priority == "Low" }})
@@ assert ({{ _ . checks . first_high and _ . checks . second_low }})
}
// Performance test
test ExtractTasks_PerformanceCheck {
functions [ ExtractTasks ]
args {
text "Task 1, Task 2, Task 3"
}
// Assert completes in under 2 seconds
@@ assert ({{ _ . latency_ms < 2000 }})
}
// Model comparison
test ExtractTasks_WithClaude {
functions [ ExtractTasks ]
override {
client "anthropic/claude-sonnet-4"
}
args {
text "Urgent: Fix bug. Low priority: Update docs."
}
@@ check ( extracted_two , {{ this | length == 2 }})
}
Next Steps
Functions Learn about BAML functions
CLI Test Reference Complete CLI testing documentation
Testing Advanced validation techniques
Dynamic Types Test with dynamic types