Tool Calling

Overview

Cactus supports tool calling (function calling) for models that are trained with tool support. The engine handles:

Tool schema validation
Constrained decoding to ensure valid JSON
Function call parsing
Multi-turn tool conversations

Supported Models

Model	Tool Support
FunctionGemma-270M-IT	✅ Native
LiquidAI/LFM2-*	✅ Native
Qwen3-*	✅ Native
Gemma-3-*	❌

Basic Tool Calling

Define Tools

import json

tools = json.dumps([{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g. San Francisco"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
}])

Call Model with Tools

from cactus import cactus_init, cactus_complete, cactus_destroy
import json

model = cactus_init("weights/lfm2-1.2b", None, False)

messages = json.dumps([{
    "role": "user",
    "content": "What's the weather in Paris?"
}])

result = json.loads(cactus_complete(
    model,
    messages,
    None,    # options
    tools,   # tools JSON
    None     # callback
))

# Check for function calls
if result["function_calls"]:
    for call in result["function_calls"]:
        print(f"Function: {call['name']}")
        print(f"Arguments: {json.dumps(call['arguments'])}")
else:
    print(f"Response: {result['response']}")

cactus_destroy(model)

Response Format

{
    "success": true,
    "response": "",
    "function_calls": [
        {
            "name": "get_weather",
            "arguments": {
                "location": "Paris",
                "unit": "celsius"
            }
        }
    ],
    "confidence": 0.95,
    "total_time_ms": 156.2
}

Multi-Turn Tool Execution

Implement the full agent loop:

def execute_function(name, arguments):
    """Execute the actual function"""
    if name == "get_weather":
        # Call weather API
        return {
            "location": arguments["location"],
            "temperature": 22,
            "unit": arguments.get("unit", "celsius"),
            "conditions": "sunny"
        }
    return {"error": "Unknown function"}

conversation = [{
    "role": "user",
    "content": "What's the weather in Tokyo and Paris?"
}]

while True:
    messages = json.dumps(conversation)
    result = json.loads(cactus_complete(model, messages, None, tools, None))
    
    # No function calls - we're done
    if not result["function_calls"]:
        print(f"Assistant: {result['response']}")
        break
    
    # Execute each function call
    for call in result["function_calls"]:
        print(f"Calling {call['name']}({call['arguments']})...")
        
        # Execute function
        function_result = execute_function(call["name"], call["arguments"])
        
        # Add function result to conversation
        conversation.append({
            "role": "function",
            "name": call["name"],
            "content": json.dumps(function_result)
        })

Multiple Tools

Define multiple tools for the model to choose from:

tools = json.dumps([
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform a mathematical calculation",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string", "description": "Math expression to evaluate"}
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Get the current time",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": []
            }
        }
    }
])

Constrained Decoding

Cactus uses constrained decoding to ensure tool calls produce valid JSON:

# Model output is constrained to match tool schema
# Invalid tokens are masked during generation
# Guarantees parseable function calls

The tool call constrainer uses the tokenizer to bias token probabilities, ensuring only valid JSON structures are generated.

Error Handling

try:
    result = json.loads(cactus_complete(model, messages, None, tools, None))
    
    if not result["success"]:
        print(f"Generation failed: {result['error']}")
    
    for call in result.get("function_calls", []):
        try:
            function_result = execute_function(call["name"], call["arguments"])
        except Exception as e:
            # Add error to conversation
            conversation.append({
                "role": "function",
                "name": call["name"],
                "content": json.dumps({"error": str(e)})
            })
except RuntimeError as e:
    print(f"API error: {e}")

Parallel Function Calls

Some models can return multiple function calls simultaneously:

result = json.loads(cactus_complete(model, messages, None, tools, None))

if len(result["function_calls"]) > 1:
    print("Executing functions in parallel...")
    
    import concurrent.futures
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(execute_function, call["name"], call["arguments"])
            for call in result["function_calls"]
        ]
        
        results = [future.result() for future in futures]
        
        for call, result in zip(result["function_calls"], results):
            conversation.append({
                "role": "function",
                "name": call["name"],
                "content": json.dumps(result)
            })

Best Practices

Keep tool descriptions clear and concise
Use descriptive parameter names
Validate function arguments before execution
Handle function errors gracefully
Set appropriate timeouts for long-running tools

Model-Specific Formats

Different models use different tool call formats internally:

Qwen
LFM2
Gemma

{
    "name": "get_weather",
    "arguments": {"location": "Paris"}
}

[get_weather(location="Paris")]

<function_call>{"name": "get_weather", "parameters": {"location": "Paris"}}</function_call>

Cactus automatically handles format conversion - you always get the same structured JSON response regardless of model.

Advanced: Custom Tool Constraints

For advanced use cases, manually set tool constraints:

# In C++
model->set_tool_constraints({"function1", "function2"});

// During generation, only these tools can be called
uint32_t token = model->decode(tokens, temperature, top_p, top_k);

// Clear constraints
model->clear_tool_constraints();

Performance

Tool calling adds minimal overhead:

Constraint evaluation: ~0.1ms per token
Function call parsing: ~0.5ms
No impact on non-tool tokens

Next Steps

Chat Completion

Build conversational agents

RAG Guide

Combine tools with retrieval

Supported Models

Browse tool-capable models

API Reference

Complete completion API docs

Get Started

Core Concepts

Guides

Platform SDKs

Advanced

Overview

Supported Models

Basic Tool Calling

Define Tools

Call Model with Tools

Response Format

Multi-Turn Tool Execution

Multiple Tools

Constrained Decoding

Error Handling

Parallel Function Calls

Best Practices

Model-Specific Formats

Advanced: Custom Tool Constraints

Performance

Next Steps

Chat Completion

RAG Guide

Supported Models

API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Platform SDKs

Advanced

Documentation Index

​Overview

​Supported Models

​Basic Tool Calling

​Define Tools

​Call Model with Tools

​Response Format

​Multi-Turn Tool Execution

​Multiple Tools

​Constrained Decoding

​Error Handling

​Parallel Function Calls

​Best Practices

​Model-Specific Formats

​Advanced: Custom Tool Constraints

​Performance

​Next Steps

Chat Completion

RAG Guide

Supported Models

API Reference

Build docs developers (and LLMs) love

Overview

Supported Models

Basic Tool Calling

Define Tools

Call Model with Tools

Response Format

Multi-Turn Tool Execution

Multiple Tools

Constrained Decoding

Error Handling

Parallel Function Calls

Best Practices

Model-Specific Formats

Advanced: Custom Tool Constraints

Performance

Next Steps