Skip to main content

Overview

Cactus supports tool calling (function calling) for models that are trained with tool support. The engine handles:
  • Tool schema validation
  • Constrained decoding to ensure valid JSON
  • Function call parsing
  • Multi-turn tool conversations

Supported Models

ModelTool Support
FunctionGemma-270M-IT✅ Native
LiquidAI/LFM2-*✅ Native
Qwen3-*✅ Native
Gemma-3-*

Basic Tool Calling

Define Tools

import json

tools = json.dumps([{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g. San Francisco"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
}])

Call Model with Tools

from cactus import cactus_init, cactus_complete, cactus_destroy
import json

model = cactus_init("weights/lfm2-1.2b", None, False)

messages = json.dumps([{
    "role": "user",
    "content": "What's the weather in Paris?"
}])

result = json.loads(cactus_complete(
    model,
    messages,
    None,    # options
    tools,   # tools JSON
    None     # callback
))

# Check for function calls
if result["function_calls"]:
    for call in result["function_calls"]:
        print(f"Function: {call['name']}")
        print(f"Arguments: {json.dumps(call['arguments'])}")
else:
    print(f"Response: {result['response']}")

cactus_destroy(model)

Response Format

{
    "success": true,
    "response": "",
    "function_calls": [
        {
            "name": "get_weather",
            "arguments": {
                "location": "Paris",
                "unit": "celsius"
            }
        }
    ],
    "confidence": 0.95,
    "total_time_ms": 156.2
}

Multi-Turn Tool Execution

Implement the full agent loop:
def execute_function(name, arguments):
    """Execute the actual function"""
    if name == "get_weather":
        # Call weather API
        return {
            "location": arguments["location"],
            "temperature": 22,
            "unit": arguments.get("unit", "celsius"),
            "conditions": "sunny"
        }
    return {"error": "Unknown function"}

conversation = [{
    "role": "user",
    "content": "What's the weather in Tokyo and Paris?"
}]

while True:
    messages = json.dumps(conversation)
    result = json.loads(cactus_complete(model, messages, None, tools, None))
    
    # No function calls - we're done
    if not result["function_calls"]:
        print(f"Assistant: {result['response']}")
        break
    
    # Execute each function call
    for call in result["function_calls"]:
        print(f"Calling {call['name']}({call['arguments']})...")
        
        # Execute function
        function_result = execute_function(call["name"], call["arguments"])
        
        # Add function result to conversation
        conversation.append({
            "role": "function",
            "name": call["name"],
            "content": json.dumps(function_result)
        })

Multiple Tools

Define multiple tools for the model to choose from:
tools = json.dumps([
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform a mathematical calculation",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string", "description": "Math expression to evaluate"}
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Get the current time",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": []
            }
        }
    }
])

Constrained Decoding

Cactus uses constrained decoding to ensure tool calls produce valid JSON:
# Model output is constrained to match tool schema
# Invalid tokens are masked during generation
# Guarantees parseable function calls
The tool call constrainer uses the tokenizer to bias token probabilities, ensuring only valid JSON structures are generated.

Error Handling

try:
    result = json.loads(cactus_complete(model, messages, None, tools, None))
    
    if not result["success"]:
        print(f"Generation failed: {result['error']}")
    
    for call in result.get("function_calls", []):
        try:
            function_result = execute_function(call["name"], call["arguments"])
        except Exception as e:
            # Add error to conversation
            conversation.append({
                "role": "function",
                "name": call["name"],
                "content": json.dumps({"error": str(e)})
            })
except RuntimeError as e:
    print(f"API error: {e}")

Parallel Function Calls

Some models can return multiple function calls simultaneously:
result = json.loads(cactus_complete(model, messages, None, tools, None))

if len(result["function_calls"]) > 1:
    print("Executing functions in parallel...")
    
    import concurrent.futures
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(execute_function, call["name"], call["arguments"])
            for call in result["function_calls"]
        ]
        
        results = [future.result() for future in futures]
        
        for call, result in zip(result["function_calls"], results):
            conversation.append({
                "role": "function",
                "name": call["name"],
                "content": json.dumps(result)
            })

Best Practices

  • Keep tool descriptions clear and concise
  • Use descriptive parameter names
  • Validate function arguments before execution
  • Handle function errors gracefully
  • Set appropriate timeouts for long-running tools

Model-Specific Formats

Different models use different tool call formats internally:
{
    "name": "get_weather",
    "arguments": {"location": "Paris"}
}
Cactus automatically handles format conversion - you always get the same structured JSON response regardless of model.

Advanced: Custom Tool Constraints

For advanced use cases, manually set tool constraints:
# In C++
model->set_tool_constraints({"function1", "function2"});

// During generation, only these tools can be called
uint32_t token = model->decode(tokens, temperature, top_p, top_k);

// Clear constraints
model->clear_tool_constraints();

Performance

Tool calling adds minimal overhead:
  • Constraint evaluation: ~0.1ms per token
  • Function call parsing: ~0.5ms
  • No impact on non-tool tokens

Next Steps

Chat Completion

Build conversational agents

RAG Guide

Combine tools with retrieval

Supported Models

Browse tool-capable models

API Reference

Complete completion API docs

Build docs developers (and LLMs) love