Documentation Index Fetch the complete documentation index at: https://mintlify.com/cactus-compute/cactus/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Streaming enables token-by-token output as the model generates text, providing a better user experience for interactive applications.
Basic Streaming
Python
from cactus import cactus_init, cactus_complete, cactus_destroy
import json
model = cactus_init( "weights/lfm2-1.2b" , None , False )
def on_token ( token , token_id ):
print (token, end = "" , flush = True )
messages = json.dumps([{ "role" : "user" , "content" : "Tell me a story" }])
result = json.loads(cactus_complete(model, messages, None , None , on_token))
print ( f " \n\n Generated { result[ 'decode_tokens' ] } tokens in { result[ 'total_time_ms' ] :.2f} ms" )
cactus_destroy(model)
Swift
let result = try cactusComplete (model, messagesJson, nil , nil ) { token, tokenId in
print (token, terminator : "" )
}
Kotlin
val result = cactusComplete (model, messagesJson, null , null ) { token, _ ->
print (token)
}
Streaming Transcription
Stream audio transcription results in real-time:
from cactus import (
cactus_stream_transcribe_start,
cactus_stream_transcribe_process,
cactus_stream_transcribe_stop
)
import json
stream = cactus_stream_transcribe_start(model, None )
for audio_chunk in microphone_stream():
partial = json.loads(cactus_stream_transcribe_process(stream, audio_chunk))
print ( f " \r { partial[ 'text' ] } " , end = "" )
final = json.loads(cactus_stream_transcribe_stop(stream))
print ( f " \n Final: { final[ 'text' ] } " )
Buffering Strategies
Error Handling
try :
result = json.loads(cactus_complete(model, messages, None , None , on_token))
if not result[ "success" ]:
print ( f " \n Generation failed: { result[ 'error' ] } " )
except RuntimeError as e:
print ( f " \n Stream error: { e } " )
Stop Generation
from cactus import cactus_stop
# In another thread
cactus_stop(model) # Aborts ongoing generation
Next Steps
Chat Completion Build conversational AI
Transcription Real-time speech-to-text