Documentation Index Fetch the complete documentation index at: https://mintlify.com/cvat-ai/cvat/llms.txt
Use this file to discover all available pages before exploring further.
Overview
CVAT’s serverless architecture allows you to integrate custom AI models by packaging them as Nuclio functions. This guide covers creating functions for detectors, interactors, trackers, and ReID models.
Function Structure
Each serverless function consists of:
my-custom-function/
├── nuclio/
│ ├── function.yaml # Nuclio function configuration
│ ├── main.py # Handler and request processing
│ ├── model_handler.py # Model loading and inference
│ └── requirements.txt # Python dependencies (optional)
└── README.md # Documentation (optional)
Creating a Detector Function
Create function.yaml with function configuration:
metadata :
name : my-custom-detector
namespace : cvat
annotations :
name : My Custom Detector
type : detector
version : 1
spec : |
[
{ "id": 0, "name": "class1", "type": "rectangle" },
{ "id": 1, "name": "class2", "type": "polygon" },
{ "id": 2, "name": "class3", "type": "mask" }
]
spec :
description : Custom object detector
runtime : 'python:3.10'
handler : main:handler
eventTimeout : 30s
build :
image : cvat.custom.detector
baseImage : ubuntu:22.04
directives :
preCopy :
- kind : RUN
value : apt-get update && apt-get install -y python3-pip
- kind : WORKDIR
value : /opt/nuclio
- kind : RUN
value : pip install torch torchvision opencv-python-headless pillow numpy
triggers :
myHttpTrigger :
numWorkers : 2
kind : 'http'
workerAvailabilityTimeoutMilliseconds : 10000
attributes :
maxRequestBodySize : 33554432 # 32MB
platform :
attributes :
restartPolicy :
name : always
maximumRetryCount : 3
Step 2: Implement Model Handler
Create model_handler.py to load and run your model:
import torch
import cv2
import numpy as np
from typing import List, Dict
class ModelHandler :
def __init__ ( self ):
"""Initialize and load the model."""
self .device = torch.device( 'cuda' if torch.cuda.is_available() else 'cpu' )
self .model = self ._load_model()
self .model.eval()
def _load_model ( self ):
"""Load your custom model."""
# Load pre-trained weights
model = YourCustomModel()
model.load_state_dict(torch.load( 'model_weights.pth' , map_location = self .device))
model.to( self .device)
return model
def preprocess ( self , image : np.ndarray) -> torch.Tensor:
"""Preprocess image for model input."""
# Resize, normalize, convert to tensor
image = cv2.resize(image, ( 640 , 640 ))
image = image.astype(np.float32) / 255.0
image = torch.from_numpy(image).permute( 2 , 0 , 1 ).unsqueeze( 0 )
return image.to( self .device)
def infer ( self , image : np.ndarray, threshold : float = 0.5 ) -> List[Dict]:
"""Run inference and return detections."""
preprocessed = self .preprocess(image)
with torch.no_grad():
predictions = self .model(preprocessed)
return self ._postprocess(predictions, image.shape, threshold)
def _postprocess ( self , predictions , original_shape , threshold ):
"""Convert model output to CVAT format."""
detections = []
for pred in predictions:
if pred[ 'score' ] < threshold:
continue
detection = {
'label' : self ._get_label_name(pred[ 'class_id' ]),
'confidence' : float (pred[ 'score' ]),
'type' : 'rectangle' , # or 'polygon', 'mask'
'points' : self ._convert_bbox(pred[ 'bbox' ], original_shape)
}
detections.append(detection)
return detections
def _get_label_name ( self , class_id : int ) -> str :
"""Map class ID to label name."""
labels = [ 'class1' , 'class2' , 'class3' ]
return labels[class_id]
def _convert_bbox ( self , bbox , original_shape ):
"""Convert bbox to CVAT format [xtl, ytl, xbr, ybr]."""
# Scale coordinates to original image size
h, w = original_shape[: 2 ]
x1, y1, x2, y2 = bbox
return [x1 * w, y1 * h, x2 * w, y2 * h]
Step 3: Create Request Handler
Create main.py to handle HTTP requests:
import json
import base64
import cv2
import numpy as np
from model_handler import ModelHandler
def init_context ( context ):
"""Initialize function context and load model."""
context.logger.info( "Initializing model..." )
context.user_data.model = ModelHandler()
context.logger.info( "Model initialized successfully" )
def handler ( context , event ):
"""Handle inference requests."""
try :
# Parse request
data = event.body
image_data = base64.b64decode(data[ 'image' ])
threshold = data.get( 'threshold' , 0.5 )
# Decode image
nparr = np.frombuffer(image_data, np.uint8)
image = cv2.imdecode(nparr, cv2. IMREAD_COLOR )
# Run inference
results = context.user_data.model.infer(image, threshold)
# Return response
return context.Response(
body = json.dumps(results),
headers = {},
content_type = 'application/json' ,
status_code = 200
)
except Exception as e:
context.logger.error( f "Error processing request: { str (e) } " )
return context.Response(
body = json.dumps({ 'error' : str (e)}),
headers = {},
content_type = 'application/json' ,
status_code = 500
)
Creating an Interactor Function
Interactors receive user input (points, boxes) for guided segmentation:
metadata :
name : my-custom-interactor
namespace : cvat
annotations :
name : My Custom Interactor
type : interactor
version : 1
spec : |
[
{ "name": "object", "type": "polygon" }
]
min_pos_points : 1
min_neg_points : 0
startswith_box : false
startswith_box_optional : true
help_message : Click points inside the object to segment it
Handler Implementation
def handler ( context , event ):
"""Handle interactive segmentation requests."""
data = event.body
# Extract inputs
image_data = base64.b64decode(data[ 'image' ])
pos_points = np.array(data[ 'pos_points' ]) # [[x1, y1], [x2, y2], ...]
neg_points = np.array(data[ 'neg_points' ]) # [[x1, y1], ...]
obj_bbox = data.get( 'obj_bbox' ) # Optional [xtl, ytl, xbr, ybr]
# Decode image
nparr = np.frombuffer(image_data, np.uint8)
image = cv2.imdecode(nparr, cv2. IMREAD_COLOR )
# Run interactive segmentation
mask = context.user_data.model.segment(
image,
pos_points,
neg_points,
obj_bbox
)
# Convert mask to polygon
polygon = mask_to_polygon(mask)
return context.Response(
body = json.dumps([{
'label' : 'object' ,
'type' : 'polygon' ,
'points' : polygon.flatten().tolist()
}]),
headers = {},
content_type = 'application/json' ,
status_code = 200
)
Creating a Tracker Function
Trackers maintain object state across frames:
metadata :
name : my-custom-tracker
namespace : cvat
annotations :
name : My Custom Tracker
type : tracker
version : 1
spec :
supported_shape_types : rectangle,polygon
Handler Implementation
def handler ( context , event ):
"""Handle tracking requests."""
data = event.body
# Extract inputs
image_data = base64.b64decode(data[ 'image' ])
shapes = data[ 'shapes' ] # Initial shapes or None for continuation
states = data[ 'states' ] # Tracking state or [] for initialization
# Decode image
nparr = np.frombuffer(image_data, np.uint8)
image = cv2.imdecode(nparr, cv2. IMREAD_COLOR )
if not states:
# Initialize tracking
new_states = []
for shape in shapes:
state = context.user_data.model.init_tracker(
image,
shape[ 'points' ],
shape[ 'type' ]
)
new_states.append(state)
else :
# Continue tracking
new_states = []
for state in states:
updated_state = context.user_data.model.track(
image,
state
)
new_states.append(updated_state)
# Extract updated shapes
updated_shapes = [
context.user_data.model.get_shape(state)
for state in new_states
]
return context.Response(
body = json.dumps({
'shapes' : updated_shapes,
'states' : new_states # Opaque state to pass to next frame
}),
headers = {},
content_type = 'application/json' ,
status_code = 200
)
Rectangle
{
'label' : 'car' ,
'type' : 'rectangle' ,
'points' : [x1, y1, x2, y2], # [xtl, ytl, xbr, ybr]
'confidence' : 0.95 ,
'rotation' : 0.0 # Optional
}
Polygon
{
'label' : 'person' ,
'type' : 'polygon' ,
'points' : [x1, y1, x2, y2, x3, y3, ... ], # Flat list of coordinates
'confidence' : 0.88
}
{
'label' : 'dog' ,
'type' : 'mask' ,
'points' : [x1, y1, x2, y2, ... , xtl, ytl, xbr, ybr], # RLE + bbox
'confidence' : 0.92
}
Skeleton (with Elements)
{
'label' : 'person' ,
'type' : 'skeleton' ,
'points' : [],
'elements' : [
{ 'label' : 'nose' , 'type' : 'points' , 'points' : [x, y]},
{ 'label' : 'left_eye' , 'type' : 'points' , 'points' : [x, y]},
# ... more keypoints
]
}
Advanced Features
Attributes
Add attributes to detections:
{
'label' : 'car' ,
'type' : 'rectangle' ,
'points' : [ 100 , 200 , 300 , 400 ],
'attributes' : [
{ 'name' : 'color' , 'value' : 'red' },
{ 'name' : 'occluded' , 'value' : 'false' }
]
}
Define attributes in function.yaml:
spec : |
[
{
"name": "car",
"type": "rectangle",
"attributes": [
{
"name": "color",
"input_type": "select",
"values": ["red", "blue", "green", "white", "black"]
},
{
"name": "occluded",
"input_type": "checkbox",
"values": ["true", "false"]
}
]
}
]
Group Annotations
Group related objects:
[
{ 'label' : 'person' , 'points' : [ ... ], 'group_id' : 0 },
{ 'label' : 'bicycle' , 'points' : [ ... ], 'group_id' : 0 }, # Same person
{ 'label' : 'person' , 'points' : [ ... ], 'group_id' : 1 },
{ 'label' : 'car' , 'points' : [ ... ], 'group_id' : 1 } # Different person
]
Deployment
Build and Deploy
# Deploy function
nuctl deploy --project-name cvat \
--path ./my-custom-function/nuclio \
--file ./my-custom-function/nuclio/function.yaml \
--platform local
# Verify deployment
nuctl get function my-custom-detector --platform local
Test Function
import requests
import base64
import json
# Read and encode image
with open ( 'test_image.jpg' , 'rb' ) as f:
image_b64 = base64.b64encode(f.read()).decode()
# Call function
response = requests.post(
'http://localhost:8080' , # Function port
json = { 'image' : image_b64, 'threshold' : 0.5 }
)
print (json.dumps(response.json(), indent = 2 ))
Best Practices
Lazy Loading : Load models in init_context, not in handler
Batch Processing : Process multiple requests efficiently
GPU Utilization : Use GPU when available for faster inference
Model Optimization : Use ONNX, TensorRT, or quantization
Caching : Cache preprocessed data when possible
Error Handling
def handler ( context , event ):
try :
# Validate input
if 'image' not in event.body:
raise ValueError ( "Missing 'image' field" )
# Process request
results = process_image(event.body)
return context.Response(
body = json.dumps(results),
status_code = 200
)
except ValueError as e:
context.logger.warning( f "Invalid input: { e } " )
return context.Response(
body = json.dumps({ 'error' : str (e)}),
status_code = 400
)
except Exception as e:
context.logger.error( f "Unexpected error: { e } " , exc_info = True )
return context.Response(
body = json.dumps({ 'error' : 'Internal server error' }),
status_code = 500
)
Logging
def handler ( context , event ):
context.logger.info( "Processing request" )
context.logger.debug( f "Threshold: { event.body.get( 'threshold' ) } " )
try :
result = process(event.body)
context.logger.info( f "Found { len (result) } objects" )
return context.Response( body = json.dumps(result), status_code = 200 )
except Exception as e:
context.logger.error( f "Error: { e } " , exc_info = True )
raise
Resource Management
# Limit function resources
platform :
attributes :
resources :
requests :
memory : "2Gi"
cpu : "1"
limits :
memory : "4Gi"
cpu : "2"
nvidia.com/gpu : "1" # Request GPU
Testing
Unit Tests
import pytest
from model_handler import ModelHandler
def test_model_loading ():
handler = ModelHandler()
assert handler.model is not None
def test_inference ():
handler = ModelHandler()
image = np.random.rand( 640 , 640 , 3 ).astype(np.uint8)
results = handler.infer(image, threshold = 0.5 )
assert isinstance (results, list )
def test_output_format ():
handler = ModelHandler()
image = np.random.rand( 640 , 640 , 3 ).astype(np.uint8)
results = handler.infer(image)
for detection in results:
assert 'label' in detection
assert 'type' in detection
assert 'points' in detection
assert detection[ 'type' ] in [ 'rectangle' , 'polygon' , 'mask' ]
Integration Tests
# Deploy function locally
nuctl deploy --project-name cvat --path ./nuclio --file ./nuclio/function.yaml --platform local
# Get function URL
FUNC_URL = $( nuctl get function my-custom-detector -o json | jq -r '.status.httpPort' )
# Test with sample image
python test_function.py --url "http://localhost: $FUNC_URL " --image test.jpg
Troubleshooting
Common Issues
Model Not Loading :
Check model file path in container
Verify dependencies in build directives
Increase memory limits
Slow Inference :
Use GPU-optimized function variant
Optimize model (ONNX, quantization)
Adjust numWorkers in function.yaml
Invalid Output Format :
Validate against CVAT expected format
Check coordinate scaling
Test with small dataset first
Memory Errors :
Increase container memory limits
Reduce batch size
Optimize image preprocessing
Examples
Explore existing functions in the CVAT repository:
SAM Interactor : serverless/pytorch/facebookresearch/sam/
YOLO Detector : serverless/onnx/WongKinYiu/yolov7/
TransT Tracker : serverless/pytorch/dschoerk/transt/
Mask R-CNN : serverless/openvino/omz/public/mask_rcnn_inception_resnet_v2_atrous_coco/
Next Steps
Deployment Guide Learn how to deploy serverless functions
Overview Understand serverless function types