Documentation Index
Fetch the complete documentation index at: https://mintlify.com/cvat-ai/cvat/llms.txt
Use this file to discover all available pages before exploring further.
CVAT provides built-in format conversion through the Datumaro framework. This page covers converting between formats, handling format limitations, and best practices for dataset transformation.
Overview
Format conversion is needed when:
- Your ML framework requires a specific format
- Source annotations are in a different format
- Converting between annotation types (e.g., masks to polygons)
- Merging datasets from different sources
- Adapting to format-specific limitations
Using Datumaro for Conversion
Datumaro is CVAT’s built-in dataset framework that handles all format conversions.
Installation
Basic Conversion
Convert between any supported formats:
# Convert COCO to YOLO
datum convert -if coco -i /path/to/coco -f yolo -o /path/to/yolo
# Convert Pascal VOC to COCO
datum convert -if voc -i /path/to/voc -f coco -o /path/to/coco
# Convert YOLO to Pascal VOC
datum convert -if yolo -i /path/to/yolo -f voc -o /path/to/voc
Command structure:
-if: Input format
-i: Input directory
-f: Output format
-o: Output directory
Python API Conversion
Programmatic format conversion:
import datumaro as dm
# Load dataset in COCO format
dataset = dm.Dataset.import_from('/path/to/coco', 'coco')
# Export to YOLO format
dataset.export('/path/to/yolo', 'yolo', save_media=True)
# Export to Pascal VOC
dataset.export('/path/to/voc', 'voc')
Batch Conversion
Convert to multiple formats:
import datumaro as dm
# Load source dataset
dataset = dm.Dataset.import_from('/path/to/source', 'coco')
# Export to multiple formats
formats = ['yolo', 'voc', 'labelme', 'imagenet']
for fmt in formats:
output_path = f'/path/to/output/{fmt}'
dataset.export(output_path, fmt, save_media=True)
print(f'Exported to {fmt} at {output_path}')
Common Conversion Scenarios
COCO to YOLO
Convert COCO object detection to YOLO format:
import datumaro as dm
# Load COCO dataset
dataset = dm.Dataset.import_from('coco_dataset/', 'coco')
# Export to YOLO
dataset.export('yolo_dataset/', 'yolo', save_media=True)
Considerations:
- COCO polygons are converted to bounding boxes
- Category IDs are remapped to YOLO class indices
- Coordinate normalization is handled automatically
YOLO to COCO
Convert YOLO to COCO for modern frameworks:
import datumaro as dm
# Load YOLO dataset with image dimensions
dataset = dm.Dataset.import_from('yolo_dataset/', 'yolo')
# Export to COCO format
dataset.export('coco_dataset/', 'coco_instances', save_media=True)
Considerations:
- YOLO bounding boxes become COCO bbox annotations
- Class names from obj.names are preserved
- Image dimensions required for conversion
Pascal VOC to COCO
Convert VOC XML annotations to COCO JSON:
import datumaro as dm
# Load Pascal VOC dataset
dataset = dm.Dataset.import_from('voc_dataset/', 'voc')
# Export to COCO
dataset.export('coco_dataset/', 'coco_instances', save_media=True)
Masks to Polygons
Convert segmentation masks to polygon annotations:
import datumaro as dm
# Load dataset with masks
dataset = dm.Dataset.import_from('dataset_with_masks/', 'coco')
# Convert masks to polygons
dataset = dataset.transform('masks_to_polygons')
# Export with polygons
dataset.export('dataset_with_polygons/', 'coco_instances')
Polygons to Masks
Convert polygons to pixel-level masks:
import datumaro as dm
# Load dataset with polygons
dataset = dm.Dataset.import_from('dataset_with_polygons/', 'coco')
# Convert polygons to masks
dataset = dataset.transform('polygons_to_masks')
# Export with masks
dataset.export('dataset_with_masks/', 'coco_instances')
Bounding Boxes to Polygons
Convert boxes to polygon representations:
import datumaro as dm
# Load dataset with bounding boxes
dataset = dm.Dataset.import_from('yolo_dataset/', 'yolo')
# Convert boxes to polygons
dataset = dataset.transform('boxes_to_polygons')
# Export
dataset.export('polygon_dataset/', 'coco_instances')
CVAT-Specific Conversions
When exporting from CVAT, certain conversions happen automatically:
Ellipses to Masks
CVAT ellipses are automatically converted to masks for formats that don’t support ellipses:
# In CVAT export code (automatic)
from cvat.apps.dataset_manager.formats.transformations import EllipsesToMasks
with GetCVATDataExtractor(instance_data, include_images=save_images) as extractor:
dataset = StreamDataset.from_extractors(extractor, env=dm_env)
dataset.transform(EllipsesToMasks) # Automatic conversion
dataset.export(temp_dir, format_name, save_media=save_images)
This happens automatically when exporting to:
- COCO formats
- YOLO Segmentation
- Most segmentation formats
Track Keyframes
For video annotations, CVAT ensures track keyframes are set:
# Automatic track keyframe setting
from cvat.apps.dataset_manager.formats.transformations import SetKeyframeForEveryTrackShape
dataset = dataset.transform(SetKeyframeForEveryTrackShape)
This ensures tracking annotations work correctly in imported formats.
Datumaro provides powerful transformations beyond format conversion:
Filtering
Filter dataset by various criteria:
import datumaro as dm
dataset = dm.Dataset.import_from('source/', 'coco')
# Filter by label
filtered = dataset.filter('/item/annotation[label="person"]')
# Filter by annotation count
filtered = dataset.filter('/item[annotation/label="car"]',
filter_annotations=True)
# Export filtered dataset
filtered.export('filtered_dataset/', 'coco')
Sampling
Create dataset subsets:
import datumaro as dm
dataset = dm.Dataset.import_from('source/', 'coco')
# Random sampling
sampled = dataset.transform('random_sampler', count=100)
# Subset by percentage
test_split = dataset.transform('random_split', splits=[(0.8, 'train'), (0.2, 'test')])
# Export subsets
for subset_name in ['train', 'test']:
subset = test_split.get_subset(subset_name)
subset.export(f'{subset_name}_dataset/', 'coco')
Label Mapping
Rename or merge labels:
import datumaro as dm
dataset = dm.Dataset.import_from('source/', 'coco')
# Remap labels
mapping = {
'car': 'vehicle',
'truck': 'vehicle',
'bicycle': 'vehicle'
}
remapped = dataset.transform('remap_labels', mapping=mapping)
remapped.export('remapped_dataset/', 'coco')
Image Resizing
Resize images and adjust annotations:
import datumaro as dm
dataset = dm.Dataset.import_from('source/', 'coco')
# Resize to fixed dimensions
resized = dataset.transform('resize', width=640, height=640)
# Resize with aspect ratio
resized = dataset.transform('resize', width=640, height=640, keep_aspect_ratio=True)
resized.export('resized_dataset/', 'coco')
Annotation Normalization
Normalize annotations for consistency:
import datumaro as dm
dataset = dm.Dataset.import_from('source/', 'coco')
# Remove duplicate annotations
normalized = dataset.transform('remove_duplicates')
# Merge overlapping annotations
normalized = normalized.transform('merge_instance_segments')
normalized.export('normalized_dataset/', 'coco')
Limitations:
- No native support for ellipses (converted to masks/polygons)
- Rotated boxes require polygon representation
- Attributes stored as custom fields
Best practices:
# Export COCO with proper settings
dataset.export('coco_output/', 'coco_instances',
save_media=True,
merge_images=False, # Keep images separate
crop_covered=False) # Don't crop overlapping regions
Limitations:
- Only bounding boxes (classic YOLO) or polygons (Ultralytics)
- No attribute support
- Requires image dimensions for import
Best practices:
# Provide image dimensions for YOLO import
image_info = {
'image1': (480, 640), # (height, width)
'image2': (1080, 1920)
}
dataset = dm.Dataset.import_from('yolo_dataset/', 'yolo',
image_info=image_info)
Pascal VOC
Limitations:
- XML-based, less efficient for large datasets
- Limited attribute support
- Bounding boxes only (segmentation in separate format)
Best practices:
# Export with proper subset splits
dataset.export('voc_output/', 'voc',
label_map='voc', # Use VOC standard labels
save_media=True,
apply_colormap=True) # For segmentation
ImageNet
Limitations:
- Classification only (no bounding boxes)
- Directory-based organization
- No spatial annotations
Best practices:
# Convert detection dataset to classification
from datumaro.components.annotation import AnnotationType
# Filter to keep only image-level labels
def filter_to_classification(item):
labels = [ann for ann in item.annotations
if ann.type == AnnotationType.label]
return item.wrap(annotations=labels)
dataset = dataset.transform(filter_to_classification)
dataset.export('imagenet_output/', 'imagenet')
Handling Annotation Type Mismatches
When converting between formats with different annotation types:
Detection to Segmentation
Convert bounding boxes to masks:
import datumaro as dm
import numpy as np
from datumaro.components.annotation import Mask
def boxes_to_masks(item):
new_annotations = []
for ann in item.annotations:
if ann.type == dm.AnnotationType.bbox:
# Create mask from bbox
x, y, w, h = map(int, [ann.x, ann.y, ann.w, ann.h])
mask = np.zeros((item.media.height, item.media.width), dtype=np.uint8)
mask[y:y+h, x:x+w] = 1
new_annotations.append(Mask(
image=mask,
label=ann.label,
attributes=ann.attributes
))
else:
new_annotations.append(ann)
return item.wrap(annotations=new_annotations)
dataset = dataset.transform(boxes_to_masks)
Segmentation to Detection
Convert masks to bounding boxes:
import datumaro as dm
# Automatic conversion using Datumaro
dataset = dataset.transform('masks_to_boxes')
# Or manually
from datumaro.components.annotation import Bbox
import numpy as np
def masks_to_boxes(item):
new_annotations = []
for ann in item.annotations:
if ann.type == dm.AnnotationType.mask:
# Compute bounding box from mask
indices = np.where(ann.image != 0)
if len(indices[0]) > 0:
y_min, y_max = indices[0].min(), indices[0].max()
x_min, x_max = indices[1].min(), indices[1].max()
new_annotations.append(Bbox(
x=x_min,
y=y_min,
w=x_max - x_min,
h=y_max - y_min,
label=ann.label,
attributes=ann.attributes
))
else:
new_annotations.append(ann)
return item.wrap(annotations=new_annotations)
dataset = dataset.transform(masks_to_boxes)
Keypoints to Detection
Extract bounding boxes from keypoint annotations:
import datumaro as dm
from datumaro.components.annotation import Bbox
def keypoints_to_boxes(item):
new_annotations = []
for ann in item.annotations:
if ann.type == dm.AnnotationType.points:
# Compute bounding box from keypoints
points = np.array(ann.points).reshape(-1, 2)
x_min, y_min = points.min(axis=0)
x_max, y_max = points.max(axis=0)
# Add padding
padding = 10
new_annotations.append(Bbox(
x=max(0, x_min - padding),
y=max(0, y_min - padding),
w=x_max - x_min + 2 * padding,
h=y_max - y_min + 2 * padding,
label=ann.label
))
else:
new_annotations.append(ann)
return item.wrap(annotations=new_annotations)
dataset = dataset.transform(keypoints_to_boxes)
Validation and Quality Checks
Validate converted datasets:
import datumaro as dm
# Load converted dataset
dataset = dm.Dataset.import_from('converted_dataset/', 'coco')
# Validate format
from datumaro.components.validator import TaskType
from datumaro.plugins.validators import validate_annotations
# Validate for detection task
reports = validate_annotations(dataset, task_type=TaskType.detection)
# Print validation issues
for report in reports:
print(f"Issue: {report['anomaly_type']}")
print(f"Severity: {report['severity']}")
print(f"Description: {report['description']}")
# Check dataset statistics
stats = dataset.get_patch().stats()
print(f"Total items: {stats['annotations']['count']}")
print(f"Labels: {stats['annotations']['labels']}")
Troubleshooting
Missing Annotations After Conversion
Problem: Some annotations disappeared after conversion.
Solutions:
- Check if target format supports the annotation type
- Verify labels exist in target format
- Check for invalid coordinates or empty annotations
# Debug missing annotations
import datumaro as dm
source = dm.Dataset.import_from('source/', 'coco')
converted = dm.Dataset.import_from('converted/', 'yolo')
print(f"Source annotations: {len(list(source.annotations()))}")
print(f"Converted annotations: {len(list(converted.annotations()))}")
# Check annotation types
for item in source:
for ann in item.annotations:
print(f"Type: {ann.type}, Label: {ann.label}")
Coordinate Mismatches
Problem: Bounding boxes or polygons are misplaced after conversion.
Solutions:
- Verify image dimensions are correct
- Check coordinate normalization (YOLO uses normalized coords)
- Ensure coordinate systems match (some formats use different origins)
# Verify coordinates
for item in dataset:
print(f"Image: {item.id}, Size: {item.media.size}")
for ann in item.annotations:
if ann.type == dm.AnnotationType.bbox:
print(f" Bbox: x={ann.x}, y={ann.y}, w={ann.w}, h={ann.h}")
Label Mapping Errors
Problem: Labels are incorrectly mapped or missing.
Solutions:
- Provide explicit label mapping
- Check for case sensitivity in label names
- Verify label IDs match between formats
# Explicit label mapping
label_map = {
0: 'background',
1: 'person',
2: 'car',
3: 'bicycle'
}
dataset.export('output/', 'yolo', label_map=label_map)
Best Practices
- Always validate after conversion - Check statistics and sample images
- Preserve original datasets - Keep source data before conversion
- Use Datumaro format for intermediate storage - It preserves all information
- Test with small samples first - Verify conversion works before processing large datasets
- Document label mappings - Keep track of label changes between formats
- Handle edge cases - Empty annotations, overlapping regions, etc.
- Check format documentation - Understand target format limitations
- Use version control - Track dataset versions and conversions
Next Steps