WorldStereo code and model weights are not yet released. This guide describes the expected workflow once the code becomes available.
Prerequisites
Before starting, ensure you have:
Completed the installation steps
A CUDA-capable GPU with sufficient VRAM (24GB+ recommended)
Input images (perspective or panoramic)
Camera trajectory information (if doing controlled generation)
Basic Workflow
WorldStereo’s workflow consists of three main stages:
Prepare Input Data
Prepare your input images and camera trajectories
Generate Multi-View Video
Use WorldStereo to generate geometrically consistent videos with camera control
3D Reconstruction
Reconstruct 3D scenes from the generated multi-view consistent videos
Camera-Guided Video Generation
Basic Usage
Generate a video with camera control from a single image:
import worldstereo
from worldstereo import WorldStereoModel
from worldstereo.utils import load_image, define_camera_trajectory
# Initialize the model
model = WorldStereoModel.from_pretrained( "worldstereo-base" )
# Load input image
input_image = load_image( "path/to/input/image.jpg" )
# Define camera trajectory
camera_trajectory = define_camera_trajectory(
trajectory_type = "orbit" ,
radius = 5.0 ,
num_frames = 60 ,
height = 1.5
)
# Generate video with camera control
output_video = model.generate(
image = input_image,
camera_trajectory = camera_trajectory,
num_inference_steps = 50 ,
guidance_scale = 7.5
)
# Save the generated video
output_video.save( "output/generated_video.mp4" )
The actual API may differ. This is a conceptual example based on the framework’s architecture.
WorldStereo supports both perspective and panoramic images:
Perspective Image
Panoramic Image
# Standard perspective image input
input_image = load_image( "scene.jpg" , image_type = "perspective" )
output = model.generate(
image = input_image,
camera_trajectory = trajectory,
input_type = "perspective"
)
Using Geometric Memory Modules
Global-Geometric Memory
The global-geometric memory maintains point clouds for structural consistency:
# Configure global-geometric memory
model.configure_global_memory(
enable = True ,
point_cloud_resolution = 1024 ,
update_frequency = "incremental" ,
structural_prior_strength = 0.8
)
# Generate with structural guidance
output = model.generate(
image = input_image,
camera_trajectory = trajectory,
use_global_memory = True
)
Spatial-Stereo Memory
The spatial-stereo memory provides fine-grained detail consistency:
# Configure spatial-stereo memory
model.configure_spatial_memory(
enable = True ,
correspondence_threshold = 0.9 ,
attention_constraint = "strong" ,
memory_bank_size = 2048
)
# Generate with fine-grained control
output = model.generate(
image = input_image,
camera_trajectory = trajectory,
use_spatial_memory = True
)
3D Scene Reconstruction
Once you’ve generated multi-view consistent videos, reconstruct the 3D scene:
from worldstereo.reconstruction import reconstruct_3d
# Generate video first
video = model.generate(
image = input_image,
camera_trajectory = trajectory
)
# Reconstruct 3D scene
point_cloud, mesh = reconstruct_3d(
video = video,
camera_trajectory = trajectory,
method = "neural_reconstruction" ,
resolution = "high"
)
# Export results
point_cloud.save( "output/scene.ply" )
mesh.save( "output/scene.obj" )
The reconstruction module leverages the geometric consistency from both memory modules to produce high-quality 3D outputs.
Advanced Configuration
Custom Camera Trajectories
Define complex camera movements:
import numpy as np
from worldstereo.camera import CameraTrajectory
# Define custom camera poses
num_frames = 60
camera_poses = []
for i in range (num_frames):
# Custom pose calculation
position = np.array([np.cos(i * 2 * np.pi / num_frames) * 5 ,
1.5 ,
np.sin(i * 2 * np.pi / num_frames) * 5 ])
# Look at center
look_at = np.array([ 0 , 0 , 0 ])
camera_poses.append({
'position' : position,
'look_at' : look_at,
'up' : np.array([ 0 , 1 , 0 ])
})
trajectory = CameraTrajectory( poses = camera_poses)
Batch Processing
Process multiple scenes efficiently:
from worldstereo.utils import batch_process
# Prepare batch of images
images = [
load_image( "scene1.jpg" ),
load_image( "scene2.jpg" ),
load_image( "scene3.jpg" )
]
# Define trajectories for each
trajectories = [
define_camera_trajectory( trajectory_type = "orbit" , radius = 5.0 ),
define_camera_trajectory( trajectory_type = "forward" , distance = 10.0 ),
define_camera_trajectory( trajectory_type = "spiral" , height_range = ( 0 , 3 ))
]
# Batch generate
results = batch_process(
model = model,
images = images,
trajectories = trajectories,
batch_size = 1 # Adjust based on GPU memory
)
Generation Parameters
Key parameters to tune for optimal results:
Parameter Description Default Range num_inference_stepsDiffusion denoising steps 50 20-100 guidance_scaleClassifier-free guidance strength 7.5 1.0-15.0 structural_prior_strengthGlobal memory influence 0.8 0.0-1.0 correspondence_thresholdSpatial memory matching 0.9 0.5-1.0 memory_bank_sizeDetail memory capacity 2048 512-4096
Higher inference steps and guidance scales improve quality but increase computation time. Balance based on your needs.
Example Use Cases
Use Case 1: Virtual Tour Generation
Generate a virtual tour from a single room photograph:
# Load room image
room_image = load_image( "room.jpg" )
# Create walkthrough trajectory
walkthrough = define_camera_trajectory(
trajectory_type = "walkthrough" ,
waypoints = [
{ "position" : [ 0 , 1.5 , 0 ], "look_at" : [ 5 , 1.5 , 0 ]},
{ "position" : [ 5 , 1.5 , 0 ], "look_at" : [ 5 , 1.5 , 5 ]},
{ "position" : [ 5 , 1.5 , 5 ], "look_at" : [ 0 , 1.5 , 5 ]}
],
num_frames = 180
)
# Generate consistent tour
tour_video = model.generate(
image = room_image,
camera_trajectory = walkthrough,
use_global_memory = True ,
use_spatial_memory = True
)
Use Case 2: Product Visualization
Create 360° product views for e-commerce:
# Load product image
product_image = load_image( "product.jpg" )
# Orbit around product
orbit_trajectory = define_camera_trajectory(
trajectory_type = "orbit" ,
radius = 2.0 ,
num_frames = 120 ,
height = 1.0 ,
complete_rotation = True
)
# Generate product video
product_video = model.generate(
image = product_image,
camera_trajectory = orbit_trajectory
)
# Reconstruct 3D model
product_mesh = reconstruct_3d(product_video, orbit_trajectory)
product_mesh.save( "product_3d_model.obj" )
Use Case 3: Scene Expansion from Panorama
Expand navigable space from a 360° panorama:
# Load panoramic image
panorama = load_image( "360_panorama.jpg" , image_type = "panoramic" )
# Generate exploration trajectory
exploration = define_camera_trajectory(
trajectory_type = "free_exploration" ,
bounds = { "x" : [ - 10 , 10 ], "z" : [ - 10 , 10 ]},
height = 1.7 ,
num_frames = 300
)
# Generate explorable space
space_video = model.generate(
image = panorama,
camera_trajectory = exploration,
input_type = "panoramic"
)
Memory Management
Optimize GPU memory usage:
# Enable mixed precision
model.enable_mixed_precision( dtype = "float16" )
# Adjust memory bank sizes
model.configure_global_memory( point_cloud_resolution = 512 ) # Reduce if needed
model.configure_spatial_memory( memory_bank_size = 1024 ) # Reduce if needed
# Process in chunks for long videos
output = model.generate(
image = input_image,
camera_trajectory = long_trajectory,
chunk_size = 30 # Process 30 frames at a time
)
Speed vs Quality Tradeoffs
# Fast generation (lower quality)
fast_output = model.generate(
image = input_image,
camera_trajectory = trajectory,
num_inference_steps = 20 ,
guidance_scale = 5.0 ,
resolution = ( 512 , 512 )
)
# High quality (slower)
quality_output = model.generate(
image = input_image,
camera_trajectory = trajectory,
num_inference_steps = 100 ,
guidance_scale = 10.0 ,
resolution = ( 1024 , 1024 )
)
Troubleshooting
Common Issues
Inconsistent geometry across views
Increase structural_prior_strength and ensure both memory modules are enabled: model.configure_global_memory( structural_prior_strength = 0.9 )
model.configure_spatial_memory( correspondence_threshold = 0.95 )
Reduce memory requirements:
Lower resolution
Reduce chunk_size
Decrease memory_bank_size
Enable mixed precision (float16)
Poor 3D reconstruction quality
Ensure sufficient camera coverage and geometric consistency:
Use more diverse camera poses
Enable both memory modules
Increase correspondence_threshold
Use higher resolution generation
Next Steps
Camera-Guided Generation Deep dive into camera-guided video generation
3D Reconstruction Learn advanced reconstruction techniques
API Reference Explore the complete API documentation
Scene Generation Browse scene generation workflows