Skip to main content
WorldStereo code and model weights are not yet released. This guide describes the expected workflow once the code becomes available.

Prerequisites

Before starting, ensure you have:
  • Completed the installation steps
  • A CUDA-capable GPU with sufficient VRAM (24GB+ recommended)
  • Input images (perspective or panoramic)
  • Camera trajectory information (if doing controlled generation)

Basic Workflow

WorldStereo’s workflow consists of three main stages:
1

Prepare Input Data

Prepare your input images and camera trajectories
2

Generate Multi-View Video

Use WorldStereo to generate geometrically consistent videos with camera control
3

3D Reconstruction

Reconstruct 3D scenes from the generated multi-view consistent videos

Camera-Guided Video Generation

Basic Usage

Generate a video with camera control from a single image:
import worldstereo
from worldstereo import WorldStereoModel
from worldstereo.utils import load_image, define_camera_trajectory

# Initialize the model
model = WorldStereoModel.from_pretrained("worldstereo-base")

# Load input image
input_image = load_image("path/to/input/image.jpg")

# Define camera trajectory
camera_trajectory = define_camera_trajectory(
    trajectory_type="orbit",
    radius=5.0,
    num_frames=60,
    height=1.5
)

# Generate video with camera control
output_video = model.generate(
    image=input_image,
    camera_trajectory=camera_trajectory,
    num_inference_steps=50,
    guidance_scale=7.5
)

# Save the generated video
output_video.save("output/generated_video.mp4")
The actual API may differ. This is a conceptual example based on the framework’s architecture.

Perspective vs Panoramic Input

WorldStereo supports both perspective and panoramic images:
# Standard perspective image input
input_image = load_image("scene.jpg", image_type="perspective")

output = model.generate(
    image=input_image,
    camera_trajectory=trajectory,
    input_type="perspective"
)

Using Geometric Memory Modules

Global-Geometric Memory

The global-geometric memory maintains point clouds for structural consistency:
# Configure global-geometric memory
model.configure_global_memory(
    enable=True,
    point_cloud_resolution=1024,
    update_frequency="incremental",
    structural_prior_strength=0.8
)

# Generate with structural guidance
output = model.generate(
    image=input_image,
    camera_trajectory=trajectory,
    use_global_memory=True
)

Spatial-Stereo Memory

The spatial-stereo memory provides fine-grained detail consistency:
# Configure spatial-stereo memory
model.configure_spatial_memory(
    enable=True,
    correspondence_threshold=0.9,
    attention_constraint="strong",
    memory_bank_size=2048
)

# Generate with fine-grained control
output = model.generate(
    image=input_image,
    camera_trajectory=trajectory,
    use_spatial_memory=True
)

3D Scene Reconstruction

Once you’ve generated multi-view consistent videos, reconstruct the 3D scene:
from worldstereo.reconstruction import reconstruct_3d

# Generate video first
video = model.generate(
    image=input_image,
    camera_trajectory=trajectory
)

# Reconstruct 3D scene
point_cloud, mesh = reconstruct_3d(
    video=video,
    camera_trajectory=trajectory,
    method="neural_reconstruction",
    resolution="high"
)

# Export results
point_cloud.save("output/scene.ply")
mesh.save("output/scene.obj")
The reconstruction module leverages the geometric consistency from both memory modules to produce high-quality 3D outputs.

Advanced Configuration

Custom Camera Trajectories

Define complex camera movements:
import numpy as np
from worldstereo.camera import CameraTrajectory

# Define custom camera poses
num_frames = 60
camera_poses = []

for i in range(num_frames):
    # Custom pose calculation
    position = np.array([np.cos(i * 2 * np.pi / num_frames) * 5,
                         1.5,
                         np.sin(i * 2 * np.pi / num_frames) * 5])
    
    # Look at center
    look_at = np.array([0, 0, 0])
    
    camera_poses.append({
        'position': position,
        'look_at': look_at,
        'up': np.array([0, 1, 0])
    })

trajectory = CameraTrajectory(poses=camera_poses)

Batch Processing

Process multiple scenes efficiently:
from worldstereo.utils import batch_process

# Prepare batch of images
images = [
    load_image("scene1.jpg"),
    load_image("scene2.jpg"),
    load_image("scene3.jpg")
]

# Define trajectories for each
trajectories = [
    define_camera_trajectory(trajectory_type="orbit", radius=5.0),
    define_camera_trajectory(trajectory_type="forward", distance=10.0),
    define_camera_trajectory(trajectory_type="spiral", height_range=(0, 3))
]

# Batch generate
results = batch_process(
    model=model,
    images=images,
    trajectories=trajectories,
    batch_size=1  # Adjust based on GPU memory
)

Generation Parameters

Key parameters to tune for optimal results:
ParameterDescriptionDefaultRange
num_inference_stepsDiffusion denoising steps5020-100
guidance_scaleClassifier-free guidance strength7.51.0-15.0
structural_prior_strengthGlobal memory influence0.80.0-1.0
correspondence_thresholdSpatial memory matching0.90.5-1.0
memory_bank_sizeDetail memory capacity2048512-4096
Higher inference steps and guidance scales improve quality but increase computation time. Balance based on your needs.

Example Use Cases

Use Case 1: Virtual Tour Generation

Generate a virtual tour from a single room photograph:
# Load room image
room_image = load_image("room.jpg")

# Create walkthrough trajectory
walkthrough = define_camera_trajectory(
    trajectory_type="walkthrough",
    waypoints=[
        {"position": [0, 1.5, 0], "look_at": [5, 1.5, 0]},
        {"position": [5, 1.5, 0], "look_at": [5, 1.5, 5]},
        {"position": [5, 1.5, 5], "look_at": [0, 1.5, 5]}
    ],
    num_frames=180
)

# Generate consistent tour
tour_video = model.generate(
    image=room_image,
    camera_trajectory=walkthrough,
    use_global_memory=True,
    use_spatial_memory=True
)

Use Case 2: Product Visualization

Create 360° product views for e-commerce:
# Load product image
product_image = load_image("product.jpg")

# Orbit around product
orbit_trajectory = define_camera_trajectory(
    trajectory_type="orbit",
    radius=2.0,
    num_frames=120,
    height=1.0,
    complete_rotation=True
)

# Generate product video
product_video = model.generate(
    image=product_image,
    camera_trajectory=orbit_trajectory
)

# Reconstruct 3D model
product_mesh = reconstruct_3d(product_video, orbit_trajectory)
product_mesh.save("product_3d_model.obj")

Use Case 3: Scene Expansion from Panorama

Expand navigable space from a 360° panorama:
# Load panoramic image
panorama = load_image("360_panorama.jpg", image_type="panoramic")

# Generate exploration trajectory
exploration = define_camera_trajectory(
    trajectory_type="free_exploration",
    bounds={"x": [-10, 10], "z": [-10, 10]},
    height=1.7,
    num_frames=300
)

# Generate explorable space
space_video = model.generate(
    image=panorama,
    camera_trajectory=exploration,
    input_type="panoramic"
)

Performance Optimization

Memory Management

Optimize GPU memory usage:
# Enable mixed precision
model.enable_mixed_precision(dtype="float16")

# Adjust memory bank sizes
model.configure_global_memory(point_cloud_resolution=512)  # Reduce if needed
model.configure_spatial_memory(memory_bank_size=1024)  # Reduce if needed

# Process in chunks for long videos
output = model.generate(
    image=input_image,
    camera_trajectory=long_trajectory,
    chunk_size=30  # Process 30 frames at a time
)

Speed vs Quality Tradeoffs

# Fast generation (lower quality)
fast_output = model.generate(
    image=input_image,
    camera_trajectory=trajectory,
    num_inference_steps=20,
    guidance_scale=5.0,
    resolution=(512, 512)
)

# High quality (slower)
quality_output = model.generate(
    image=input_image,
    camera_trajectory=trajectory,
    num_inference_steps=100,
    guidance_scale=10.0,
    resolution=(1024, 1024)
)

Troubleshooting

Common Issues

Increase structural_prior_strength and ensure both memory modules are enabled:
model.configure_global_memory(structural_prior_strength=0.9)
model.configure_spatial_memory(correspondence_threshold=0.95)
Reduce memory requirements:
  • Lower resolution
  • Reduce chunk_size
  • Decrease memory_bank_size
  • Enable mixed precision (float16)
Ensure sufficient camera coverage and geometric consistency:
  • Use more diverse camera poses
  • Enable both memory modules
  • Increase correspondence_threshold
  • Use higher resolution generation

Next Steps

Camera-Guided Generation

Deep dive into camera-guided video generation

3D Reconstruction

Learn advanced reconstruction techniques

API Reference

Explore the complete API documentation

Scene Generation

Browse scene generation workflows

Build docs developers (and LLMs) love