Pre-Release Status
The WorldStereo team is currently preparing the codebase and model weights for public release. Follow the GitHub repository for updates on the release timeline.Stay Updated
GitHub Repository
Watch the repository for release announcements
arXiv Paper
Read the research paper for technical details
Expected System Requirements
Based on the nature of WorldStereo as a Video Diffusion Model-based framework with geometric memory modules, the following requirements are anticipated:Hardware Requirements
These are estimated requirements. Actual requirements will be confirmed upon code release.
- GPU: NVIDIA GPU with at least 24GB VRAM (e.g., RTX 3090, RTX 4090, or A100)
- Video diffusion models are computationally intensive
- Geometric memory modules require additional VRAM for point cloud storage
- RAM: At least 32GB system memory recommended
- Storage: 50GB+ free space for model weights and dependencies
- CUDA: CUDA 11.8 or higher
Software Requirements
- Python: 3.9 or higher
- PyTorch: 2.0+ with CUDA support
- Additional dependencies (expected):
- Diffusion model libraries (likely diffusers or custom implementation)
- 3D processing libraries (e.g., PyTorch3D, Open3D for point cloud operations)
- Video processing utilities (e.g., OpenCV, imageio)
- Camera pose handling libraries
Planned Installation Steps
The following installation procedure is provisional and will be updated when the code is released.
Install Dependencies
Install required Python packages:This will likely include PyTorch, diffusion libraries, and 3D processing tools.
Download Model Weights
Download pre-trained model weights:Expected model components:
- Base VDM (Video Diffusion Model) backbone
- Global-geometric memory module weights
- Spatial-stereo memory module weights
Docker Installation (Expected)
A Docker container may be provided for easier setup:Docker support will simplify dependency management and ensure consistent environments across different systems.
Model Architecture Components
When released, WorldStereo will include:VDM Backbone
The foundation Video Diffusion Model trained with distribution matching distillation for efficient generation.Control Branch
The flexible control branch architecture that integrates geometric memory modules without requiring joint training of the entire system.Geometric Memory Modules
- Global-geometric memory: Point cloud-based structural priors
- Spatial-stereo memory: 3D correspondence-based attention constraints
Troubleshooting (Anticipated)
GPU Memory Issues
If you encounter out-of-memory errors:- Reduce batch size or video resolution
- Use gradient checkpointing if available
- Consider using model quantization (e.g., fp16/bf16)
CUDA Compatibility
Ensure your PyTorch installation matches your CUDA version:Getting Help
Once the code is released:- Issues: Report bugs on GitHub Issues
- Discussions: Join discussions on the GitHub repository
- Documentation: Check the official documentation for detailed guides
Next Steps
While waiting for the release:Read the Paper
Understand the technical details and methodology
Quick Start Guide
Preview the planned usage workflow