Project Overview
VibeVoice is developed and maintained by Microsoft Research. The project aims to push the boundaries of expressive, long-form, multi-speaker conversational audio generation.VibeVoice is licensed under the MIT License, allowing free use, modification, and distribution with proper attribution.
Getting Started
Repository Access
- GitHub: https://github.com/microsoft/VibeVoice
- Hugging Face: microsoft/vibevoice collection
- Project Page: https://microsoft.github.io/VibeVoice
- Technical Report: arXiv:2508.19205
Installation
Before contributing, set up your development environment:Using Docker (Recommended)
Using Docker (Recommended)
From Source
From Source
Ways to Contribute
Code Contributions
We welcome pull requests for:- Bug fixes and stability improvements
- Performance optimizations
- New features aligned with the project roadmap
- Documentation improvements
- Test coverage expansion
Before starting significant work, please open an issue to discuss your proposed changes with the maintainers.
Research Collaboration
Contribute to the research direction:- Share experimental results and findings
- Propose new architectures or training strategies
- Contribute benchmark evaluations
- Test multilingual capabilities and share observations
Testing and Feedback
Help improve VibeVoice by:- Testing the models in your use cases
- Reporting bugs and unexpected behavior
- Sharing performance metrics on different hardware
- Providing feedback on documentation clarity
- Suggesting new features or improvements
Current Roadmap
Active development areas include:VibeVoice-Realtime Roadmap
VibeVoice-Realtime Roadmap
- Add more voices (expand available speakers/voice timbres)
- Implement streaming text input function to feed new tokens while audio is still being generated
- Merge models into official HuggingFace
transformersrepository
Multilingual Exploration
Multilingual Exploration
Experimental support for nine additional languages (DE, FR, IT, JP, KR, NL, PL, PT, ES) has been added. We welcome:
- Testing and quality evaluations
- Bug reports for specific languages
- Comparative analysis with English performance
- Suggestions for improvement
Submission Guidelines
Opening Issues
When reporting bugs or requesting features:- Check existing issues to avoid duplicates
- Use descriptive titles
- Include:
- Model version and variant
- Hardware configuration
- Steps to reproduce (for bugs)
- Expected vs. actual behavior
- Relevant code snippets or logs
Pull Requests
When submitting code:- Fork the repository
- Create a feature branch
- Make your changes with clear commit messages
- Test thoroughly on your hardware
- Update documentation as needed
- Submit a PR with a detailed description
PR Best Practices
PR Best Practices
- Keep changes focused and atomic
- Follow existing code style and conventions
- Add tests for new functionality
- Update README or docs if behavior changes
- Reference related issues in your PR description
Responsible AI Principles
Contribution Standards
Ensure your contributions:- Do not facilitate deepfakes or disinformation
- Include appropriate safety guardrails
- Maintain or improve content verification capabilities
- Support transparency and AI disclosure
- Respect privacy and consent principles
Voice Customization
To mitigate deepfake risks, voice prompts are provided in an embedded format. Users requiring voice customization should reach out to the team directly.
- Implement authentication and authorization
- Include audit logging capabilities
- Provide clear usage documentation
- Consider consent and verification mechanisms
Community and Support
Getting Help
- GitHub Issues: For bug reports and feature requests
- GitHub Discussions: For questions and general discussion (if enabled)
- Project Page: microsoft.github.io/VibeVoice for demos and examples
- Colab Demo: Try VibeVoice-Realtime
Sharing Your Work
If you build something with VibeVoice:- Share your project on GitHub with the
vibevoicetopic - Disclose the use of AI-generated content
- Consider contributing improvements back to the project
- Link to the VibeVoice project page for attribution
It is best practice to disclose the use of AI when sharing AI-generated content, in accordance with responsible AI principles.
Security Reporting
Microsoft takes security seriously. For security issues:- Do not use public GitHub issues
- Review guidance at https://aka.ms/SECURITY.md
- Follow Microsoft’s official security reporting procedures
License
VibeVoice is released under the MIT License:By contributing to VibeVoice, you agree that your contributions will be licensed under the MIT License.
Recognition
Contributors are recognized through:- GitHub contributor graphs
- Acknowledgment in release notes (for significant contributions)
- Community recognition in project documentation