Skip to main content
VibeVoice is an open-source research framework designed to advance collaboration in the speech synthesis community. We welcome contributions from researchers and developers.

Project Overview

VibeVoice is developed and maintained by Microsoft Research. The project aims to push the boundaries of expressive, long-form, multi-speaker conversational audio generation.
VibeVoice is licensed under the MIT License, allowing free use, modification, and distribution with proper attribution.

Getting Started

Repository Access

Installation

Before contributing, set up your development environment:
git clone https://github.com/microsoft/VibeVoice.git
cd VibeVoice/
pip install -e .

Ways to Contribute

Code Contributions

We welcome pull requests for:
  • Bug fixes and stability improvements
  • Performance optimizations
  • New features aligned with the project roadmap
  • Documentation improvements
  • Test coverage expansion
Before starting significant work, please open an issue to discuss your proposed changes with the maintainers.

Research Collaboration

Contribute to the research direction:
  • Share experimental results and findings
  • Propose new architectures or training strategies
  • Contribute benchmark evaluations
  • Test multilingual capabilities and share observations

Testing and Feedback

Help improve VibeVoice by:
  • Testing the models in your use cases
  • Reporting bugs and unexpected behavior
  • Sharing performance metrics on different hardware
  • Providing feedback on documentation clarity
  • Suggesting new features or improvements

Current Roadmap

Active development areas include:
  • Add more voices (expand available speakers/voice timbres)
  • Implement streaming text input function to feed new tokens while audio is still being generated
  • Merge models into official HuggingFace transformers repository
Experimental support for nine additional languages (DE, FR, IT, JP, KR, NL, PL, PT, ES) has been added. We welcome:
  • Testing and quality evaluations
  • Bug reports for specific languages
  • Comparative analysis with English performance
  • Suggestions for improvement

Submission Guidelines

Opening Issues

When reporting bugs or requesting features:
  1. Check existing issues to avoid duplicates
  2. Use descriptive titles
  3. Include:
    • Model version and variant
    • Hardware configuration
    • Steps to reproduce (for bugs)
    • Expected vs. actual behavior
    • Relevant code snippets or logs
Do not report security vulnerabilities through public GitHub issues. Follow Microsoft’s security reporting guidelines instead.

Pull Requests

When submitting code:
  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with clear commit messages
  4. Test thoroughly on your hardware
  5. Update documentation as needed
  6. Submit a PR with a detailed description
  • Keep changes focused and atomic
  • Follow existing code style and conventions
  • Add tests for new functionality
  • Update README or docs if behavior changes
  • Reference related issues in your PR description

Responsible AI Principles

All contributions must align with Microsoft’s Responsible AI principles. Do not contribute features that enable harmful use cases.

Contribution Standards

Ensure your contributions:
  • Do not facilitate deepfakes or disinformation
  • Include appropriate safety guardrails
  • Maintain or improve content verification capabilities
  • Support transparency and AI disclosure
  • Respect privacy and consent principles

Voice Customization

To mitigate deepfake risks, voice prompts are provided in an embedded format. Users requiring voice customization should reach out to the team directly.
Contributions involving voice customization must:
  • Implement authentication and authorization
  • Include audit logging capabilities
  • Provide clear usage documentation
  • Consider consent and verification mechanisms

Community and Support

Getting Help

Sharing Your Work

If you build something with VibeVoice:
  • Share your project on GitHub with the vibevoice topic
  • Disclose the use of AI-generated content
  • Consider contributing improvements back to the project
  • Link to the VibeVoice project page for attribution
It is best practice to disclose the use of AI when sharing AI-generated content, in accordance with responsible AI principles.

Security Reporting

Microsoft takes security seriously. For security issues:
  • Do not use public GitHub issues
  • Review guidance at https://aka.ms/SECURITY.md
  • Follow Microsoft’s official security reporting procedures

License

VibeVoice is released under the MIT License:
MIT License

Copyright (c) 2025 Microsoft

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
By contributing to VibeVoice, you agree that your contributions will be licensed under the MIT License.

Recognition

Contributors are recognized through:
  • GitHub contributor graphs
  • Acknowledgment in release notes (for significant contributions)
  • Community recognition in project documentation
Thank you for helping advance open-source speech synthesis research!

Build docs developers (and LLMs) love