Contributing

VibeVoice is an open-source research framework designed to advance collaboration in the speech synthesis community. We welcome contributions from researchers and developers.

Project Overview

VibeVoice is developed and maintained by Microsoft Research. The project aims to push the boundaries of expressive, long-form, multi-speaker conversational audio generation.

VibeVoice is licensed under the MIT License, allowing free use, modification, and distribution with proper attribution.

Getting Started

Repository Access

GitHub: https://github.com/microsoft/VibeVoice
Hugging Face: microsoft/vibevoice collection
Project Page: https://microsoft.github.io/VibeVoice
Technical Report: arXiv:2508.19205

Installation

Before contributing, set up your development environment:

Using Docker (Recommended)

# NVIDIA PyTorch Container 24.07 / 24.10 / 24.12 verified
sudo docker run --privileged --net=host --ipc=host \
  --ulimit memlock=-1:-1 --ulimit stack=-1:-1 \
  --gpus all --rm -it nvcr.io/nvidia/pytorch:24.07-py3

# Install flash attention if needed
pip install flash-attn --no-build-isolation

From Source

git clone https://github.com/microsoft/VibeVoice.git
cd VibeVoice/
pip install -e .

Ways to Contribute

Code Contributions

We welcome pull requests for:

Bug fixes and stability improvements
Performance optimizations
New features aligned with the project roadmap
Documentation improvements
Test coverage expansion

Before starting significant work, please open an issue to discuss your proposed changes with the maintainers.

Research Collaboration

Contribute to the research direction:

Share experimental results and findings
Propose new architectures or training strategies
Contribute benchmark evaluations
Test multilingual capabilities and share observations

Testing and Feedback

Help improve VibeVoice by:

Testing the models in your use cases
Reporting bugs and unexpected behavior
Sharing performance metrics on different hardware
Providing feedback on documentation clarity
Suggesting new features or improvements

Current Roadmap

Active development areas include:

VibeVoice-Realtime Roadmap

Add more voices (expand available speakers/voice timbres)
Implement streaming text input function to feed new tokens while audio is still being generated
Merge models into official HuggingFace transformers repository

Multilingual Exploration

Experimental support for nine additional languages (DE, FR, IT, JP, KR, NL, PL, PT, ES) has been added. We welcome:

Testing and quality evaluations
Bug reports for specific languages
Comparative analysis with English performance
Suggestions for improvement

Submission Guidelines

Opening Issues

When reporting bugs or requesting features:

Check existing issues to avoid duplicates
Use descriptive titles
Include:
- Model version and variant
- Hardware configuration
- Steps to reproduce (for bugs)
- Expected vs. actual behavior
- Relevant code snippets or logs

Do not report security vulnerabilities through public GitHub issues. Follow Microsoft’s security reporting guidelines instead.

Pull Requests

When submitting code:

Fork the repository
Create a feature branch
Make your changes with clear commit messages
Test thoroughly on your hardware
Update documentation as needed
Submit a PR with a detailed description

PR Best Practices

Keep changes focused and atomic
Follow existing code style and conventions
Add tests for new functionality
Update README or docs if behavior changes
Reference related issues in your PR description

Responsible AI Principles

All contributions must align with Microsoft’s Responsible AI principles. Do not contribute features that enable harmful use cases.

Contribution Standards

Ensure your contributions:

Do not facilitate deepfakes or disinformation
Include appropriate safety guardrails
Maintain or improve content verification capabilities
Support transparency and AI disclosure
Respect privacy and consent principles

Voice Customization

To mitigate deepfake risks, voice prompts are provided in an embedded format. Users requiring voice customization should reach out to the team directly.

Contributions involving voice customization must:

Implement authentication and authorization
Include audit logging capabilities
Provide clear usage documentation
Consider consent and verification mechanisms

Community and Support

Getting Help

GitHub Issues: For bug reports and feature requests
GitHub Discussions: For questions and general discussion (if enabled)
Project Page: microsoft.github.io/VibeVoice for demos and examples
Colab Demo: Try VibeVoice-Realtime

If you build something with VibeVoice:

Share your project on GitHub with the vibevoice topic
Disclose the use of AI-generated content
Consider contributing improvements back to the project
Link to the VibeVoice project page for attribution

It is best practice to disclose the use of AI when sharing AI-generated content, in accordance with responsible AI principles.

Security Reporting

Microsoft takes security seriously. For security issues:

Do not use public GitHub issues
Review guidance at https://aka.ms/SECURITY.md
Follow Microsoft’s official security reporting procedures

License

VibeVoice is released under the MIT License:

MIT License

Copyright (c) 2025 Microsoft

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

By contributing to VibeVoice, you agree that your contributions will be licensed under the MIT License.

Recognition

Contributors are recognized through:

GitHub contributor graphs
Acknowledgment in release notes (for significant contributions)
Community recognition in project documentation

Thank you for helping advance open-source speech synthesis research!

Get Started

Models

Guides

Architecture

Resources

Project Overview

Getting Started

Repository Access

Installation

Ways to Contribute

Code Contributions

Research Collaboration

Testing and Feedback

Current Roadmap

Submission Guidelines

Opening Issues

Pull Requests

Responsible AI Principles

Contribution Standards

Voice Customization

Community and Support

Getting Help

Security Reporting

License

Recognition

Build docs developers (and LLMs) love

Get Started

Models

Guides

Architecture

Resources

​Project Overview

​Getting Started

​Repository Access

​Installation

​Ways to Contribute

​Code Contributions

​Research Collaboration

​Testing and Feedback

​Current Roadmap

​Submission Guidelines

​Opening Issues

​Pull Requests

​Responsible AI Principles

​Contribution Standards

​Voice Customization

​Community and Support

​Getting Help

​Sharing Your Work

​Security Reporting

​License

​Recognition

Build docs developers (and LLMs) love

Project Overview

Getting Started

Repository Access

Installation

Ways to Contribute

Code Contributions

Research Collaboration

Testing and Feedback

Current Roadmap

Submission Guidelines

Opening Issues

Pull Requests

Responsible AI Principles

Contribution Standards

Voice Customization

Community and Support

Getting Help

Sharing Your Work

Security Reporting

License

Recognition