Every Gradio app comes with a built-in queuing system that can scale to thousands of concurrent users. Because many of your event listeners may involve heavy processing, Gradio automatically creates a queue to handle every event listener in the backend. Every event listener in your app automatically has a queue to process incoming events.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/gradio-app/gradio/llms.txt
Use this file to discover all available pages before exploring further.
Configuring the queue
By default, each event listener has its own queue, which handles one request at a time. You can configure this via two arguments:Concurrency limit
Theconcurrency_limit parameter sets the maximum number of concurrent executions for an event listener. By default, the limit is 1 unless configured otherwise in Blocks.queue(). You can also set it to None for no limit (i.e., an unlimited number of concurrent executions).
Shared queues with concurrency ID
If you want to manage multiple event listeners using a shared queue, you can use theconcurrency_id argument. This allows event listeners to share a queue by assigning them the same ID.
For example, if your setup has only 2 GPUs but multiple functions require GPU access, you can create a shared queue for all those functions:
"gpu_queue". The queue can handle up to 2 concurrent requests at a time, as defined by the concurrency_limit.
Default concurrency settings
The default concurrency limit for all queues can be set globally using thedefault_concurrency_limit parameter in Blocks.queue().
The queuing system makes it easy to manage the processing behavior of your Gradio app and ensures optimal resource utilization.