Stalled Jobs
What Are Stalled Jobs?
A job becomes “stalled” when Bull detects that a job is locked but not making progress. This happens when:- The Node process running your job processor unexpectedly terminates
- Your job processor is CPU-intensive and blocks the event loop
- The lock expires before the job completes
Understanding Job Locks
Bull uses Redis locks to ensure jobs are processed only once:- lockDuration: Time in milliseconds to hold the lock (default: 30000ms)
- lockRenewTime: Interval to renew the lock (default: lockDuration / 2)
Common Causes
1. CPU-Intensive Processing
Problem: Job processor blocks the event loop, preventing lock renewal.2. Lock Duration Too Short
Problem: Jobs take longer than the lock duration. Solution: Increase lock duration:3. Process Crashes
Problem: Worker process crashes while processing jobs. Solution: Implement proper error handling and process monitoring:Monitoring Stalled Jobs
Configuration
Redis Connection Issues
Connection Failures
Problem: Cannot Connect to Redis
Symptoms:Error: connect ECONNREFUSED- Queue methods hanging indefinitely
- Jobs not processing
Connection Drops
Problem: Redis connections drop intermittently. Solution: Configure connection retry logic:Redis Cluster Issues
Problem: Bull doesn’t work with Redis Cluster. Solution: Use hash tags in queue prefix:Connection Pooling
Problem: Too many Redis connections. Solution: Reuse connections:Memory Leaks
Job Data Accumulation
Problem: Completed/failed jobs accumulating in Redis. Solution: Auto-remove completed jobs:Manual Cleanup
Event Listener Leaks
Problem: Too many event listeners registered. Solution: Remove listeners when done:Worker Process Memory
Problem: Worker process memory grows over time. Solution: Use sandboxed processors or restart workers periodically:Lock Extension Failures
Problem: “Unable to renew lock” Errors
Symptoms:- Job taking longer than lock duration
- Redis connection issues
- High CPU usage blocking renewals
Job Not Processing
Jobs Stuck in Waiting State
Checklist:- Verify processor is registered:
- Check queue is not paused:
- Verify workers are running:
- Check for rate limiting:
Named Jobs Not Processing
Problem: Named jobs stay in waiting state. Solution: Register processor for that job name:Rate Limiting Issues
Jobs Not Respecting Rate Limits
Problem: More jobs processing than rate limit allows. Cause: Multiple workers/instances not sharing rate limit. Solution: Rate limits are global across all workers:Rate Limit Delays
Problem: Jobs delayed longer than expected. Solution: Configure bounce back:Debugging Tips
Enable Debug Logging
Inspect Queue State
Check Job Details
Monitor Events
Getting Help
Before Asking for Help
- Check this troubleshooting guide
- Review the Queue API Reference
- Search GitHub issues
- Enable debug logging and collect relevant logs
- Create a minimal reproduction case
Where to Get Help
- GitHub Issues: OptimalBits/bull
- Gitter Chat: Bull Gitter
- Stack Overflow: Tag questions with
bullandnode.js - Slack: BullMQ Slack
When Reporting Issues
Include:- Bull version (
npm list bull) - Node.js version (
node --version) - Redis version (
redis-server --version) - Minimal code to reproduce
- Error messages and stack traces
- Debug logs if applicable
- What you’ve already tried