Overview
Jobs are individual execution instances of workflows. When a workflow’s scheduled time arrives, Chronoverse creates a job to execute the workflow’s defined task. Each job has its own lifecycle, status, and log history.Job Properties
Every job has the following properties:Unique identifier (UUID) for the job
ID of the parent workflow this job belongs to
ID of the user who owns the workflow
Current job status (see Job Statuses below)
How the job was triggered:
AUTOMATIC: Scheduled automatically by ChronoverseMANUAL: Triggered manually by user via API
Docker container ID (for CONTAINER workflows only)
When the job was scheduled to run (RFC3339 format)
When job execution actually started
When job execution finished
When the job record was created
Last update timestamp
Job Statuses
Jobs progress through several states during their lifecycle:PENDING
Initial State: Job has been created but not yet queued for execution.
- Job record exists in database
- Waiting to be picked up by workers
- Typically very short duration (milliseconds)
QUEUED
Queued for Execution: Job has been sent to the execution queue.
- Event published to Kafka
- Waiting for worker to process
- Duration depends on worker availability
RUNNING
Active Execution: Job is currently executing.
- Container/process is running
- Logs are being captured in real-time
started_attimestamp is set- Duration varies by workflow complexity
COMPLETED
Successful Completion: Job finished successfully.
- Exit code 0 (or equivalent success indicator)
- All logs captured and stored
completed_attimestamp is set- Consecutive failure counter is reset
FAILED
Failed Execution: Job encountered an error.
- Non-zero exit code or exception
- Error logs captured
completed_attimestamp is set- Consecutive failure counter incremented
CANCELED
Canceled Execution: Job was canceled before or during execution.
- Manually canceled by user
- Or workflow terminated mid-execution
- Resources cleaned up
completed_attimestamp is set
Job Lifecycle
1. Scheduling Phase
Trigger: Scheduling Worker Every poll interval (configurable), the Scheduling Worker:- Queries PostgreSQL for workflows due for execution
- Checks workflow status and build readiness
- Creates job records with
PENDINGstatus - Sets
scheduled_atto current time - Publishes events to Kafka
Scheduling Query Logic
Scheduling Query Logic
Automatic Trigger: Jobs created by the Scheduling Worker have
trigger = AUTOMATIC.2. Queue Phase
Status: PENDING → QUEUED Once events are published to Kafka:- Job status updates to
QUEUED - Events wait in Kafka topics based on workflow type:
- HEARTBEAT workflows →
jobstopic - CONTAINER workflows →
workflowstopic (build first)
- HEARTBEAT workflows →
- Consumer groups ensure at-least-once delivery
3. Execution Phase
Status: QUEUED → RUNNINGFor HEARTBEAT Workflows
Worker: Execution Worker- Consumes job event from
jobstopic - Updates status to
RUNNING - Sets
started_attimestamp - Executes health check logic
- Captures result (success/failure)
For CONTAINER Workflows
Worker: Workflow Worker → Execution Worker Two-phase execution: Phase 1: Workflow Worker- Consumes workflow event from
workflowstopic - Retrieves workflow configuration from Redis
- Validates Docker image availability
- Prepares execution environment
- Publishes job event to
jobstopic
- Consumes job event from
jobstopic - Updates status to
RUNNING - Sets
started_attimestamp - Creates Docker container with configuration
- Starts container execution
- Streams stdout/stderr to Kafka’s
job_logstopic - Monitors container lifecycle
Container logs are streamed in real-time via Redis for live viewing while also being published to Kafka for persistence.
4. Completion Phase
Status: RUNNING → COMPLETED/FAILED/CANCELED When execution finishes:- Container exits (or heartbeat completes)
- Exit code determines outcome:
- Exit code 0 →
COMPLETED - Exit code ≠ 0 →
FAILED - Canceled by user →
CANCELED
- Exit code 0 →
- Sets
completed_attimestamp - Updates job status in database
- Publishes analytics events to Kafka
- Cleans up resources (container removal, cache cleanup)
5. Post-Processing Phase
Workers: JobLogs Processor & Analytics ProcessorJobLogs Processor
- Consumes log events from Kafka
- Performs batch insertion to ClickHouse
- Indexes logs in MeiliSearch
- Enables efficient log querying and search
Analytics Processor
- Consumes analytics events from Kafka
- Aggregates job metrics (duration, success rate, etc.)
- Stores in PostgreSQL for reporting
- Enables trend analysis and dashboards
Job Triggers
AUTOMATIC Trigger
Jobs created by the Scheduling Worker based on workflow interval. Characteristics:- Regular, predictable scheduling
- Respects workflow interval
- Skipped if previous job still running
- Default trigger type
MANUAL Trigger
Jobs created by explicit user request via API. Characteristics:- On-demand execution
- Ignores workflow interval
- Can run even if automatic job is running
- Useful for testing and debugging
- Test workflow configuration
- Run job outside regular schedule
- Retry failed execution
- Debug workflow issues
Job Logs
Log Capture
During execution, Chronoverse captures:- stdout: Standard output stream
- stderr: Standard error stream
- Timestamp: Nanosecond precision
- Sequence Number: Order preservation
Log Storage
Real-time (Redis):- Temporary storage during execution
- Enables live log streaming
- TTL-based expiration
- Permanent storage for all logs
- Optimized for time-series queries
- Compressed for efficient storage
- Full-text search capability
- Fast log filtering
- Message content indexing
Log Retrieval
Get Job Logs (Paginated)
Get Job Logs (Paginated)
Retrieve logs for a completed job with pagination:Supports filtering by stream (stdout, stderr, all).
Stream Job Logs (Real-time)
Stream Job Logs (Real-time)
Stream logs for a running job using Server-Sent Events:Automatically falls back to static logs when job completes.
Search Job Logs
Search Job Logs
Full-text search across job logs:Powered by MeiliSearch for fast results.
Job Monitoring
Status Tracking
Monitor job progress through:- API Polling: GET
/api/v1/workflows/{workflow_id}/jobs/{job_id} - Real-time Notifications: Subscribe to SSE endpoint for status updates
- Dashboard: Visual workflow and job status
Performance Metrics
Available metrics for each job:- Execution Duration:
completed_at - started_at - Queue Time:
started_at - scheduled_at - Total Time:
completed_at - scheduled_at
Access aggregated metrics via the Analytics Service for trend analysis.
Failure Analysis
When a job fails:- Check job status for current state
- Review stderr logs for error messages
- Examine exit code or error details
- Verify workflow configuration
- Check resource availability (memory, disk, network)
Job Management
List Jobs
Retrieve all jobs for a workflow:status: Filter by job statustrigger: Filter by trigger type (AUTOMATIC/MANUAL)cursor: Pagination cursor
Schedule Manual Job
Trigger a workflow manually:trigger = MANUAL that executes immediately.
View Job Details
Get complete job information:Best Practices
Interval Management
Start Conservative
Begin with longer intervals and decrease as needed. Prevents resource exhaustion during initial testing.
Consider Execution Time
Ensure interval > average execution time. If jobs take 5 minutes, use at least 10-minute intervals.
Log Management
- Keep Logs Concise: Excessive logging impacts performance
- Use Structured Output: JSON logs are easier to parse and search
- Log Errors to stderr: Helps with filtering and analysis
- Include Context: Timestamps, request IDs, relevant data
Error Handling
Exit Codes
Exit Codes
Use appropriate exit codes in your containerized applications:
0: Success1: General error2: Misuse/invalid arguments130: Terminated by Ctrl+C- Custom codes: Document in your workflow
Retry Strategy
Retry Strategy
- Implement retries within your application code for transient failures
- Use exponential backoff for external API calls
- Set reasonable timeouts
- Log retry attempts for debugging
Resource Limits
Resource Limits
- Set memory limits to prevent OOM errors
- Monitor execution time and optimize long-running jobs
- Clean up temporary files and connections
- Handle signals for graceful shutdown
Troubleshooting
Job Stuck in QUEUED
Possible Causes:- Workers are down or overloaded
- Kafka consumer lag
- Network connectivity issues
- Check worker health and logs
- Monitor Kafka consumer lag metrics
- Scale workers horizontally if needed
Job Failing Consistently
Possible Causes:- Invalid workflow configuration
- Missing dependencies in container
- External service unavailability
- Resource constraints
- Review job logs for error messages
- Test container locally with same configuration
- Verify external dependencies
- Check resource limits and utilization
Logs Not Appearing
Possible Causes:- JobLogs Processor is down
- Kafka connection issues
- ClickHouse write errors
- Check JobLogs Processor status
- Verify Kafka connectivity
- Review ClickHouse logs for errors
Next Steps
Schedule a Job
Learn how to schedule manual jobs
Workers
Understand how workers execute jobs
Workflows
Learn about workflow configuration
API Reference
Complete jobs API documentation