Technology stack
| Layer | Technology |
|---|---|
| Backend API | Python 3.11, FastAPI, SQLAlchemy, Alembic |
| Frontend | Next.js 15+, React 18, TypeScript, Tailwind CSS |
| Relational database | PostgreSQL 15 |
| Vector database | Vespa 8.609 |
| Full-text search | OpenSearch 3.4 |
| Task queue / cache | Redis 7.4 |
| Background workers | Celery (thread-pool based) |
| Object storage | MinIO (S3-compatible) |
| LLM integration | LiteLLM, LangChain |
| Auth | OAuth2, OIDC, SAML, basic auth |
Service topology
The following diagram shows how the services communicate at runtime:Core services
API server (api_server)
The api_server is the central FastAPI application. It handles:
- All HTTP requests routed through nginx (internal port 8080)
- User authentication and session management
- Chat completions, streaming responses, and LLM interactions
- Connector management and credential storage
- Document retrieval from Vespa and OpenSearch
- Running database migrations via Alembic on startup
Web server (web_server)
The web_server is the Next.js 15 frontend. It proxies API calls to the api_server via the internal Docker network (http://api_server:8080). The nginx service routes browser traffic to either the frontend or the API based on path.
PostgreSQL (relational_db)
PostgreSQL 15 stores all relational data: users, connectors, credentials, chat sessions, document metadata, and Celery task state. The schema is managed with Alembic migrations.
The api_server runs alembic upgrade head automatically on every startup before accepting traffic.
Vespa vector database (index)
Vespa stores document chunks and their embeddings. It powers Onyx’s hybrid retrieval — combining dense vector similarity search with BM25 keyword matching. Vespa is the backbone of the RAG pipeline.
The Docker Compose service for Vespa is named
index (not vespa) because Vespa imposes restrictions on the container hostname format used in its internal URL scheme.OpenSearch (opensearch)
OpenSearch provides an additional full-text search layer alongside Vespa. Enabled by default (OPENSEARCH_FOR_ONYX_ENABLED=true). It can be disabled by setting OPENSEARCH_FOR_ONYX_ENABLED=false in your .env.
Redis (cache)
Redis 7.4 serves as both the Celery message broker and a general-purpose cache for ephemeral state, inter-process coordination, and locking. The cache service uses ephemeral storage (no persistence) to keep it lightweight.
Model servers
Two dedicated model servers run the sameonyxdotapp/onyx-model-server image:
inference_model_server: Handles embedding and reranking at query timeindexing_model_server: Runs withINDEXING_ONLY=True, dedicated to embedding documents during indexing to avoid resource contention
model_cache_huggingface, indexing_huggingface_model_cache) to avoid re-downloading on restart.
GPU support is available by uncommenting the deploy.resources section in docker-compose.yml (requires nvidia-container-toolkit).
MinIO (minio)
MinIO provides S3-compatible object storage for user-uploaded files. Activated with the s3-filestore Docker Compose profile (set via COMPOSE_PROFILES=s3-filestore). Alternatively, set FILE_STORE_BACKEND=postgres to store files in PostgreSQL without MinIO.
Background workers
Thebackground container runs all Celery workers under supervisord. Onyx uses multiple specialized worker types to prevent resource contention between different classes of work.
All workers use thread pools (not process forks) for stability. Task time limits must be implemented within each task — Celery’s built-in time limit features are silently disabled in thread mode.
Worker types
Primary worker
Primary worker
Coordinates core background operations and system-wide tasks. Runs with 4 threads concurrency.Handles: connector deletion, Vespa sync, document pruning checks, LLM model updates, user file sync.
Docfetching worker
Docfetching worker
Fetches documents from external data sources (connectors). Spawns
docprocessing tasks for each document batch. Implements watchdog monitoring for stuck connectors.Concurrency: configurable via CELERY_WORKER_DOCFETCHING_CONCURRENCY.Docprocessing worker
Docprocessing worker
Processes fetched documents through the full indexing pipeline:
- Upsert document records to PostgreSQL
- Chunk documents and add contextual metadata
- Embed chunks via the
indexing_model_server - Write chunks and embeddings to Vespa
- Update document metadata and status
CELERY_WORKER_DOCPROCESSING_CONCURRENCY.Light worker
Light worker
Handles lightweight, fast-completing operations. Runs at higher concurrency.Tasks: Vespa sync operations, document permissions sync, external group sync.Concurrency: configurable via
CELERY_WORKER_LIGHT_CONCURRENCY.Heavy worker
Heavy worker
Handles resource-intensive operations. Runs with 4 threads concurrency.Primary task: document pruning (removing stale documents from the index).
KG processing worker
KG processing worker
Handles Knowledge Graph processing and clustering. Builds relationships between documents and runs graph clustering algorithms.Scheduled every 60 seconds by the Beat worker.
Monitoring worker
Monitoring worker
System health monitoring and metrics collection. Monitors Celery queues, process memory, and overall system status. Runs single-threaded.Scheduled every 5 minutes by the Beat worker.
User file processing worker
User file processing worker
Processes user-uploaded files. Handles indexing and project synchronization for personal documents.
Beat worker (scheduler)
Beat worker (scheduler)
Celery Beat is the periodic task scheduler. Uses a
DynamicTenantScheduler for multi-tenant support.Default schedule:| Task | Frequency |
|---|---|
| Indexing checks | Every 15 seconds |
| Connector deletion checks | Every 20 seconds |
| Vespa sync checks | Every 20 seconds |
| Pruning checks | Every 20 seconds |
| KG processing | Every 60 seconds |
| Monitoring tasks | Every 5 minutes |
| Cleanup tasks | Hourly |
Task priorities
Workers consume from three priority queues: High, Medium, and Low. Redis coordinates queue state and inter-process locking. Task state and metadata are persisted in PostgreSQL.Data flow
Connector indexing pipeline
When a connector syncs, the following sequence runs entirely in the background workers:RAG query pipeline
When a user sends a chat message, theapi_server handles the full retrieval-augmented generation flow:
Community vs Enterprise Edition architecture
The Community Edition (CE) and Enterprise Edition (EE) share the same core service topology. The EE adds features layered on top — it does not require additional infrastructure services.
ENABLE_PAID_ENTERPRISE_EDITION_FEATURES=true in your .env and providing a valid license key.
EE-specific code lives in the backend/ee/ directory and is gated at the API router level. This includes:
- Advanced SSO: SAML, OIDC with PKCE, multi-provider support
- Multi-tenancy: Tenant-isolated database schemas managed via
alembic -n schema_private - Audit logging: Full query history with user attribution
- Advanced RBAC: Fine-grained role and permission management
- Document-level permissioning: Mirrors ACLs from connected external apps
web/) uses the same codebase for both editions, with Enterprise-only UI components gated behind feature flags.
Directory structure reference
Next steps
Quick start
Deploy the full stack with a single command.
Docker deployment
Production configuration, SSL, and security hardening.
RAG & search
How Onyx’s hybrid retrieval and knowledge graph work.
Connectors
Configure data source connectors and indexing schedules.
