Catalog sync pipeline: import jobs and scheduling

The catalog sync pipeline bridges supplier APIs and the API-HUB database. When you trigger an import — manually via the API or automatically through the scheduler or an n8n workflow — a SyncJob record is created immediately so you can poll its status, and all the actual work runs in the background. Products that fail individually do not abort the job; they are recorded in the errors array and the job continues.

Discovery modes

The DiscoveryMode enum controls which products are fetched from the supplier during a sync. Each mode maps to a different query strategy inside the adapter.

full_sellable
delta
first_n
explicit_list
closeouts

Fetches the supplier’s full active catalog — every product that is currently available for sale. Use this for the initial load and periodic full refreshes.Recommended schedule: weekly

{ "mode": "full_sellable" }

Fetches only products that have changed since the last successful sync. The adapter uses supplier.last_delta_sync (falling back to last_full_sync, then 2000-01-01) as the cutoff timestamp.Recommended schedule: daily or hourly

{ "mode": "delta" }

Fetches the first N products from the supplier. Used for testing adapter configuration without pulling a full catalog.

{ "mode": "first_n", "limit": 20 }

Default limit is 20 if omitted. Maximum is 10,000.

Fetches a specific list of products by supplier_sku. explicit_list is required when this mode is selected.

{
  "mode": "explicit_list",
  "explicit_list": ["PC61", "PC54", "DT6000"]
}

Fetches only products flagged as closeouts by the supplier. Useful for building closeout-specific storefronts or running periodic clearance syncs.Recommended schedule: monthly

{ "mode": "closeouts" }

Not all adapters implement every mode. If the adapter does not support closeouts, calling it raises NotImplementedError and the job is marked failed. Check the adapter’s source or supplier documentation before scheduling a closeout sync.

Triggering an import

POST /api/suppliers//import

Starts a new import job for the specified supplier. Returns 202 Accepted immediately; work runs in a BackgroundTask. Request

POST /api/suppliers/3fa85f64-5717-4562-b3fc-2c963f66afa6/import
Content-Type: application/json

{
  "mode": "delta"
}

Response

{
  "sync_job_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  "supplier_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "mode": "delta",
  "accepted_at": "2026-05-07T10:15:30.123456Z"
}

Use sync_job_id to poll the job’s status via GET /api/suppliers/{supplier_id}/sync-jobs. Conflict protection: If an import of the same mode is already pending or running for the supplier, the endpoint returns 409 Conflict rather than starting a duplicate job. Adapter precondition: If the supplier has no adapter_class configured, the endpoint returns 409 before creating a job.

Sync job lifecycle

Job created (pending)

create_pending_import_job inserts a SyncJob row with status = "pending", total_products = 0, and started_at set to now. The job ID is returned to the caller immediately.

Adapter resolved

run_existing_import_job updates the job to status = "running" and calls get_adapter(supplier, db) to instantiate the correct protocol adapter (SOAP, REST, etc.) from the adapter registry. An AdapterNotConfiguredError or AdapterNotRegisteredError at this point marks the job failed.

Discovery

The adapter’s discover(mode, limit, explicit_list) method returns a list of ProductRef objects — lightweight references containing supplier_sku and optionally part_id. The count is written to job.total_products.An AuthError during discovery aborts immediately and marks the job failed. Other adapter errors (SupplierError, TransientError) also abort and mark failed.

Hydrate and persist (per-product loop)

For each ProductRef, the adapter calls hydrate_product(ref) to fetch full product details and normalize them to the ProductIngest schema. The result is then upserted to the database via persist_product.

TransientError (network, timeout, 5xx): retried up to 2 times with exponential backoff (1s, 2s). After all retries are exhausted, the product is counted as failed and the loop continues.
AuthError mid-loop: aborts the entire job immediately.
SupplierError, PersistError, AdapterError: counted as failed, loop continues.
Unexpected exceptions: logged and counted as failed, loop continues.

Job finalized

After the loop, _finalize_job computes the terminal status and writes completed_at:

"success" — zero errors
"partial_success" — some succeeded, some failed
"failed" — zero products succeeded

On success or partial_success, the supplier’s last_full_sync or last_delta_sync timestamp is updated, and stale detection runs (see below).

SyncJob schema

Jobs are stored in the sync_jobs table and exposed through SyncJobRead.

Field	Type	Description
`id`	UUID	Job identifier
`supplier_id`	UUID	FK → `suppliers.id`
`supplier_name`	VARCHAR	Denormalized for display
`job_type`	VARCHAR	`"import:{mode}"` e.g. `"import:delta"`
`status`	VARCHAR	`pending` → `running` → `success` / `partial_success` / `failed`
`discovery_mode`	VARCHAR	Raw enum value of the mode used
`total_products`	INTEGER	Count of `ProductRef` objects returned by `discover`
`success_count`	INTEGER	Products successfully hydrated and stored
`failed_count`	INTEGER	Products that errored during hydration or persistence
`records_processed`	INTEGER	Alias for `success_count` (used by OPS-side reporting)
`errors`	JSONB	Array of error objects: `{ phase, ref?, code?, msg }`
`started_at`	timestamptz	When the job entered `running` state
`completed_at`	timestamptz	When `_finalize_job` ran

Example error entry

{
  "phase": "hydrate",
  "ref": "PC61",
  "code": "SOAP_TIMEOUT",
  "msg": "ReadTimeout after 30s"
}

phase is one of "registry", "discover", or "hydrate". For hydrate errors, ref contains the supplier_sku of the failing product.

Stale detection

When a product is successfully re-synced, any customer that previously received that product (via OPS push) needs to know that the catalog data has changed. After a successful or partially successful sync, API-HUB identifies all customer_product_selections rows where:

The product belongs to the synced supplier
product.last_synced >= job.started_at (the product was actually updated in this run)
status = "pushed" (the product has been delivered to a storefront)
pushed_at < now (the push happened before this sync completed)

Those rows have their status flipped to "stale". Operators can query for stale selections to identify products that need to be re-pushed to reflect updated pricing, images, or attributes.

selected → pushed → stale

Filter customer_product_selections by status = "stale" to build a re-push queue. A product returns to pushed after a successful re-push.

Background scheduler

API-HUB includes a built-in Python scheduler (start_scheduler) that triggers a delta sync for all active, adapter-configured suppliers on a fixed interval.

await start_scheduler(interval_hours=24)

The scheduler sleeps first before its initial run. This prevents a burst of sync requests every time the application restarts.

API-HUB ships n8n cron workflows (inventory-sync-hourly.json, pricing-sync-daily.json, catalog-sync-weekly.json) that cover the same responsibility. If those n8n workflows are active, set DISABLE_SCHEDULER=true in your environment to prevent duplicate jobs.

Disabling the scheduler

# .env or environment
DISABLE_SCHEDULER=true

When set, start_scheduler logs a message and returns immediately without entering the loop.

Listing sync jobs

GET /api/suppliers//sync-jobs

Returns the 50 most recent sync jobs for a supplier, ordered by started_at descending.

GET /api/suppliers/3fa85f64-5717-4562-b3fc-2c963f66afa6/sync-jobs

[
  {
    "id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "supplier_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "supplier_name": "SanMar",
    "job_type": "import:delta",
    "status": "partial_success",
    "discovery_mode": "delta",
    "total_products": 312,
    "success_count": 308,
    "failed_count": 4,
    "records_processed": 308,
    "started_at": "2026-05-07T10:15:30Z",
    "completed_at": "2026-05-07T10:22:48Z",
    "errors": [
      { "phase": "hydrate", "ref": "LPC60", "msg": "SOAP fault: product discontinued" }
    ]
  }
]

Get Started

Suppliers

Catalog & Pricing

Storefront Push

n8n Automation

Operations

Catalog sync pipeline: import jobs and scheduling

Discovery modes

Triggering an import

POST /api/suppliers//import

Sync job lifecycle

SyncJob schema

Example error entry

Stale detection

Background scheduler

Disabling the scheduler

Listing sync jobs

GET /api/suppliers//sync-jobs

Build docs developers (and LLMs) love

Get Started

Suppliers

Catalog & Pricing

Storefront Push

n8n Automation

Operations

Documentation Index

​Discovery modes

​Triggering an import

​POST /api/suppliers//import

​Sync job lifecycle

​SyncJob schema

​Example error entry

​Stale detection

​Background scheduler

​Disabling the scheduler

​Listing sync jobs

​GET /api/suppliers//sync-jobs

Build docs developers (and LLMs) love

Discovery modes

Triggering an import

POST /api/suppliers//import

Sync job lifecycle

SyncJob schema

Example error entry

Stale detection

Background scheduler

Disabling the scheduler

Listing sync jobs

GET /api/suppliers//sync-jobs