Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/azfar-imtiaz/PayPulse-Cloud/llms.txt

Use this file to discover all available pages before exploring further.

Retail invoice fetching in PayPulse Cloud is driven entirely by configuration stored in the VendorConfig DynamoDB table. Adding a new vendor — or disabling an existing one — requires no Lambda code changes.

The VendorConfig table

The VendorConfig table is defined in aws-infra-terraform/dynamodb.tf with vendor_id as its partition key and PAY_PER_REQUEST billing.
AttributeDynamoDB typeDescription
vendor_idString (PK)Unique identifier for the vendor (e.g. dominos, zalando). Used as the S3 path segment and DynamoDB key.
vendor_nameStringHuman-readable display name (e.g. Dominos, Jack & Jones).
invoice_categoryStringAlways retail for vendors in this table.
invoice_sub_typeStringOne of the 8 retail categories (see below). Determines the S3 sub-path.
default_email_patternsString SetOne or more sender email addresses to match (e.g. domino@dominos.se).
default_subject_keywordsString SetOne or more subject-line keywords to filter emails by (e.g. Tack for din beställning).
parser_typeStringFormat of the invoice content — currently html for all vendors.
activeBooleanWhen true, this vendor is included in every fetch run. Set to false to disable without deleting the record.
supports_pdfBooleanWhether this vendor sends PDF attachments.
supports_htmlBooleanWhether this vendor sends HTML email bodies.
logo_urlStringURL to the vendor logo asset in S3. May be empty.
created_atStringISO 8601 creation timestamp.
updated_atStringISO 8601 last-update timestamp.

Example record

The Zalando vendor config from vendor_configs/clothing/zalando.json:
{
  "vendor_id": {"S": "zalando"},
  "vendor_name": {"S": "Zalando"},
  "invoice_category": {"S": "retail"},
  "invoice_sub_type": {"S": "clothing"},
  "default_email_patterns": {"SS": [
    "info@service-mail.zalando.se"
  ]},
  "default_subject_keywords": {"SS": [
    "Thanks for your order"
  ]},
  "parser_type": {"S": "html"},
  "active": {"BOOL": true},
  "supports_pdf": {"BOOL": false},
  "supports_html": {"BOOL": true},
  "logo_url": {"S": "https://paypulse-vendor-logos.s3.eu-west-1.amazonaws.com/svgs/zalando.svg"},
  "created_at": {"S": "2025-10-06T10:30:00Z"},
  "updated_at": {"S": "2025-10-06T10:30:00Z"}
}

The 8 retail categories

Every vendor maps to one invoice_sub_type. This value determines the S3 path segment under invoices/{user_id}/retail/ where the HTML email is stored, and which DynamoDB detail table the parsed invoice is written to.

food-delivery

Restaurant and food delivery orders (e.g. Dominos, Foodora).

clothing

Fashion and apparel purchases (e.g. Zalando, Jack & Jones, Fotproffsen).

technology

Electronics and tech products.

subscriptions

Streaming services, SaaS receipts, memberships (e.g. Anthropic, Mevlana Moské).

grocery

Supermarket and grocery store purchases.

utility

Electricity, water, internet, and other utility bills.

travel

Transportation invoices — flights, trains, buses.

miscellaneous

Other retail purchases that don’t fit a named category.
Each sub-type has a dedicated DynamoDB detail table (FoodDeliveryInvoices, ClothingInvoices, TechnologyInvoices, etc.) in addition to the shared RetailInvoices base table.

How fetch_retail_invoices reads vendor configs

The fetch_retail_invoices Lambda (lambdas/invoices/fetch_retail_invoices/lambda_function.py) runs the following flow on every POST /v1/invoices/retail/ingest call:
1

Load active vendors

The Lambda calls get_active_vendors(vendor_config_table) to scan the VendorConfig table and return only records where active = true. An optional vendor_category filter in the request body narrows this to a single sub-type.
2

Determine the date range

determine_search_date_range() checks the last_retail_invoice_fetch field in the Users table for the authenticated user.
  • First fetch: searches the last 30 days.
  • Subsequent fetches: searches from last_retail_invoice_fetch to now.
  • Custom range: if start_date and end_date are provided in the request body (format YYYY-MM-DD), those take precedence.
3

Build the Gmail search query

For each vendor, build_gmail_query() combines:
  • Sender filter: from:(domino@dominos.se) — one OR-joined entry per address in default_email_patterns.
  • Subject filter: subject:(Tack for din beställning) — one OR-joined entry per keyword in default_subject_keywords.
  • Date filter: after:YYYY/MM/DD before:YYYY/MM/DD.
If default_email_patterns is empty for a vendor, that vendor is skipped with a warning.
4

Search Gmail and collect emails

The Lambda calls the Gmail API messages.list endpoint with the generated query, capped at 100 results per vendor to stay within the Lambda timeout. For each matching message, it fetches the full email content.
5

Duplicate detection

Before uploading, the Lambda generates a deterministic S3 key (generate_retail_invoice_s3_key) from user_id, vendor_id, sub_type, email date, and Gmail message ID. It then calls s3_file_exists() — if the object already exists, the email is skipped. This prevents re-processing the same invoice on repeated fetch calls.
6

Upload to S3

New emails are extracted as HTML (extract_html_from_email) and uploaded to:
{bucket}/invoices/{user_id}/retail/{sub_type}/{vendor_id}_{date}_{message_id_hash}.html
7

Update fetch timestamp

After processing all vendors, update_last_retail_invoice_fetch() writes the current UTC timestamp back to the Users table so the next call fetches only new emails.
A 100 ms sleep between individual emails and a 200 ms sleep between vendors prevents Gmail API rate-limit errors.

Incremental fetching

The last_retail_invoice_fetch field in the Users table is an ISO 8601 timestamp (e.g. 2025-10-06T10:30:00Z). On each successful run it is updated to the current UTC time, so subsequent calls only search the window between the last fetch and now. If the stored timestamp cannot be parsed, the Lambda falls back to searching the last 30 days and logs a warning.

Duplicate detection

Duplicate detection happens at the S3 layer before any upload. The S3 key encodes the Gmail message ID, so the same email always produces the same key. If s3:HeadObject confirms the object exists, the email is skipped without triggering the downstream parser. This means calling POST /v1/invoices/retail/ingest multiple times is safe — only genuinely new emails are processed.

Currently configured vendors

VendorSub-typeSender
Dominosfood-deliverydomino@dominos.se
Foodorafood-deliveryinfo@mail.foodora.se
Zalandoclothinginfo@service-mail.zalando.se
Fotproffsenclothingsupport@fotproffsen.se
Jack & Jonesclothingnoreply@jackjones.com
Anthropicsubscriptionsinvoice+statements@mail.anthropic.com
Mevlana Moskésubscriptionsinvoice+statements@mevlanagoteborg.se

Build docs developers (and LLMs) love