Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JerryZLiu/Dayflow/llms.txt
Use this file to discover all available pages before exploring further.
AI Processing Pipeline
Dayflow’s AI pipeline transforms screenshots into meaningful timeline cards through a multi-stage process. The exact implementation varies by AI provider to optimize for quality, cost, and latency.
Pipeline Overview
The processing pipeline consists of five stages:
1. Capture Stage
Responsibility: Record screen activity as lightweight screenshots
Implementation: ScreenRecorder.swift:304
Process:
- Timer fires every 10 seconds (configurable via
ScreenshotConfig.interval)
- Captures active display using ScreenCaptureKit
- Scales to ~1080p with aspect ratio preservation
- Compresses as JPEG (quality: 0.85)
- Saves to file system (~50-200KB per screenshot)
- Persists metadata to
screenshots table
Key Features:
- No recording indicator (uses screenshot API, not video stream)
- Automatic pause on sleep/lock/screensaver
- Multi-display support with active display tracking
- Even width/height for video encoding compatibility
Storage Location: ~/Library/Application Support/Dayflow/recordings/*.jpg
2. Analyze Stage
Responsibility: Group screenshots into batches and send to AI for transcription
Implementation: AnalysisManager.swift:326
Batching Logic:
// Screenshot batch creation (AnalysisManager.swift:476)
func createScreenshotBatches(from screenshots: [Screenshot]) {
let maxGap: TimeInterval = config.maxGap // 5 minutes
let maxBatchDuration: TimeInterval = config.targetDuration // 15-30 min
// Break batch if:
// 1. Gap between screenshots > 5 minutes
// 2. Batch duration exceeds target duration
// 3. Minimum batch duration: 5 minutes
}
Batching Strategy:
- Check Interval: Every 60 seconds
- Max Gap: 5 minutes (breaks batch if gap exceeds this)
- Target Duration: 15-30 minutes per batch
- Minimum Duration: 5 minutes (skips shorter batches)
- Lookback Window: 24 hours (only processes recent data)
Database Updates:
- Create
analysis_batches record
- Link screenshots via
batch_screenshots junction table
- Mark batch status as
pending
- Queue for LLM processing
3. Generate Stage
Responsibility: Transform screenshots → observations → timeline cards
Implementation: LLMService.swift:507
Two-Stage Processing:
Stage 3a: Transcription (Screenshots → Observations)
Purpose: Convert visual screenshots into text descriptions
Code: LLMService.swift:582
// Transcribe screenshots using provider
let transcribeResult = try await executeWithProviderBackup(
operation: "transcribe",
batchId: batchId,
primaryContext: primaryContext,
activeContext: activeContext,
backupContext: backupContext
) { context in
try await context.actions.transcribeScreenshots(screenshots, batchStartDate, batchId)
}
observations = transcribeResult.value.observations
Output: Array of Observation objects:
struct Observation {
let id: Int64?
let batchId: Int64
let startTs: Int // Unix timestamp
let endTs: Int
let observation: String // Text description of screenshot
let metadata: String?
let llmModel: String?
}
Example Observation:
“User is editing code in Xcode, working on ScreenRecorder.swift file, implementing the captureScreenshot function.”
Stage 3b: Card Generation (Observations → Timeline Cards)
Purpose: Synthesize observations into activity cards
Code: LLMService.swift:680
Sliding Window Context:
// Calculate card-generation lookback window (LLMService.swift:624)
let currentTime = Date(timeIntervalSince1970: TimeInterval(batchEndTs))
let windowStartTime = currentTime.addingTimeInterval(-batchingConfig.cardLookbackDuration)
// Fetch observations from the recent batching window
let recentObservations = StorageManager.shared.fetchObservationsByTimeRange(
from: windowStartTime,
to: currentTime
)
// Fetch existing timeline cards for context
let existingTimelineCards = StorageManager.shared.fetchTimelineCardsByTimeRange(
from: windowStartTime,
to: currentTime
)
Why Sliding Window?
- Maintains context continuity across batch boundaries
- Allows merging/splitting activities that span multiple batches
- Provides AI with broader temporal context
Context Provided to AI:
struct ActivityGenerationContext {
let batchObservations: [Observation] // Current batch
let existingCards: [ActivityCardData] // Previous cards in window
let currentTime: Date // Batch end time
let categories: [LLMCategoryDescriptor] // User-defined categories
}
Output: Array of ActivityCardData:
struct ActivityCardData {
let startTime: String // "2:30 PM"
let endTime: String // "3:45 PM"
let category: String // "Development"
let subcategory: String // "Coding"
let title: String // "Implementing screen recorder"
let summary: String // Brief description
let detailedSummary: String // Comprehensive summary
let distractions: [Distraction]?
let appSites: AppSites? // Apps and websites used
}
Card Replacement Strategy:
// Replace old cards with new ones in the time range (LLMService.swift:698)
let (insertedCardIds, deletedVideoPaths) = StorageManager.shared.replaceTimelineCardsInRange(
from: windowStartTime,
to: currentTime,
with: cards,
batchId: batchId
)
Why Replace Instead of Insert?
- Handles activity merging/splitting across batches
- Ensures timeline accuracy when context changes
- Atomic update prevents duplicate cards
- Old timelapse videos are cleaned up automatically
4. Display Stage
Responsibility: Render timeline cards in the UI
Implementation: SwiftUI views observe database changes
Process:
- Timeline view fetches cards for selected day
- Groups cards by category and time
- Displays with smooth animations
- Generates timelapses on-demand when user clicks card
Timelapse Generation: VideoProcessingService.swift
// On-demand timelapse creation
func generateTimelapse(for card: TimelineCard) {
1. Parse time range from card.startTimestamp/endTimestamp
2. Fetch screenshots in range from database
3. Create video from screenshots using AVFoundation
4. Save to file system with card ID
5. Update card.videoSummaryURL in database
}
5. Cleanup Stage
Responsibility: Manage storage within configured limits
Implementation: StorageManager.swift
Automatic Cleanup:
- Runs hourly via scheduled timer
- Configurable storage limits (1GB - 20GB, or unlimited)
- Deletes oldest screenshots first
- Preserves timeline cards and observations
- Also deletes associated timelapse videos
Manual Cleanup: Settings → Storage → Delete recordings older than X days
Provider-Specific Pipelines
The efficiency and quality of the pipeline varies significantly by AI provider:
Gemini Pipeline: 2 LLM Calls
Provider: GeminiDirectProvider.swift
Diagram:
Implementation Details:
Call 1: Upload + Transcribe
// GeminiDirectProvider.swift
func transcribeScreenshots(
_ screenshots: [Screenshot],
batchStartTime: Date,
batchId: Int64?
) async throws -> (observations: [Observation], log: LLMCall) {
// 1. Create video from screenshots
let videoURL = try await createVideoFromScreenshots(screenshots)
// 2. Upload video to Gemini Files API
let fileURI = try await uploadVideo(videoURL)
// 3. Wait for video processing
try await pollFileStatus(fileURI)
// 4. Generate transcription with native video understanding
let prompt = buildTranscriptionPrompt()
let response = try await generateContent(
model: current.model,
fileURI: fileURI,
prompt: prompt
)
// 5. Parse observations from response
return parseObservations(response)
}
Call 2: Generate Cards
func generateActivityCards(
observations: [Observation],
context: ActivityGenerationContext,
batchId: Int64?
) async throws -> (cards: [ActivityCardData], log: LLMCall) {
let prompt = buildCardGenerationPrompt(
observations: context.batchObservations,
existingCards: context.existingCards,
categories: context.categories
)
let response = try await generateContent(
model: current.model,
prompt: prompt
)
return parseActivityCards(response)
}
Advantages:
- ✅ Most efficient: Only 2 LLM calls per batch
- ✅ Native video understanding: Leverages Gemini’s vision capabilities
- ✅ Fast: Typical batch processing: 30-60 seconds
- ✅ Cost-effective: Minimal API calls
Fallback Strategy:
// Automatic model fallback on capacity errors
models: [
.gemini_2_0_flash_exp, // Try first
.gemini_1_5_flash // Fallback on 429/503
]
// Automatic Gemma 2 fallback on persistent errors
if error.isCapacityError && hasGemmaBackup {
fallbackState.preferGemma = true
return try await gemmaProvider.transcribeScreenshots(...)
}
Local Pipeline: 33+ LLM Calls
Provider: OllamaProvider.swift
Diagram:
Implementation Strategy:
Step 1: Extract Frames
// Extract 30 frames evenly distributed across video
let frames = extractFrames(from: videoURL, count: 30)
Step 2: Describe Each Frame (30 Calls)
for (index, frame) in frames.enumerated() {
let description = try await llm.describeImage(
image: frame,
prompt: "Describe what the user is doing in this screenshot."
)
observations.append(description)
}
Step 3: Merge Observations (1 Call)
let prompt = """
Merge these 30 observations into coherent activity segments:
\(observations.joined(separator: "\n"))
"""
let mergedSegments = try await llm.generate(prompt: prompt)
Step 4: Generate Titles (1 Call per Segment)
for segment in segments {
let title = try await llm.generate(
prompt: "Create a concise title for: \(segment)"
)
}
Step 5: Merge Check (1 Call)
let shouldMerge = try await llm.generate(
prompt: "Should these adjacent segments be merged? \(segment1) \(segment2)"
)
Step 6: Final Merge (1 Call if needed)
Trade-offs:
- ❌ Inefficient: 33+ LLM calls per batch
- ❌ Slower: Typical batch processing: 5-10 minutes
- ❌ GPU-heavy: Drains battery on unplugged MacBooks
- ✅ Private: All processing stays on-device
- ✅ No API costs: Free after model download
- ⚠️ Quality varies: Depends heavily on local model capabilities
Recommended Models:
- LLaVA 7B/13B: Good balance of speed and quality
- BakLLaVA: Optimized for screenshots
- GPT4-Vision-like models: Best quality but slower
ChatGPT/Claude Pipeline: 4-6 LLM Calls
Provider: ChatCLIProvider.swift
Diagram:
Implementation Strategy:
Step 1: Extract Frames (Every 60s)
// Extract frames at 60-second intervals
let frames = extractFrames(from: videoURL, interval: 60)
// For a 30-minute batch: ~30 frames
Step 2: Batch Describe (3 Calls for 30 frames)
// Process 10 frames per call
let batches = frames.chunked(into: 10)
for batch in batches {
let descriptions = try await chatCLI.analyzeImages(
images: batch,
prompt: "Describe each screenshot and what the user is doing."
)
}
Step 3: Merge Segments (1 Call)
let prompt = """
Merge these observations into coherent activities:
\(allDescriptions.joined(separator: "\n\n"))
"""
let segments = try await chatCLI.generate(prompt: prompt)
Step 4: Generate Cards (1 Call)
let cards = try await chatCLI.generateCards(
observations: segments,
existingCards: context.existingCards,
categories: context.categories
)
CLI Integration:
// ChatCLIRunner.swift
func runChatCLI(
tool: ChatCLITool, // .codex or .claude
prompt: String,
images: [Data]
) async throws -> String {
// 1. Save images to temp files
let imagePaths = saveTemporaryImages(images)
// 2. Build CLI command
let command: String
switch tool {
case .codex:
command = "codex \(imagePaths.joined(separator: " ")) --prompt \(prompt)"
case .claude:
command = "claude \(imagePaths.joined(separator: " ")) --prompt \(prompt)"
}
// 3. Execute CLI
let process = Process()
process.launchPath = "/usr/bin/env"
process.arguments = ["bash", "-c", command]
let output = try await process.run()
// 4. Cleanup temp files
cleanupTemporaryImages(imagePaths)
return output
}
Requirements:
- Codex CLI installed and signed in (ChatGPT Plus/Pro)
- Claude Code installed and signed in (Claude Pro)
- Active internet connection
- Valid paid subscription
Advantages:
- ✅ Best quality: Frontier reasoning models (GPT-4V, Claude 3.5 Sonnet)
- ✅ Efficient: 4-6 calls vs. 33+ for local
- ✅ Faster than local: 1-3 minutes per batch
- ✅ Streaming support: Real-time updates in chat interface
- ❌ Requires subscription: $20/month minimum
- ❌ Privacy: Data processed by OpenAI/Anthropic
Dayflow Backend Pipeline
Provider: DayflowBackendProvider.swift
Purpose: Cloud-based processing alternative for users without API keys
Implementation:
func transcribeScreenshots(...) async throws -> (observations: [Observation], log: LLMCall) {
// Upload screenshots to Dayflow backend
let uploadURL = try await uploadScreenshots(screenshots, to: endpoint)
// Poll for processing completion
let observations = try await pollForObservations(uploadURL)
return observations
}
Note: Backend provider is currently in limited beta.
Error Handling
Provider Fallback
Primary + Backup Provider System: LLMService.swift:308
func executeWithProviderBackup<T>(
operation: String,
batchId: Int64?,
primaryContext: TimelineProviderContext,
activeContext: TimelineProviderContext,
backupContext: TimelineProviderContext?,
work: (TimelineProviderContext) async throws -> T
) async throws -> (value: T, activeContext: TimelineProviderContext, usedProviderBackup: Bool) {
do {
// Try primary provider
let value = try await work(activeContext)
return (value, activeContext, false)
} catch {
guard let backupContext else { throw error }
// Log fallback attempt
AnalyticsService.shared.capture("llm_timeline_fallback_attempted", ...)
do {
// Try backup provider
let value = try await work(backupContext)
AnalyticsService.shared.capture("llm_timeline_fallback_succeeded", ...)
return (value, backupContext, true)
} catch {
AnalyticsService.shared.capture("llm_timeline_fallback_failed", ...)
throw error
}
}
}
User Configuration: Settings → Providers → Backup Provider
Error Cards
When Processing Fails: LLMService.swift:822
func createErrorCard(
batchId: Int64,
batchStartTime: Date,
batchEndTime: Date,
error: Error
) -> TimelineCardShell {
return TimelineCardShell(
startTimestamp: startTimeStr,
endTimestamp: endTimeStr,
category: "System",
subcategory: "Error",
title: "Processing failed",
summary: "Failed to process \(duration) minutes... \(humanError) Your recording is safe and can be reprocessed.",
detailedSummary: "Error details: \(error.localizedDescription)\n\nThis recording batch can be reprocessed by retrying from Settings."
)
}
User-Friendly Error Messages: LLMService.swift:851
func getHumanReadableError(_ error: Error) -> String {
// Maps technical errors to user-friendly messages
// Examples:
// - "Rate limited. Too many requests to Gemini. Please wait a few minutes."
// - "Invalid API key. Please check your Gemini API key in Settings."
// - "Google's AI services may be temporarily down. Check the status page."
}
Typical Processing Times (30-minute batch)
| Provider | LLM Calls | Processing Time | Cost (approx) |
|---|
| Gemini 2.0 Flash | 2 | 30-60 seconds | $0.05-0.10 |
| Gemini 1.5 Flash | 2 | 45-90 seconds | $0.03-0.08 |
| ChatGPT (GPT-4V) | 4-6 | 1-3 minutes | $0.50-1.50 |
| Claude (3.5 Sonnet) | 4-6 | 1-3 minutes | $0.40-1.20 |
| Local (LLaVA 7B) | 33+ | 5-10 minutes | Free |
| Local (LLaVA 13B) | 33+ | 8-15 minutes | Free |
Storage Requirements (per day of recording)
Screenshots (10s interval, 16 hours/day):
- Count: ~5,760 screenshots
- Size: ~50-200KB each
- Total: ~300MB - 1.2GB/day
Database:
- Observations: ~1-2KB each
- Timeline Cards: ~2-5KB each
- Growth: ~5-10MB/day
Timelapses (generated on-demand):
- Size: ~5-20MB per card
- Cached until storage limit reached
LLM Call Logging
View Recent Calls: Settings → Debug → LLM Calls
Database Table: llm_calls
SELECT
created_at,
batch_id,
provider,
operation,
status,
latency_ms,
error_message
FROM llm_calls
ORDER BY created_at DESC
LIMIT 100;
Observation Inspector
View Transcriptions: Settings → Debug → Observations
Database Table: observations
SELECT
batch_id,
start_ts,
end_ts,
observation
FROM observations
WHERE batch_id = ?
ORDER BY start_ts;
Batch Status
View All Batches: Settings → Debug → Analysis Batches
Database Table: analysis_batches
SELECT
id,
status,
batch_start_ts,
batch_end_ts,
reason
FROM analysis_batches
ORDER BY batch_start_ts DESC;
Status Values:
pending: Waiting for processing
processing: Currently being analyzed
analyzed: Successfully completed
failed: Processing error (see reason)
skipped_short: Batch too short (less than 5 min)
failed_empty: No screenshots in batch
Next Steps