How it works

This guide explains what happens when you ask Perplexica a question, from the moment you send a message to when you receive a cited, intelligent answer. For component-level architecture details, see Architecture.

Overview

When you send a message in the UI, the app calls POST /api/chat. At a high level, three things happen:

Classify the question

Decide what to do next based on the query

Run research and widgets in parallel

Gather information and structured data simultaneously

Write the final answer

Generate a response with citations

Let’s walk through each step in detail.

Step 1: Classification

Before searching or answering, Perplexica runs a classification step to understand the question and plan the response.

What the classifier decides

The classifier (src/lib/agents/search/classifier.ts) analyzes the query and determines:

Should we do research? Some questions don’t need web search (e.g., “What did we discuss earlier?”)
Which widgets are relevant? Weather, stocks, or calculations
What sources to use? Web, academic papers, or discussions
How to rewrite the query into a clearer, standalone form that works without conversation context

The classifier uses a structured schema with boolean flags for each decision. This ensures consistent, predictable behavior.

Classification output example

{
  "classification": {
    "skipSearch": false,
    "personalSearch": false,
    "academicSearch": true,
    "discussionSearch": false,
    "showWeatherWidget": false,
    "showStockWidget": false,
    "showCalculationWidget": false
  },
  "standaloneFollowUp": "How does quantum entanglement work?"
}

Step 2: Parallel execution

After classification, Perplexica runs two processes in parallel for optimal performance:

Widgets

Widgets are small, structured helpers that provide real-time data:

Weather

Current conditions and forecasts based on location

Stocks

Real-time market data and stock prices

Calculations

Evaluate mathematical expressions

Key characteristics:

Run independently of research
Show structured UI cards while the answer is being generated
Provide helpful context but are not cited as sources
Executed by src/lib/agents/search/widgets/executor.ts

Widgets complete quickly and appear in the UI before the final answer, giving users immediate value.

Research

If research is needed (based on classification), the researcher gathers information in the background: Research capabilities (src/lib/agents/search/researcher/actions/):

Web search: General information via SearXNG meta-search
Academic search: Scholarly papers and research articles
Social search: Discussion forums and community insights
Upload search: Semantic search over user-uploaded files (PDFs, documents)
URL scraping: Direct content extraction from specific URLs

How research works:

The researcher receives the standalone query and enabled sources
It selects appropriate tools based on the classification
Tools run and gather relevant content
Results are deduplicated and ranked
A set of “search findings” is returned with metadata (title, URL, content)

Location: src/lib/agents/search/researcher/index.ts

Step 3: Answer generation

Once Perplexica has enough context from research and widgets, it generates the final response.

Context assembly

The system combines two types of context:

Search results (citable)

<search_results note="These are the search results and assistant can cite these">
  <result index=1 title="...">
    Content from web search...
  </result>
  <result index=2 title="...">
    Content from academic paper...
  </result>
</search_results>

These can and should be cited in the answer.

Widget results (non-citable)

<widgets_result noteForAssistant="Its output is already showed to the user, assistant can use this information to answer the query but do not CITE this as a source">
  <result>
    Weather data...
  </result>
</widgets_result>

These provide context but should not be cited.

Optimization modes

You can control the tradeoff between speed and quality using optimizationMode:

Speed: Fast responses with lighter processing
Balanced: Default mode balancing speed and thoroughness
Quality: Deep analysis with more comprehensive answers

The mode affects:

How much context is gathered
The complexity of the writer prompt
Model parameters (temperature, top-p, etc.)

Streaming response

The answer is streamed to the user in real-time:

Writer prompt is constructed with search context and system instructions
LLM begins generating the response
Each chunk is emitted to the user via Server-Sent Events (SSE)
The UI updates progressively as text arrives
When complete, the full response is saved to the database

Location: src/lib/agents/search/index.ts:122-166

How citations work

Perplexica prompts the model to cite the references it uses. The citation system works as follows:

Source numbering: Each search result has an index
Inline citations: The model references sources by index (e.g., [1], [2])
UI rendering: Citations are rendered as clickable links alongside the answer
Supporting links: Each citation links to the original source

The writer prompt explicitly instructs the model to cite sources. This is handled by the prompt template in src/lib/prompts/search/writer.ts.

Media search (images and videos)

Image and video search use separate, specialized endpoints:

How it differs

Endpoint: POST /api/images and POST /api/videos
Process:
1. Generate a focused query using the chat model
2. Fetch matching results from the search backend
3. Return structured media results
No research phase: These are pure search operations
No citations: Results are displayed as a gallery

Location: src/lib/agents/media/image.ts and src/lib/agents/media/video.ts

Search API for integrations

If you’re integrating Perplexica into another product, use POST /api/search.

Response format

{
  "message": "The generated answer with citations...",
  "sources": [
    {
      "title": "Source Title",
      "url": "https://example.com",
      "snippet": "Relevant excerpt..."
    }
  ]
}

Streaming mode

Enable streaming by setting stream: true in your request:

fetch('/api/search', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'How does quantum computing work?',
    stream: true
  })
});

The response will be streamed as Server-Sent Events.

Complete flow diagram

Here’s the complete flow from question to answer:

Performance optimizations

Perplexica is designed for speed:

Parallel execution: Widgets and research run simultaneously
Streaming: Users see responses as they’re generated
Efficient prompts: Prompts are optimized per mode
Caching: Provider connections are reused
Database indexes: Fast chat and message lookups

Get Started

Core Features

Configuration

Deployment

Advanced

Overview

Step 1: Classification

What the classifier decides

Classification output example

Step 2: Parallel execution

Widgets

Weather

Stocks

Calculations

Research

Step 3: Answer generation

Context assembly

Optimization modes

Streaming response

How citations work

Media search (images and videos)

How it differs

Search API for integrations

Response format

Streaming mode

Complete flow diagram

Performance optimizations

Next steps

Architecture

API Reference

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

Advanced

​Overview

​Step 1: Classification

​What the classifier decides

​Classification output example

​Step 2: Parallel execution

​Widgets

Weather

Stocks

Calculations

​Research

​Step 3: Answer generation

​Context assembly

​Optimization modes

​Streaming response

​How citations work

​Media search (images and videos)

​How it differs

​Search API for integrations

​Response format

​Streaming mode

​Complete flow diagram

​Performance optimizations

​Next steps

Architecture

API Reference

Build docs developers (and LLMs) love

Overview

Step 1: Classification

What the classifier decides

Classification output example

Step 2: Parallel execution

Widgets

Research

Step 3: Answer generation

Context assembly

Optimization modes

Streaming response

How citations work

Media search (images and videos)

How it differs

Search API for integrations

Response format

Streaming mode

Complete flow diagram

Performance optimizations

Next steps