Skip to main content
This guide explains what happens when you ask Perplexica a question, from the moment you send a message to when you receive a cited, intelligent answer. For component-level architecture details, see Architecture.

Overview

When you send a message in the UI, the app calls POST /api/chat. At a high level, three things happen:
1

Classify the question

Decide what to do next based on the query
2

Run research and widgets in parallel

Gather information and structured data simultaneously
3

Write the final answer

Generate a response with citations
Let’s walk through each step in detail.

Step 1: Classification

Before searching or answering, Perplexica runs a classification step to understand the question and plan the response.

What the classifier decides

The classifier (src/lib/agents/search/classifier.ts) analyzes the query and determines:
  • Should we do research? Some questions don’t need web search (e.g., “What did we discuss earlier?”)
  • Which widgets are relevant? Weather, stocks, or calculations
  • What sources to use? Web, academic papers, or discussions
  • How to rewrite the query into a clearer, standalone form that works without conversation context
The classifier uses a structured schema with boolean flags for each decision. This ensures consistent, predictable behavior.

Classification output example

{
  "classification": {
    "skipSearch": false,
    "personalSearch": false,
    "academicSearch": true,
    "discussionSearch": false,
    "showWeatherWidget": false,
    "showStockWidget": false,
    "showCalculationWidget": false
  },
  "standaloneFollowUp": "How does quantum entanglement work?"
}

Step 2: Parallel execution

After classification, Perplexica runs two processes in parallel for optimal performance:

Widgets

Widgets are small, structured helpers that provide real-time data:

Weather

Current conditions and forecasts based on location

Stocks

Real-time market data and stock prices

Calculations

Evaluate mathematical expressions
Key characteristics:
  • Run independently of research
  • Show structured UI cards while the answer is being generated
  • Provide helpful context but are not cited as sources
  • Executed by src/lib/agents/search/widgets/executor.ts
Widgets complete quickly and appear in the UI before the final answer, giving users immediate value.

Research

If research is needed (based on classification), the researcher gathers information in the background: Research capabilities (src/lib/agents/search/researcher/actions/):
  • Web search: General information via SearXNG meta-search
  • Academic search: Scholarly papers and research articles
  • Social search: Discussion forums and community insights
  • Upload search: Semantic search over user-uploaded files (PDFs, documents)
  • URL scraping: Direct content extraction from specific URLs
How research works:
  1. The researcher receives the standalone query and enabled sources
  2. It selects appropriate tools based on the classification
  3. Tools run and gather relevant content
  4. Results are deduplicated and ranked
  5. A set of “search findings” is returned with metadata (title, URL, content)
Location: src/lib/agents/search/researcher/index.ts

Step 3: Answer generation

Once Perplexica has enough context from research and widgets, it generates the final response.

Context assembly

The system combines two types of context:
<search_results note="These are the search results and assistant can cite these">
  <result index=1 title="...">
    Content from web search...
  </result>
  <result index=2 title="...">
    Content from academic paper...
  </result>
</search_results>
These can and should be cited in the answer.
<widgets_result noteForAssistant="Its output is already showed to the user, assistant can use this information to answer the query but do not CITE this as a source">
  <result>
    Weather data...
  </result>
</widgets_result>
These provide context but should not be cited.

Optimization modes

You can control the tradeoff between speed and quality using optimizationMode:
  • Speed: Fast responses with lighter processing
  • Balanced: Default mode balancing speed and thoroughness
  • Quality: Deep analysis with more comprehensive answers
The mode affects:
  • How much context is gathered
  • The complexity of the writer prompt
  • Model parameters (temperature, top-p, etc.)

Streaming response

The answer is streamed to the user in real-time:
  1. Writer prompt is constructed with search context and system instructions
  2. LLM begins generating the response
  3. Each chunk is emitted to the user via Server-Sent Events (SSE)
  4. The UI updates progressively as text arrives
  5. When complete, the full response is saved to the database
Location: src/lib/agents/search/index.ts:122-166

How citations work

Perplexica prompts the model to cite the references it uses. The citation system works as follows:
  1. Source numbering: Each search result has an index
  2. Inline citations: The model references sources by index (e.g., [1], [2])
  3. UI rendering: Citations are rendered as clickable links alongside the answer
  4. Supporting links: Each citation links to the original source
The writer prompt explicitly instructs the model to cite sources. This is handled by the prompt template in src/lib/prompts/search/writer.ts.

Media search (images and videos)

Image and video search use separate, specialized endpoints:

How it differs

  • Endpoint: POST /api/images and POST /api/videos
  • Process:
    1. Generate a focused query using the chat model
    2. Fetch matching results from the search backend
    3. Return structured media results
  • No research phase: These are pure search operations
  • No citations: Results are displayed as a gallery
Location: src/lib/agents/media/image.ts and src/lib/agents/media/video.ts

Search API for integrations

If you’re integrating Perplexica into another product, use POST /api/search.

Response format

{
  "message": "The generated answer with citations...",
  "sources": [
    {
      "title": "Source Title",
      "url": "https://example.com",
      "snippet": "Relevant excerpt..."
    }
  ]
}

Streaming mode

Enable streaming by setting stream: true in your request:
fetch('/api/search', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'How does quantum computing work?',
    stream: true
  })
});
The response will be streamed as Server-Sent Events.

Complete flow diagram

Here’s the complete flow from question to answer:

Performance optimizations

Perplexica is designed for speed:
  • Parallel execution: Widgets and research run simultaneously
  • Streaming: Users see responses as they’re generated
  • Efficient prompts: Prompts are optimized per mode
  • Caching: Provider connections are reused
  • Database indexes: Fast chat and message lookups

Next steps

Architecture

Deep dive into components and code structure

API Reference

Integrate Perplexica into your applications

Build docs developers (and LLMs) love