Skip to main content
Prism is built to handle more than text documents. You can upload images, source code in dozens of languages, PDFs, and Word documents. All content types are processed into semantic chunks, embedded, and indexed — so everything is searchable by meaning and available as RAG context.

Supported content types

CategoryFormats
DocumentsPDF, DOCX, MD, TXT
Codejs, jsx, ts, tsx, py, java, cpp, c, h, hpp, cs, rb, go, rs, php, swift, kt, scala, r, css, scss, sass, html, xml, json, yaml, yml, sql, sh, bash, ps1, bat, cmake, dockerfile
Imagesjpg, jpeg, png, gif, webp, bmp, svg

How each type is processed

PDF files are parsed page by page using pdf2json. Each page’s text content is extracted and concatenated with double newlines between pages. The resulting text is then split into overlapping chunks of up to 1,000 characters with a 200-character overlap, so context is not lost at chunk boundaries.
Scanned PDFs (image-only, no embedded text layer) cannot be extracted this way. Use an OCR tool to add a text layer before uploading.

The processing pipeline

Regardless of file type, all content goes through the same downstream pipeline once text is extracted:
1

Extract text

Text is extracted from the file using the appropriate method for its format. For images, Gemini Vision generates the description.
2

Chunk

The text is split into overlapping chunks. The default chunk size is 1,000 characters with a 200-character overlap. Sentence and paragraph boundaries are respected where possible to avoid splitting mid-thought.
3

Embed

Each chunk is converted to a 768-dimensional vector by the Gemini text-embedding-004 model. Chunks are processed in batches of up to 100 at a time.
4

Index

The vectors are upserted into Qdrant with metadata including document name, type, category, chunk index, and upload date. All content is then searchable.

Image search in practice

Because image descriptions cover visual attributes — subjects, colors, setting, mood, text, and logos — you can search for images using descriptive phrases:
  • “a dark background with white text showing a terminal output”
  • “a flowchart showing a login sequence”
  • “photo of a whiteboard with handwritten architecture diagram”
The same description is available as context when you chat, so you can ask questions like “what does the diagram in architecture.png show?” and get a substantive answer.
If Gemini Vision is temporarily unavailable during upload, Prism retries up to three times with exponential backoff before failing the image ingestion.

Build docs developers (and LLMs) love