Skip to main content

LlamaIndex Adapter

LlamaIndex is a framework for building LLM-powered applications. LlamaIndex helps you ingest, structure, and access private or domain-specific data. LlamaIndex.TS offers the core features of LlamaIndex for Python for popular runtimes like Node.js (official support), Vercel Edge Functions (experimental), and Deno (experimental).

Installation

llamaindex is a required peer dependency.

Features

  • Transform LlamaIndex ChatEngine and QueryEngine streams to AI SDK UIMessageStream
  • Seamless integration with AI SDK UI components like useCompletion
  • Support for RAG (Retrieval Augmented Generation) workflows
  • Compatible with LlamaIndex’s document processing and indexing capabilities

Example: Completion

Here is a basic example that uses both AI SDK and LlamaIndex together with the Next.js App Router. The AI SDK @ai-sdk/llamaindex package uses the stream result from calling the chat method on a LlamaIndex ChatEngine or the query method on a LlamaIndex QueryEngine to pipe text to the client.
import { OpenAI, SimpleChatEngine } from 'llamaindex';
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

export const maxDuration = 60;

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const llm = new OpenAI({ model: 'gpt-4o' });
  const chatEngine = new SimpleChatEngine({ llm });

  const stream = await chatEngine.chat({
    message: prompt,
    stream: true,
  });

  return createUIMessageStreamResponse({
    stream: toUIMessageStream(stream),
  });
}
Then, we use the AI SDK’s useCompletion method in the page component to handle the completion:
'use client';

import { useCompletion } from '@ai-sdk/react';

export default function Chat() {
  const { completion, input, handleInputChange, handleSubmit } =
    useCompletion();

  return (
    <div>
      {completion}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
      </form>
    </div>
  );
}

Example: RAG with QueryEngine

LlamaIndex excels at building RAG applications. Here’s an example using a QueryEngine with document indexing:
import {
  OpenAI,
  VectorStoreIndex,
  SimpleDirectoryReader,
} from 'llamaindex';
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

export const maxDuration = 60;

// Initialize once (consider caching in production)
let queryEngine: any = null;

async function getQueryEngine() {
  if (!queryEngine) {
    // Load documents from a directory
    const reader = new SimpleDirectoryReader();
    const documents = await reader.loadData('./data');

    // Create index from documents
    const index = await VectorStoreIndex.fromDocuments(documents);

    // Create query engine
    queryEngine = index.asQueryEngine({
      llm: new OpenAI({ model: 'gpt-4o' }),
    });
  }
  return queryEngine;
}

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const engine = await getQueryEngine();

  const stream = await engine.query({
    query: prompt,
    stream: true,
  });

  return createUIMessageStreamResponse({
    stream: toUIMessageStream(stream),
  });
}

Example: Chat with Context

Build a conversational interface with document context:
import {
  OpenAI,
  VectorStoreIndex,
  ContextChatEngine,
  SimpleDirectoryReader,
} from 'llamaindex';
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

export const maxDuration = 60;

let chatEngine: any = null;

async function getChatEngine() {
  if (!chatEngine) {
    const reader = new SimpleDirectoryReader();
    const documents = await reader.loadData('./data');
    const index = await VectorStoreIndex.fromDocuments(documents);

    // Create a chat engine with context
    chatEngine = new ContextChatEngine({
      retriever: index.asRetriever(),
      llm: new OpenAI({ model: 'gpt-4o' }),
    });
  }
  return chatEngine;
}

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const engine = await getChatEngine();

  const stream = await engine.chat({
    message: prompt,
    stream: true,
  });

  return createUIMessageStreamResponse({
    stream: toUIMessageStream(stream),
  });
}
Use with the useCompletion hook on the client:
'use client';

import { useCompletion } from '@ai-sdk/react';

export default function ChatWithContext() {
  const { completion, input, handleInputChange, handleSubmit, isLoading } =
    useCompletion({
      api: '/api/chat',
    });

  return (
    <div>
      <div className="response">
        {completion || 'Ask a question about your documents...'}
      </div>
      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          disabled={isLoading}
          placeholder="What would you like to know?"
        />
        <button type="submit" disabled={isLoading}>
          {isLoading ? 'Thinking...' : 'Ask'}
        </button>
      </form>
    </div>
  );
}

API Reference

toUIMessageStream(stream)

Converts a LlamaIndex ChatEngine or QueryEngine stream to an AI SDK UIMessageStream.
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

const stream = await chatEngine.chat({
  message: prompt,
  stream: true,
});

return createUIMessageStreamResponse({
  stream: toUIMessageStream(stream),
});
Parameters:
  • stream: AsyncIterable - Stream from LlamaIndex ChatEngine or QueryEngine
Returns: ReadableStream<UIMessageChunk>

Integration with LlamaIndex Features

The adapter works seamlessly with LlamaIndex’s powerful features:

Document Loaders

  • Load documents from various sources (files, URLs, databases)
  • Support for multiple file formats (PDF, Markdown, JSON, etc.)
  • Custom document readers

Vector Stores

  • In-memory vector storage
  • Integration with external vector databases
  • Efficient similarity search

Retrievers

  • Vector similarity retrieval
  • Keyword-based retrieval
  • Hybrid retrieval strategies

Query Engines

  • Simple query engine for basic RAG
  • Sub-question query engine for complex queries
  • Custom query engines

Chat Engines

  • Simple chat engine
  • Context chat engine with retrieval
  • Condense question chat engine

More Examples

create-llama is the easiest way to get started with LlamaIndex. It uses the AI SDK to connect to LlamaIndex in all its generated code.

Learn More

Build docs developers (and LLMs) love