Documentation Index
Fetch the complete documentation index at: https://mintlify.com/gradio-app/gradio/llms.txt
Use this file to discover all available pages before exploring further.
In this guide, we go through several examples of how to use gr.ChatInterface with popular LLM libraries and API providers.
We will cover the following libraries and API providers:
- Llama Index
- LangChain
- OpenAI
- Hugging Face
transformers
- SambaNova
- Hyperbolic
- Anthropic’s Claude
For many LLM libraries and providers, there exist community-maintained integration libraries that make it even easier to spin up Gradio apps. We reference these libraries in the appropriate sections below.
Llama Index
Let’s start by using llama-index on top of openai to build a RAG chatbot on any text or PDF files that you can demo and share in less than 30 lines of code. You’ll need to have an OpenAI key for this example (keep reading for the free, open-source equivalent!):
import gradio as gr
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
def chat(message, history):
chat_engine = index.as_chat_engine()
response = chat_engine.stream_chat(message)
partial_response = ""
for token in response.response_gen:
partial_response += token
yield partial_response
demo = gr.ChatInterface(
fn=chat,
title="Chat with your documents using LlamaIndex",
description="Upload your documents in the 'data' folder and start chatting!",
)
if __name__ == "__main__":
demo.launch()
LangChain
Here’s an example using langchain on top of openai to build a general-purpose chatbot. As before, you’ll need to have an OpenAI key for this example:
import gradio as gr
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, AIMessage
llm = ChatOpenAI(model="gpt-4o-mini")
def chat(message, history):
# Convert history to LangChain message format
lc_history = []
for msg in history:
if msg["role"] == "user":
lc_history.append(HumanMessage(content=msg["content"][0]["text"]))
else:
lc_history.append(AIMessage(content=msg["content"][0]["text"]))
# Add current message
lc_history.append(HumanMessage(content=message))
# Get response
response = llm.stream(lc_history)
partial_response = ""
for chunk in response:
partial_response += chunk.content
yield partial_response
demo = gr.ChatInterface(
fn=chat,
title="LangChain Chatbot",
)
if __name__ == "__main__":
demo.launch()
For quick prototyping, the community-maintained langchain-gradio repo makes it even easier to build chatbots on top of LangChain.
OpenAI
Of course, we could also use the openai library directly. Here’s a similar example to the LangChain one, but this time with streaming as well:
import gradio as gr
from openai import OpenAI
client = OpenAI()
def chat(message, history):
# Convert history to OpenAI message format
messages = [{"role": "system", "content": "You are a helpful assistant."}]
for msg in history:
if msg["role"] == "user":
messages.append({"role": "user", "content": msg["content"][0]["text"]})
else:
messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
# Add current message
messages.append({"role": "user", "content": message})
# Get streaming response
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True,
)
partial_response = ""
for chunk in stream:
if chunk.choices[0].delta.content is not None:
partial_response += chunk.choices[0].delta.content
yield partial_response
demo = gr.ChatInterface(
fn=chat,
title="OpenAI Chatbot",
)
if __name__ == "__main__":
demo.launch()
For quick prototyping, the openai-gradio library makes it even easier to build chatbots on top of OpenAI models.
Of course, in many cases you want to run a chatbot locally. Here’s the equivalent example using the SmolLM2-135M-Instruct model using the Hugging Face transformers library:
import gradio as gr
from transformers import pipeline, TextIteratorStreamer
from threading import Thread
pipe = pipeline(
"text-generation",
model="HuggingFaceTB/SmolLM2-135M-Instruct",
device_map="auto"
)
def chat(message, history):
# Convert history to transformers format
messages = [{"role": "system", "content": "You are a helpful assistant."}]
for msg in history:
if msg["role"] == "user":
messages.append({"role": "user", "content": msg["content"][0]["text"]})
else:
messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
# Add current message
messages.append({"role": "user", "content": message})
# Generate response with streaming
streamer = TextIteratorStreamer(pipe.tokenizer, skip_special_tokens=True)
generation_kwargs = dict(
messages=messages,
max_new_tokens=256,
streamer=streamer,
)
thread = Thread(target=pipe, kwargs=generation_kwargs)
thread.start()
partial_response = ""
for token in streamer:
partial_response += token
yield partial_response
demo = gr.ChatInterface(
fn=chat,
title="SmolLM2 Chatbot",
)
if __name__ == "__main__":
demo.launch()
SambaNova
The SambaNova Cloud API provides access to full-precision open-source models, such as the Llama family. Here’s an example of how to build a Gradio app around the SambaNova API:
import gradio as gr
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.sambanova.ai/v1",
api_key=os.environ.get("SAMBANOVA_API_KEY"),
)
def chat(message, history):
messages = [{"role": "system", "content": "You are a helpful assistant."}]
for msg in history:
if msg["role"] == "user":
messages.append({"role": "user", "content": msg["content"][0]["text"]})
else:
messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
messages.append({"role": "user", "content": message})
stream = client.chat.completions.create(
model="Meta-Llama-3.1-8B-Instruct",
messages=messages,
stream=True,
)
partial_response = ""
for chunk in stream:
if chunk.choices[0].delta.content is not None:
partial_response += chunk.choices[0].delta.content
yield partial_response
demo = gr.ChatInterface(
fn=chat,
title="SambaNova Chatbot",
)
if __name__ == "__main__":
demo.launch()
For quick prototyping, the sambanova-gradio library makes it even easier to build chatbots on top of SambaNova models.
Hyperbolic
The Hyperbolic AI API provides access to many open-source models, such as the Llama family. Here’s an example of how to build a Gradio app around the Hyperbolic API:
import gradio as gr
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.hyperbolic.xyz/v1",
api_key=os.environ.get("HYPERBOLIC_API_KEY"),
)
def chat(message, history):
messages = [{"role": "system", "content": "You are a helpful assistant."}]
for msg in history:
if msg["role"] == "user":
messages.append({"role": "user", "content": msg["content"][0]["text"]})
else:
messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
messages.append({"role": "user", "content": message})
stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=messages,
stream=True,
)
partial_response = ""
for chunk in stream:
if chunk.choices[0].delta.content is not None:
partial_response += chunk.choices[0].delta.content
yield partial_response
demo = gr.ChatInterface(
fn=chat,
title="Hyperbolic Chatbot",
)
if __name__ == "__main__":
demo.launch()
For quick prototyping, the hyperbolic-gradio library makes it even easier to build chatbots on top of Hyperbolic models.
Anthropic’s Claude
Anthropic’s Claude model can also be used via API. Here’s a simple 20 questions-style game built on top of the Anthropic API:
import gradio as gr
import anthropic
import os
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
def chat(message, history):
# Convert history to Anthropic format
messages = []
for msg in history:
if msg["role"] == "user":
messages.append({"role": "user", "content": msg["content"][0]["text"]})
else:
messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
# Add current message
messages.append({"role": "user", "content": message})
# Get streaming response
with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="You are playing 20 questions. Think of an object and only answer yes/no questions about it. Don't reveal the object until the user guesses correctly or uses all 20 questions.",
messages=messages,
) as stream:
partial_response = ""
for text in stream.text_stream:
partial_response += text
yield partial_response
demo = gr.ChatInterface(
fn=chat,
title="20 Questions with Claude",
description="I'm thinking of an object. You have 20 questions to guess what it is!",
)
if __name__ == "__main__":
demo.launch()
These examples demonstrate how easy it is to integrate various LLM providers with Gradio’s ChatInterface. You can mix and match different providers and customize the chat function to suit your specific needs.