Exploring semantic search, vector databases, and retrieval-augmented generation for LLM applications
Chapter 8 explores semantic search and retrieval-augmented generation (RAG), vital components for building LLM applications that can access and utilize external knowledge.
Semantic search goes beyond keyword matching to understand the meaning and context of queries. When combined with large language models through RAG, it enables systems to provide accurate, grounded responses based on retrieved information.
import cohere# Initialize Cohere clientapi_key = 'YOUR_API_KEY'co = cohere.Client(api_key)# Sample text about Interstellartext = """Interstellar is a 2014 epic science fiction film co-written, directed, and produced by Christopher Nolan.It stars Matthew McConaughey, Anne Hathaway, Jessica Chastain, Bill Irwin, Ellen Burstyn, Matt Damon, and Michael Caine.Set in a dystopian future where humanity is struggling to survive, the film follows a group of astronauts who travel through a wormhole near Saturn in search of a new home for mankind."""# Split into sentencestexts = text.split('.')texts = [t.strip(' \n') for t in texts]
import numpy as np# Get embeddings using Cohereresponse = co.embed( texts=texts, input_type="search_document",).embeddingsembeds = np.array(response)print(embeds.shape)
Output:
(15, 4096)
Cohere’s embedding model produces 4096-dimensional vectors. Different models produce different dimensionalities.
Build a complete RAG pipeline using local models for full control and privacy.
1
Load the Generation Model
from langchain import LlamaCpp# Load a local LLMllm = LlamaCpp( model_path="Phi-3-mini-4k-instruct-q4.gguf", n_gpu_layers=-1, max_tokens=500, n_ctx=2048, seed=42, verbose=False)
2
Load the Embedding Model
from langchain.embeddings.huggingface import HuggingFaceEmbeddings# Embedding model for text to numerical representationsembedding_model = HuggingFaceEmbeddings( model_name='BAAI/bge-small-en-v1.5')
3
Create Vector Database
from langchain.vectorstores import FAISS# Create a local vector databasedb = FAISS.from_texts(texts, embedding_model)
4
Build RAG Pipeline
from langchain import PromptTemplatefrom langchain.chains import RetrievalQA# Create prompt templatetemplate = """<|user|>Relevant information:{context}Provide a concise answer to the following question using the relevant information:{question}<|end|><|assistant|>"""prompt = PromptTemplate( template=template, input_variables=["context", "question"])# RAG Pipelinerag = RetrievalQA.from_chain_type( llm=llm, chain_type='stuff', retriever=db.as_retriever(), chain_type_kwargs={"prompt": prompt}, verbose=True)
5
Query the System
# Ask a questionresult = rag.invoke('Income generated')print(result['result'])
Output:
"Interstellar grossed over $677 million worldwide in 2014 and had additional earnings from subsequent re-releases, totaling approximately $773 million."