Chapter 7: Advanced Text Generation Techniques and Tools
Going beyond prompt engineering with chains, memory, and agents
This chapter moves beyond prompt engineering to explore advanced tools and techniques for building sophisticated LLM applications. You’ll learn how to chain operations, maintain conversation memory, and create autonomous agents using LangChain.
To run the examples in this chapter, you’ll need a GPU. In Google Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4.
from langchain import LlamaCpp# Load Phi-3 modelllm = LlamaCpp( model_path="Phi-3-mini-4k-instruct-fp16.gguf", n_gpu_layers=-1, max_tokens=500, n_ctx=2048, seed=42, verbose=False)# Test the modelllm.invoke("Hi! My name is Maarten. What is 1 + 1?")
Output:
''
Without proper templating, the model may not respond correctly. This is why chains with prompt templates are essential.
from langchain import PromptTemplate# Create a prompt template with the Phi-3 chat formattemplate = """<s><|user|>{input_prompt}<|end|><|assistant|>"""prompt = PromptTemplate( template=template, input_variables=["input_prompt"])
2
Chain with LLM
# Use the pipe operator to chain prompt and LLMbasic_chain = prompt | llm
3
Invoke Chain
basic_chain.invoke({ "input_prompt": "Hi! My name is Maarten. What is 1 + 1?",})# Output: ' Hello Maarten, the answer to 1 + 1 is 2.'
Chain multiple LLM calls together to create complex workflows:
from langchain import LLMChain# Chain 1: Generate titletemplate = """<s><|user|>Create a title for a story about {summary}. Only return the title.<|end|><|assistant|>"""title_prompt = PromptTemplate(template=template, input_variables=["summary"])title = LLMChain(llm=llm, prompt=title_prompt, output_key="title")# Chain 2: Describe main charactertemplate = """<s><|user|>Describe the main character of a story about {summary} with the title {title}. Use only two sentences.<|end|><|assistant|>"""character_prompt = PromptTemplate( template=template, input_variables=["summary", "title"])character = LLMChain(llm=llm, prompt=character_prompt, output_key="character")# Chain 3: Write the storytemplate = """<s><|user|>Create a story about {summary} with the title {title}. The main charachter is: {character}. Only return the story and it cannot be longer than one paragraph<|end|><|assistant|>"""story_prompt = PromptTemplate( template=template, input_variables=["summary", "title", "character"])story = LLMChain(llm=llm, prompt=story_prompt, output_key="story")# Combine all three chainsllm_chain = title | character | story
Run the complete chain:
result = llm_chain.invoke("a girl that lost her mother")print(result)
Output:
{ "summary": "a girl that lost her mother", "title": " \"In Loving Memory: A Journey Through Grief\"", "character": " The protagonist, Emily, is a resilient young girl who struggles to cope with her overwhelming grief after losing her beloved and caring mother at an early age. As she embarks on a journey of self-discovery and healing, she learns valuable life lessons from the memories and wisdom shared by those around her.", "story": " In Loving Memory: A Journey Through Grief revolves around Emily, a resilient young girl who loses her beloved mother at an early age. Struggling to cope with overwhelming grief, she embarks on a journey of self-discovery and healing, drawing strength from the cherished memories and wisdom shared by those around her. Through this transformative process, Emily learns valuable life lessons about resilience, love, and the power of human connection, ultimately finding solace in honoring her mother's legacy while embracing a newfound sense of inner peace amidst the painful loss."}
Chains with multiple steps allow you to break complex generation tasks into manageable pieces, with each step building on previous outputs.
# First interactionbasic_chain.invoke({"input_prompt": "Hi! My name is Maarten. What is 1 + 1?"})# Output: ' Hello Maarten! The answer to 1 + 1 is 2.'# Second interaction - model doesn't rememberbasic_chain.invoke({"input_prompt": "What is my name?"})# Output: " I'm unable to determine your name as I don't have the capability to access personal data..."
# Add chat_history to the templatetemplate = """<s><|user|>Current conversation:{chat_history}{input_prompt}<|end|><|assistant|>"""prompt = PromptTemplate( template=template, input_variables=["input_prompt", "chat_history"])
2
Add Memory
from langchain.memory import ConversationBufferMemory# Define buffer memorymemory = ConversationBufferMemory(memory_key="chat_history")# Chain with memoryllm_chain = LLMChain( prompt=prompt, llm=llm, memory=memory)
3
Test Memory
# First interactionllm_chain.invoke({"input_prompt": "Hi! My name is Maarten. What is 1 + 1?"})# Output: " Hello Maarten! The answer to 1 + 1 is 2. Hope you're having a great day!"# Second interaction - now it remembers!llm_chain.invoke({"input_prompt": "What is my name?"})# Output: ' Your name is Maarten.'
Retain only recent conversations to limit context size:
from langchain.memory import ConversationBufferWindowMemory# Retain only the last 2 conversationsmemory = ConversationBufferWindowMemory(k=2, memory_key="chat_history")llm_chain = LLMChain( prompt=prompt, llm=llm, memory=memory)# Ask multiple questionsllm_chain.invoke({"input_prompt": "Hi! My name is Maarten and I am 33 years old. What is 1 + 1?"})llm_chain.invoke({"input_prompt": "What is 3 + 3?"})llm_chain.invoke({"input_prompt": "What is my name?"})# Output: ' Your name is Maarten.'# But older information is forgottenllm_chain.invoke({"input_prompt": "What is my age?"})# Output: " I'm unable to determine your age as I don't have access to personal information..."
Window memory is useful for managing context length while maintaining recent conversation history. Adjust k based on your needs.
# Template for summarizing conversationssummary_prompt_template = """<s><|user|>Summarize the conversations and update with the new lines.Current summary:{summary}new lines of conversation:{new_lines}New summary:<|end|><|assistant|>"""summary_prompt = PromptTemplate( input_variables=["new_lines", "summary"], template=summary_prompt_template)
llm_chain.invoke({"input_prompt": "Hi! My name is Maarten. What is 1 + 1?"})llm_chain.invoke({"input_prompt": "What is my name?"})llm_chain.invoke({"input_prompt": "What was the first question I asked?"})# Output: ' The first question you asked was "what's 1 + 1?"'# Check the summarymemory.load_memory_variables({})
Output:
{'chat_history': ' Maarten, identified in this conversation, initially asked about the sum of 1+1...'}
Summary memory is ideal for long conversations where you need to maintain context without overwhelming the model’s context window.
Agents can autonomously decide which tools to use and in what order, enabling them to solve complex tasks that require multiple steps and external information.
# ReAct (Reasoning + Acting) templatereact_template = """Answer the following questions as best you can. You have access to the following tools:{tools}Use the following format:Question: the input question you must answerThought: you should always think about what to doAction: the action to take, should be one of [{tool_names}]Action Input: the input to the actionObservation: the result of the action... (this Thought/Action/Action Input/Observation can repeat N times)Thought: I now know the final answerFinal Answer: the final answer to the original input questionBegin!Question: {input}Thought:{agent_scratchpad}"""prompt = PromptTemplate( template=react_template, input_variables=["tools", "tool_names", "input", "agent_scratchpad"])
3
Prepare Tools
from langchain.agents import load_tools, Toolfrom langchain.tools import DuckDuckGoSearchResults# Web search toolsearch = DuckDuckGoSearchResults()search_tool = Tool( name="duckduck", description="A web search engine. Use this to as a search engine for general queries.", func=search.run,)# Math calculator tooltools = load_tools(["llm-math"], llm=openai_llm)tools.append(search_tool)
4
Create Agent Executor
from langchain.agents import AgentExecutor, create_react_agent# Construct the ReAct agentagent = create_react_agent(openai_llm, tools, prompt)agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)
Ask the agent a complex question requiring multiple tools:
agent_executor.invoke({ "input": "What is the current price of a MacBook Pro in USD? How much would it cost in EUR if the exchange rate is 0.85 EUR for 1 USD?"})
Agent execution trace:
> Entering new AgentExecutor chain...I need to find the current price of a MacBook Pro in USD first before converting it to EUR.Action: duckduckAction Input: "current price of MacBook Pro in USD"[Search results showing MacBook Pro prices around $2,249.00]I found the current price of a MacBook Pro in USD, now I need to convert it to EUR using the exchange rate.Action: CalculatorAction Input: $2,249.00 * 0.85Answer: 1911.6499999999999I now know the final answerFinal Answer: The current price of a MacBook Pro in USD is $2,249.00. It would cost approximately 1911.65 EUR with an exchange rate of 0.85 EUR for 1 USD.> Finished chain.
The agent autonomously decided to:
Search the web for MacBook Pro prices
Use the calculator tool to perform currency conversion