Guide
LangGraph Tutorial: Building Agents with LangChain's Agent Framework
The idea behind the agent in LangChain is to use an LLM and a sequence of actions; the agent then uses a reasoning engine to decide which action to take. LangChain was useful for simple agents with straightforward chains and retrieval flows, but building more complex agentic systems was overly complicated—memory management, persistence, and human-in-the-loop components were implemented manually, rendering chains and agents less flexible.
This is where LangGraph comes into play. LangGraph is an orchestration framework built by LangChain. LangGraph allows you to develop agentic LLM applications using a graph structure, which can be used with or without LangChain.
This article focuses on building agents with LangGraph rather than LangChain. It provides a tutorial for building LangGraph agents, beginning with a discussion of LangGraph and its components. These concepts are reinforced by building a LangGraph agent from scratch and managing conversation memory with LangGraph agents. Finally, we use Zep's long-term memory for egents to create an agent that remembers previous conversations and user facts.
Summary of key LangGraph tutorial concepts
The following are the main concepts covered in this article.
What is LangGraph?
LangGraph is an AI agent framework built on LangChain that allows developers to create more sophisticated and flexible agent workflows. Unlike traditional LangChain chains and agents, LangGraph implements agent interactions as cyclic graphs with multiple-step processing involving branching and loops. This eliminates the need to implement custom logic to control the flow of information between multiple agents in the workflow.
How LangGraph works
As the name suggests, LangGraph is a graph workflow consisting of nodes and edges. The nodes implement functionality within the workflow while the edges control its direction.
The following diagram best explains how LangGraph works at a high level.
A LangGraph agent receives input, which can be a user input or input from another LangGraph agent. Typically, an LLM agent processes the input and decides whether it needs to call one or more tools, but it can directly generate a response and proceed to the next stage in the graph.
If the agent decides to call one or more tools, the tool processes the agent output and returns the response to the agent. The agent then generates its response based on the tool output. Once an agent finalizes its response, you can further add an optional “human-in-the-loop” step to refine the agent response before returning the final output.
This is just one example of how LangGraph agents work at a high level. You can create different combinations of nodes and edges to achieve your desired functionality.
Persistence
One key LangGraph feature that distinguishes it from traditional LangChain agents is its built-in persistence mechanism. LangGraph introduces the concept of an agent state shared among all the nodes and edges in a workflow. This allows automatic error recovery, enabling the workflow to resume where it left off.
In addition to the agent state memory, LangGraph supports persisting conversation histories using short-term and long-term memories, which are covered in detail later in the article.
Cycles
LangGraph introduces cycling graphs, allowing agents to communicate with tools in a cyclic manner. For example, an agent may call a tool, retrieve information from the tool, and then call the same or another tool to retrieve follow-up information. Similarly, tools may call each other multiple times to share and refine information before passing it back to an agent. This differentiates it from DAG-based solutions.
Human-in-the-loop capability
LangGraph supports human intervention in agent workflows, which interrupts graph execution at specific points, allowing humans to review, approve, or edit the agent’s proposed response. The workflow resumes after receiving human input.
This feature fosters greater control and oversight in critical decision-making processes in an agent’s workflow.
LangGraph agents vs. LangChain agents
Before LangGraph, LangChain chains and agents were the go-to techniques for creating agentic LLM applications. The following table briefly compares LangGraph agents with traditional LangChain chains and agents.
To summarize, LangGraph supports implementing more complex agentic workflows while allowing higher flexibility than traditional LangChain chains and agents.
Understanding nodes, edges, and state
If you are new to LangGraph, you must understand a few terms before creating an agent: nodes, edges, and state.
Nodes
Nodes are the building blocks of your agents and represent a discrete computation unit within your agent’s workflow. A node can be as simple as a small Python function or as complex as an independent agent that calls external tools.
Edges
Edges connect nodes and define how your agent progresses from one step to the next. Edges can be of two types: direct and conditional. A direct edge simply connects two nodes without any condition, whereas a conditional node is similar to an if-else statement and connects two nodes based on a condition.
State
A state is LangGraph’s most underrated yet most essential component. It contains all the data and context available to different entities, such as nodes and edges. Simply put, the state shares data and context among all nodes and edges in a graph.
Building a LangGraph agent
Enough with the theory—in this section, you will see all the building blocks of LangGraph agents in action. You will learn how to:
- Create a LangGraph agent from scratch
- Incorporate tools into LangGraph agents
- Stream agent responses
- Use built-in agents
Installing and importing required libraries
This article uses the Python version of LangGraph for examples. To run scripts in this section and the upcoming sections, you need to install the following Python libraries, which allow you to access the various LangGraph functions and tools you will incorporate into your agents.
%pip install langchain-core
%pip install langchain-openai
%pip install -U langgraph
%pip install langchain-community
%pip install --upgrade --quiet wikipedia
%pip install arxiv
%pip install zep-cloud
Let’s import relevant functionalities from the modules above.
from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage, ToolMessage, AIMessage, trim_messages
from langchain_core.tools import tool, ToolException, InjectedToolArg
from langchain_core.runnables import RunnableConfig
from langchain_community.utilities import ArxivAPIWrapper
from langchain_community.tools import ArxivQueryRun, HumanInputRun
from langgraph.graph import StateGraph,START,END, add_messages, MessagesState
from langgraph.prebuilt import create_react_agent, ToolNode
from langgraph.checkpoint.memory import MemorySaver
from langgraph.store.base import BaseStore
from langgraph.store.memory import InMemoryStore
from typing import Annotated, Optional
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
import wikipedia
import uuid
import operator
from IPython.display import Image, display
import os
from google.colab import userdata
Creating a LangGraph agent from scratch
Let’s start with the state definition, which specifies what type of information will flow between different nodes and edges in a graph.
class State(TypedDict):
messages: Annotated[list[AnyMessage], operator.add]
This defines a simple state that stores a list of any type of LangChain message, such as ToolMessage, AIMessage, HumanMessage, etc. The operator.add operator will add new messages to the list instead of overwriting existing ones.
Next, we will define a simple Python function to add a node in our LangGraph agent.
def run_llm(state: State):
messages = state['messages']
message = model.invoke(messages)
return {'messages': [message]}
The run_llm() function accepts an object of the State class that we defined before. When we add the run_llm() function to a LangGraph node, LangGraph will automatically pass the agent’s state to the run_llm() function.
Let’s now create our graph.
graph_builder=StateGraph(State)
graph_builder.add_node("llm", run_llm)
graph_builder.add_edge(START,"llm")
graph_builder.add_edge("llm",END)
graph=graph_builder.compile()
To create a graph, we will create a StateGraph object and define the state type in the StateGraph constructor. Subsequently, we will add a node titled llm and add the run_llm() function to the node.
We add two edges that define the start and end of the agent execution. Our agent has a single node, so we start with the llm node and end the agent execution once we receive the response from the llm node.
Finally, we must compile the graph using the compile() method.
We can visualize the graph using the following script:
display(Image(graph.get_graph().draw_mermaid_png()))
Let’s test the agent we just created. To do so, call the invoke() method on the graph object created.
messages = [HumanMessage(content="Tell me a joke about mathematics")]
result = graph.invoke({"messages": messages})
print(result['messages'][-1].content)
In most cases, you will need LangGraph agents to use tools to respond appropriately. The following section explains how to incorporate tools into LangGraph agents.
{{banner-large-1="/banners"}}
Incorporating tools into LangGraph agents
An AI tool is a component that enhances the default functionalities of an AI agent, allowing it to perform a specific task or access external information. For example, you can use tools to access the web, connect to an external database, book a flight, etc.
You can incorporate custom and built-in LangChain tools into your LangGraph agents; the approaches remain very similar. In this section, we will see both tool types.
Incorporating a tool into an agent is a highly flexible process. You can directly add a tool to an agent’s node or a function to a node that calls one or multiple tools. The latter approach is recommended because it allows for more customization.
Let’s first see how to use a built-in LangChain tool in LangGraph. We will use the LangChain ArXiv tool wrapper to create a tool that returns research papers based on user queries.
def get_arxiv_data(query):
data = arxiv_tool.invoke(query)
return data
class ArticleTopic(BaseModel):
topic: str = Field(description="The topic of the article to search on arxiv.")
@tool (args_schema=ArticleTopic)
def arxiv_search(topic: str) -> str:
"""Returns the information about research papers from arxiv"""
return get_arxiv_data(topic)
In the script above, we define the function get_arxiv_data(), which accepts a user query and calls the LangChain ArXiv tool to return research paper information related to a user query.
Next, we inherit the BaseModel class to define the data type our tool will accept as a parameter, which ensures that input to the tool always has a valid input data type.
Finally, we use the @tool decorator and create an arxiv_search tool that calls the get_arxiv_data function. The tool description is critical in this case since the LLM agent selects a tool based on its description.
In the same way, we create a custom tool, as the following script shows:
def get_wiki_data(topic):
data = wikipedia.summary(topic)
return data
class WikipediaTopic(BaseModel):
topic: str = Field(description="The wikipedia article topic to search")
@tool(args_schema = WikipediaTopic)
def wikipedia_search(topic: str) -> str:
"""Returns the summary of wikipedia page of the passed topic"""
return get_wiki_data(topic)
The tool above uses the Python Wikipedia library to return Wikipedia article summaries based on user queries.
Once you create your tools, the next step is to bind them to the LLM you will use in your agent.
tools = [arxiv_search, wikipedia_search]
tools_names = {t.name: t for t in tools}
model = model.bind_tools(tools)
In the next step, we define a function that executes whenever an agent decides to call one or more tools.
def execute_tools(state: State):
tool_calls = state['messages'][-1].tool_calls
results = []
for t in tool_calls:
if not t['name'] in tools_names:
result = "Error: There's no such tool, please try again"
else:
result = tools_names[t['name']].invoke(t['args'])
results.append(
ToolMessage(
tool_call_id=t['id'],
name=t['name'],
content=str(result)
)
)
return {'messages': results}
The execute_tools function above will be added to a LangGraph agent’s node, automatically receiving the agent’s current state. We will only call the execute_tools() function if the agent decides to use one or more tools.
Inside the execute_tools function, we will iteratively call the tools and pass the arguments from the LLM’s last response to them. Finally, we will append the tool response to the results[] list and add the list to the model state using the state’s messages list.
The last and final step before creating a graph is to define a function that checks whether the agent's latest state contains tool calls.
def tool_exists(state: State):
result = state['messages'][-1]
return len(result.tool_calls) > 0
We will use this function to create a conditional edge, which decides whether to go to the execute_tools() function or the END node and returns the agent’s final response.
Now let’s create a LangGraph agent that uses the tool we created. The following script defines the agent’s state and the run_llm() function as before.
class State(TypedDict):
messages: Annotated[list[AnyMessage], operator.add]
def run_llm(state: State):
messages = state['messages']
message = model.invoke(messages)
return {'messages': [message]}
The script below defines and displays the complete agent graph.
graph_builder=StateGraph(State)
graph_builder.add_node("llm", run_llm)
graph_builder.add_node("tools", execute_tools)
graph_builder.add_conditional_edges(
"llm",
tool_exists,
{True: "tools", False: END}
)
graph_builder.add_edge("tools", "llm")
graph_builder.set_entry_point("llm")
graph=graph_builder.compile()
display(Image(graph.get_graph().draw_mermaid_png()))
Here is how the graph looks:
We have two nodes in the graph: the llm, which runs the run_llm() function, and the tools node, which runs the execute_tools() function. The conditional node connects the llm node with the tool or the END node depending upon the output of the llm node. We also add an edge back from the tools to the llm node because we want the llm node to generate the final response with or without the help of the tool.
Now let’s test the agent we created. We will first ask the agent to return a research paper.
messages = [HumanMessage(content="Give me the latest research paper on attention is all you need")]
result = graph.invoke({"messages": messages})
result
The output above shows that the model has called the arxiv_tool to generate the response. The model is intelligent enough to infer any query about research papers must be routed to the arxiv_search tool.
Let’s search for something on Wikipedia.
messages = [HumanMessage(content="Wikipedia article on artificial intelligence")]
result = graph.invoke({"messages": messages})
result
You can see that the model used the wikipedia_search tool to generate the final response.
Streaming agent responses
You can also stream the individual responses from all nodes and edges in your LangGraph agent. Streaming messages allows users to receive responses in real-time. To do so, you can call the stream() function instead of the invoke() method.
Let’s define a function that receives streaming agent response and displays it on the console.
def print_stream(stream):
for s in stream:
message = s["messages"][-1]
if isinstance(message, tuple):
print(message)
else:
message.pretty_print()
Next, call graph().stream() and pass it the input messages. Also set the attribute stream_mode to values, which displays the values of the streaming agent responses.
messages = [HumanMessage(content="Who is Christiano Ronaldo")]
print_stream(graph.stream({"messages": messages}, stream_mode= "values"))
You will see real-time responses from each graph node printed on the console. For example, in the output above, you can see the human message followed by the AI response, which contains tool calls to the wikipedia_search tool. The tool returns the response to the user query; this is again passed to the AI node, which generates the final response.
Using built-in agents
In previous sections, we created an agent that checks whether it needs a tool's help to generate a final response. If it does, it calls the tool, fetches the tool response, and returns the final response; if it doesn't, it simply returns the default LLM response. We can use LangGraph’s built-in ReAct agent to achieve the same functionality.
You can use the react_search_agent() from the langgraph.prebuilt module to create a ReAct agent. To define the ReAct agent's functionality, pass the system_prompt to the state_modifier attribute.
The following script creates a ReAct agent that uses the tool we created in previous sections:
model = ChatOpenAI(model="gpt-4o")
prompt = '''You are an expert researcher. Your goal is to search wikipedia for general queries related to famous things, places, persons, etc.
You can also search for arxiv if the user asks to search for research papers.
Use your default knowledge for general queries'''
react_search_agent = create_react_agent(model, tools, state_modifier= prompt)
display(Image(react_search_agent.get_graph().draw_mermaid_png()))
You can see that the ReAct agent above is very similar to what we created earlier from scratch.
Let’s test the agent by asking a simple question that doesn't require any tool’s help.
messages = [HumanMessage(content="What is 2 + 2")]
print_stream(react_search_agent.stream({"messages": messages}, stream_mode= "values"))
You can see that the ReAct agent generated a response without any tool’s assistance.
Let’s send another request.
messages = [HumanMessage(content="What is the Eiffel tower?")]
print_stream(react_search_agent.stream({"messages": messages}, stream_mode= "values"))
This time, the agent called the wikipedia_search tool before generating the final response.
Memory management in LangGraph
By default, interaction with LangGraph agents is stateless, which means that the agent does not remember the previous conversation and cannot generate responses to follow-up queries. In this section, you will see why you need agents with memory and how to create LangGraph agents that remember previous conversations.
Why do you need agents with memory?
The answer is simple: Humans have memory and can answer follow-up questions. You want your agents to remember what was previously discussed so that they can have a meaningful conversation.
Let’s see an example where a user interacts with an agent without conversational memory. We ask the agent: “Who is Christiano Ronaldo?”
messages = [HumanMessage(content="Who is Christiano Ronaldo?")]
result = react_search_agent.invoke({"messages": messages})
print(result['messages'][-1].content)
Here, the agent probably called the wikipedia_search tool to generate the response. Let’s ask a follow-up question about Christiano Ronaldo.
messages = [HumanMessage(content="To which country does he belong?")]
result = react_search_agent.invoke({"messages": messages})
print(result['messages'][-1].content)
You can see that the model doesn’t remember what we asked it previously. Though we could append previous conversations before the current message to provide context to the LLM, an LLM's context window is limited and will eventually be filled, leading to slower agent responses and, in some cases, truncation of conversation context.
Models with very large context windows can store an entire chat history, leading to recall issues where the model may overlook older conversations. Additionally, a large context window might introduce contradictory information if there are conflicting details from earlier parts of the conversation, potentially confusing the model. Lastly, using larger prompts can significantly increase the cost of processing.
The ability of an AI agent to remember previous conversations is crucial in almost all agent types, ranging from medical agents, where an agent must remember a patient’s previous information, to e-commerce agents, where it is important for an agent to remember user preferences to provide a customized response.
The diagram below shows an LLM-powered agent's components; tools were used to retrieve additional information in the examples above. In the examples below, the role of memory will be explained.
Creating LangGraph agents with memory
LangGraph agents can be created with short-term or long-term memory.
Agents with short-term memory
The easiest way to add persistence to your interactions with LangGraph agents is via checkpointers. To do so, you must pass a memory object (in memory or third-party) to the checkpointer attribute while compiling a LangGraph agent. For example:
graph.compile(checkpointer=memory)
For a ReAct agent, you can pass the memory object to the checkpointer attribute of the create_react_agent() function.
Next, while invoking the graph, you must pass the configurable dictionary containing the value for the thread_id key. The memory is associated with this thread_id.
Here is an example.
memory = MemorySaver()
react_search_agent = create_react_agent(model, tools, state_modifier= prompt, checkpointer=memory)
config = {"configurable": {"thread_id": "1"}}
messages = [HumanMessage(content="Who is Christiano Ronaldo?")]
result = react_search_agent.invoke({"messages": messages}, config = config)
print(result['messages'][-1].content)
messages = [HumanMessage(content="To which country does he belong?")]
result = react_search_agent.invoke({"messages": messages}, config = config)
print(result['messages'][-1].content)
You can see that the agent remembers that we are asking a question about Christiano Ronaldo. However, one drawback of short-term memory is that it is not shared between multiple sessions or threads. For example, if you change the thread_id and ask the same question, the agent will not understand the follow-up query.
config = {"configurable": {"thread_id": "2"}}
messages = [HumanMessage(content="To which country does he belong?")]
result = react_search_agent.invoke({"messages": messages}, config = config)
print(result['messages'][-1].content)
The other drawback of short-term memory is that the entire chat history might not fit the model context window. Longer chat histories can be complex and often introduce hallucinations in agent responses.
Agents with long-term memory
Recently, LangGraph introduced long-term memory, which you can share across multiple threads. You can also extract facts from user conversations and add them to long-term memory, leading to a shorter and more robust chat context.
You can use LangGraph’s InMemoryStore class to manage and store long-term memories. This class stores memories in namespaces, each of which may include multiple memories. Each memory has a memory ID, while context and content are key-value pairs.
The following script shows an example of storing a long-term memory in an InMemoryStore object using the put() method.
from langgraph.store.memory import InMemoryStore
memory_store = InMemoryStore()
user_id = "123"
namespace = (user_id, "memories")
memory_id = "001"
memory = {"food preferences" : "I like apples"}
memory_store.put(namespace,
memory_id,
memory)
You can see memories in a namespace using the following script:
memories = memory_store.search(namespace)
memories[-1].dict()
Now we will create another memory for the same user:
memory_id = "002"
memory = {"sports preferences" : "I like to play football"}
memory_store.put(namespace, memory_id, memory)
memories = memory_store.search(namespace)
for memory in memories:
print(f"Memory ID: {memory.key}, Memory Value: {memory.value}")
You can see two memories in the memory store now. Let’s see how you can create a LangGraph agent that uses LangGraph’s long-term memory.
We will create a tool that accepts the memory ID, content, and context and inserts them in a memory store. The tool also accepts the configuration dictionary containing the user ID and the memory store object.
If the memory ID is not passed, it creates a new memory ID; otherwise, it updates the content of the passed memory ID.
@tool
def upsert_memory(
content: str,
context: str,
memory_id: Optional[str] = None,
*,
config: Annotated[RunnableConfig, InjectedToolArg],
store: Annotated[BaseStore, InjectedToolArg],
):
"""
Insert or update a memory entry in the database.
If a memory entry with the provided ID is found, the function modifies it with new details.
If no such entry exists, it creates a fresh record, ensuring no duplicate memories are stored.
When users revise a memory, it is replaced with the updated content.
Args:
content: The actual details of the memory. For example:
"User likes to eat pizza and fries."
context: Additional background for the memory. For example:
"This was mentioned when the user introduced himself."
memory_id: Only include this only if an existing memory is being modified.
It specifies which memory should be updated.
"""
mem_id = memory_id or uuid.uuid4()
user_id = config["configurable"]["user_id"]
namespace = ("memories", user_id)
store.put(
namespace,
key=str(mem_id),
value={"content": content, "context": context},
)
return f"{content}"
We will define the update_memory function to add to our LangGraph agent node. It will receive the graph’s state, the configuration dictionary, and the InMemoryStore object. The function extracts the memory content and context from the graph’s state and the user ID from the configuration dictionary.
def update_memory(state: MessagesState, config: RunnableConfig, store: BaseStore):
# Retrieve the tool call history from the most recent message
recent_tool_calls = state["messages"][-1].tool_calls
memory_entries = []
# Process each tool call to save memory data
for call in recent_tool_calls:
memory_content = call['args']['content']
memory_context = call['args']['context']
memory_entries.append([
upsert_memory.invoke({'content': memory_content, 'context': memory_context, 'config': config, 'store': store})
])
print("Stored memories: ", memory_entries)
# Generate a results list with each memory entry's details
response_data = [
{
"role": "tool",
"content": memory_entry[0],
"tool_call_id": call["id"],
}
for call, memory_entry in zip(recent_tool_calls, memory_entries)
]
# Return the first message result in the response
return {"messages": response_data[0]}
The function passes these values to the upsert_memory tool. The update_memory function adds the tool’s response to the state. Next, we define the run_llm() function, which extracts memories from the InMemoryStore object using the user ID and invokes the LLM model using the memories and the user’s new query.
def run_llm(state: MessagesState, config: RunnableConfig, *, store: BaseStore):
user_id = config["configurable"]["user_id"]
namespace = ("memories", user_id)
memories = store.search(namespace)
user_info = "\n".join(f"[{mem.key}]: {mem.value}" for mem in memories)
if user_info:
user_info = f"""
<user_memories>
{user_info}
</user_memories>"""
system_msg = f'''You are a helpful AI assistant answering user questions.
You must decide whether to store information in the memory from the list of messages and then answer the user query or directly answer the user query.
Here is the information about the user: {user_info}'''
response = model.bind_tools([upsert_memory]).invoke(
[{"type": "system", "content": system_msg}] + state["messages"]
)
return {"messages": response}
The last step is to define the tool_exists function, which decides whether we need to store user facts in memory.
def tool_exists(state: MessagesState):
"""Check if an agent selects a tool and decide whether to store memory."""
msg = state["messages"][-1]
if msg.tool_calls:
# If an agent selects a tool, we need to update the memory
return "update_memory"
# else, directly respond to the user
return END
Finally, we will create our LangGraph agent that uses long-term memory to respond to user queries:
model = ChatOpenAI(model="gpt-4o")
memory_store = InMemoryStore()
graph_builder = StateGraph(MessagesState)
graph_builder.add_node("agent", run_llm)
graph_builder.add_node(update_memory)
graph_builder.add_conditional_edges("agent", tool_exists, ["update_memory", END])
graph_builder.add_edge("update_memory", "agent")
graph_builder.set_entry_point("agent")
graph = graph_builder.compile(store=memory_store)
display(Image(graph.get_graph().draw_mermaid_png()))
The agent is similar to the ReAct agent we created earlier but maintains a long-term user memory. Let’s test the agent.
config = {"configurable": {"user_id": "2"}}
messages = [HumanMessage(content="Hello, my name is James, and I like AI")]
for chunk in graph.stream({"messages": messages}, config, stream_mode="values"):
chunk["messages"][-1].pretty_print()
You can see that the agent called the upsert_memory tool and inserted some user information into long-term memory.
config = {"configurable": {"user_id": "2"}}
messages = [HumanMessage(content="What do you know about me?")]
for chunk in graph.stream({"messages": messages}, config, stream_mode="values"):
chunk["messages"][-1].pretty_print()
This shows that the agent remembers the user information. Since there was nothing to add to memory this time, the agent did not call any tool and directly responded to the user.
Problems with LangGraph’s default memory options
Though LangGraph provides several default options to store memories, it has certain drawbacks:
- Short-term memories are not shared between multiple sessions and threads.
- The memory context can exceed the LLM model context; in such cases, you must trim or summarize memories to fit the model context.
- Extremely long memory contexts may induce hallucinations in LLM models.
- LangGraph’s default long-term memory solves most problems associated with short-term memory. However, even with LangGraph's default long-term memory, generating and updating facts from the conversation history and invalidating existing facts to have the most updated user information is challenging.
This is where Zep's long-term memory comes into play.
Zep Long-Term Memory for Agents
Zep is a memory layer designed for AI agents that addresses several of the limitations of the default LangGraph short-term and long-term memory described above while offering additional functionality.
Zep’s memory layer updates as facts change by continually updating a knowledge graph based on user interactions and business data. During conversations with users, new information is collected, and superseded facts are marked as invalid. Developers can retrieve up-to-date facts from the knowledge graph via a single API call, improving response quality by grounding the LLM in relevant historical data. This eliminates the need to store the entire user conversation and extract facts via prompt engineering techniques.
You can install the Zep cloud library via the following pip command:
%pip install zep-cloud
To use Zep cloud, import the Zep class from the zep_cloud.client module and instantiate it by passing the Zep API key. You can create or retrieve an existing API key from the Projects section of your Zep cloud.
from zep_cloud.client import Zep
from zep_cloud import Message
import rich
ZEP_API_KEY = userdata.get('ZEP_API_KEY')
client = Zep(
api_key=ZEP_API_KEY,
)
To add memories for a user’s session, you need first to add the user and then the session. Users have a one-to-many relationship with sessions. The Zep client’s user.add() method adds a user to the Zep cloud, and the memory.add_session() method adds a new session. The script below defines a dummy user and the session to add to the Zep cloud.
bot_name = "SupportBot"
user_name = "James"
user_id = user_name + str(uuid.uuid4())[:4]
session_id = str(uuid.uuid4())
client.user.add(
user_id=user_id,
email=f"{user_name}@abcd.com",
first_name=user_name,
last_name="J",
)
client.memory.add_session(
user_id=user_id,
session_id=session_id,
)
Let’s define a dummy chat history between the user and an agent.
chat_history = [
{
"role": "assistant",
"name": bot_name,
"content": f"Hello {user_name}, welcome to QuickEats support. How can I assist you today?",
"timestamp": "2024-11-01T12:00:00Z",
},
{
"role": "user",
"name": user_name,
"content": "This is unbelievable! My food was supposed to arrive an hour ago!",
"timestamp": "2024-11-01T12:01:00Z",
},
{
"role": "assistant",
"name": bot_name,
"content": f"I'm really sorry to hear about the delay, {user_name}. I understand how frustrating it is to wait longer than expected for your meal. Could you share your order details with me so I can look into this right away?",
"timestamp": "2024-11-01T12:02:00Z",
},
{
"role": "user",
"name": user_name,
"content": "I ordered at 11:00 AM, and it said it would be here by 11:30! Now it's 12:01, and still nothing!",
"timestamp": "2024-11-01T12:03:00Z",
},
{
"role": "assistant",
"name": bot_name,
"content": f"I apologize for the inconvenience, {user_name}. This delay is certainly not up to our standards. Let me check with the restaurant and delivery partner to get an update on your order status. I'll update you as soon as I have more information.",
"timestamp": "2024-11-01T12:04:00Z",
},
{
"role": "user",
"name": user_name,
"content": "This is really unacceptable. I'm starving here, and there's no communication from your side!",
"timestamp": "2024-11-01T12:05:00Z",
},
{
"role": "assistant",
"name": bot_name,
"content": f"I completely understand, {user_name}, and I apologize for the lack of updates. We're committed to making this right. I'll escalate your order as a priority, and in the meantime, I'll also apply a discount to your account as a gesture of apology.",
"timestamp": "2024-11-01T12:06:00Z",
},
]
To populate a Zep session, you must pass a list of zep_cloud.Message type objects. The following script accepts a list of chat history messages and converts them to a list of zep_cloud.Message type objects. You must pass values for the role_type, role, and content attributes for each Message object. Finally, you can add messages to a session using the memory.add() method.
def convert_to_zep_messages(chat_history: list[dict[str, str | None]]) -> list[Message]:
return [
Message(
role_type=msg["role"],
role=msg.get("name", None),
content=msg["content"],
)
for msg in chat_history
]
formatted_chat_messages = convert_to_zep_messages(chat_history)
client.memory.add(
session_id=session_id, messages= formatted_chat_messages
)
Once you have messages in session, you can retrieve all facts about a user from all the sessions using the user.get_facts() method, as shown below.
fact_response = client.user.get_facts(user_id=user_id)
for fact in fact_response.facts:
rich.print(fact)
If you are only interested in retrieving facts from a relevant session, you can call the memory.get() method, providing it the session ID. Subsequently, you can retrieve session facts using the relevant_facts attribute.
session_facts = client.memory.get(session_id=session_id)
rich.print([r.fact for r in session_facts.relevant_facts])
The output above shows all relevant facts for a specific user session.
{{banner-small-1="/banners"}}
Putting it all together: a LangGraph agent with Zep
Now that you know how Zep's long-term memory works, let’s look at how to develop an agent using LangGraph agents that employ Zep's long-term memory to store user facts. The agent responses will be based on the user facts from Zep's memory.
We will define a graph state that stores messages originating from different nodes, user names, and session IDs. Next, we will create the search_facts tool, which uses the Zep client’s memory.search_sessions() method to find user facts relevant to the query.
The search_facts tool is added to the LLM. We also create an object of the ToolNode class, which serves as the method for calling tools.
class State(TypedDict):
messages: Annotated[list, add_messages]
user_name: str
session_id: str
@tool
async def search_facts(state: State, query: str, limit: int = 5):
"""Search for facts in all conversations had with a user.
Args:
state (State): The Agent's state.
query (str): The search query.
limit (int): The number of results to return. Defaults to 5.
Returns:
list: A list of facts that match the search query.
"""
return await client.memory.search_sessions(user_id=state['user_name'], text=query, limit=limit, search_scope="facts")
tools = [search_facts]
tool_node = ToolNode(tools)
model = ChatOpenAI(model='gpt-4o', temperature=0).bind_tools(tools)
Subsequently, we define the chatbot() method, which serves as the starting node of the graph. This method fetches relevant user facts for the current session and passes them in the system prompt to the LLM. Note that the system prompt tells the LLM to act as a financial advisor and use the user facts to provide a customized response.
The LLM response is added to the Zep memory for the current session using the memory.add() method. Zep will automatically fetch facts from these messages. Notice that, unlike the LangChain long-term memory, you don't have to do any prompt engineering to extract and save facts while using the Zep memory—everything is done behind the scenes for you.
Finally, we trim the messages in the message state to the last three since we don't need the complete message history. We use Zep user facts to maintain context in the conversation.
def chatbot(state: State):
memory = client.memory.get(state["session_id"])
facts_string = ""
if memory.relevant_facts:
facts_string = "\n".join([f.fact for f in memory.relevant_facts])
system_message = SystemMessage(
content=f"""You are a knowledgeable and empathetic financial advisor bot,
here to guide users through their financial questions and concerns.
Review the information about the user and their prior conversation history below to provide accurate, thoughtful, and personalized advice.
Ensure that responses are supportive, clear, and practical, focusing on helping users make informed decisions.
Always prioritize the user's financial well-being, respecting their current situation and goals.
Facts about the user and their conversation:
{facts_string or 'No facts about the user and their conversation'}"""
)
messages = [system_message] + state["messages"]
response = model.invoke(messages)
# Add the new chat turn to the Zep graph
messages_to_save = [
Message(
role_type="user",
role=state["user_name"],
content=state["messages"][-1].content,
),
Message(role_type="assistant", content=response.content),
]
client.memory.add(
session_id=state["session_id"],
messages=messages_to_save,
)
# Truncate the chat history to keep the state from growing unbounded.
# In this example, we are going to keep the state small for demonstration purposes.
# We'll use Zep's Facts to maintain conversation context.
state["messages"] = trim_messages(
state["messages"],
strategy="last",
token_counter=len,
max_tokens=3,
start_on="human",
end_on=("human", "tool"),
include_system=True,
)
return {"messages": [response]}
We will define a method called should_continue that we will add to the conditional edge to decide whether the LLM should call the search_facts tool or directly send a response to the user.
Finally, we define our LangGraph and print the graph’s figure.
graph_builder = StateGraph(State)
memory = MemorySaver()
def should_continue(state, config):
messages = state['messages']
last_message = messages[-1]
# If there is no function call, then we finish.
if not last_message.tool_calls:
return 'end'
# Otherwise if there is, we continue.
else:
return 'continue'
graph_builder.add_node('agent', chatbot)
graph_builder.add_node('tools', tool_node)
graph_builder.add_edge(START, 'agent')
graph_builder.add_conditional_edges('agent', should_continue, {'continue': 'tools', 'end': END})
graph_builder.add_edge('tools', 'agent')
graph = graph_builder.compile(checkpointer=memory)
display(Image(graph.get_graph().draw_mermaid_png()))
The graph above is similar to the ReAct agent, where the tools node now calls the search_facts tool. Next, we will define the extract_messages() function that extracts messages from the response returned by the graph.invoke() method.
def extract_messages(result):
output = ""
for message in result['messages']:
if isinstance(message, AIMessage):
role = "assistant"
else:
role = result['user_name']
output += f"{role}: {message.content}\n"
return output.strip()
Finally, we define the graph_invoke() function, which accepts user query, user name, and session name (thread_id in the following script) and returns the LangGraph agent’s response.
def graph_invoke(message: str, user_name: str, thread_id: str, ai_response_only: bool = True):
r = graph.invoke(
{
'messages': [
{
'role': 'user',
'content': message,
}
],
'user_name': user_name,
'session_id': thread_id,
},
config={'configurable': {'thread_id': thread_id}},
)
if ai_response_only:
return r['messages'][-1].content
else:
return extract_messages(r)
To test the agent, we will create a new session for a dummy user and add the user and the session to the Zep cloud memory.
user_name = 'James_' + uuid.uuid4().hex[:4]
session_id = uuid.uuid4().hex
client.user.add(user_id=user_name)
client.memory.add_session(session_id=session_id,
user_id=user_name)
Next, we will execute a while loop that accepts user inputs; calls the graph_invoke() method using the user name, session ID, and user input; and prints the agent's response on the console.
while True:
user_input = input("Enter your message (type 'quit' to exit): ")
if user_input.lower() == "quit":
print("Exiting...")
break
# Call the graph_invoke function with user input
r = graph_invoke(
user_input, # The message from the user
user_name, # Provide the user name variable
session_id, # Provide the session ID variable
)
# Print the response from graph_invoke
print("Response:", r)
Let’s test the agent by providing it with some information.
You can check the session to see the facts the agent has stored about the user.
session_facts = client.memory.get(session_id=session_id)
rich.print([r.fact for r in session_facts.relevant_facts])
Let’s ask a question to verify that the agent can access the user facts.
You can see that the agent has all the information about the user. Zep memory stores important facts about the user, which you can use to avoid hallucinations and improve the personalized customer experience.
Guidelines for building LangGraph agents
Here are some of the guidelines you should follow while working with LangChain agents:
- Remember that LangGraph was built by the creators of LangChain but can be used without LangChain. It is a more powerful framework for building AI Agents because LangGraph allows you to define flows that involve cycles, which is essential for most agentic architectures.
- Tools are integral to LangGraph agents, but they should not be overused. Only implement tools to fetch information that an LLM agent does not possess by default.
- The tool description should include as much detail as possible. This will help the agent select the correct tool for the task.
- An agent is only as good as its context. Depending on your requirements, store all the relevant information from past conversations in short- or long-term memory.
- Third-party SDKs (like Zep) can make your life easier by automatically managing memory and storing conversation facts, permitting a personalized user experience.
Last thoughts
LangGraph agents provide a flexible way to develop complex LLM applications. This article explains LangGraph agents and how to implement them with detailed examples. Adding external tools enables the agents to retrieve external information, and persisting memory across conversations enables the LangGraph agent to provide contextualized responses.
Zep’s long-term memory stores conversation context and user facts. Zep is fast and highly effective in extracting relevant user facts, resulting in better and more personalized user responses. Incorporating Zep’s long-term memory helped the agent remember user facts, allowing it to provide personalized responses.