AI Agents Guides: Memory, Context & Evaluation

Key takeaways

Agent memory is the category — the persistent knowledge an agent carries across sessions and sources. Chat buffers, vector stores, and markdown files are all partial attempts to implement it.
A temporal context graph is the durable structure for it: facts with provenance and a validity window, so the agent reasons over what's true now versus what was true then.
RAG and agent memory are complementary, not the same. RAG retrieves static documents by similarity; agent memory tracks evolving facts about users and the business over time. Most production agents use both.
A Context Lake is the infrastructure that implements agent memory at enterprise scale — a governed system of context graphs that manages, governs, and serves what agents need to know, with sub-200ms retrieval.

Start here

If you're new to the topic, read What is agent memory? first — it defines the category and explains why chat history and RAG don't scale to it. From there, Agent memory vs RAG draws the distinction most teams get wrong, and What is a temporal knowledge graph? explains the structure underneath. When you're ready to build, How to give an AI agent long-term memory walks through the approaches, and the persistent-memory tutorial is the hands-on version.

The infrastructure these guides point to is the Context Lake — Zep manages agent memory (ingest, construct, invalidate, evolve), governs it (ABAC, retention, audit), and serves it as assembled context with sub-200ms p95 retrieval. The graph itself is built with Graphiti, Zep's open-source temporal context graph library, which runs on top of the Context Graph Engine at scale. For how Zep's accuracy is measured, see the LoCoMo and LongMemEval results.

Frequently asked questions

What is agent memory?

Agent memory is everything an AI agent knows across time about the users, the business, and the world it operates in — so it can reason, personalize, and act without starting from scratch every turn. It's the category; chat buffers, vector stores, and Context Lakes are different ways to implement it. See What is agent memory?

How is agent memory different from RAG?

RAG retrieves static documents by similarity at query time. Agent memory tracks evolving, provenance-stamped facts about users and the business over time, with a sense of what's true now versus what was true then. They're complementary — most production agents use RAG for documents and agent memory for state. See Agent memory vs RAG.

How do you give an AI agent long-term memory?

Add a memory layer that builds a temporal context graph from the agent's inputs and serves the relevant context back per turn — rather than stuffing chat history into the context window. With Zep this is a few lines of code and works with any agent framework. See How to give an AI agent long-term memory.

How do you test agent memory?

Measure context completeness first — did the system retrieve the facts needed to answer — then answer correctness, retrieval latency, and token use, across multiple sessions and over time. Industry benchmarks include LoCoMo and LongMemEval. See How to test agent memory.

What's the best way to do agent memory at enterprise scale?

A Context Lake — a governed system of context graphs that manages, governs, and serves agent memory across millions of users with sub-200ms retrieval, attribute-based access control, retention, and audit. Zep is the Context Lake for AI agents.

Part of the Zep AI agent memory guides. Built on Graphiti and the Context Graph Engine. See the research and benchmarks.

Guides for building agents that remember