AI Memory for Agents in 2026: A Practical Comparison of the Top Projects

LLMs are stateless. Every session starts blank. For agents that need to remember users, tasks, and context across sessions, this is the core bottleneck.

A new category of projects has emerged to solve this: AI memory layers. They sit between your agent and your storage, handling ingestion, retrieval, decay, and context generation.

I tested the major open-source options head-to-head. Here's what I found.

The Contenders

Project	Stars	Language	Storage	Approach
Mem0	49k	Python	Cloud / Qdrant	Flat memory extraction
Letta	21k	Python	Postgres	Stateful agent platform
memU	12.8k	Python	Cloud	24/7 agent memory
MemOS	6.5k	Python	Various	Skill memory OS
Zep	4.2k	Go+Python	Postgres	Temporal / episodic
Cortex	New	Rust	SQLite	Cognitive 4-tier architecture

Performance

This is where the numbers get interesting.

Metric	Cortex	Mem0 (cloud)	Letta	Zep
Search latency (top-10)	132µs	~300ms	~100ms	~50ms
Ingest latency	7µs	~200ms	~50ms	~30ms
Context generation	51µs	N/A	N/A	N/A

Cortex runs entirely in-process with SQLite and an in-memory vector index. No network round-trips, no cold starts, no connection pools. The Rust implementation with pre-computed L2 norms and partial sort makes brute-force cosine similarity competitive with indexed approaches at the <100K scale most personal agents operate at.

Mem0's cloud offering adds network latency by design. Letta and Zep require a running Postgres instance. These are acceptable trade-offs for server-side deployments, but for local agents, the overhead is real.

Memory Architecture

This is where the projects diverge most significantly.

Mem0: Flat Memory Store

Mem0 extracts "memories" from conversations and stores them as flat key-value entries. It's simple and effective for personalization — remembering that a user prefers dark mode or lives in San Francisco. The LOCOMO benchmark shows 26% higher accuracy than OpenAI's built-in memory.

The limitation: no concept of memory importance, decay, or consolidation. Every memory is treated equally. Over months of use, the memory store grows without pruning, and retrieval relies entirely on vector similarity.

Letta: Stateful Agents

Letta takes a different angle — it's less a memory layer and more a full agent platform with built-in state management. Memory is organized into core memory (always in context) and archival memory (retrieved on demand). The self-editing memory approach is unique: the agent itself decides what to remember and forget.

The trade-off: you need to buy into Letta's agent framework. It's not a drop-in memory layer you can add to an existing agent.

Zep: Temporal Memory

Zep structures memory as episodes — temporal sequences rather than flat entries. This is closer to how humans remember: not isolated facts, but experiences in context. The Go implementation is fast, and the temporal indexing makes "what happened last week" queries natural.

Limited by requiring Postgres and a running server process.

Cortex: Cognitive Architecture

Cortex takes the most opinionated approach with a 4-tier architecture modeled after human cognition:

Working memory — session scratch pad (in-memory, ephemeral)
Episodic memory — raw experiences with timestamps
Semantic memory — distilled facts, promoted from episodic through consolidation
Procedural memory — learned behavioral patterns

What's interesting here is the consolidation engine: memories automatically decay based on importance (high-salience memories persist 3x longer), repeated episodic memories get promoted to semantic facts, and dead memories get swept. This mirrors how human memory works — you don't remember every conversation, but patterns and important facts persist.

The Bayesian belief system is unique to Cortex. Instead of storing facts as binary true/false, beliefs carry confidence scores that update with evidence. When contradictory information arrives, the confidence adjusts rather than creating duplicate conflicting memories.

Retrieval

Signal	Mem0	Letta	Zep	Cortex
Vector similarity	Yes	Yes	Yes	Yes (35%)
Temporal recency	No	No	Yes	Yes (20%)
Importance/salience	No	No	No	Yes (20%)
Social (person-based)	No	No	Partial	Yes (15%)
Channel filtering	No	No	No	Yes (10%)

Most memory projects rely on vector similarity alone. Cortex combines five weighted signals, which means a recent important memory about a specific person will rank higher than an old generic memory, even if the old one has a slightly higher embedding similarity.

Whether you need this complexity depends on your use case. For a simple chatbot that needs to remember user preferences, Mem0's vector-only approach is perfectly adequate. For a personal AI assistant that interacts across email, chat, and calendar, multi-signal retrieval matters.

Deployment Model

This is a philosophical divide as much as a technical one.

Cloud-dependent: Mem0 (cloud tier), memU

Pros: Zero setup, managed scaling, no ops burden
Cons: Data leaves your machine, latency, ongoing cost, vendor dependency

Self-hosted server: Letta, Zep, Mem0 (open-source)

Pros: Data stays on your infra, customizable
Cons: Requires Postgres, Docker, operational overhead

Local-first: Cortex

Pros: Single 3.8MB binary, single SQLite file, zero dependencies, works offline
Cons: Single-machine only, no built-in sync, brute-force vector search

For personal AI assistants — the "AI that knows you" use case — local-first has a strong argument. Your personal memories, preferences, and behavioral patterns are sensitive data. The question of who stores your AI's memory of you is a real one.

For enterprise or multi-user deployments, server-based approaches make more sense.

Integration

Feature	Mem0	Letta	Zep	Cortex
Python SDK	Full	Full	Full	WIP (PyO3)
JavaScript SDK	Yes	No	Yes	No
REST API	Yes	Yes	Yes	No (planned v1.0)
MCP Protocol	No	No	No	Native
LangChain integration	Yes	Yes	Yes	No

Mem0 wins on ecosystem breadth. If you're building with LangChain or need a REST API today, Mem0 is the pragmatic choice.

Cortex's MCP (Model Context Protocol) native support is interesting for Claude Code and Claude Desktop users — it plugs in directly without SDK integration. But the ecosystem is still early.

What I'd Use Where

Simple chatbot personalization → Mem0 Mature, well-documented, large community. Vector similarity is enough for preferences and facts.

Full agent platform → Letta If you're building agents from scratch and want memory baked in, Letta's self-editing approach is compelling.

Temporal/conversation-heavy use cases → Zep When "what happened when" matters more than "what facts do I know."

Personal AI assistant (local) → Cortex If you want your AI's memory to stay on your machine, run offline, and model human-like memory consolidation. Early stage but architecturally the most ambitious.

24/7 cloud agents → memU Purpose-built for long-running agents that need to minimize token cost.

The Bigger Picture

AI memory is still early. Most projects are under two years old, and the "right" architecture hasn't been settled. The field is splitting along two axes:

Cloud vs Local — where does your memory live?
Flat vs Structured — are memories just vector entries, or do they have tiers, types, and lifecycle?

The cloud projects have the stars and the ecosystem. The local projects have the privacy argument and the performance. Both approaches will likely coexist, just like cloud databases and SQLite coexist today.

What's clear is that stateless LLMs aren't enough for agents. Memory is becoming infrastructure, not a feature. The projects that get the abstraction right — making memory as easy to add as a database connection — will define this category.

Benchmarks run on Apple M-series, single-threaded, 10K memory entries. Cloud latencies measured from US-West. Your numbers will vary. All projects tested at their latest stable release as of March 2026.

AI Memory for Agents in 2026: A Practical Comparison of the Top Projects

AI Memory for Agents in 2026: A Practical Comparison of the Top Projects

The Contenders

Performance