AI Memory for Agents in 2026: A Practical Comparison of the Top Projects
AI Memory for Agents in 2026: A Practical Comparison of the Top Projects
LLMs are stateless. Every session starts blank. For agents that need to remember users, tasks, and context across sessions, this is the core bottleneck.
A new category of projects has emerged to solve this: AI memory layers. They sit between your agent and your storage, handling ingestion, retrieval, decay, and context generation.
I tested the major open-source options head-to-head. Here's what I found.
The Contenders
| Project | Stars | Language | Storage | Approach |
| Mem0 | 49k | Python | Cloud / Qdrant | Flat memory extraction |
| Letta | 21k | Python | Postgres | Stateful agent platform |
| memU | 12.8k | Python | Cloud | 24/7 agent memory |
| MemOS | 6.5k | Python | Various | Skill memory OS |
| Zep | 4.2k | Go+Python | Postgres | Temporal / episodic |
| Cortex | New | Rust | SQLite | Cognitive 4-tier architecture |
Performance
This is where the numbers get interesting.
| Metric | Cortex | Mem0 (cloud) | Letta | Zep |
| Search latency (top-10) | 132µs | ~300ms | ~100ms | ~50ms |
| Ingest latency | 7µs | ~200ms | ~50ms | ~30ms |
| Context generation | 51µs | N/A | N/A | N/A |
Cortex runs entirely in-process with SQLite and an in-memory vector index. No network round-trips, no cold starts, no connection pools. The Rust implementation with pre-computed L2 norms and partial sort makes brute-force cosine similarity competitive with indexed approaches at the <100K scale most personal agents operate at.
Mem0's cloud offering adds network latency by design. Letta and Zep require a running Postgres instance. These are acceptable trade-offs for server-side deployments, but for local agents, the overhead is real.
Memory Architecture
This is where the projects diverge most significantly.
Mem0: Flat Memory Store
Mem0 extracts "memories" from conversations and stores them as flat key-value entries. It's simple and effective for personalization — remembering that a user prefers dark mode or lives in San Francisco. The LOCOMO benchmark shows 26% higher accuracy than OpenAI's built-in memory.
The limitation: no concept of memory importance, decay, or consolidation. Every memory is treated equally. Over months of use, the memory store grows without pruning, and retrieval relies entirely on vector similarity.
Letta: Stateful Agents
Letta takes a different angle — it's less a memory layer and more a full agent platform with built-in state management. Memory is organized into core memory (always in context) and archival memory (retrieved on demand). The self-editing memory approach is unique: the agent itself decides what to remember and forget.
The trade-off: you need to buy into Letta's agent framework. It's not a drop-in memory layer you can add to an existing agent.
Zep: Temporal Memory
Zep structures memory as episodes — temporal sequences rather than flat entries. This is closer to how humans remember: not isolated facts, but experiences in context. The Go implementation is fast, and the temporal indexing makes "what happened last week" queries natural.
Limited by requiring Postgres and a running server process.
Cortex: Cognitive Architecture
Cortex takes the most opinionated approach with a 4-tier architecture modeled after human cognition:
- Working memory — session scratch pad (in-memory, ephemeral)
- Episodic memory — raw experiences with timestamps
- Semantic memory — distilled facts, promoted from episodic through consolidation
- Procedural memory — learned behavioral patterns
What's interesting here is the consolidation engine: memories automatically decay based on importance (high-salience memories persist 3x longer), repeated episodic memories get promoted to semantic facts, and dead memories get swept. This mirrors how human memory works — you don't remember every conversation, but patterns and important facts persist.
The Bayesian belief system is unique to Cortex. Instead of storing facts as binary true/false, beliefs carry confidence scores that update with evidence. When contradictory information arrives, the confidence adjusts rather than creating duplicate conflicting memories.
Retrieval
| Signal | Mem0 | Letta | Zep | Cortex |
| Vector similarity | Yes | Yes | Yes | Yes (35%) |
| Temporal recency | No | No | Yes | Yes (20%) |
| Importance/salience | No | No | No | Yes (20%) |
| Social (person-based) | No | No | Partial | Yes (15%) |
| Channel filtering | No | No | No | Yes (10%) |
Most memory projects rely on vector similarity alone. Cortex combines five weighted signals, which means a recent important memory about a specific person will rank higher than an old generic memory, even if the old one has a slightly higher embedding similarity.
Whether you need this complexity depends on your use case. For a simple chatbot that needs to remember user preferences, Mem0's vector-only approach is perfectly adequate. For a personal AI assistant that interacts across email, chat, and calendar, multi-signal retrieval matters.
Deployment Model
This is a philosophical divide as much as a technical one.
Cloud-dependent: Mem0 (cloud tier), memU
- Pros: Zero setup, managed scaling, no ops burden
- Cons: Data leaves your machine, latency, ongoing cost, vendor dependency
Self-hosted server: Letta, Zep, Mem0 (open-source)
- Pros: Data stays on your infra, customizable
- Cons: Requires Postgres, Docker, operational overhead
Local-first: Cortex
- Pros: Single 3.8MB binary, single SQLite file, zero dependencies, works offline
- Cons: Single-machine only, no built-in sync, brute-force vector search
For personal AI assistants — the "AI that knows you" use case — local-first has a strong argument. Your personal memories, preferences, and behavioral patterns are sensitive data. The question of who stores your AI's memory of you is a real one.
For enterprise or multi-user deployments, server-based approaches make more sense.
Integration
| Feature | Mem0 | Letta | Zep | Cortex |
| Python SDK | Full | Full | Full | WIP (PyO3) |
| JavaScript SDK | Yes | No | Yes | No |
| REST API | Yes | Yes | Yes | No (planned v1.0) |
| MCP Protocol | No | No | No | Native |
| LangChain integration | Yes | Yes | Yes | No |
Mem0 wins on ecosystem breadth. If you're building with LangChain or need a REST API today, Mem0 is the pragmatic choice.
Cortex's MCP (Model Context Protocol) native support is interesting for Claude Code and Claude Desktop users — it plugs in directly without SDK integration. But the ecosystem is still early.
What I'd Use Where
Simple chatbot personalization → Mem0 Mature, well-documented, large community. Vector similarity is enough for preferences and facts.
Full agent platform → Letta If you're building agents from scratch and want memory baked in, Letta's self-editing approach is compelling.
Temporal/conversation-heavy use cases → Zep When "what happened when" matters more than "what facts do I know."
Personal AI assistant (local) → Cortex If you want your AI's memory to stay on your machine, run offline, and model human-like memory consolidation. Early stage but architecturally the most ambitious.
24/7 cloud agents → memU Purpose-built for long-running agents that need to minimize token cost.
The Bigger Picture
AI memory is still early. Most projects are under two years old, and the "right" architecture hasn't been settled. The field is splitting along two axes:
- Cloud vs Local — where does your memory live?
- Flat vs Structured — are memories just vector entries, or do they have tiers, types, and lifecycle?
The cloud projects have the stars and the ecosystem. The local projects have the privacy argument and the performance. Both approaches will likely coexist, just like cloud databases and SQLite coexist today.
What's clear is that stateless LLMs aren't enough for agents. Memory is becoming infrastructure, not a feature. The projects that get the abstraction right — making memory as easy to add as a database connection — will define this category.
Benchmarks run on Apple M-series, single-threaded, 10K memory entries. Cloud latencies measured from US-West. Your numbers will vary. All projects tested at their latest stable release as of March 2026.




