AI Memory
AI memory refers to the mechanisms that allow artificial intelligence agents to retain, retrieve, and use information across interactions, extending capability beyond a single context window.
AI memory is the umbrella term for the mechanisms by which artificial intelligence systems, especially LLM-based agents, retain information beyond the boundaries of a single prompt and use that retained information to inform later behaviour. Without explicit memory, a model is stateless: each request is evaluated only against its current context window and built-in parameters, so personalisation, long-running tasks, and continuity across sessions are impossible. Memory mechanisms add the missing state by writing, retrieving, summarising, and sometimes forgetting information across interactions.
Why memory matters for agents
Static LLMs are constrained by their fixed context window, which even at hundreds of thousands of tokens cannot hold a user's complete history, a large codebase, or a multi-week project record. Agents that operate over long horizons must therefore externalise state. Memory enables personalisation (remembering a user's preferences), tool reliability (recalling which tools have failed), planning (referring to earlier plans and reflections), and cross-session continuity (resuming a conversation).
Taxonomy of memory types
Researchers and practitioners increasingly distinguish four functional categories, loosely analogous to human cognitive systems:
| Type | Stored content | Typical implementation | |---|---|---| | Short-term (working) memory | Current conversation, scratchpad | The model's active context window | | Episodic memory | Specific past events and interactions | Vector store of interaction logs | | Semantic memory | General facts, preferences, world knowledge | Knowledge graph or summarised notes | | Procedural memory | Learned skills, tool-use patterns | Fine-tuned weights, cached plans |
A 2025 wave of research papers argues that the older "short-term vs long-term" split is too coarse and that production agents need explicit episodic and semantic stores with different write, retrieval, and decay policies.
How memory is implemented
Most production agent stacks implement memory as a layered system around an LLM call. On every turn the agent (1) writes salient new information to one or more stores, (2) retrieves relevant prior information by similarity or keyword search, (3) composes the retrieved memory into the prompt alongside the user message, and (4) optionally reflects by summarising or consolidating older entries to keep storage bounded.
Common storage substrates include vector databases (Pinecone, Weaviate, Qdrant, Chroma, pgvector), key-value or document stores (Redis, MongoDB, DynamoDB), and graph databases (Neo4j, ArangoDB, TigerGraph) for relational facts. Frameworks such as LangChain, LangGraph, LlamaIndex, MemGPT, Letta, Mem0, and Zep provide higher-level memory primitives.
Retrieval and forgetting
Memory systems must solve two opposing problems: ensuring that relevant information is recalled when needed and ensuring that the store does not grow unbounded or dilute retrieval quality with stale entries. Retrieval typically combines dense vector similarity with metadata filters, recency boosts, and importance scores; forgetting is handled by time-based decay, summarisation into higher-level notes, or explicit user-controlled deletion. Designing the forgetting policy is often as important as designing the writing policy, since retaining everything degrades retrieval relevance and raises privacy risk.
Privacy and governance
Persistent memory raises distinctive governance questions: which categories of personal data may be retained, for how long, with what user controls, and under what jurisdiction. Memory stores that learn from user behaviour are subject to the same privacy laws as any other personal data system, including the EU GDPR, Singapore PDPA, and Malaysia's PDPA, and may attract specific obligations under emerging AI regulations regarding data subject rights of access and erasure.
See Also
References
References
- Park, J. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. UIST.
- Packer, C. et al. (2023). MemGPT: Towards LLMs as Operating Systems. arXiv.
- Liu, S. et al. (2025). Memory in the Age of AI Agents: A Survey. arXiv.
- Kim, J. et al. (2022). A Machine with Short-Term, Episodic, and Semantic Memory Systems. arXiv:2212.02098.
- Bank Negara Malaysia. (2024). Discussion Paper on the Use of AI by Financial Institutions.