GraphRAG
A retrieval-augmented generation technique that builds a knowledge graph of entities and relationships from a document collection, enabling AI systems to answer multi-hop and whole-corpus questions that plain vector search handles poorly.
GraphRAG is a retrieval-augmented generation technique that combines knowledge graph construction with large language model summarisation to answer questions over large document collections. It was introduced by Microsoft Research in early 2024 and released as open source in July 2024, where it rapidly gained a large following. Unlike conventional retrieval-augmented generation, which fetches passages that are semantically similar to a query, GraphRAG builds an explicit graph of the entities and relationships in a corpus and uses that structure to reason about how information connects.
Why plain RAG falls short
Standard retrieval-augmented generation, often called baseline RAG, splits documents into chunks, converts them into vector embeddings, and retrieves the chunks most similar to a user query. This works well for questions whose answer lies in one or a few passages. It struggles, however, with two categories of question. The first is multi-hop questions that require chaining facts spread across several documents. The second is global or whole-corpus questions, such as "What are the main themes across this entire dataset?", where the answer is not contained in any single chunk but must be synthesised from the collection as a whole. Because vector search retrieves only locally similar text, it has no mechanism for aggregating information across an entire body of documents.
How GraphRAG works
GraphRAG proceeds in an indexing phase and a query phase. During indexing, a language model reads the source documents and extracts entities, such as people, organisations and concepts, along with the relationships between them. These are assembled into a knowledge graph. The graph is then partitioned into nested communities of closely related entities using a graph clustering method; Microsoft's implementation uses the Leiden algorithm to build a hierarchy of communities. For each community, the language model writes a summary describing the entities it contains and how they relate.
At query time, GraphRAG can operate in different modes. For global questions, it draws on the community summaries at an appropriate level of the hierarchy, combines partial answers from each, and produces a synthesised response covering the whole corpus. For more specific questions, it can traverse the graph from relevant entities to gather connected facts. This structure lets the system answer questions that require understanding how concepts link across many documents.
Performance and variants
Microsoft reported that GraphRAG produced substantially more comprehensive and accurate answers than baseline RAG on whole-dataset reasoning tasks, with large gains in answer completeness on benchmark questions. The open-source release attracted tens of thousands of stars on GitHub and prompted a wave of research variants, including robustly optimised implementations, agentic graph-search workflows, and hierarchical tag-guided retrieval.
The main trade-off is cost. Building the graph requires many language model calls to extract entities and generate community summaries, which makes indexing more expensive and slower than simply embedding text chunks. For collections that change frequently, keeping the graph up to date adds further overhead. As a result, GraphRAG is most attractive for high-value corpora where accurate global reasoning justifies the additional indexing expense.
| Aspect | Baseline RAG | GraphRAG | | --- | --- | --- | | Retrieval unit | Text chunks | Entities, relationships, community summaries | | Multi-hop questions | Weak | Strong | | Whole-corpus questions | Poor | Designed for them | | Indexing cost | Low | Higher |
References
- Edge, D. et al. (2024). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. Microsoft Research, arXiv:2404.16130.
- Microsoft Research. (2024). Project GraphRAG. microsoft.com/research.
- IBM. (2025). What is GraphRAG?. IBM Think Topics.