GraphRAG Explained: Neo4j Cypher AI + Knowledge Graph RAG Guide¶

Author: QuarkAndCode
Published:
Source: https://medium.com/@QuarkAndCode/graphrag-explained-neo4j-cypher-ai-knowledge-graph-rag-guide-941777de986c
Fetched: 2026-06-07T01:56:57.992406

GraphRAG Explained: Neo4j Cypher AI + Knowledge Graph RAG Guide¶

Press enter or click to view image in full size

Large language models can sound confident even when they’re wrong. That’s not a personality flaw — it’s how they work: they generate likely text, not guaranteed facts. So when you ask a question that depends on your data (your documents, your systems, your domain), you need a way to “ground” the model in real information.

That’s the idea behind Retrieval-Augmented Generation (RAG): retrieve relevant source material first, then let the model answer using that material as context. Neo4j’s developer blog describes standard RAG as using vector search to find relevant documents and passing them to a chatbot as context to reduce hallucinations and enable grounded answers with references.

GraphRAG is what happens when you go one step further: instead of retrieving a flat list of similar text chunks, you use a graph’s relationships to pull in additional, connected context — often exactly the “missing bridge” that makes an answer correct and explainable. Neo4j’s “(Almost) Pure Cypher” post summarizes it simply: GraphRAG still uses vector search to find a starting document, but then it expands through the knowledge graph to gather more relevant material to feed the model.

But GraphRAG is also used in another, equally important way: as a natural-language interface to a knowledge graph, where an LLM turns a question into a graph query (like Cypher), executes it, and then translates results back into plain language. A recent paper calls GraphRAG “a paradigm that extends RAG by incorporating graph-structured data” for more accurate and multi-hop reasoning — and notes that systems often include stages like graph indexing, graph-guided retrieval, and graph-enhanced generation.

Let’s unpack what this means in practice, what’s newly possible in Neo4j, what research says about making GraphRAG more reliable, and (crucially) when you may not need it at all.

RAG in plain English¶

A modern RAG system usually has three moving parts:

A store of information (documents, tickets, wiki pages, product manuals, etc.)
A retrieval step that picks the most relevant pieces (often via vectors/embeddings)
A generation step that answers using only the retrieved pieces as evidence

Neo4j’s GraphRAG post emphasizes why this matters: in RAG, retrieval results become context for a chatbot, which helps reduce hallucinations and can provide source references.

So what’s different about GraphRAG?¶

Vector search is good at answering: “Which text looks semantically similar to my question?”
Graphs are good at answering: “What is connected to what — and how?”

GraphRAG combines both:

Vector search gets you an entry point (a relevant node/document).
Graph traversal expands the context along meaningful relationships.
The LLM answers using that richer, connected context.

Neo4j’s 2025 post (about new Cypher AI procedures) gives a concise “GraphRAG flow” recap:

embed the question, 2) vector search for relevant nodes, 3) traverse the graph for neighboring context, 4) ask the LLM to answer only from that context.

Meanwhile, the Lilys.ai note describes the “querying with LLMs” flavor: the model generates a Cypher query from a natural-language question, executes it against the graph database, and then translates results back into a natural-language response — often using a two-part prompting strategy (one prompt to generate Cypher, another to produce the final answer).

These are two complementary patterns:

Graph expansion GraphRAG (vector seed → traversal → context → answer)
Text-to-Cypher GraphRAG (LLM generates query → database returns facts → answer)

Many real systems blend them.

A concrete example: GraphRAG with a massive movie graph (Neo4j, 2024)¶

In July 2024, Neo4j’s Christoffer Bergman demonstrated GraphRAG using a huge entertainment dataset: about 10 million titles, 13 million people, and 106 million relationships, with ~1.3 million titles containing synopses. The graph was built from public, non-commercial IMDB and TMDB data fetched on February 22, 2024.

The workflow he showed is a “classic” GraphRAG pattern:

Create a vector index on the node property that will store embeddings (in this case, a synopsis embedding).
Generate embeddings in batches using Neo4j’s GenAI procedure genai.vector.encodeBatch and set them on nodes — because embedding 1.3 million synopses requires batching. The post notes a practical constraint: OpenAI supported 2,048 properties in a batch, which shaped how batching was done.
Ask a question — in the post, the example question was: “With what family does Robb Stark start a war?”
Use vector search to locate the most relevant synopsis nodes, then use graph structure to expand from that entry point to gather additional relevant context.

Even without showing every line of query code, the core idea is intuitive:

Vector search gets you “Game of Thrones” quickly.
The graph helps you pull in related episodes/series links or other relevant nodes so the model gets a fuller picture than one synopsis chunk.

Why this matters: If you only retrieve the “closest chunk,” you can easily miss key details that sit one or two hops away — like relationships, roles, or linked entities.

From “almost pure Cypher” to “pure Cypher”: Neo4j’s 2025 leap¶

The 2024 post was called “GraphRAG in (Almost) Pure Cypher” for a reason: you could do embeddings and vector search in Cypher, but you still needed external code (or special plugins) to call the LLM for the final answer.

In December 2025, Neo4j published an update: it’s time to remove the “Almost.” The post explains that previously you had to use external functions/client-side code to integrate the LLM, but “not anymore,” because Neo4j introduced a new package of AI functions/procedures “in Cypher and Aura.”

What changed, specifically?¶

The blog describes new Cypher AI capabilities (as of the 2025.11 release surface) including:

Embedding generation
ai.text.embed
ai.text.embedBatch
ai.text.embed.providers
Text generation
ai.text.completion
ai.text.completion.providers

It also notes that Neo4j introduced vectors as a native data type at the end of 2025 (not just a list of floats), and these new procedures align with that capability.

A memorable demo: using GraphRAG to “investigate” a famous cold case¶

To show what “pure Cypher GraphRAG” looks like end-to-end, the 2025 post uses a real-world narrative: the murder of Swedish Prime Minister Olof Palme, framing it as a GraphRAG case study.

The author points out a practical barrier: accessing the full investigation files via official requests is costly and slow (the post mentions a per-page fee and an estimated total cost). Instead, he uses a public wiki resource, wpu.nu, and notes that “a wiki is… a graph,” making it straightforward to import pages and links into a graph database. He simplifies the schema to Pages and Categories (with room to expand later by extracting people/suspects/witnesses as entities).

The GraphRAG pipeline, in Cypher¶

Once the pages are in the graph:

Create a vector index for (:Page) embeddings.
Embed page text in bulk.

The post highlights a real operational limit: embedding batches have a 300,000 token limit, so it proposes splitting into batches (the example uses 400 pages per batch) rather than embedding everything in one call.

For retrieval, the post shows an approach that accounts for approximate nearest-neighbor search: query a larger set (top 20) and then pick the top 2 for more reliable seeds.

Then comes the “graph part”:

Start from the two seed pages.
Expand to neighboring pages connected via LINKS_TO.
Also gather pages along shortest paths between the two seed nodes up to a bounded length (0–4 hops in the example), to capture bridging context.

Finally, the post explains the key new capability:

The part that you couldn’t do in Cypher before … is now possible with ai.text.completion.

The prompt instructs the model to answer only from context and to say “I don’t know” when the answer isn’t supported.

What the output looks like when the system is behaving well¶

When asked about whether a specific person likely committed the murder, the example answer is careful and cites which wiki pages/sections support it.
When asked “who murdered Olof Palme,” the system replies “I don’t know,” and explains that the provided context does not establish a definitive perpetrator — again with references to relevant pages.

For general readers, this is a big deal: it shows GraphRAG can produce answers that are not just fluent, but epistemically honest — willing to say “unknown” when evidence is missing.

What research adds: making Text-to-Cypher GraphRAG more reliable with multiple agents¶

The arXiv paper “Multi-Agent GraphRAG: A Text-to-Cypher Framework for Labeled Property Graphs” argues that much GraphRAG work focuses on RDF graphs and SPARQL, while Cypher + labeled property graphs (LPGs) are underexplored as reasoning engines for GraphRAG pipelines.

The core idea¶

Instead of trusting a single LLM prompt to generate a correct Cypher query, the paper proposes an agentic workflow:

Generate a Cypher query
Execute it against the graph database
Critique the result and fix the query
Verify named entities and schema elements against the actual graph
Repeat until the query is accepted or attempts are exhausted

The agents¶

The system includes seven “roles” plus an executor module, including:

Query Generator: writes Cypher, grounded on a provided schema context.
Graph Database Executor: runs the query and returns results or errors.
Query Evaluator: grades the query output as Accept, Incorrect, or Error/Empty based on semantic alignment and result quality.
Named Entity Extractor: pulls out labels, property values, and relationship types likely to be hallucinated.
Verification Module: checks extracted entities against the real graph; when something doesn’t exist, it uses string similarity (Levenshtein via rapidfuzz) to suggest candidates and then uses an LLM to rank replacements semantically.
Instructions Generator: turns verification results into concise, actionable correction instructions for the query generator.
Feedback Aggregator: merges evaluator feedback and verification feedback into a prioritized correction plan.
Interpreter: turns an accepted query’s results into a concise domain answer.

It’s a practical acknowledgement of reality: in graph querying, a lot can go wrong — mis-typed labels, wrong relationship directions, nonexistent properties — and you need automated ways to catch and correct these.

What they tested¶

They evaluate on CypherBench, selecting five graphs (art, flight accident, company, geography, fictional character) and sampling 150 question–answer pairs per graph.
They also test on an Industry Foundation Classes (IFC) building “digital twin” graph (Sample House), using 10 curated questions from a prior dataset.

Across multiple models, their agentic workflow outperforms a “single-pass” baseline. The paper reports average improvements of about +10.23% for Gemini 2.5 Pro, +6.79% for GPT-4o, +7.67% for Qwen3 Coder, and +10.01% for GigaChat 2 MAX in their setup.

They implement the system in Python using LangGraph for orchestration and Memgraph as the database backend (via an OpenAI-compatible API interface for models).

The hype check: do you actually need GraphRAG?¶

GraphRAG can be powerful — but it’s not automatically the right answer.

A practitioner-oriented piece promoted by Towards Data Science highlights real production challenges:

Semantic retrieval can return false positives
LLM “attention drop” can cause missed entities during extraction
Robust GraphRAG depends on data normalization, deduplication, and chunking-based entity extraction

In another summary post, the same source emphasizes that building a graph doesn’t have to be overwhelming and mentions pragmatic design principles such as using a star graph to balance effort vs. value.

Note on sourcing: the full Towards Data Science article itself was not directly accessible to me in this environment (site restrictions). The points above come from the publicly visible summary posts linking to it.

A practical decision guide for general readers¶

Here’s a grounded way to decide.

You probably do want GraphRAG when…

Your questions are inherently relational.
Examples: “Who worked with whom?”, “What caused what?”, “Which components depend on this one?”, “What changed between these two versions?”

Graph traversal is built for this kind of multi-hop reasoning and connecting evidence across entities.

Your data has structure that matters.
If it’s not just text, but entities and links (people ↔ projects ↔ tickets ↔ services), a graph can represent that structure directly.

You need more than “top-K chunks.”
Vector RAG is often limited to a handful of “most similar” chunks. The Lilys.ai note argues that GraphRAG can overcome that limitation by using graph-based mechanisms and potentially graph indexes/summaries to retrieve broader context.

You care about explainability.
GraphRAG makes it easier to justify “why this context was chosen” by pointing to traversed relationships and paths — like the Neo4j example that references specific wiki pages/sections.

You might not need GraphRAG when…

Your questions are mostly “find the paragraph that says X.”
If users ask simple factual questions that live in one place in one document, a well-built vector RAG system (with good chunking + reranking) is often enough.

Your “graph” is expensive to build and maintain.
If you don’t already have relationships, you may need extraction pipelines (and all the normalization/deduplication work that goes with them). That’s not free.

Your biggest problem is retrieval quality, not reasoning.
GraphRAG can still fail if your initial retrieval is noisy (false positives) or your entity extraction misses key items.

A common outcome in production is a hybrid: vector search for recall, graph logic for precision and reasoning. Lilys.ai explicitly notes that hybrid RAG systems can combine both vector and graph databases in practice.

A “starter blueprint” for building GraphRAG without drowning in complexity¶

If you’re designing for real users, start small and iterate.

1) Pick your first graph shape¶

Neo4j’s cold-case example starts with Pages and Categories only, even though it could extract richer entities later. This is a good pattern: start with a minimum useful graph and expand once it’s delivering value.

A star graph (one central entity connected to many attributes/documents) is often a high-value first step, as practitioner summaries suggest.

2) Decide what gets embeddings¶

In both Neo4j examples, embeddings are attached to nodes with substantial text (movie synopses; wiki page text).

3) Batch embedding generation deliberately¶

The 2025 Neo4j post shows why: token limits and efficiency matter, so you batch to stay under constraints (it mentions a 300,000 token limit per batch and uses 400-page batches).

4) Retrieval: seed with vectors, expand with graph paths¶

A robust pattern is:

Embed the question
Vector search for seed nodes (sometimes over-fetch and then refine)
Traverse neighbors and (optionally) shortest paths to capture bridging context

5) Generation: force grounding and allow “I don’t know”¶

Neo4j’s example prompt explicitly instructs: answer only from context, cite references, and say “I don’t know” if unsupported.

6) For Text-to-Cypher systems: add verification and feedback loops¶

If you go the “LLM writes Cypher” route, the multi-agent paper shows why iterative refinement and entity verification can improve accuracy — and how to do it (verification queries, string similarity candidates, LLM ranking, structured feedback).

Where GraphRAG is heading¶

Tooling is moving into the database (Neo4j’s ai.text.* procedures), reducing glue code and enabling reproducible pipelines in Cypher.
Research is moving toward agentic, verifiable graph querying (multi-agent workflows that detect and correct hallucinated schema elements).
Practitioners are pushing back on hype and emphasizing the unglamorous essentials: retrieval precision, normalization, deduplication, and extraction quality.

GraphRAG isn’t a magic upgrade button. It’s a design choice: pay the cost of structure so your AI can reason over relationships — then get the benefits of better multi-hop answers, better traceability, and more honest uncertainty.

References¶

https://neo4j.com/blog/developer/graphrag-pure-cypher/

https://neo4j.com/blog/developer/new-cypher-ai-procedures/

https://arxiv.org/abs/2511.08274

https://towardsdatascience.com/do-you-really-need-graphrag-a-practitioners-guide-beyond-the-hype/

https://lilys.ai/en/notes/get-your-first-users-20260207/graphrag-ai-retrieval-knowledge-graph-cypher