Developers Guide GraphRAG
Developers Guide GraphRAG
GraphRAG
Alison Cossette
Zach Blumenfeld
Damaso Sanoja
The Developer’s Guide to GraphRAG
Table of Contents
What Is GraphRAG?...................................................................................................... 6
1. Context-Aware Responses..................................................................... 6
Neo4j Connection............................................................................................ 10
Mapping Relationships................................................................................. 15
The Developer’s Guide to GraphRAG
Import Libraries................................................................................................................ 17
Text2CypherRetriever................................................................................................... 23
PART I: The Problem With the meaning depends on a sales addendum from
three weeks earlier. Or maybe they ask a support
Current RAG question that only makes sense in the context of
their infrastructure and license tier. The information
is there, but it’s scattered across multiple documents,
Why chunk-based RAG hits a ceiling — and why
formats, and timelines. Chunk-based retrieval can’t
developers need more context to answer well
bridge that gap.
You’ve built a retrieval-augmented generation (RAG)
Traditional RAG doesn’t have shared context
system. You embedded the docs, connected the
across documents. That’s because it doesn’t track
vector store, wrapped a prompt around the output,
relationships. It doesn’t know which concepts are
and deployed it. For a minute, it felt like you cracked
upstream, downstream, dependent, or mutually
the code. The model was grounded in your own data,
exclusive. It doesn’t distinguish between definitions,
giving answers that sounded smarter than base GPT.
instructions, timelines, policies, or decision logic.
Then reality hit.
The bottom line: Traditional RAG treats all chunks
The system works — but only under the most as equal, flat, unstructured blobs of text.
forgiving conditions. The moment you ask a question
Even more problematic is that the system has
that spans documents, relies on implicit context, or
no mental model for your business. It cannot
touches anything complex or structured, the cracks
understand what a “customer” is in your world.
start to show. Answers get vague. Sometimes they’re
Or how a support ticket relates to a contract. Or
just plain wrong. Or worse, the system confidently
what a system diagram implies about downstream
quotes the right chunk — but misses the point entirely.
integrations. The mental model that represents the
Your RAG system isn’t broken. It’s just blind. structure behind your content is absent in RAG.
You give it a query, it vectorizes that query, and • Answer a support question and understand
fetches the top-k similar chunks. That’s fine if the the user’s tech stack, contract level,
answer you need lives entirely within isolated chunks. and product version.
But most real-world questions don’t work that way. • Explain a contract term — and know what the
sales path looked like, who signed off, and
Let’s say a user asks about a contract clause, but
which systems were impacted.
4
The Developer’s Guide to GraphRAG
• Interpret a customer review and place it in such as ChatGPT, Gemini, and Claude. When a
context with purchase history, usage data, and user’s prompt goes directly to the LLM, it generates
net promoter score (NPS). a response based on its training data. Due to the
probabilistic nature of response generation, LLMs
These shouldn’t feel like advanced use cases —
often produce responses that lack accuracy and
they’re basic context. They’re what you, as a human
nuance and don’t draw on knowledge specific to your
developer, bring into every decision without even
business. In addition, the LLM in question may have
realizing it. And that’s the problem: Your RAG
limited explainability, which limits its adoption in
system has none of that. Sure, it has some document
enterprise settings.
metadata available, but no user metadata, no
business logic, no connected data — just isolated RAG addresses these challenges by intercepting
chunks in a vector store. But RAG can’t use what it a user’s prompt, querying external data, usually a
can’t see. So until you give it structure — until you vector store, and passing relevant documents back
teach it relationships, timelines, ownership, and to the LLM. Adding retrieval to the LLM enables the
dependencies — it will keep retrieving the right application to answer questions with knowledge
words for the wrong reasons. from a specific dataset. This simple technique
suddenly makes it possible to build applications for a
This isn’t a whitepaper. It’s a build-it-yourself playbook.
variety of use cases. As examples:
We’re going to walk you through:
• Knowledge assistants can tap into company-
• Ingesting documents and turning them into a specific information for accurate, contextual
knowledge graph responses.
• Structuring real-world context from messy • Recommendation systems can incorporate
PDFs, CSVs, and APIs real-time data for more personalized
• Building retrievers that combine vector search suggestions.
and graph traversal • Search APIs can deliver more nuanced and
• Using text-to-query generation to run dynamic context-aware results.
Cypher queries (a query language for graphs)
RAG consists of three key components:
and pull precise information and calculations
from your data • An LLM that serves as the generator
• A knowledge base or database that stores the
And we’re going to do it with code. No fluff. Just the
information to be retrieved
stack, the logic, and the patterns that actually work.
• A retrieval mechanism to find relevant
If you’ve built RAG, and you know it’s not enough,
information from the knowledge base, based
then this is the guide to take you further.
on the input query
5
The Developer’s Guide to GraphRAG
The quality of a RAG response depends heavily on customer’s purchase history, known issues with that
the database type the information is retrieved from. product version, related documentation, and prior
If you use a vector store (as in traditional RAG), the support conversations.
process goes like this: The user query is turned into
a vector, which is then used to retrieve semantically
similar text chunks from a vector database. While
retrieval based on semantic similarity can work
across multiple documents, it often falls short when
questions require understanding implicit context or
relationships that span those documents. Traditional
RAG treats each chunk in isolation, as it lacks a
holistic view of the domain. Figure 2. Order issue flow
Retrieval based on semantic similarity can only get A knowledge graph holds all related information
you so far. And this is where GraphRAG comes in. together across both structured and unstructured
GraphRAG gives the LLM a mental model of your data. A RAG system built on a knowledge graph —
domain so that it can answer questions by drawing on or GraphRAG — excels at generating context-aware
the correct context. responses.
What Is GraphRAG? The main reasons to implement a GraphRAG solution
In GraphRAG, the knowledge base used for retrieval include:
is a knowledge graph. A knowledge graph organizes
facts as connected entities and relationships, 1. Context-Aware Responses
which helps the system understand how pieces of Unlike traditional RAG, which retrieves isolated
information relate to each other. chunks of text based on similarity, GraphRAG
The knowledge graph becomes a mental map of your retrieves facts in context. Since the knowledge
domain, providing the LLM with information about graph explicitly encodes relationships,
dependencies, sequences, hierarchies, and meaning. GraphRAG returns relevant information, as
This makes GraphRAG especially effective at well as related information. This structured
answering complex, multi-step questions that require retrieval ensures that application outputs are
reasoning across multiple sources. comprehensive, reducing hallucinations and
leading to more accurate, reliable outputs and
Imagine that a customer calls to request support improving real-world applicability.
regarding a recent purchase. Customer Service uses
an internal chatbot to troubleshoot the request. 2. Traceability and Explainability
A traditional system built on vector-only RAG LLMs and even standard RAG approaches
would retrieve a product name from the customer operate as black boxes, making it difficult
support ticket: to know why and how a certain answer was
generated. GraphRAG increases transparency
Service Ticket Service Ticket Text Embedding
by structuring retrieval paths through the
234381 My new JavaCo coffee [.234, .789, .123……]
knowledge graph. The knowledge graph
maker isn’t working.
will show the sources and relationships that
But that’s all the RAG system would surface. contributed to a response. This makes it
easier to audit results, build trust, and meet
A GraphRAG system, on the other hand, would compliance needs.
show not only this service ticket text but also the
6
The Developer’s Guide to GraphRAG
7
The Developer’s Guide to GraphRAG
Ground With Unstructured At one end, you’ve got relational databases and clean
CSV, where entities and relationships are explicitly
and Structured Data defined. At the other end, you’ve got raw text:
meaning buried in natural language. In between?
A complex middle: XML files, JSON logs, form
If you’ve worked with RAG systems, you’re already
submissions, and mixed-format documents with both
familiar with vector databases and unstructured
tables and prose.
content — PDFs, contracts, reports. But the most
important context for your data rarely lives in a As you think about your own dataset, ask yourself
single format. In fact, most of the time you’ll want these questions: Where does the context for
to use more than just unstructured data. Structured your application actually live? And where on the
data like CRM exports, product catalogs, and structure continuum does it fall? These questions
relational databases often contains crucial grounding matter because they will help you determine the
information for the answers your users need. tools you should use to build the knowledge graph.
For this guide, you’ll use:
To build systems that retrieve the right answer at
the right time, you need to connect two worlds: • Neo4j Data Importer (Neo4j Aura Platform) for
unstructured and structured. That’s where structured data
knowledge graphs come in. By linking unstructured • Knowledge Graph Builder Pipeline (Neo4j
chunks to structured business entities and GraphRAG Python Package) for extracting
relationships, you create a semantic network that implicit relationships from natural language
makes retrieval smarter, safer, and more transparent.
So, where do you start? With your documents or your If you find that your dataset has more complex data
structured schema? structures, you can consider adding tools to your
workflow. This is an ever-evolving field, and many are
Technically, you can begin from either side. But in working on building tools for these scenarios. A few
practice, most teams start with unstructured data to consider:
because that’s where the buried context usually lives.
Think financial disclosures, legal contracts, emails, Took Description Resource
and support tickets. These contain implicit business [Link] Extracts structured data Neo4j
logic, risk factors, and decision-making signals that (tables, lists, key-value pairs) Integration
don’t show up in structured rows and columns. from unstructured documents Guide
like PDFs, HTML, and email
But here’s the catch: Structure isn’t binary. It’s a Boundary’s Declarative language for BAML
continuum. Annotation extracting structured data to Neo4j
Modeling from unstructured sources, Tutorial by
Language demonstrated with Neo4j Jason Koo
(BAML)
8
The Developer’s Guide to GraphRAG
Tip: Download your AuraDB credentials (URI, If you’re just getting started, you’ll do well with
username, password) immediately after creating AuraDB Free or AuraDB Professional trial.
the instance. They will not be available for
download later. Store them securely, as you’ll need
them to connect your application to Neo4j.
9
The Developer’s Guide to GraphRAG
Be sure to download the credentials when you set up • Document Chunking and Storage: The
the database because they won’t be available later on. package uses the SimpleKGPipeline class
to automate chunking and storage. This
class handles the parsing of documents, the
chunking of text, and storage of chunks as
nodes in Neo4j.
from neo4j_graphrag.[Link].kg_
builder import SimpleKGPipeline
As you begin to build your knowledge graph, you • neo4j: Official Python driver for interacting
can use the Neo4j GraphRAG Python library. This with a Neo4j database.
package offers specialized functionalities that • GraphDatabase: Connects to Neo4j to
streamline and enhance the process of building a interact with the graph database.
knowledge graph from unstructured data, such as PDFs. • SimpleKGPipeline: Automates chunking,
Capabilities include document chunking, embedding entity recognition, and storage in Neo4j.
generation, and knowledge graph construction. • OpenAILLM: Integrates GPT-4 for text-based
processing and knowledge extraction.
pip install neo4j-graphrag • OpenAIEmbeddings: Handles vector
embeddings to enable semantic search in
Neo4j.
• ERExtractionTemplate: Supplies prompt
templates for entity-relation extraction.
10
The Developer’s Guide to GraphRAG
Note that the required credentials can be found in the Defining your nodes and relationships in two lists is
.txt file you downloaded when you created the instance. a key moment in the knowledge graph construction
process. This is when you determine the data model.
These lists control what the SimpleKGBuilder will
look for in the text and how it will organize that
information in your graph. To understand how you
might want to construct these lists, let’s take a look
at some general ideas.
Entities = Nouns
Figure 9. Credentials from .txt file
What are the real-world concepts you’re trying
• NEO4J_URI: The database URL (e.g., to capture?
“neo4j+s://[Link].neo4j.
Company, Executive, RiskFactor, Product —
io”)
whatever matters to your domain.
• auth=(NEO4J_USER, NEO4J_PASSWORD):
Credentials to authenticate Relationships = Verbs or Connectors
How do those concepts relate?
Initialize the LLM and Embeddings
model used to vectorize text, which supports of the chunk text. This is how your retriever finds the
similarity-based retrieval alongside structured relevant chunk in your application: by comparing the
querying. embedding of the query and the embeddings in your
data store.
The entities and relations define the schema:
what kinds of objects (like Customers, Contracts,
Products) and relationships (like HAS_CONTRACT,
CONTAINS, REFERENCES) the pipeline should look
for. Finally, enforce_schema=True ensures that
only the entity and relationship types that have been
explicitly defined in those lists are allowed into the
graph. This prevents schema drift and keeps the
resulting knowledge graph clean and reliable.
[Link](run_pipeline_on_file(pdf_file, pipeline))
12
The Developer’s Guide to GraphRAG
By using a vector index, Neo4j enables scalable, real- 2. Create a new graph model.
time retrieval of relevant knowledge from large and
complex graphs.
create_vector_index(driver, name=”chunkEmbeddings”,
label=”Chunk”,
embedding_property=”embedding”,
dimensions=1536, similarity_fn=”cosine”)
for bringing structured data into your graph database. 3. A graph data model has been provided for your
Here’s how to use this powerful tool. The Neo4j Aura convenience. Note: Due to pathway differences
console includes a dedicated Data Importer feature between operating systems, please choose
that allows you to transform tabular data into graph either Mac or Windows data models.
structures without writing code. This tool works well
in quickly populating your knowledge graph with
data from existing datasets.
13
The Developer’s Guide to GraphRAG
5. Once the files are connected, you’ll see that Once you’ve uploaded these CSV files, you’ll be
the data model has check marks for each entity given a choice as to how to proceed. Click Define
and relationship. Click Run Import in the upper Manually to begin building your data model.
right-hand corner.
First, you’ll see a blank node, and on the right-
hand side, you’ll see the parameters for that node,
including Label, Table, Properties.
14
The Developer’s Guide to GraphRAG
15
The Developer’s Guide to GraphRAG
Next, let’s create connections between and among 3. Drag the outline of the AssetManager node to
these entities. In our domain, the Asset Managers cover the Company node. When you release,
own stock in various companies. Here’s a sample you’ll see a new relationship arrow between
from the Asset_Manager_Holdings.csv: them:
MCDONALDS
ALLIANCEBERNSTEIN L.P. MCD 1201960
CORP
In a knowledge graph, we want to map the domain Figure 23. Drag and release for new relationship
knowledge of structured data, which in this case is
Clicking on this arrow allows you to edit the
the Asset Managers’ ownership of stock in a given
parameters of the relationship.
company. If entities are nouns, then relationships are
verbs. So let’s create the relationship OWNS that goes OWNS Relationship
from Asset Manager to Company.
Relationship Type: OWNS
1. Click on the AssetManager node. You’ll see a Table: Asset_Manager_Holdings.csv
blue outline of the node: Node ID Mapping
From:
Node: AssetManager
ID: managerName
ID column: managerName
To:
Node: Company
ID: name
ID column: companyName
Properties: shares
16
The Developer’s Guide to GraphRAG
The property shares represents the number of import. Click the blue Run import button in the upper
shares of the Company owned by the Asset Manager right corner of the screen.
and for this book is an optional inclusion. Additional
columns such as value or sharevalue are optional,
as well. When working with your own data, it’s best
to consider if that property will have value to your
use case. Will you be asking to rank based on shares
owned? Does the total value of the holding have
Figure 26. Run import button
relevance to your application? Additional information
on data modeling can be found at GraphAcademy. Now that your unstructured and structured data is
loaded, you can use the Explore and Query functions
FILED Relationship
to refine your graph structure and data to accurately
Note that the relationship between Company
represent your business domain. Use Explore to
and Document is the linchpin that connects the
visualize and navigate your graph with Neo4j Bloom
structured and the unstructured data in this
and Query to investigate the graph.
GraphRAG application.
Import Libraries
17
The Developer’s Guide to GraphRAG
This notebook imports the core libraries required for Here, you load sensitive configuration values
building and querying RAG pipelines with Neo4j and (such as database credentials and API keys) from
GraphRAG: environment variables, ensuring that secrets aren’t
hardcoded in your notebook. The steps include:
• [Link]: The official Python
driver for connecting to and querying a Neo4j • load_dotenv(): Loads environment variables
database. from an .env file into your Python environment.
• neo4j_graphrag.[Link]: Integrates • [Link](): Fetches the Neo4j connection
OpenAI language models for generating and URI, username, and password, as well as your
processing natural language queries. OpenAI API key.
• neo4j_graphrag.embeddings. • [Link](): Initializes the
OpenAIEmbeddings: Provides access to Neo4j database driver with the provided
OpenAI’s embedding models for generating credentials, allowing your notebook to connect
vector representations of text. and interact with your Neo4j instance securely.
• Neo4j_graphrag.retrievers: Different
TIP: Make sure your .env file contains the correct
retriever classes for semantic and hybrid
values for NEO4J_URI, NEO4J_USERNAME,
search over graph data using vector similarity
NEO4J_PASSWORD, and OPENAI_API_KEY
and Cypher queries:
before running this code. This approach keeps
• VectorRetriever
your credentials secure and makes your codebase
• VectorCypherRetriever
easier to share and maintain.
• Text2CypherRetriever
• neo4j_graphrag.[Link]: Initialize the LLM and Embedder
The main class for orchestrating RAG Just as you selected a specific LLM and embedding
workflows over a Neo4j knowledge graph. model when processing your PDFs, you should do
• neo4j_graphrag.schema.get_schema: the same when generating embeddings for your text
Utility to introspect and retrieve the schema of data. It’s important to keep track of the language
your Neo4j database. model and embedding tools that you use during this
• dotenv.load_dotenv: Loads environment process.
variables (such as credentials and API keys)
from an .env file for secure configuration. For the retrievers to work correctly, the embedding
model used during retrieval must match the one
These imports enable advanced semantic search, used to generate the dataset’s embeddings. This
retrieval, and GenAI capabilities directly on your ensures accurate and meaningful search results.
Neo4j knowledge graph.
llm = OPENAILLM (model_name=‘gpt-4o’, api_key=OPENAI_API_KEY)
Load Environment Variables and Initialize Neo4j Driver
embedder = OPENAIEmbeddings(api_key=OPENAI_API_KEY)
load_dotenv()
18
The Developer’s Guide to GraphRAG
19
The Developer’s Guide to GraphRAG
The basic retriever will cause the LLM to generate a While the vector search provided useful information
result like this: about cryptocurrency risks, it did not answer deeper,
more actionable questions, such as:
The main risks around cryptocurrency, as • Which specific companies are exposed to
highlighted in the context, include: these risks?
• What other risks may be occurring
1. Regulatory Risks: The regulatory status of
concurrently?
certain cryptocurrencies is unclear, which
• Which asset managers are associated with
could subject businesses to additional
the affected companies? (e.g., multi-hop
licensing and regulatory obligations. If
relationships from risk to company to asset
cryptocurrencies are deemed securities,
manager)
this might necessitate securities broker-
dealer registration under federal In other words, the approach demonstrated here
securities laws. Non-compliance could retrieves relevant text fragments. However, it
lead to regulatory actions, fines, and other doesn’t use the graph’s structure to connect the
consequences. risks to companies or asset managers, nor does it
2. Custodial Risks: Cryptocurrency assets show related or concurrent risks. There’s no traversal
held through a third-party custodian or multi-hop reasoning, so you miss out on the rich,
are susceptible to various risks, such as contextual insights that a knowledge graph
inappropriate access, theft, or destruction. can provide.
Inadequate insurance coverage by
To answer these more complex, relationship-driven
custodians and their potential inability to
questions, you need to combine vector search with
maintain effective controls can expose
graph-powered Cypher queries that can traverse and
customers to losses. In the event of a
analyze connections between entities. This is where
custodian’s bankruptcy, the treatment of
graph-enhanced retrieval patterns come in.
custodial holdings in proceedings remains
uncertain, which could delay or prevent The Graph-Enhanced Vector Search Pattern
the return of assets. The basic retriever pattern typically relies on text-
3. Third-Party Partner Risks: Dependence based embeddings, capturing only the semantic
on third-party custodians and financial meaning of content. While this method is effective
institutions means exposure to operational in identifying similar chunks, it leaves the LLM in the
disruptions, inability to safeguard dark as to how those items interact in the real world.
holdings, and financial defaults by these
partners, which could harm business The Graph-Enhanced Vector Search Pattern, also
operations and customer trust. known as augmented vector search, overcomes
this limitation by drawing on the graph structure
These risks underscore the need for robust (i.e., using not just what items are but also how
regulatory compliance, secure custodial they connect). By embedding node positions and
arrangements, and the management of third- relationships within a graph, this approach generates
party relationships to mitigate potential contextually relevant nodes, integrating both:
negative impacts on businesses offering
cryptocurrency products. • Unstructured data: Product descriptions,
customer reviews, and other text content via
semantic similarity
• Structured data: Purchase patterns, category
relationships, and transaction records via
explicit instructions
20
The Developer’s Guide to GraphRAG
The VectorCypherRetriever uses the full graph Next, let’s add this new retrieval query to the
capabilities of Neo4j by combining vector-based VectorCypherRetriever parameters:
similarity searches with graph traversal techniques.
vector_cypher_retriever = VectorCypherRetriever(
The retriever completes the following actions:
driver=driver,
21
The Developer’s Guide to GraphRAG
retrieval_query=chunk_to_asset_manager_query
The result: You can ask complex, context-aware
)
questions about entities in your own industry. The
GraphRAG retriever will surface relevant information
VectorCypherRetriever parameters:
that connects context across structured and
unstructured data to drive real-world understanding. • Driver: The Neo4j database connection
• Index_name: The name of the vector
With this in mind, let’s look at another
index (here, chunkEmbeddings) used for
VectorCypherRetriever example.
semantic search
22
The Developer’s Guide to GraphRAG
• Embedder: The embedding model used to Since these results look as expected, we proceed to
generate/query vector representations the natural language output:
• Retrieval_query: The Cypher query
result = GraphRag(llm=llm,retriever=vector_cyper_retriever)
(defined above) that tells Neo4j how to
print([Link](query_text=query_text).answer)
traverse the graph from the semantically
matched nodes
The Asset Managers most affected by
result = vector_cypher_retriever.search(query_text=query,
top_k=10) cryptocurrency concerns are:
for item in [Link]:
1. BlackRock Inc.
print([Link][:100])
2. FMR LLC
Let’s look at the results:
3. STATE STREET CORP
<Record company=’APPLE INC’
AssetManager=’BlackRock Inc.’ 4. GEODE CAPITAL MANAGEMENT, LLC
shares=1031407553> 5. MORGAN STANLEY
<Record company=’APPLE INC’ 6. NORTHERN TRUST CORP
AssetManager=’Berkshire Hathaway Inc’
shares=915560382> 7. BANK OF AMERICA CORP /DE/
23
The Developer’s Guide to GraphRAG
The process begins with a natural language question, Now that you’ve defined the schema, you
such as: have everything you need to set the
Text2CypherRetriever.
“What are the names of companies owned
by BlackRock Inc.?” query=”What are the names of the companies owned by BlackRock
Inc.?”
The retriever then uses the schema, described text2cypher_retriever = Text2CypherRetriever(
as a string outlining the main node types and driver=driver,
relationships in your graph (for example, companies,
llm=llm,
risk factors, and asset managers), to guide the LLM
neo4j_schema= schema
in generating an appropriate Cypher query. While you
)
could pass a hard-coded schema to the retriever, it’s
best practice to access the schema as it currently
cypher_query = text2cypher_retriever.get_search_results(query)
exists in your instance. Here’s a sample of the full
schema: cypher_query.metadata[“cypher”]
24
The Developer’s Guide to GraphRAG
While the Text2Cypher functionality in the Neo4j trade-offs will help you integrate Text2Cypher
GraphRAG library offers a powerful way to translate effectively while ensuring that it is used in
natural language queries into Cypher, there are scenarios where its strengths outweigh its
important considerations to keep in mind when using it. potential drawbacks.
First, because Text2Cypher relies on an LLM to Check out the Text2Cypher Crowdsourcing
generate queries dynamically, the same input may App to explore Text2Cypher applications and
not always yield identical results. The model’s contribute to development projects.
responses can vary depending on context, training
Community Summary Pattern
data, and even minor changes in phrasing. While the
You may have heard the term GraphRAG and
flexibility of Text2Cypher allows for more natural
thought of the pattern popularized by Microsoft,
interactions, it can also introduce inconsistencies
where the text is used to summarize community
when precise, repeatable queries are required.
or other knowledge (i.e., forum posts). This
Additionally, query optimization remains an important type of retriever is often called the Community
factor. While LLMs are capable of generating Summary Pattern.
complex Cypher queries, they may not always
While a Microsoft-style GraphRAG emphasizes
produce the most efficient ones. Without human
summarization and community Q&A, Neo4j’s
intervention or performance tuning, these queries
approach focuses on domain-specific schema
might not be optimized for speed or resource
control and composable query generation. This
consumption, which could potentially slow
focus expands GraphRAG from summarization
application performance.
into structured reasoning, decision tracing, and
dynamic compliance use cases.
25
The Developer’s Guide to GraphRAG
Like other AI technologies, GraphRAG is rapidly Build on what you learned in this guide:
evolving. A few trends to watch:
• The Neo4j for GenAI use case page offers
• More advanced, dynamic Cypher queries and guides, tutorials, and best practices about
sophisticated retrieval patterns that use graph GraphRAG implementation.
algorithms and machine learning techniques • The GraphRAG site contains explanations
are pushing the boundaries of what’s possible of GraphRAG principles and step-by-
in information retrieval step guides for various implementation
and generation. scenarios.
• Deeper integration with other AI technologies, • Neo4j GraphAcademy offers free, hands-
such as knowledge graph embeddings and on online courses.
graph neural networks, promises to enhance
the semantic understanding and reasoning
capabilities of GraphRAG systems.
• Integrating GraphRAG with agentic systems
and other multi-tool, multi-step RAG chains
can result in more autonomous and intelligent
systems capable of handling complex,
multifaceted tasks with greater efficiency
and accuracy.
• Incorporating semantic layers in GraphRAG
systems can provide even more nuanced
understanding and context awareness in
information retrieval and generation tasks.
Explore GenAI
With Neo4j
Neo4j uncovers hidden relationships and patterns
across billions of data connections deeply, easily,
and quickly, making graph databases an ideal choice
for building your first GraphRAG application.
Learn More
26
The Developer’s Guide to GraphRag
Appendix
Technical Resources in Workflow Order
1. Data Modeling Designing a Graph Data Model for Helps you define entity-relationship schemas (on-
GenAI (Neo4j Blog) tology) that power GraphRAG context.
2. Data Modeling Neo4j Data Modeling Guide Foundation for understanding how to structure both
unstructured and structured data into a graph.
3. Environment Setup Neo4j Aura Free Tier Spin up a secure cloud instance instantly – perfect
for prototyping.
4. Data Ingestion (Structured) Neo4j Data Importer Tool Visual UI for mapping CSVs and relational data to
graph nodes and relationships.
5. Data Ingestion (Unstructured) Neo4j GraphRAG Python Library Convert PDFs and text to a knowledge graph using
LLM-powered entity + relationship extraction.
6. Data Ingestion (Unstructured) KGBuilder Tutorial – SEC Filings Walkthrough for turning dense financial disclosu-
Example res into structured graph nodes and edges.
7. Embeddings + Vector Indexing Neo4j Vector Indexing Docs Build and manage vector embeddings inside Neo4j
for hybrid retrieval.
8. Retrieval: Basic + Vector Neo4j GraphRAG Basic Retriever First step: combine chunked content and embed-
Pattern ding for basic semantic retrieval.
9. Retrieval: Graph-Enhanced Graph-Enhanced Vector Search with Augment vector search with traversal logic to im-
Neo4 prove contextual accuracy.
10. Test2Cypher Automation Text2Cypher Documentation & Translate user queries into Cypher automatically
Examples using LLMs – ideal for dynamic GraphRAG.
11. Agentic & Multi-Step Use GraphRAG + NeoConverse + Agents Build multi-tool agents that query graphs autono-
mously across task chains.
12. Semantic Enhancement Topic Extraction for Semantic RAG Use LLMs to extract topics and themes into your
graph to add interpretability.
13. Deployment + Ops Neo4j Deployment Best Practices Tips for scaling and monitoring GraphRAG in pro-
duction environments.
27