Meeting notes capture decisions, action items, participant information, and the relationships between people and tasks. Yet most organizations treat them as static documents—searchable only through basic text search.
With a knowledge graph, you can run queries like: "Who attended meetings where the topic was 'budget planning'?" or "What tasks did Sarah get assigned across all meetings?"
This example shows how to build a meeting knowledge graph from Google Drive Markdown notes using LLM extraction and Neo4j, with automatic continuous updates.
Please drop CocoIndex on Github a star to support us and stay tuned for more updates. Thank you so much 🥥🤗.
The pipeline defines:
- Meeting nodes: one per meeting section, keyed by source note file and meeting time
- Person nodes: people who organized or attended meetings
- Task nodes: tasks decided in meetings
- Relationships:
ATTENDEDPerson → Meeting (organizer included, marked in flow when collected)DECIDEDMeeting → TaskASSIGNED_TOPerson → Task
The source is Google Drive folders shared with a service account. The flow watches for recent changes and keeps the graph up to date.
- Ingest files from Google Drive (service account + root folder IDs)
- Split each note by Markdown headings into meeting sections
- Use an LLM to extract a structured
Meetingobject: time, note, organizer, participants, and tasks (with assignees) - Collect nodes and relationships in-memory
- Export to Neo4j:
- Nodes:
Meeting(explicit export),PersonandTask(declared with primary keys) - Relationships:
ATTENDED,DECIDED,ASSIGNED_TO
- Nodes:
- Install Neo4j and start it locally
- Default local browser: http://localhost:7474
- Default credentials used in this example: username
neo4j, passwordcocoindex
- Configure your OpenAI API key
- Prepare Google Drive:
- Create a Google Cloud service account and download its JSON credential
- Share the source folders with the service account email
- Collect the root folder IDs you want to ingest
- See Setup for Google Drive for details
Set the following environment variables:
export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2Notes:
GOOGLE_DRIVE_ROOT_FOLDER_IDSaccepts a comma-separated list of folder IDs- The flow polls recent changes and refreshes periodically
Install dependencies:
pip install -e .Update the index (run the flow once to build/update the graph):
cocoindex update mainOpen Neo4j Browser at http://localhost:7474.
Sample Cypher queries:
// All relationships
MATCH p=()-->() RETURN p
// Who attended which meetings (including organizer)
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p, m
// Tasks decided in meetings
MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m, t
// Task assignments
MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p, tI used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with Zero pipeline data retention.
Start CocoInsight:
cocoindex server -ci mainThen open the UI at https://cocoindex.io/cocoinsight.
