Threat Intelligence Graph RAG

A fully containerized Graph RAG application for cybersecurity threat intelligence, powered by local LLMs via Ollama.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Docker Network                                 │
│                                                                             │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌───────────┐  │
│  │   Frontend  │     │   Backend   │     │    Neo4j    │     │  Ollama   │  │
│  │   (Nginx)   │────▶│  (FastAPI)  │────▶│  (Graph DB) │     │  (LLM)    │  │
│  │   Port 8501 │     │  Port 8000  │     │  Port 7474  │     │ Port 11434│  │
│  └─────────────┘     └──────┬──────┘     └─────────────┘     └───────────┘  │
│                             │                                       ▲       │
│                             └───────────────────────────────────────┘       │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                        Data Ingestion (Job)                         │    │
│  │              Loads MITRE ATT&CK data into Neo4j on startup          │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘

Components

1. Neo4j (Graph Database)

Image: neo4j:5.15-community
Purpose: Stores threat intelligence as a knowledge graph
Ports:
- 7474 - Browser UI
- 7687 - Bolt protocol
Data: Persisted via Docker volume

2. Ollama (Local LLM)

Image: ollama/ollama:latest
Model: mistral:7b
Purpose:
- LLM for natural language understanding and generation
- Embedding generation via nomic-embed-text
Port: 11434
Note: Runs on CPU (no GPU available), 8GB memory limit

3. Backend API (FastAPI)

Image: Custom Python image
Purpose:
- REST API for frontend
- RAG pipeline orchestration
- Cypher query generation from natural language
- Graph traversal and context retrieval
Port: 8000

Endpoints:

POST /query                              - Natural language query
GET  /graph/stats                        - Graph statistics
GET  /graph/actors                       - List threat actors
GET  /graph/techniques                   - List techniques
GET  /graph/actors/{name}/techniques     - Get actor's techniques
GET  /graph/actors/{name}/attack-path    - Get actor's kill chain
GET  /graph/techniques/{id}/mitigations  - Get technique mitigations
GET  /graph/search?q=                    - Search across all entities
GET  /graph/visualize                    - Get graph data for visualization
GET  /health                             - Health check

4. Frontend (Nginx + Static Web App)

Image: Nginx Alpine
Purpose: Modern web UI for querying threat intelligence
Port: 8501 (mapped from internal port 80)
Tech Stack:
- HTML5/CSS3/JavaScript
- jQuery for AJAX requests
- Chart.js for statistics visualization
- vis-network for interactive graph visualization
- marked.js for markdown rendering
Features:
- Query Page: Natural language queries with example suggestions
- Explore Page: Browse threat actors, techniques, and search
- Graph Map: Interactive network visualization with filtering
- Statistics: Charts showing node/relationship distribution

5. Data Ingestion (Init Job)

Image: Custom Python image
Purpose: One-time job to load MITRE ATT&CK data
Data Sources:
- MITRE ATT&CK Enterprise (STIX format)
- Relationships: Actors → Techniques → Tactics → Mitigations

Graph Schema

Nodes

Label	Properties	Description
`ThreatActor`	id, name, description, aliases, country	APT groups, criminal orgs
`Technique`	id, name, description, platforms, detection	ATT&CK techniques
`Tactic`	id, name, description, shortname	ATT&CK tactics (kill chain phases)
`Malware`	id, name, description, platforms	Malware families
`Tool`	id, name, description	Legitimate tools used maliciously
`Mitigation`	id, name, description	Defensive measures

Relationships

(:ThreatActor)-[:USES]->(:Technique)
(:ThreatActor)-[:USES]->(:Malware)
(:ThreatActor)-[:USES]->(:Tool)
(:Technique)-[:BELONGS_TO]->(:Tactic)
(:Technique)-[:MITIGATED_BY]->(:Mitigation)
(:Malware)-[:EMPLOYS]->(:Technique)
(:Tool)-[:EMPLOYS]->(:Technique)

RAG Pipeline

User Query
    │
    ▼
┌─────────────────────┐
│ 1. Query Analysis   │  ← Ollama extracts intent & entities
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 2. Graph Retrieval  │  ← Cypher query against Neo4j
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 3. Context Building │  ← Combine graph results + embeddings
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 4. Response Gen     │  ← Ollama generates final answer
└─────────────────────┘

Query Examples

Natural Language Query	Graph Retrieval
"What techniques does APT29 use?"	Match path from actor to techniques
"How do I defend against phishing?"	Find mitigations for T1566
"Which actors target healthcare?"	Filter actors by target industry
"Show the kill chain for Lazarus"	Traverse actor → techniques → tactics

Project Structure

graph-rag/
├── docker-compose.yml
├── .env.example
├── Makefile                     # Useful commands
├── README.md
│
├── backend/
│   ├── Dockerfile
│   ├── requirements.txt
│   └── app/
│       ├── main.py              # FastAPI app
│       ├── config.py            # Settings
│       ├── routers/
│       │   ├── query.py         # Query endpoints
│       │   └── graph.py         # Graph endpoints
│       ├── services/
│       │   ├── neo4j_service.py # Graph operations
│       │   ├── ollama_service.py# LLM operations
│       │   └── rag_pipeline.py  # RAG orchestration
│       └── models/
│           └── schemas.py       # Pydantic models
│
├── frontend/
│   ├── Dockerfile
│   ├── nginx.conf               # Nginx configuration
│   ├── index.html               # Main HTML page
│   ├── css/
│   │   └── style.css            # Styles
│   └── js/
│       └── app.js               # JavaScript application
│
├── ingestion/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── ingest.py                # Main ingestion script
│   └── parsers/
│       └── mitre_attack.py      # MITRE ATT&CK parser
│
└── data/
    └── .gitkeep                 # Downloaded data stored here

Configuration

Environment Variables

# Neo4j
NEO4J_URI=bolt://neo4j:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=threatintel123

# Ollama
OLLAMA_HOST=http://ollama:11434
OLLAMA_MODEL=mistral:7b
OLLAMA_EMBED_MODEL=nomic-embed-text

# Backend
LOG_LEVEL=INFO

Deployment

Prerequisites

Docker & Docker Compose installed on target machine
At least 16GB RAM (for Ollama + Neo4j)
~10GB disk space

Quick Start

# Clone repository
git clone https://github.com/encryptedtouhid/graph-rag.git
cd graph-rag

# Copy environment file
cp .env.example .env

# Start all services
docker-compose up -d

# Watch logs
docker-compose logs -f

# Access services
# - Frontend: http://localhost:8501
# - Backend API: http://localhost:8000/docs
# - Neo4j Browser: http://localhost:7474

First Run

Ollama init container will auto-pull mistral:7b and nomic-embed-text models
Ingestion job loads MITRE ATT&CK data into Neo4j
System ready when all health checks pass

Makefile Commands

make help          # Show all available commands
make build         # Build all Docker images
make up            # Start all services in background
make up-logs       # Start all services with logs
make down          # Stop all services
make logs          # View logs from all services
make logs-backend  # View backend logs only
make status        # Show status of all services
make restart       # Restart all services
make clean         # Stop and remove containers, volumes, images
make rebuild       # Clean rebuild and start
make shell-backend # Open shell in backend container
make shell-neo4j   # Open cypher-shell in Neo4j
make reset-db      # Clear database and re-run ingestion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Threat Intelligence Graph RAG

Architecture Overview

Components

1. Neo4j (Graph Database)

2. Ollama (Local LLM)

3. Backend API (FastAPI)

4. Frontend (Nginx + Static Web App)

5. Data Ingestion (Init Job)

Graph Schema

Nodes

Relationships

RAG Pipeline

Query Examples

Project Structure

Configuration

Environment Variables

Deployment

Prerequisites

Quick Start

First Run

Makefile Commands

Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
backend		backend
data		data
frontend		frontend
ingestion		ingestion
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
screenshot.png		screenshot.png

encryptedtouhid/threat-intel-graph-rag

Folders and files

Latest commit

History

Repository files navigation

Threat Intelligence Graph RAG

Architecture Overview

Components

1. Neo4j (Graph Database)

2. Ollama (Local LLM)

3. Backend API (FastAPI)

4. Frontend (Nginx + Static Web App)

5. Data Ingestion (Init Job)

Graph Schema

Nodes

Relationships

RAG Pipeline

Query Examples

Project Structure

Configuration

Environment Variables

Deployment

Prerequisites

Quick Start

First Run

Makefile Commands

Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages