Skip to content

encryptedtouhid/threat-intel-graph-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Threat Intelligence Graph RAG

A fully containerized Graph RAG application for cybersecurity threat intelligence, powered by local LLMs via Ollama.

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              Docker Network                                 β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Frontend  β”‚     β”‚   Backend   β”‚     β”‚    Neo4j    β”‚     β”‚  Ollama   β”‚  β”‚
β”‚  β”‚   (Nginx)   │────▢│  (FastAPI)  │────▢│  (Graph DB) β”‚     β”‚  (LLM)    β”‚  β”‚
β”‚  β”‚   Port 8501 β”‚     β”‚  Port 8000  β”‚     β”‚  Port 7474  β”‚     β”‚ Port 11434β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                             β”‚                                       β–²       β”‚
β”‚                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚                        Data Ingestion (Job)                         β”‚    β”‚
β”‚  β”‚              Loads MITRE ATT&CK data into Neo4j on startup          β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

1. Neo4j (Graph Database)

  • Image: neo4j:5.15-community
  • Purpose: Stores threat intelligence as a knowledge graph
  • Ports:
    • 7474 - Browser UI
    • 7687 - Bolt protocol
  • Data: Persisted via Docker volume

2. Ollama (Local LLM)

  • Image: ollama/ollama:latest
  • Model: mistral:7b
  • Purpose:
    • LLM for natural language understanding and generation
    • Embedding generation via nomic-embed-text
  • Port: 11434
  • Note: Runs on CPU (no GPU available), 8GB memory limit

3. Backend API (FastAPI)

  • Image: Custom Python image
  • Purpose:
    • REST API for frontend
    • RAG pipeline orchestration
    • Cypher query generation from natural language
    • Graph traversal and context retrieval
  • Port: 8000
  • Endpoints:
    POST /query                              - Natural language query
    GET  /graph/stats                        - Graph statistics
    GET  /graph/actors                       - List threat actors
    GET  /graph/techniques                   - List techniques
    GET  /graph/actors/{name}/techniques     - Get actor's techniques
    GET  /graph/actors/{name}/attack-path    - Get actor's kill chain
    GET  /graph/techniques/{id}/mitigations  - Get technique mitigations
    GET  /graph/search?q=                    - Search across all entities
    GET  /graph/visualize                    - Get graph data for visualization
    GET  /health                             - Health check
    

4. Frontend (Nginx + Static Web App)

  • Image: Nginx Alpine
  • Purpose: Modern web UI for querying threat intelligence
  • Port: 8501 (mapped from internal port 80)
  • Tech Stack:
    • HTML5/CSS3/JavaScript
    • jQuery for AJAX requests
    • Chart.js for statistics visualization
    • vis-network for interactive graph visualization
    • marked.js for markdown rendering
  • Features:
    • Query Page: Natural language queries with example suggestions
    • Explore Page: Browse threat actors, techniques, and search
    • Graph Map: Interactive network visualization with filtering
    • Statistics: Charts showing node/relationship distribution

5. Data Ingestion (Init Job)

  • Image: Custom Python image
  • Purpose: One-time job to load MITRE ATT&CK data
  • Data Sources:
    • MITRE ATT&CK Enterprise (STIX format)
    • Relationships: Actors β†’ Techniques β†’ Tactics β†’ Mitigations

Graph Schema

Nodes

Label Properties Description
ThreatActor id, name, description, aliases, country APT groups, criminal orgs
Technique id, name, description, platforms, detection ATT&CK techniques
Tactic id, name, description, shortname ATT&CK tactics (kill chain phases)
Malware id, name, description, platforms Malware families
Tool id, name, description Legitimate tools used maliciously
Mitigation id, name, description Defensive measures

Relationships

(:ThreatActor)-[:USES]->(:Technique)
(:ThreatActor)-[:USES]->(:Malware)
(:ThreatActor)-[:USES]->(:Tool)
(:Technique)-[:BELONGS_TO]->(:Tactic)
(:Technique)-[:MITIGATED_BY]->(:Mitigation)
(:Malware)-[:EMPLOYS]->(:Technique)
(:Tool)-[:EMPLOYS]->(:Technique)

RAG Pipeline

User Query
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Query Analysis   β”‚  ← Ollama extracts intent & entities
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. Graph Retrieval  β”‚  ← Cypher query against Neo4j
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 3. Context Building β”‚  ← Combine graph results + embeddings
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 4. Response Gen     β”‚  ← Ollama generates final answer
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Query Examples

Natural Language Query Graph Retrieval
"What techniques does APT29 use?" Match path from actor to techniques
"How do I defend against phishing?" Find mitigations for T1566
"Which actors target healthcare?" Filter actors by target industry
"Show the kill chain for Lazarus" Traverse actor β†’ techniques β†’ tactics

Project Structure

graph-rag/
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ .env.example
β”œβ”€β”€ Makefile                     # Useful commands
β”œβ”€β”€ README.md
β”‚
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── app/
β”‚       β”œβ”€β”€ main.py              # FastAPI app
β”‚       β”œβ”€β”€ config.py            # Settings
β”‚       β”œβ”€β”€ routers/
β”‚       β”‚   β”œβ”€β”€ query.py         # Query endpoints
β”‚       β”‚   └── graph.py         # Graph endpoints
β”‚       β”œβ”€β”€ services/
β”‚       β”‚   β”œβ”€β”€ neo4j_service.py # Graph operations
β”‚       β”‚   β”œβ”€β”€ ollama_service.py# LLM operations
β”‚       β”‚   └── rag_pipeline.py  # RAG orchestration
β”‚       └── models/
β”‚           └── schemas.py       # Pydantic models
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ nginx.conf               # Nginx configuration
β”‚   β”œβ”€β”€ index.html               # Main HTML page
β”‚   β”œβ”€β”€ css/
β”‚   β”‚   └── style.css            # Styles
β”‚   └── js/
β”‚       └── app.js               # JavaScript application
β”‚
β”œβ”€β”€ ingestion/
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ ingest.py                # Main ingestion script
β”‚   └── parsers/
β”‚       └── mitre_attack.py      # MITRE ATT&CK parser
β”‚
└── data/
    └── .gitkeep                 # Downloaded data stored here

Configuration

Environment Variables

# Neo4j
NEO4J_URI=bolt://neo4j:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=threatintel123

# Ollama
OLLAMA_HOST=http://ollama:11434
OLLAMA_MODEL=mistral:7b
OLLAMA_EMBED_MODEL=nomic-embed-text

# Backend
LOG_LEVEL=INFO

Deployment

Prerequisites

  • Docker & Docker Compose installed on target machine
  • At least 16GB RAM (for Ollama + Neo4j)
  • ~10GB disk space

Quick Start

# Clone repository
git clone https://github.com/encryptedtouhid/graph-rag.git
cd graph-rag

# Copy environment file
cp .env.example .env

# Start all services
docker-compose up -d

# Watch logs
docker-compose logs -f

# Access services
# - Frontend: http://localhost:8501
# - Backend API: http://localhost:8000/docs
# - Neo4j Browser: http://localhost:7474

First Run

  1. Ollama init container will auto-pull mistral:7b and nomic-embed-text models
  2. Ingestion job loads MITRE ATT&CK data into Neo4j
  3. System ready when all health checks pass

Makefile Commands

make help          # Show all available commands
make build         # Build all Docker images
make up            # Start all services in background
make up-logs       # Start all services with logs
make down          # Stop all services
make logs          # View logs from all services
make logs-backend  # View backend logs only
make status        # Show status of all services
make restart       # Restart all services
make clean         # Stop and remove containers, volumes, images
make rebuild       # Clean rebuild and start
make shell-backend # Open shell in backend container
make shell-neo4j   # Open cypher-shell in Neo4j
make reset-db      # Clear database and re-run ingestion

Future Enhancements

  • Add IOC ingestion (AlienVault OTX)
  • Add CVE/NVD data
  • Implement semantic search with vector index
  • Add query caching
  • Add authentication
  • Kubernetes deployment manifests
  • GPU support for Ollama

About

Graph RAG application for cybersecurity threat intelligence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors