AgensGraph: a Multi-Model Graph Database
based-on PostgreSQL
Bitnine R&D Center
2017-1-14
Who am I
Ph.D Kisung Kim -Chief Technology Officer of Bitnine Global Inc.
Researched query optimization for graph-structured data during
doctorate degree
Developed a distributed relational database engine in TmaxSoft
Lead the development of a new graph database, AgensGraph in
Bitnine Global
What is Graph Database?
Images from http://www.slideshare.net/debanjanmahata/an-introduction-to-nosql-graph-databases-and-neo4j
What is Graph Database?
Relationship is the first-class citizen in the graph database
Make your data connected in the graph database
Relational Database Graph Database
Entity Row Node (Vertex)
Relationship Row Relationship (Edge)
What is the Graph Database?
Handle data in different view
Data model similar to entity-relationship model
Gartner says it represents a radical change in how data is
organized and processed
Cypher Query Language
Declarative query language for the property graph model
Inspired by SQL and SPARQL
Designed to be human-readable query language
Developed by Neo technology Inc. since 2011
Current version is 3.0
OpenCypher.org (http://opencypher.org)
Participate in developing the query language
Cypher Query Example
Make two nodes
CREATE (:person {id: 1, name: Kisung Kim, birthday: 1980-01-05});
CREATE (:company {id: 1, name: Bitnine Global});
Make a relationship between the two nodes
MATCH (p:person {id: 1}), (c:company {id:1})
CREATE (p)-[:workFor {title: CTO, since: 2014}]->(c);
workFor
Kisung Kim Bitnine Global
Cypher Query Example
Querying
MATCH (p:person {name: Kisung Kim})-[:workFor]->(c:company)
RETURN (p), (c)
workFor
Kisung Kim ?
Query with variable length relationships
MATCH (p:person {name: Kisung Kim})-[:knows*..3]->(f:person)
RETURN (f)
knows knows knows
Kisung Kim ? ? ?
No Table Definitions and No Joins
GraphDB to PostgreSQL Case
From Hipolabs
http://engineering.hipolabs.com/graphdb-to-postgresql/
Graph Database and Hybrid Database
Magic Quadrant for Operational Database Management Systems, Gartner, 2016
So, What We Want to Make is
Hybrid database engine with graph and relational model
Cypher query processing on PostgreSQL
Online transactional graph database
Disk-based persistent graph storage
( ) -[:processes]->(Cypher)
Why We Choose PostgreSQL?
Fully-featured enterprise-ready open source database
Graph processing actually uses relational algebra
Graph is serialized as tables in disk
Every graph traversal step is in principle a join
(from LDBC documentation)
It is important to optimize the joins speed up join processing
PostgreSQL has an excellent query optimizer
And. Abundant eco-system of PostgreSQL
Challenges
How to store graph data
Efficient structure for graph pattern matching
At the same time, efficient for transaction processing
How to process graph queries
Processing complex graph pattern matching: variable length path,
shortest path
Mismatches between graph data model & relational data model
Graph query optimization
Graph Storage
Graph data is stored in disk as decomposed into vertexes
and edges
When processing graph pattern matching, it is essential to
find adjacent vertexes or edges efficiently
Given a start vertex, find end vertexes
Given an end vertex, find start vertexes
v1
Two Graph Databases
Solution Company Latest Version Features
Most famous graph database, Cypher
Neo Technology 3.1
O(1) access using fixed-size array
Titan Distributed graph system based on
Datastax -
Cassandra
Graph Storage -Neo4j
Fixed-size array for nodes and relationships
Relationships for a node is organized as a doubly-linked list
Index-free adjacency
O(1) access for adjacent edges: follow the pointer
From Graph Databases 2nd ed. OReilly, 2015
Graph Storage Titan (DSE Graph)
Titan stores graphs in adjacency list format
Each edge is stored twice
Vertex and edge list are stored in backend storage like HBase
Cassandra or BerkeleyDB
From http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html
Graph Storage -AgensGraph
Fixed-size array is hard to implement in PostgreSQL
Tuples are moved when updated
Titans big row approach is also inadequate
We chose B-tree index for graph traversal
Graph
Vertex Edge
B-tree B-tree B-tree
Vertex ID (Start, End) (End, Start)
Vertex ID Properties Edge ID Start Vertex ID End Vertex ID Properties
Index Problems
Current B-tree has several disadvantages for our workload
Composite index is preferable but the size increases
There exists a lot of duplicate keys (vertex ID) on start_ID or end_ID
Property updates incur insertions into B-trees
We are developing a new index having bucket structure (like
GIN index), in-direct index and supports for index-only scan
for the graph traversals
Graph Storage -AgensGraph
Vertexes and edges are grouped into labels
Labels are organized as a label hierarchy
We use PostgreSQLs table hierarchy feature
ag_vertex
Vertex ID Properties
Person Message
Vertex ID Properties Vertex ID Properties
Comment Post
Vertex ID Properties Vertex ID Properties
Current Status
AgensGraph v0.9
(https://github.com/bitnine-oss/agens-graph or http://bitnine.net/downloads/)
Graph data model and DDL on PostgreSQL 9.6
Cypher query processing (70% of OpenCypher spec.)
Integrated query processing (Cypher + SQL)
Client library (JDBC, ODBC, Python)
Monitoring and development using Tadpole DB-hub
Tadpole for Agens Graph
Tadpole DB Hub is open-source project for managing unified
infrastructure (https://github.com/hangum/TadpoleForDBTools)
Support various databases including (PostgreSQL and Agens Graph)
Features of Tadpole for Agens Graph
Monitoring Agens Graph server
Cypher query browser and graph visualization
Tadpole for AgensGraph
Future Roadmap
Distributed graph database
Plan to exploit Postgres-XL
Specialized storage and index for graph traversals
Dictionary compression for JSONB (ZSON)
Graph query optimization using graph statistics
Integration with big data systems
HDFS Storage
Graph analysis using GraphX
Join Us
AgensGraph is an open-source project https://github.com/bitnine-oss/agens-
graph
We also wish to contribute PostgreSQL community
Graph database meetup in Silicon Valley
http://www.meetup.com/Graph-Database-in-Silicon-Valley/
Thank You
[email protected] :likes