Advanced Database – Graph-Based
Query Processing Algorithms
• Group Members:
• Natnael Michele (QQ5897)
• Samket Ateklit (OJ3892)
• Sudies Fuad (QB9297)
• Michael Belaynhe (Rw2617)
• Course: SE334
• Submitted to: Samuel Getachew
• Submission Date: May 9, 2025
• School: HiLCoE, School of Computer Science
Abstract
• Graph-based query processing is crucial in
handling data from social, biological, and
knowledge networks.
• This report covers graph data models, query
languages, and algorithms like subgraph
isomorphism and traversal.
• Challenges and advancements in distributed
query processing are also discussed.
Introduction
• Graph-structured data is growing rapidly.
• Graph databases handle interconnected data
better than relational ones.
• This report reviews query processing
algorithms and performance factors.
Graph Data Models
• RDF: Schema-less triples for semantic web
data.
• Property Graph: Nodes and edges with
properties, used in Neo4j.
• Labeled Graph: Labels on nodes/edges for
pattern matching.
Query Languages
• SPARQL: Standard for RDF, supports patterns
and filters.
• Cypher: SQL-like language for property graphs.
• GSQL, Gremlin, PGQL: Analytical and traversal
languages.
Types of Graph Queries
• Pattern Matching: Find subgraph structures.
• Reachability: Check if a path exists.
• Path Queries: Retrieve specific paths.
• Analytical Queries: Centrality, clustering, etc.
Query Processing Algorithms
• Subgraph Isomorphism: Ullmann’s, VF2,
TurboISO.
• Graph Traversal: DFS, BFS, Dijkstra’s, A*.
• Indexing: Neighborhood, path, and label
indexes.
• Cost-Based Optimization: Selectivity, strategy,
parallelism.
Distributed Graph Query
Processing
• Challenges: Partitioning, load balancing,
locality.
• Frameworks: Pregel, Giraph, GraphX, DGL,
PyG.
• Execution Models: Push vs. pull, async vs.
sync.
Conclusion
• Graph query processing is essential for
modern data applications.
• Algorithms and frameworks improve efficiency
and scalability.
• Continued research needed for optimization.
Recommendations
• Develop hybrid algorithms with multiple
techniques.
• Enhance and standardize query languages.
• Focus on adaptive distributed processing.
• Create benchmarking standards for
comparison.
Bibliography
• Angles & Gutierrez (2008) – Graph database
models.
• Robinson et al. (2015) – Graph Databases
(O'Reilly).
• Han et al. (2013) – TurboISO in SIGMOD.
• Malewicz et al. (2010) – Pregel in ACM.
• Zaharia et al. (2012) – RDDs in NSDI.
Appendix
• Appendix A: SPARQL and Cypher Queries
• Appendix B: Pseudocode for VF2 and Dijkstra
• Appendix C: Graph Partitioning Diagram
• Appendix D: Algorithm Performance Table