0% found this document useful (0 votes)
29 views12 pages

Knowledge Graph

Knowledge Graph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views12 pages

Knowledge Graph

Knowledge Graph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Knowledge Graphs Querying

Arijit Khan
Aalborg University
[email protected]
arXiv:2305.14485v1 [cs.DB] 23 May 2023

ABSTRACT not require exhaustively modeling their schema; new en-


tities and relationships can be added in human-driven,
Knowledge graphs (KGs) such as DBpedia, Freebase, semi-automated, or fully automated manner to their ex-
YAGO, Wikidata, and NELL were constructed to store isting structure without endangering the current func-
large-scale, real-world facts as ⟨subject, predicate, object⟩ tionality. This follows the semi-structured data paradigm,
triples – that can also be modeled as a graph, where a enabling more frequent and timely updates in knowl-
node (a subject or an object) represents an entity with edge graphs. However, schema-flexibility also intro-
attributes, and a directed edge (a predicate) is a relation- duces challenges in managing and querying KGs.
ship between two entities. Querying KGs is critical in
web search, question answering (QA), semantic search, 1.1 Challenges in KG Querying
personal assistants, fact checking, and recommendation.
While significant progress has been made on KG con- • Scalable and efficient querying. The problems are
struction and curation, thanks to deep learning recently three-fold. First, due to storing cross-domain informa-
we have seen a surge of research on KG querying and tion and being un-normalized, KGs are massive volume.
QA. The objectives of our survey are two-fold. First, Freebase [18] (Google KG is powered in part by Free-
research on KG querying has been conducted by several base) alone has over 22 million entities and 350 million
communities, such as databases, data mining, seman- relationships in about 100 domains. Graph-of-Things
tic web, machine learning, information retrieval, and (GoT) [109] which is a live knowledge graph system for
natural language processing (NLP), with different fo- Internet-of-Things added roughly more than 10 billion
cus and terminologies; and also in diverse topics rang- RDF triples per month. While some works partition the
ing from graph databases, query languages, join algo- data across multiple tables, e.g., property tables in Jena2
rithms, graph patterns matching, to more sophisticated [162] and Oracle [31], vertically partitioned databases
KG embedding and natural language questions (NLQs). in SW-Store [1], etc., many databases store them as one
We aim at uniting different interdisciplinary topics and giant table (e.g., RDF-3X [94]), or a big graph with la-
concepts that have been developed for KG querying. bels associated with nodes and edges [37, 92]. Second,
Second, many recent advances on KG and query em- KG queries (e.g., “find the 10 most commonly followed
bedding, multimodal KG, and KG-QA come from deep entities by people within a given user’s second-degree
learning, IR, NLP, and computer vision domains. We network in the LinkedIn economic graph”) are differ-
identify important challenges of KG querying that re- ent from classical relational queries [121, 21, 85, 20].
ceived less attention by graph databases, and by the DB They are join-heavy queries over many-to-many rela-
community in general, e.g., incomplete KG, semantic tions (e.g., ‘knows’, ‘follows’, ‘friends’ relations), in-
matching, multimodal data, and NLQs. We conclude volving recursive joins or graph traversals, and resulting
by discussing interesting opportunities for the data man- in complex query shapes, e.g., chain, tree, cycle, star,
agement community, for instance, KG as a unified data and flower. Such queries generate large intermediate
model and vector-based query processing. results [10] and query optimization is challenging with
traditional binary join plans. Third, exact subgraph pat-
1 Introduction tern matching via subgraph isomorphism is also NP-
complete [48]. For a fixed query pattern, subgraph iso-
Knowledge graph (KG) [161] is an intelligible data model morphism can be verified by enumerating all potential
to support easy integration of data from multiple hetero- candidate matches. For node-labeled graphs, if the query
geneous sources, providing a formal semantic represen- pattern has q nodes v1 , v2 , . . . , vq , and if the number of
tation for inference and machine processing. One does candidate node matches (from the data graph) for each
query node vi is |C(vi )| based on node-label matching, ing graph structure is more difficult than that with miss-
then the search space has size Πki=1 |C(vi )|. This can be ing attributes on nodes and edges. Researchers stud-
large due to massive data graphs, larger query graphs, ied uncertain graph data management [71], probabilis-
and due to less selective query node-labels. Therefore, tic knowledge bases [24], and commonsense KGs [58].
even with node-labeled graphs, e.g., KGs, enumerating Recent ML approaches embed a KG and logical queries
all potential candidate matches within the search space into the same vector space, to deal with missing edges
is expensive. in the KG [55, 51, 30, 116, 115, 182, 9].
Scalability and efficiency of graph query processing • User-friendly querying. Non-professional users find
(including RDF, KG querying) were studied by data man- it difficult to formulate an appropriate graph query, e.g.,
agement, theory, and systems communities, e.g., graph via SPARQL or subgraph pattern [17], thus more user-
and join query optimization [85, 95, 93, 123], join vs. friendly approaches were developed: (1) (declarative)
graph queries [122, 139, 141], indexing [53, 6, 65], ma- graph query languages [8], (2) keyword search [170],
terialized views [57, 44], efficient (exact) subgraph pat- (3) query-by-example [89, 60], (4) faceted search [148],
tern matching [138, 72], multi-query optimization [107, (5) visual query [16, 52], (6) natural language ques-
117], distributed processing [2, 25, 64], data partitioning tions [119, 28], (7) incorporating users’ feedback [19,
[103], I/O efficiency [181], caching [69, 105], modern 134], (8) query auto-completion and recommendation
hardware [62, 177], etc. [77], (9) answers explanation [131, 150, 59], (10) con-
• Flexible schema and semantic matching. In a KG, versational QA [176], etc. A one-time answer might
similar relationships can be stored in diverse ways, e.g., not be satisfactory. Exploration-based, interactive meth-
for the query, “find all software that have been devel- ods such as faceted search, users’ feedback, query sug-
oped by organizations founded in California” on the gestion and completion, answers explanation, conversa-
DBPedia knowledge graph [73], a recently proposed KG tional QA were designed, enabling users to refine their
querying system, AGQ [157] reports that correct an- queries and obtaining personalized results.
swers conform to one of at least six different schemas. • Multimodal data querying. Data are multimodal,
It is expected to retrieve all semantically correct (i.e., consisting of texts, images, and other multimedia data.
structurally different, yet ‘relevant’) answers for such Entities as well as features of both entities and relations
queries. If users have full knowledge about DBpedia, in a KG can have varieties of data types. However, bulk
they can construct various query patterns or write dif- of KG querying methods only focus on the structured
ferent SPARQL queries that cover all possible schemas, information in triple facts, since multimodal informa-
to obtain all software of interest. It is challenging for tion are either omitted completely, or are treated as reg-
ordinary users to have full knowledge of the vocabulary ular nodes and edges. Thus, KG queries and answers
used in a KG and the underlying schemas defined in the lose richer and potentially useful information, reduc-
KG, since the schema can be large and complex due to ing their effectiveness in downstream tasks. Recently,
heterogeneity, thus KG querying is difficult. multimodal KGs and their querying techniques are an
Additionally, the notion of ‘relevant’ or ‘correct’ an- emerging area of research [80, 49].
swers could very well depend on the user’s query intent,
or can even be vague, thus a predefined, ‘one-size-fits- 1.2 Related Work and Benefits
all’ similarity metric might not work in all scenarios. of Our Survey
Data management, semantic web, and ML communities The closest to our work are surveys on heterogeneous in-
investigated this problem in the context of schema map- formation networks [128] and querying attributed graphs
ping [112], ontology and logic based approaches [110, [158]. While having similarities, knowledge graphs are
83], query reformulation [173, 189], schema-free query complex, modeling real-world facts as ⟨subject, predi-
interfaces and search [144], approximate subgraph pat- cate, object⟩ triples. Different from those surveys, we
tern matching [68, 70, 171, 185], graph simulation, ho- discuss diverse querying methods on KGs, neural ap-
momorphism, and regular expression based pattern match- proaches, and graph databases support to process them.
ing [43, 84, 42], and KG embedding based query pro- There are surveys on RDF data management and query-
cessing [75, 156, 157, 155, 55, 51, 30]. ing [2, 6, 64, 120], as well as on knowledge graphs [161,
• Incomplete KGs. Knowledge graphs are incomplete 101, 3] and its various operations separately, such as KG
and follow the open-world assumption — information embedding [5], KG reasoning with logics and embed-
available in a KG only captures a subset of reality. To ding [179], KG-QA [111, 28], conversational KG-QA
retrieve the complete set of correct answers for a given [176], etc. Surveys on graph databases [76, 145, 15, 8],
query, one must infer missing edges and relations. In- queries [20] and optimization [85], exact subgraph pat-
completeness for RDF and property graph data models tern matching [72] exist. We mention these important
received fewer attention [33, 100]. Dealing with miss- surveys in our article.
algorithms. Online graph queries and offline graph ana-
lytics are also called graph OLTP and graph OLAP (or,
graph algorithms), respectively [145]. The focus of this
article is read-only online queries without updates in the
KG. KG querying is essential for web search [129], QA
[119], semantic search [155], personal assistants [12],
fact checking [143], and recommendation [167].
In this article, we unify various concepts under the
broad umbrella of KG query and QA with the taxon-
omy in Figure 1, which shows six key design options
Figure 1: Design space of KG query and KG-QA problems for KG query and KG-QA problems: KG data mod-
Additionally, our contributions are as follows. els, KG query and QA classification, graph databases
– We unite interdisciplinary topics about KG query- for KG, KG query and QA techniques, their processing
ing with a taxonomy on KG data models, query classifi- algorithms, and benchmarks.
cation, databases, querying techniques, algorithms, and
2.1 KG Data Models
benchmarks.
– We discuss recent neural methods for KG query Two prominent data models are (1) RDF model consist-
processing such as KG embedding-based query answer- ing of ⟨subject, predicate, object⟩ triples, and (2) prop-
ing, multi-modal KG embedding, KG-QA, and conver- erty graph model having nodes and edges with arbitrary
sational KG-QA. number of properties, where a node (a subject or an ob-
– We analyze the top-10 commercial graph databases ject) represents an entity and a directed edge (a predi-
support for KG querying, particularly focusing on query cate) is a relationship between two entities. RDF schema
languages, user-friendly and interactive interfaces, KG (RDFS, also known as an ontology), which is the World
embedding, and multi-modal KG storage. Wide Web Consortium (W3C) proposed schema lan-
– We emphasize the current challenges and highlight guage for RDF, is another RDF (equivalently a directed
some future research directions. graph) itself, describing classes, properties, and seman-
tic relationships (e.g., “is-a”, “part-of”, “synonym-to”)
1.3 Roadmap among them. Ontology languages such as OWL have
We stated challenges of KG querying and related sur- richer vocabulary and define more expressive schema.
veys in §1.1 and §1.2, respectively. Taxonomy of KG
querying with an emphasis on data models, query clas- 2.2 KG Query and Question Classification
sification, languages, technologies, and benchmarks are KG queries and questions can be classified based on sev-
introduced in §2. We highlight deep learning approaches eral aspects.
for KG query processing and QA in §3. We analyze
current graph databases support for KG query in §4 and Querying vs. QA. There are differences between KG
discuss future directions in §5. query vs. question answering (QA). A query has a struc-
ture, e.g., a graph pattern, a logic query, an SQL or a
2 Taxonomy of KG Querying SPARQL query. On the other hand, KG-QA [119] deals
with answering unstructured natural language questions
While almost all big data companies, e.g., Google, Mi- (NLQs) over KGs – it is a natural language understand-
crosoft, Facebook, Amazon, IBM, eBay have their pro- ing task, that is, semantically parsing an NLQ to trans-
prietary knowledge graphs [101], many public knowl- late it into a query language, such as in SPARQL.
edge graphs are also available, e.g., cross-domain KGs
(DBpedia [73], Wikidata [151], YAGO [135], Freebase Simple vs. complex questions. A simple question in-
[18], NELL [87]), KGs for synonyms and translations in volves a single triple and a single formal query pattern,
several languages (BabelNet [91], ConceptNet [132]), e.g., “where was Albert Einstein born?” can be an-
domain specific KGs [3] (COVID19 KG [40], Claim- swered based on the relation ‘born’: ⟨Albert Einstein,
sKG [143]), among others. born, ?place⟩. On the other hand, a complex question
Graph workloads are broadly classified into two cate- involves multiple KG relations and/or additional opera-
gories [67]: (1) online graph queries consisting of ad- tions, e.g., “what was the first movie of James Cameron
hoc graph traversal and pattern matching – exploring a that own an Oscar?”
small fraction of the entire graph and requiring fast re- Logic vs. path queries. First-order logic queries with
sponse time; (2) offline graph analytics with iterative, conjunction, disjunction, negation, and existential quan-
batch processing over the entire graph, e.g., PageRank, tification over KGs were widely studied [30]. Relational
clustering, community detection, and machine learning algebra select-project-join (SPJ) queries and subgraph
pattern matching are conjunctive queries (CQ). A regu- provide a list of keywords, and it returns subtrees/ sub-
lar path query (RPQ) finds all pairs of nodes connected graphs containing those keywords as answers, based on
by at least one path where the sequence of edge labels various ranking criteria, e.g., sum of all edge weights
on the path follows a given regular expression [163]. A in the resulting tree/ graph, sum of all path weights from
shortest-path query returns the path that has the mini- root to each keyword in the tree, maximum pairwise dis-
mum length between two given nodes [130]. A con- tance among nodes, etc. While native keyword search
junctive regular path query (CRPQ) combines CQ (e.g., algorithms directly evaluate a keyword query, there are
subgraph pattern matching) with RPQ (e.g., reachabil- also query reformulation techniques that convert the key-
ity) [20]. word query into a more structured format, e.g., SPARQL
Factoid vs. aggregate queries. The answer set to a fac- [187] or query graph [173]. Given the set of keywords,
toid query is an enumeration of noun phrases, e.g., “find the structured queries are identified by considering term
all movies by James Cameron”. An aggregate query similarity, co-occurrences, and relationships in the KG.
retrieves the statistical result of a collection of entities (2) Graph query-by-example [89, 60] enables users to
in the answer set, e.g., “what is the average length of input answer tuple(s) as a query, and it returns other sim-
movies by James Cameron?” Aggregate queries can be ilar tuples that are present in the knowledge graph. This
combined with GROUP-BY [156]. follows the well-studied query-by-example paradigm in
relational databases, HTML tables, and entity sets: A
2.3 KG Query Languages & Technologies user might already know a few answers to the user’s
A number of technologies exist for KG querying, e.g., query. The graph query-by-example systems adopt a
SPARQL, SQL extensions, Datalog, graph query lan- two-step approach. Given the input example tuple(s),
guages, keyword query, exampler query, faceted search, they first identify the query graph that captures the user’s
visual query, query templates, natural language ques- query intent. Next, they evaluate the query graph to find
tions, conversational QA, multimodal QA, and interac- other relevant answer tuples.
tive methods (e.g., feedback, explanation, suggestion,
(3) Faceted search [148] is an explorative, interactive,
autocompletion, etc.).
and progressive refinement-based search through simple
SPARQL is the W3C recommended query language
clicks, offering an overview of the result set at each iter-
for RDF. Microsoft SQL Graph supports SQL exten-
ation, thereby assisting in query formulation according
sions that enable creating and querying graph objects
to the dataset. Main techniques include faceted taxon-
[86]. Graph query languages tend to be declarative like
omy generation, facet ranking, faceted interface, visual-
SQL. Cypher [46], PGQL [102], and GSQL [36] are
ization, and navigation.
declarative graph query languages native to Neo4J, Or-
acle, and TigerGraph, respectively. Standardization ef- (4) Graph visual query interfaces [16, 52] allow a user
forts from both academia and industry led to SQL/PGQ to draw a graph query (e.g., a query graph pattern) inter-
[35], G-CORE [7], and GQL (https://www.gqlstandards.org/). actively. Graph operations such as subgraph matching
Gremlin [118], adopted by many graph vendors, is a and enumeration are employed to evaluate these queries.
graph-based programming language that supports both [52] reformulates graph queries into SPARQL queries.
imperative graph traversal and declarative pattern match- (5) Natural language interfaces [119, 189, 28] per-
ing. Datalog-based KG querying was explored in [13]. mit users to input questions in natural languages, with-
Cypher, SPARQL, SQL, and Datalog are not Turing com- out requiring them to learn the underlying schema, vo-
plete. In contrast, Gremlin and GSQL are Turing com- cabulary, or query languages. The semantic parsing of
plete and hence, more expressive (https://info.tigergraph. a natural language question involves question analysis,
com/gsql). phrase mapping and disambiguation, query construction.
In relational stores, SPARQL queries are reformulated Some systems additionally use templates to generate the
into SQL queries, then optimized and processed by the SPARQL query [4, 184]. Neural approaches are increas-
relational database management system. On the other ingly becoming popular for these tasks.
hand, graph-based RDF querying techniques convert the (6) Interactive methods include (a) graph query sug-
SPARQL query into a query graph, and perform graph gestion, expansion, refinement, and autocompletion aim-
operations (e.g., exact or approximate subgraph pattern ing to retrieve more detailed and relevant answers [77];
matching, graph traversal) to evaluate the query [164]. (b) a user’s time-bounded search to provide ‘early’ an-
Recently, [120] investigated which RDF data represen- swers within the user’s response time bound and incre-
tations are suitable for what workloads. mentally improving the quality of answers with time
More user-friendly means of KG querying involve the [155]; (c) incorporating a user’s feedback for person-
following techniques. alized graph querying [19, 134]; (d) answer explanation
(1) Keyword search over graphs [170] allows users to to support ‘why’, ‘why-not’, ‘why empty’, and ‘why so
many’ questions on query results [131, 150, 59]. 3 KG Query Processing & QA:
Recent Neural Methods
(7) Conversational QA agents [32, 176] engage users
in multi-turn QA to satisfy their information needs. Once We highlight deep learning advances for KG embedding-
a question is answered, the user can ask another question based query processing, multi-modal KG embedding,
related to the previous QA pair. Such follow-up ques- KG-QA, and conversational QA over KGs.
tions are usually incomplete, with the context not being
clearly specified. A conversational agent might also ask 3.1 Embedding-based
follow-up questions to understand the user’s query in- KG Query Processing
tent. Examples include task-oriented systems (schedul- KG embedding represents each predicate and entity of
ing an event), chat-oriented systems (conducting natural a KG as a low-dimensional vector, such that the origi-
conversations), QA dialog systems (providing answers nal structures and relations in the KG are approximately
about a topic), virtual assistants (e.g., Microsoft Cor- preserved in these learned vectors [5]. KG embeddings
tana is powered by Microsoft Satori KG), and knowl- can be broadly classified into four categories. (1) Ge-
edge grounded neural conversation [186, 147, 97]. ometric or translational distance models compute the
(8) Multimodal QA [66] consists of multiple user in- plausibility of triples based on a geometric operation
put and output modes (such as text, image, video, voice, such as a distance function in the embedding space, e.g.,
touch, gestures, gaze, and movements) over multimodal TransE [23], TransH [160], TransD [61], RotatE [140],
data (including multimodal KGs), having applications in etc. (2) Semantic matching or tensor decomposition mod-
visual QA, virtual assistants, autonomous vehicles, etc. els compute similarity of latent features by an inner prod-
uct formulation, e.g., RESCAL [98], DistMult [168],
A number of keyword search, visual graph query, and
Tucker [11]. (3) Neural network-based models gener-
natural language query-based interfaces (till 2015) for
ally use convolutional neural networks (CNNs) to pre-
RDF and KG querying were compared in [133, 41] based
dict the plausibility score of a triple, e.g., ConvE [34],
on their effectiveness and usability.
ConvKB [96]; or employ graph neural networks (GNNs)
• Application scenarios of KG querying technologies. which can capture multi-hop relations in the neighbor-
Writing queries in SPARQL or in other graph query lan- hood of a node, e.g., RGCN [126], CompGCN [149],
guages requires familiarity with that language, as well as KBAT [90], etc. (4) Rule-based models consider logic
knowledge of the vocabulary and predicates used in the rules during embedding learning, e.g., ComplEx-NNE-
KG. Such querying modes are generally suitable for ex- AER [38] and IterE [180].
pert programmers and data scientists. Non-expert users For a simple question, if the embeddings of head en-
and domain scientists (e.g., biologists, chemists, data tity (i.e., head vector h) and predicate (i.e., predicate
journalists, etc. who also use KGs) might prefer more vector r) are identified based on the KG embedding, link
user-friendly means of asking queries, such as using key- prediction can be employed to infer the tail entity, e.g.,
words, graph query-by-example, faceted search, visual tail vector t ≈ h + r via TransE. EAQ [75] applies KG
interfaces, and natural language questions. Interactive embeddings and uses spatial indexes to efficiently an-
methods including faceted search, users’ feedback, query swer top-k and aggregate queries.
suggestion and completion, answers explanation, con- We categorize recent deep learning techniques for KG
versational QA are helpful in refining users’ queries and query processing into two classes – both categories eval-
obtaining personalized results. Conversational and mul- uate input graph query patterns and can deal with incom-
timodal QA are critical in virtual assistants. plete KGs and schema mismatch between the query and
a KG.
2.4 Benchmarks for KG Query & QA (1) Query answering methods trained on single-hop
queries, e.g., [75, 155, 156, 9, 47], though trained on
Several benchmarks for KG querying and QA exist, such single-hop queries, can process multi-hop and complex
as for simple questions (WebQuestions [14], Simple- input queries by first decomposing complex queries into
Questions [22]), complex questions (ComplexQuestions smaller subqueries and then combining the answers of
[142], LC-QuAD [146]), multi-hop questions (HotpotQA subqueries in a systematic way. For instance, [155, 156]
[172]), conversational QA (ConvQuestions [32]), SPARQL process queries having complex shapes (chain, cycle,
query logs [21], benchmarks for RDF and SPARQL queries star, and flower), aggregate functions (COUNT, SUM,
(SP2Bench [127], LUBM [50]), among others. QALD AVG), FILTER and GROUP-BY operators over KG em-
is not one benchmark but a series of evaluation cam- bedding. Since KG embedding techniques deal with
paigns for QA systems over KGs, the recent one being ⟨subject, predicate, object⟩ triples and are similar to train-
QALD10 (https://www.nliwod.org/challenge). ing with single-hop queries, these query answering meth-
ods can directly work with KG embedding, without sep- representation obtained via image encoder.
arate single-hop training queries and their answers. KG+text+image. To embed KGs having texts and im-
(2) Query answering methods trained on multi-hop ages, several models were proposed, e.g., Knowledge-
queries [136, 115, 182, 78, 169, 82, 56, 30, 188] em- CLIP [104], CMGNN [45], MKBE [108], MKGAT [137],
bed multi-hop queries and their answers (i.e., entities TransAE [159]. They employ various neural encoders
from a KG) close to each other in the same embedding for multi-modal data and combine them with existing
space. These methods deal with logical queries, often relational models.
implement logical operators in neural ways, and signifi-
cantly reduce query processing time via inference. Un- 3.3 Neural Methods for KG-QA
like generating large intermediate results due to decom- Answering natural language questions (NLQ) over knowl-
posing complex queries into smaller subqueries, these edge graphs involve several subtasks including entity
approaches reduce query answering to dense similarity linking, relationships identification, identifying logical
matching of query and entity vectors. They can further and numerical operators, query forms, intent, and fi-
be classified as geometry, distribution, or fuzzy logic- nally the formal query construction [111]. Rule-based
based methods according to generated embeddings. The methods use ontologies and KG for phrase mapping and
former embeds entities and queries with geometric shapes. disambiguation to link entities and relations to the KG,
Examples include Query2box [115], NewLook [78], and and then employ grammars to generate formal queries.
ConE [182]. Distribution-based approaches encode en- Recently, neural network-based semantic parsing algo-
tities and queries into probabilistic density, e.g., BetaE rithms have become popular for KG-QA, which are cat-
[116], GammaE [169], NMP-QEM [82]. Fuzzy logic- egorized as classification, ranking, and translation-based
based methods (e.g., FuzzQE [30], ENeSy [188]) de- [28]. Classification-based parsing algorithms rely on
fine logical operators in a learning-free manner follow- classification models to predict the relation and entities
ing fuzzy logic, whereas only entity and relation em- in a simple NLQ. For more complex questions, ranking-
beddings require learning. Geometry and distribution- based methods employ a search procedure to find the
based approaches are trained with complex queries and top few probable query candidates, followed by using
their answers, which can be generated by crowdsourcing a neural network-based ranking model to find the best
[14], or by automatic generation from a KG as in [114]. candidate. Translation based KG-QA methods employ a
Fuzzy logic-based methods can be trained on single- sequence-to-sequence model, consisting of decoder and
hop or complex queries. Different from the above ap- encoder to translate a natural question into a formal query.
proaches, kgTransformer [81] uses a Transformer-based Based on the types of training data, their training meth-
GNN architecture, models logical queries as masked pre- ods can be fully supervised (consisting of NLQs and
diction, and proposes a masked pre-training strategy. their formal queries during training) or weakly super-
3.2 Multi-modal KG Embedding vised (provided with NLQs and their execution results,
but without their formal queries during training).
Multi-modal data (e.g., text, image, multi-media) is as-
sociated as attributes of entities and relations, or treated More recently, [55, 26, 113, 125, 79, 124] propose
as new entities in a KG. Multi-modal KG embeddings methods to answer NLQs over KGs in an end-to-end
are critical for querying multi-modal KGs, and can be manner. They can deal with incomplete KGs, semantic
classified as follows. meaning of NLQs, and ambiguity of entity names and
relations. KEQA [55] jointly learns head entity, predi-
KG+text. Notable methods are Extended RESCAL [99], cate, and tail entity representations of a simple NLQ in a
DKRL [165], and KDCoE [29] that embed KGs having given KG embedding space. Attention-based BiLSTM
textual descriptions of entities. These methods vary in models are used for the head entity and predicate repre-
how entity embedding from text is obtained (e.g., via sentation learning. EmbedKGQA [125] learns represen-
CNN, LSTM, bag-of-words, etc.) and then how it is tation of a multi-hop NLQ in the KG embedding space
combined with structure-based representation. Recently, first by using RoBERTa (robustly optimized BERT pre-
efforts were made to combine pre-trained language mod- training), followed by fully connected linear layers with
els with KG+text embedding, e.g., (1) when KGs hav- ReLU activation, and finally projecting onto the KG em-
ing textual description of entities: SimKGC [152], KE- bedding space. DCRN [26] identifies informative ev-
PLER [154], KnowlyBERT [63], K-BERT [80]; (2) when idence from candidate entities in a multi-hop question
KGs and text data are stored separately: DRAGON [174], by using their semantic information, then finds answers
JAKET [175], OREO-LM [54], DRLK [178]. by performing RNN encoder-decoder-based sequential
KG+image. IKRL [166], RSME [153], and MuKEA reasoning following the graph structure on the retrieved
[39] learn KG embedding by jointly training a structure- evidence. LEGO [113] alternates between growing the
based representation (e.g., TransE) with an image-based query tree and the reasoning action in the KG embed-
ding space. BiNet [79] uses an encoder-decoder-based (1) Neo4j [92] provides a native graph database with
model that transforms multi-hop NLQs into relation paths, property graph data model and Cypher query language.
and jointly addresses knowledge graph completion and It also supports the Apache TinkerPop (http://tinkerpop.
KGQA tasks. KGT5 [124] employs an encoder-decoder apache.org/) acting as a connectivity layer to use Grem-
Transformer model, with pretraining the model on the lin. Neo4J supports several graph analytic tools (e.g.,
KG using the link prediction task, and then the model is Popoto.js, Neo4j Bloom) that assist in interactive, visual
fine-tuned for complex question answering. query building and suggestion. Neo4J’s graph data sci-
ence library implements three graph embedding meth-
3.4 Conversational QA on KG
ods (FastRP, GraphSAGE, and Node2Vec), node classi-
Conversational QAs are extensions to one-shot NLQs, fication and regression, link prediction.
involving a sequence of questions and answers that ap-
pear as a dialogue between the system and the user [119, (2) Microsoft Cosmos DB for Gremlin (graph) (https://
learn.microsoft.com/en-us/azure/cosmos-db/gremlin/introduction)
111]. Conversational QA systems involve dialog man-
ager and response generator to keep track of the dialog is a hybrid graph DB service on top of Microsoft’s NoSQL
history and for generating natural language responses, Azure Cosmos DB. It follows the Apache TinkerPop
respectively. Sequence-to-sequence and pre-trained lan- specification using Gremlin as the query language. The
guage models are used for these tasks. graph data can be visualized and explored via third-party
Knowledge grounded neural conversation models gen- tools, e.g., Graphlytic, Graphistry, Linkurious.
erate more informative responses. To understand the (3) Virtuoso (https://virtuoso.openlinksw.com/) is a hybrid
context of follow-up questions, commonsense KG-based database which stores KGs as RDF triples and provides
context expansion is useful [186]. DyKgChat [147] zero- a SPARQL endpoint. Besides Virtuoso faceted brows-
shot adapts to dynamically updated knowledge graphs ing, third-party tools (e.g., LodLive [27]) exist to visu-
during conversation. HiTKG [97] proposes a hierar- alize and explore RDF data from SPARQL endpoints.
chical Transformer-based graph walker model, which
(4) ArangoDB (https://www.arangodb.com/docs/stable/), which is
learns both short-term and long-term conversation goals.
a document-based hybrid graph DB, provides a declara-
• Interaction between neural and classic approaches.
tive query language AQL (ArangoDB Query Language).
We identify scenarios where neural and classic KG query
It supports Apache TinkerPop Gremlin. ArangoDB has
processing and KG-QA can be complementary to each
an in-built graph viewer, additionally it supports third-
other. First, neural semantic parsing translates NLQs
party tools (e.g., Cytoscape) for visualization and analy-
into structured queries, e.g., SPARQL queries or sub-
sis. ArangoDB’s graph ML tools provide several graph
graph patterns, and classic approaches can be applied to
embedding methods (e.g., GraphSage, Metapath2Vec,
evaluate them. Classic approaches identify intermediate
GAT, DMGI) over both homogeneous and heterogeneous
results that help interpreting each step in query process-
networks (https://github.com/arangoml/fastgraphml).
ing. Second, neural approaches can also assist in in-
teractive, exploration-based query processing by auto- (5) OrientDB (https://orientdb.com), a document-based na-
mated query suggestion and completion, incorporating tive graph DB, offers SQL extension for graph queries,
user’s feedback, and providing personalized results. and supports Gremlin. OrientDB studio visualizes graphs
and schema.
4 Graph Databases Support for KG Query (6) JanusGraph (http://janusgraph.org) uses a number of
We analyze the top-10 commercial graph DBMS ac- wide-column stores as backends, e.g., Apache Cassan-
cording to https://db-engines.com/en/ranking/graph+dbms (accessed dra, HBase, Google Cloud Bigtable, Oracle BerkeleyDB,
on December 30, 2022), which ranks commercial database man-
ScyllaDB, etc. It supports Apache TinkerPop Grem-
agement systems based on their popularity. In the past, lin. To visualize graphs stored in JanusGraph, one can
graph databases were benchmarked in regards to their use third-party tools, e.g., Cytoscape, Gephi plugin for
performance, database systems offerings, data organi- Apache TinkerPop, Graphexp KeyLines by Cambridge
zation techniques, queries, etc. [76, 145, 15, 8]. Dif- Intelligence, Linkurious, etc.
ferent from them and following our taxonomic discus- (7) Amazon Neptune (https://aws.amazon.com/neptune) is part
sion, we categorize which graph DBMS supports what of Amazon Web Services (AWS), supporting both RDF
data models, query languages, user-friendly and inter- and property graph models, as well as Gremlin, open-
active interfaces. Given the popularity of deep learning Cypher, and SPARQL query languages. The query re-
and KG embedding that are critical for incomplete or sults can be interactively visualized using Neptune Work-
multimodal KG querying, we also investigate if these bench. Neptune uses GNN methods and the Deep Graph
graph databases support graph embedding and multi- Library (DGL) to support a number of graph ML tasks,
modal KG-QA. Our findings are summarized in Table 1. including node and edge classification, regression, link
Table 1: Categorization of top-10 commercial graph DBMS based on KG data models, query languages, user-friendly and interactive
interfaces, support for graph embedding and multimodal KG-QA. PG: property graph, RDF: RDF triples.
graph DBMS KG data models query languages user-friendly & interactive interfaces graph embedding multimodal KG-QA
Neo4J PG Cypher, Gremlin Popoto.js: create interactive visual query; ✓ ✗
Neo4j Bloom: write patterns similar to NLQs
Microsoft PG Gremlin 3rd-party data visualization tools ✗ ✗
Cosmos DB (e.g., Graphlytic, Graphistry, Linkurious)
Virtuoso RDF SPARQL faceted browsing, 3rd-party tools (e.g., LodLive) ✗ ✗
ArangoDB PG AQL, Gremlin graph viewer, 3rd-party tools (e.g., Cytoscape) ✓ ✗
OrientDB PG SQL-like, Gremlin OrientDB studio: visualize graphs and schema ✗ ✗
JanusGraph PG Gremlin 3rd-party tools (e.g., Cytoscape) ✗ ✗
Amazon PG, Gremlin, Neptune Workbench ✓ ✗
Neptune RDF SPARQL
GraphDB RDF SPARQL faceted search, 3rd-party tools (e.g., metaphactory) ✗ ✗
TigerGraph PG GSQL TigerGraph GraphStudio ✓ ✗
FaunaDB PG GraphQL ✗ ✗ ✗

prediction, graph embedding (R-GCN), and KG embed- data. We conclude with a discussion about future work
ding (TransE, DistMult, RotatE). on KG querying.
(8) GraphDB (https://www.ontotext.com/products/graphdb) is • Vector data management and querying. With the
an RDF database using SPARQL query language. It prevalence of KG embedding based query processing,
supports faceted search and third-party tools, such as managing and querying of vector data is critical. Data
metaphactory, for interactive visualization. management community can contribute in this domain
with high-dimensional data indexing, join, querying, and
(9) TigerGraph [37] is a native graph database with
geometric data processing.
property graph data model and GSQL language. Tiger-
Graph GraphStudio provides a graphical interface for • Scalable embedding learning. Scaling knowledge
interactive visualization and exploration. TigerGraph’s graphs embedding is challenging [88, 74, 183]. The
ML Workbench is a Jupyter-based Python development problem gets exacerbated when combined with more com-
framework that is inter-operable with popular deep learn- plex data, such as KG+query embedding and multi-modal
ing frameworks such as PyTorch Geometric, DGL, and KG embedding. Advanced techniques are required for
supports graph embedding (Node2Vec, Fast Random Pro- scalable embedding learning of multi-modal KGs, e.g.,
jection, and Weisfeiler-Lehman). with language models, and conversational KG-QA with
sequence-to-sequence models.
(10) FaunaDB (https://fauna.com) is a document-relational
database with property graph model and GraphQL API. • Graph databases support for KG embedding. Cur-
rent graph DBMS support for ML-based KG querying
Summary. The top-10 commercial graph databases sup- is limited. In future, they can incorporate more KG
port various languages for querying of KGs – as RDF embedding models, vector data management and query
triples or property graphs. Besides, many of them also processing techniques, as well as enable multi-modal
provide interactive interfaces for visualization, query- KG storage and query, more interactive means of KG
ing, and exploration of KGs. Their support for in-built querying such as NLQs and dialogues.
ML-based KG querying is limited. Only Amazon Nep-
• Usability of KG querying methods. Besides SPARQL,
tune provides a few popular KG embedding methods
a number of KG querying approaches exist, e.g., query
such as TransE, DistMult, and RotatE. While many of
languages, keyword search, query-by-example, faceted
these graph databases (e.g., AllegroGraph, ArangoDB,
search, visual query, natural language questions, and
OrientDB) are multi-model, supporting multiple data mod-
conversational QA. It would be interesting to holisti-
els against a single backend, none of them has in-house
cally compare them, understand their user-friendliness,
system for storage and querying of multi-modal data,
and categorize what is applicable in which domains.
such as KGs with text, images, and multimedia.
• Suitability of KG embedding models. A number of
5 Future Directions KG embedding models exist, such as translation-based
models (TransE, TransD, TransH) and semantic match-
Knowledge graphs can support a holistic integration so- ing models (RESCAL, DistMult, ConvE). Different mod-
lution for multi-modal data arriving from heterogeneous els preserve various types of relation properties, e.g.,
sources. For instance, nodes and edges in a KG can have symmetry, antisymmetry, inversion, composition, com-
arbitrary number of properties of different types, e.g., plex mapping properties, etc. [140]. One can analyze
tabular, key-value pairs, text, images, and multimedia. which properties are important for what queries, lead-
Therefore, KGs can be a unified data model for complex ing to a realization of which KG embedding models are
data lake problems, to model cross-domain and diverse suitable for different KGs and queries.
• Explainability, interoperability, and multi-lingual [18] K. D. Bollacker, C. Evans, P. K. Paritosh, T. Sturge, and
KG querying. There is an increasing focus on inter- J. Taylor. Freebase: a collaboratively created graph database
for structuring human knowledge. In SIGMOD, 2008.
pretability of deep learning methods over graph-structured [19] A. Bonifati, R. Ciucanu, and A. Lemay. Learning path queries
data. In this context, explainability in knowledge graph on graph databases. In EDBT, 2015.
embeddings is also important, for instance, what is be- [20] A. Bonifati, G. H. L. Fletcher, H. Voigt, and N. Yakovets.
Querying graphs. Synthesis Lectures on Data Management.
ing learned in knowledge graph embedding and KG-QA Morgan & Claypool Publishers, 2018.
with explanatory evidences. Interoperability between [21] A. Bonifati, W. Martens, and T. Timm. An analytical study of
KGs and supporting multi-lingual KGs [106] and queries large SPARQL query logs. VLDB J., 29(2-3):655–679, 2020.
are other interesting future directions. [22] A. Bordes, N. Usunier, S. Chopra, and J. Weston. Large-scale
simple question answering with memory networks. CoRR,
abs/1506.02075, 2015.
6 Acknowledgement [23] A. Bordes, N. Usunier, A. García-Durán, J. Weston, and
O. Yakhnenko. Translating embeddings for modeling
Arijit Khan acknowledges support from the Novo Nordisk multi-relational data. In NeurIPS, 2013.
Foundation grant NNF22OC0072415. [24] S. Borgwardt, İ. İ. Ceylan, and T. Lukasiewicz. Recent
advances in querying probabilistic knowledge bases. In IJCAI,
7 References 2018.
[25] S. Bouhenni, S. Yahiaoui, N. Nouali-Taboudjemat, and
H. Kheddouci. A survey on distributed graph pattern matching
[1] D. J. Abadi, A. Marcus, S. Madden, and K. Hollenbach. in massive graphs. ACM Comput. Surv., 54(2):36:1–36:35,
Sw-store: a vertically partitioned DBMS for semantic web 2022.
data management. VLDB J., 18(2):385–406, 2009. [26] J. Cai, Z. Zhang, F. Wu, and J. Wang. Deep cognitive
[2] I. Abdelaziz, R. Harbi, Z. Khayyat, and P. Kalnis. A survey reasoning network for multi-hop question answering over
and experimental comparison of distributed SPARQL engines knowledge graphs. In ACL/IJCNLP, 2021.
for very large RDF data. PVLDB, 10(13):2049–2060, 2017. [27] D. V. Camarda, S. Mazzini, and A. Antonuccio. Lodlive,
[3] B. Abu-Salih. Domain-specific knowledge graphs: a survey. J. exploring the web of data. In Semantic Systems, 2012.
Netw. Comput. Appl., 185:103076, 2021. [28] N. Chakraborty, D. Lukovnikov, G. Maheshwari, P. Trivedi,
[4] A. Abujabal, R. S. Roy, M. Yahya, and G. Weikum. J. Lehmann, and A. Fischer. Introduction to neural
Never-ending learning for open-domain question answering network-based question answering over knowledge graphs.
over knowledge bases. In WWW, 2018. WIREs Data Mining Knowl. Discov., 11(3), 2021.
[5] M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, M. Galkin, [29] M. Chen, Y. Tian, K. Chang, S. Skiena, and C. Zaniolo.
S. Sharifzadeh, A. Fischer, V. Tresp, and J. Lehmann. Bringing Co-training embeddings of knowledge graphs and entity
light into the dark: a large-scale evaluation of knowledge descriptions for cross-lingual entity alignment. In IJCAI, 2018.
graph embedding models under a unified framework. IEEE [30] X. Chen, Z. Hu, and Y. Sun. Fuzzy logic based logical query
Trans. Pattern Anal. Mach. Intell., 44(12):8825–8845, 2022. answering on knowledge graphs. In AAAI, 2022.
[6] W. Ali, M. Saleem, B. Yao, A. Hogan, and A. N. Ngomo. A [31] E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An efficient
survey of RDF stores & SPARQL engines for querying sql-based RDF querying scheme. In VLDB, 2005.
knowledge graphs. VLDB J., 31(3):1–26, 2022. [32] P. Christmann, R. S. Roy, A. Abujabal, J. Singh, and
[7] R. Angles, M. Arenas, P. Barceló, P. A. Boncz, G. H. L. G. Weikum. Look before you hop: conversational question
Fletcher, C. Gutierrez, T. Lindaaker, M. Paradies, S. Plantikow, answering over knowledge graphs using judicious context
J. F. Sequeda, O. van Rest, and H. Voigt. G-CORE: a core for expansion. In CIKM, 2019.
future graph query languages. In SIGMOD, 2018. [33] F. Darari, R. E. Prasojo, and W. Nutt. CORNER: a
[8] R. Angles, M. Arenas, P. Barceló, A. Hogan, J. L. Reutter, and completeness reasoner for SPARQL queries over RDF data
D. Vrgoc. Foundations of modern query languages for graph sources. In The Semantic Web: ESWC Satellite Events, 2014.
databases. ACM Comput. Surv., 50(5):68:1–68:40, 2017. [34] T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel.
[9] E. Arakelyan, D. Daza, P. Minervini, and M. Cochez. Complex Convolutional 2d knowledge graph embeddings. In AAAI,
query answering with neural link predictors. In ICLR, 2021. 2018.
[10] A. Atserias, M. Grohe, and D. Marx. Size bounds and query [35] A. Deutsch, N. Francis, A. Green, K. Hare, B. Li, L. Libkin,
plans for relational joins. CoRR, abs/1711.03860, 2017. T. Lindaaker, V. Marsault, W. Martens, J. Michels, F. Murlak,
[11] I. Balazevic, C. Allen, and T. M. Hospedales. Tucker: tensor S. Plantikow, P. Selmer, O. van Rest, H. Voigt, D. Vrgoc,
factorization for knowledge graph completion. In M. Wu, and F. Zemke. Graph pattern matching in GQL and
EMNLP-IJCNLP, 2019. SQL/PGQ. In SIGMOD, 2022.
[12] K. Balog and T. Kenter. Personal knowledge graphs: a [36] A. Deutsch, Y. Xu, and M. Wu. Seamless syntactic and
research agenda. In SIGIR, 2019. semantic integration of query primitives over relational and
[13] L. Bellomarini, E. Sallinger, and G. Gottlob. The vadalog graph data in gsql, 2018.
system: datalog-based reasoning for knowledge graphs. [37] A. Deutsch, Y. Xu, M. Wu, and V. E. Lee. Tigergraph: a native
PVLDB, 11(9):975–987, 2018. MPP graph database. CoRR, abs/1901.08248, 2019.
[14] J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic parsing [38] B. Ding, Q. Wang, B. Wang, and L. Guo. Improving
on freebase from question-answer pairs. In EMNLP, 2013. knowledge graph embedding using simple constraints. In ACL,
[15] M. Besta, E. Peter, R. Gerstenberger, M. Fischer, 2018.
M. Podstawski, C. Barthels, G. Alonso, and T. Hoefler. [39] Y. Ding, J. Yu, B. Liu, Y. Hu, M. Cui, and Q. Wu. Mukea:
Demystifying graph databases: analysis and taxonomy of data multimodal knowledge extraction and accumulation for
organization, system designs, and graph queries. CoRR, 2019. knowledge-based visual question answering. In CVPR, 2022.
[16] S. S. Bhowmick and B. Choi. Data-driven visual query [40] H. Du, Z. Le, H. Wang, Y. Chen, and J. Yu. COKG-QA:
interfaces for graphs: past, present, and (near) future. In multi-hop question answering over COVID-19 knowledge
SIGMOD, 2022. graphs. Data Intell., 4(3):471–492, 2022.
[17] S. S. Bhowmick, B. Choi, and C. Li. Graph querying meets [41] K. Elbedweihy, S. N. Wrigley, and F. Ciravegna. Evaluating
HCI: state of the art and future directions. In SIGMOD, 2017. semantic search query approaches with expert and casual
users. In ISWC, 2012.
[42] W. Fan, J. Li, S. Ma, N. Tang, and Y. Wu. Adding regular [68] A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao.
expressions to graph reachability and pattern queries. In ICDE, Neighborhood based fast graph search in large networks. In
2011. SIGMOD, 2011.
[43] W. Fan, J. Li, S. Ma, H. Wang, and Y. Wu. Graph [69] A. Khan, G. Segovia, and D. Kossmann. On smart query
homomorphism revisited for graph matching. PVLDB, routing: for distributed graph querying with decoupled
3(1):1161–1172, 2010. storage. In USENIX ATC, 2018.
[44] W. Fan, X. Wang, and Y. Wu. Answering graph pattern queries [70] A. Khan, Y. Wu, C. C. Aggarwal, and X. Yan. Nema: fast
using views. In ICDE, 2014. graph search with label similarity. PVLDB, 6(3):181–192,
[45] Q. Fang, X. Zhang, J. Hu, X. Wu, , and C. Xu. Contrastive 2013.
multi-modal knowledge graph representation learning. IEEE [71] A. Khan, Y. Ye, and L. Chen. On uncertain graphs. Synthesis
Trans. Knowl. Data Eng., 2022. Lectures on Data Management. Morgan & Claypool
[46] N. Francis, A. Green, P. Guagliardo, L. Libkin, T. Lindaaker, Publishers, 2018.
V. Marsault, S. Plantikow, M. Rydberg, P. Selmer, and [72] J. Lee, W. Han, R. Kasperovics, and J. Lee. An in-depth
A. Taylor. Cypher: an evolving query language for property comparison of subgraph isomorphism algorithms in graph
graphs. In SIGMOD, 2018. databases. Proc. VLDB Endow., 6(2):133–144, 2012.
[47] M. Galkin, Z. Zhu, H. Ren, and J. Tang. Inductive logical [73] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas,
query answering in knowledge graphs. In NeurIPS, 2022. P. N. Mendes, S. Hellmann, M. Morsey, P. v. Kleef, S. Auer,
[48] M. R. Garey and D. S. Johnson. Computers and intractability: and C. Bizer. Dbpedia - a large-scale, multilingual knowledge
a guide to the theory of NP-completeness. W. H. Freeman, base extracted from wikipedia. Semantic Web, 6(2):167–195,
1979. 2015.
[49] G. A. Gesese, R. Biswas, M. Alam, and H. Sack. A survey on [74] A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose,
knowledge graph embeddings with literals: Which model links and A. Peysakhovich. Pytorch-biggraph: a large scale graph
better literal-ly? Semantic Web, 12(4):617–647, 2021. embedding system. In MLSys, 2019.
[50] Y. Guo, Z. Pan, and J. Heflin. Lubm: a benchmark for owl [75] Y. Li, T. Ge, and C. X. Chen. Online indices for predictive
knowledge base systems. J. Web Semant., 3(2-3):158–182, top-k entity and aggregate queries on knowledge graphs. In
2005. ICDE, 2020.
[51] W. L. Hamilton, P. Bajaj, M. Zitnik, D. Jurafsky, and [76] M. Lissandrini, M. Brugnara, and Y. Velegrakis. Beyond
J. Leskovec. Embedding logical queries on knowledge graphs. macrobenchmarks: microbenchmark-based graph database
In NeurIPS, 2018. evaluation. PVLDB, 12(4):390–403, 2018.
[52] L. Han, T. Finin, and A. Joshi. Gorelations: an intuitive query [77] M. Lissandrini, D. Mottin, T. Palpanas, and Y. Velegrakis.
system for dbpedia. In JIST, 2011. Graph-query suggestions for knowledge graph exploration. In
[53] W.-S. Han, J. Lee, M.-D. Pham, and J. X. Yu. iGraph: a WWW, 2020.
framework for comparisons of disk-based graph indexing [78] L. Liu, B. Du, H. Ji, C. Zhai, and H. Tong. Neural-answering
techniques. PVLDB, 3(1–2):449–459, 2010. logical queries on knowledge graphs. In KDD, 2021.
[54] Z. Hu, Y. Xu, W. Yu, S. Wang, Z. Yang, C. Zhu, K. Chang, and [79] L. Liu, B. Du, J. Xu, Y. Xia, and H. Tong. Joint knowledge
Y. Sun. Empowering language models with knowledge graph graph completion and question answering. In KDD, 2022.
reasoning for open-domain question answering. In EMNLP, [80] W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, and
2022. P. Wang. K-BERT: enabling language representation with
[55] X. Huang, J. Zhang, D. Li, and P. Li. Knowledge graph knowledge graph. In AAAI, 2020.
embedding based question answering. In WSDM, 2019. [81] X. Liu, S. Zhao, K. Su, Y. Cen, J. Qiu, M. Zhang, W. Wu,
[56] Z. Huang, M. Chiang, and W. Lee. Line: logical query Y. Dong, and J. Tang. Mask and reason: pre-training
reasoning over hierarchical knowledge graphs. In KDD, 2022. knowledge graph transformers for complex logical queries. In
[57] D. Ibragimov, K. Hose, T. B. Pedersen, and E. Zimányi. KDD, 2022.
Optimizing aggregate SPARQL queries using materialized [82] X. Long, L. Zhuang, L. Aodi, S. Wang, and H. Li.
RDF views. In ISWC, 2016. Neural-based mixture probabilistic query embedding for
[58] F. Ilievski, P. A. Szekely, and B. Zhang. CSKG: the answering FOL queries on knowledge graphs. In EMNLP,
commonsense knowledge graph. In ESWC, 2021. 2022.
[59] M. S. Islam, C. Liu, and J. Li. Efficient answering of why-not [83] H. Ma, M. A. Langouri, Y. Wu, F. Chiang, and J. Pi.
questions in similar graph matching. IEEE Trans. Knowl. Data Ontology-based entity matching in attributed graphs. PVLDB,
Eng., 27(10):2672–2686, 2015. 12(10):1195–1207, 2019.
[60] N. Jayaram, A. Khan, C. Li, X. Yan, and R. Elmasri. Querying [84] S. Ma, Y. Cao, W. Fan, J. Huai, and T. Wo. Strong simulation:
knowledge graphs by example entity tuples. IEEE Trans. capturing topology in graph pattern matching. ACM Trans.
Knowl. Data Eng., 27(10):2797–2811, 2015. Database Syst., 39(1):4:1–4:46, 2014.
[61] G. Ji, S. He, L. Xu, K. Liu, and J. Zhao. Knowledge graph [85] A. Mhedhbi and S. Salihoglu. Modern techniques for querying
embedding via dynamic mapping matrix. In ACL, 2015. graph-structured relations: foundations, system
[62] X. Jin, Z. Yang, X. Lin, S. Yang, L. Qin, and Y. Peng. FAST: implementations, and open challenges. PVLDB,
fpga-based subgraph matching on massive graphs. In ICDE, 15(12):3762–3765, 2022.
2021. [86] Microsoft. Sql graph architecture.
[63] J. Kalo, L. Fichtel, P. Ehler, and W. Balke. Knowlybert - https://learn.microsoft.com/en-us/sql/
hybrid query answering over language models and knowledge relational-databases/graphs/
graphs. In ISWC, 2020. sql-graph-architecture?view=sql-server-ver16,
[64] Z. Kaoudi and I. Manolescu. RDF in the clouds: a survey. 2022.
VLDB J., 24(1):67–91, 2015. [87] T. M. Mitchell, W. W. Cohen, E. R. H. Jr., P. P. Talukdar,
[65] F. Katsarou, N. Ntarmos, and P. Triantafillou. Performance and B. Yang, J. Betteridge, A. Carlson, B. D. Mishra, M. Gardner,
scalability of indexed subgraph query processing methods. B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed,
Proc. VLDB Endow., 8(12):1566–1577, 2015. N. Nakashole, E. A. Platanios, A. Ritter, M. Samadi,
[66] V. Kepuska and G. Bohouta. Next-generation of virtual B. Settles, R. C. Wang, D. Wijaya, A. Gupta, X. Chen,
personal assistants (microsoft cortana, apple siri, amazon alexa A. Saparov, M. Greaves, and J. Welling. Never-ending
and google home). In CCWC, 2018. learning. Commun. ACM, 61(5):103–115, 2018.
[67] A. Khan and S. Elnikety. Systems for big-graphs. Proc. VLDB [88] J. Mohoney, R. Waleffe, H. Xu, T. Rekatsinas, and
Endow., 7(13):1709–1710, 2014.
S. Venkataraman. Marius: learning massive graph embeddings 2022.
on a single machine. In OSDI, 2021. [115] H. Ren, W. Hu, and J. Leskovec. Query2box: reasoning over
[89] D. Mottin, M. Lissandrini, Y. Velegrakis, and T. Palpanas. knowledge graphs in vector space using box embeddings. In
Exemplar queries: give me an example of what you need. ICLR, 2020.
PVLDB, 7(5):365–376, 2014. [116] H. Ren and J. Leskovec. Beta embeddings for multi-hop
[90] D. Nathani, J. Chauhan, C. Sharma, and M. Kaul. Learning logical reasoning in knowledge graphs. In NeurIPS, 2020.
attention-based embeddings for relation prediction in [117] X. Ren and J. Wang. Multi-query optimization for subgraph
knowledge graphs. In ACL, 2019. isomorphism search. PVLDB, 10(3):121–132, 2016.
[91] R. Navigli and S. P. Ponzetto. Babelnet: building a very large [118] M. A. Rodriguez. The gremlin graph traversal machine and
multilingual semantic network. In ACL, 2010. language (invited talk). In DBPL, 2015.
[92] Neo4J. Why graph databases? [119] R. S. Roy and A. Anand. Question answering over curated and
https://neo4j.com/why-graph-databases/, 2016. open web sources. In SIGIR, 2020.
[93] T. Neumann and G. Weikum. Scalable join processing on very [120] T. Sagi, M. Lissandrini, T. B. Pedersen, and K. Hose. A design
large RDF graphs. In SIGMOD, 2009. space for RDF data representations. VLDB J., 31(2):347–373,
[94] T. Neumann and G. Weikum. The RDF-3X engine for scalable 2022.
management of RDF data. VLDB J., 19(1):91–113, 2010. [121] S. Sahu, A. Mhedhbi, S. Salihoglu, J. Lin, and M. T. Özsu. The
[95] H. Q. Ngo. Worst-case optimal join algorithms: techniques, ubiquity of large graphs and surprising challenges of graph
results, and open problems. In PODS, 2018. processing: extended survey. VLDB J., 29(2-3):595–618, 2020.
[96] D. Q. Nguyen, T. D. Nguyen, D. Q. Nguyen, and D. Q. Phung. [122] S. Sakr, S. Elnikety, and Y. He. G-SPARQL: a hybrid engine
A novel embedding model for knowledge base completion for querying large attributed graphs. In CIKM, 2012.
based on convolutional neural network. In NAACL-HLT, 2018. [123] M. Sarwat, S. Elnikety, Y. He, and M. F. Mokbel. Horton+: a
[97] J. Ni, V. Pandelea, T. Young, H. Zhou, and E. Cambria. Hitkg: distributed system for processing declarative reachability
towards goal-oriented conversations via multi-hierarchy queries over partitioned graphs. PVLDB, 6(14):1918–1929,
learning. In AAAI, 2022. 2013.
[98] M. Nickel, V. Tresp, and H. Kriegel. A three-way model for [124] A. Saxena, A. Kochsiek, and R. Gemulla.
collective learning on multi-relational data. In ICML, 2011. Sequence-to-sequence knowledge graph completion and
[99] M. Nickel, V. Tresp, and H. Kriegel. Factorizing YAGO: question answering. In ACL, 2022.
scalable machine learning for linked data. In WWW, 2012. [125] A. Saxena, A. Tripathi, and P. P. Talukdar. Improving
[100] C. Nikolaou and M. Koubarakis. Querying incomplete multi-hop question answering over knowledge graphs using
information in RDF with SPARQL. Artif. Intell., 237:138–171, knowledge base embeddings. In ACL, 2020.
2016. [126] M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. v. d. Berg,
[101] N. F. Noy, Y. Gao, A. Jain, A. Narayanan, A. Patterson, and I. Titov, and M. Welling. Modeling relational data with graph
J. Taylor. Industry-scale knowledge graphs: lessons and convolutional networks. In ESWC, 2018.
challenges. Commun. ACM, 62(8):36–43, 2019. [127] M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel. Sp2bench:
[102] Oracle. Pgql 1.5 specification. a sparql performance benchmark. In ICDE, 2009.
https://pgql-lang.org/spec/1.5/, 2022. [128] C. Shi, Y. Li, J. Zhang, Y. Sun, and P. S. Yu. A survey of
[103] A. Pacaci and M. T. Özsu. Experimental analysis of streaming heterogeneous information network analysis. IEEE Trans.
algorithms for graph partitioning. In SIGMOD, 2019. Knowl. Data Eng., 29(1):17–37, 2017.
[104] X. Pan, T. Ye, D. Han, S. Song, and G. Huang. Contrastive [129] A. Singhal. Introducing the knowledge graph: things, not
language-image pre-training with knowledge graphs. In strings. https://blog.google/products/search/
NeurIPS, 2022. introducing-knowledge-graph-things-not/, 2012.
[105] N. Papailiou, D. Tsoumakos, P. Karras, and N. Koziris. [130] C. Sommer. Shortest-path queries in static networks. ACM
Graph-aware, workload-adaptive SPARQL query caching. In Comput. Surv., 46(4):45:1–45:31, 2014.
SIGMOD, 2015. [131] Q. Song, M. H. Namaki, P. Lin, and Y. Wu. Answering
[106] S. Pei, L. Yu, G. Yu, and X. Zhang. Rea: robust cross-lingual why-questions for subgraph queries. IEEE Trans. Knowl. Data
entity alignment between knowledge graphs. In KDD, 2020. Eng., 34(10):4636–4649, 2022.
[107] P. Peng, Q. Ge, L. Zou, M. T. Özsu, Z. Xu, and D. Zhao. [132] R. Speer and C. Havasi. Representing general relational
Optimizing multi-query evaluation in federated RDF systems. knowledge in ConceptNet 5. In LREC, 2012.
IEEE Trans. Knowl. Data Eng., 33(4):1692–1707, 2021. [133] A. Styperek, M. Ciesielczyk, A. Szwabe, and P. Misiorek.
[108] P. Pezeshkpour, L. Chen, and S. Singh. Embedding Evaluation of sparql-compliant semantic search user
multimodal relational data for knowledge base completion. In interfaces. Vietnam. J. Comput. Sci., 2(3):191–199, 2015.
EMNLP, 2018. [134] Y. Su, S. Yang, H. Sun, M. Srivatsa, S. Kase, M. Vanni, and
[109] D. L. Phuoc, H. N. M. Quoc, Q. H. Ngo, T. T. Nhat, and X. Yan. Exploiting relevance feedback in knowledge graph
M. Hauswirth. The graph of things: a step towards the live search. In KDD, 2015.
knowledge graph of connected things. J. Web Semant., [135] F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of
37-38:25–35, 2016. semantic knowledge. In WWW, 2007.
[110] A. Poggi, D. Lembo, D. Calvanese, G. D. Giacomo, [136] H. Sun, A. O. Arnold, T. Bedrax-Weiss, F. Pereira, and W. W.
M. Lenzerini, and R. Rosati. Linking data to ontologies. J. Cohen. Faithful embeddings for knowledge base queries. In
Data Semant., 10:133–173, 2008. NeurIPS, 2020.
[111] A. Quamar, V. Efthymiou, C. Lei, and F. Özcan. Natural [137] R. Sun, X. Cao, Y. Zhao, J. Wan, K. Zhou, F. Zhang, Z. Wang,
language interfaces to data. Found. Trends Databases, and K. Zheng. Multi-modal knowledge graphs for
11(4):319–414, 2022. recommender systems. In CIKM, 2020.
[112] E. Rahm and P. A. Bernstein. A survey of approaches to [138] S. Sun and Q. Luo. In-memory subgraph matching: an
automatic schema matching. VLDB J., 10(4):334–350, 2001. in-depth study. In SIGMOD, page 1083–1098, 2020.
[113] H. Ren, H. Dai, B. Dai, X. Chen, M. Yasunaga, H. Sun, [139] W. Sun, A. Fokoue, K. Srinivas, A. Kementsietsidis, G. Hu,
D. Schuurmans, J. Leskovec, and D. Zhou. LEGO: latent and G. T. Xie. Sqlgraph: an efficient relational-based property
execution-guided reasoning for multi-hop question answering graph store. In SIGMOD, 2015.
on knowledge graphs. In ICML, 2021. [140] Z. Sun, Z. Deng, J. Nie, and J. Tang. Rotate: knowledge graph
[114] H. Ren, H. Dai, B. Dai, X. Chen, D. Zhou, J. Leskovec, and embedding by relational rotation in complex space. In ICLR,
D. Schuurmans. SMORE: knowledge graph completion and 2019.
multi-hop reasoning in massive knowledge graphs. In KDD,
[141] G. Szárnyas, A. Prat-Pérez, A. Averbuch, J. Marton, AAAI, 2016.
M. Paradies, M. Kaufmann, O. Erling, P. A. Boncz, [166] R. Xie, Z. Liu, H. Luan, and M. Sun. Image-embodied
V. Haprian, and J. B. Antal. An early look at the LDBC social knowledge representation learning. In IJCAI, 2017.
network benchmark’s business intelligence workload. In [167] D. Xu, C. Ruan, E. Körpeoglu, S. Kumar, and K. Achan.
GRADES and NDA, 2018. Product knowledge graph embedding for e-commerce. In
[142] A. Talmor and J. Berant. The web as a knowledge-base for WSDM, 2020.
answering complex questions. In NAACL-HLT, 2018. [168] B. Yang, W. Yih, X. He, J. Gao, and L. Deng. Embedding
[143] A. Tchechmedjiev, P. Fafalios, K. Boland, M. Gasquet, entities and relations for learning and inference in knowledge
M. Zloch, B. Zapilko, S. Dietze, and K. Todorov. Claimskg: a bases. In ICLR, 2015.
knowledge graph of fact-checked claims. In ISWC, 2019. [169] D. Yang, P. Qing, Y. Li, H. Lu, and X. Lin. Gammae: gamma
[144] A. Termehchy, M. Winslett, Y. Chodpathumwan, and embeddings for logical queries on knowledge graphs. In
A. Gibbons. Design independent query interfaces. IEEE Trans. EMNLP, 2022.
Knowl. Data Eng., 24(10):1819–1832, 2012. [170] J. Yang, W. Yao, and W. Zhang. Keyword search on large
[145] Y. Tian. The world of graph databases from an industry graphs: a survey. Data Sci. Eng., 6(2):142–162, 2021.
perspective. SIGMOD Rec., 51(4):60–67, 2022. [171] S. Yang, Y. Wu, H. Sun, and X. Yan. Schemaless and
[146] P. Trivedi, G. Maheshwari, M. Dubey, and J. Lehmann. structureless graph querying. PVLDB, 7(7):565–576, 2014.
Lc-quad: a corpus for complex question answering over [172] Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. W. Cohen,
knowledge graphs. In ISWC, 2017. R. Salakhutdinov, and C. D. Manning. Hotpotqa: a dataset for
[147] Y. Tuan, Y. Chen, and H. Lee. Dykgchat: benchmarking diverse, explainable multi-hop question answering. In
dialogue generation grounding on dynamic knowledge graphs. EMNLP, 2018.
In EMNLP-IJCNLP, 2019. [173] J. Yao, B. Cui, L. Hua, and Y. Huang. Keyword query
[148] Y. Tzitzikas, N. Manolis, and P. Papadakos. Faceted reformulation on structured data. In ICDE, 2012.
exploration of RDF/S datasets: a survey. J. Intell. Inf. Syst., [174] M. Yasunaga, A. Bosselut, H. Ren, X. Zhang, C. D. Manning,
48(2):329–364, 2017. P. Liang, and J. Leskovec. Deep bidirectional
[149] S. Vashishth, S. Sanyal, V. Nitin, and P. P. Talukdar. language-knowledge graph pretraining. In NeurIPS, 2022.
Composition-based multi-relational graph convolutional [175] D. Yu, C. Zhu, Y. Yang, and M. Zeng. Jaket: joint pre-training
networks. In ICLR, 2020. of knowledge graph and language understanding. In AAAI,
[150] E. Vasilyeva, M. Thiele, C. Bornhövd, and W. Lehner. 2022.
Answering "why empty?" and "why so many?" queries in [176] M. Zaib, W. E. Zhang, Q. Z. Sheng, A. Mahmood, and
graph databases. J. Comput. Syst. Sci., 82(1):3–22, 2016. Y. Zhang. Conversational question answering: a survey.
[151] D. Vrandečić and M. Krötzsch. Wikidata: a free collaborative Knowl. Inf. Syst., 64(12):3151–3195, 2022.
knowledgebase. Commun. ACM, 57(10):78–85, 2014. [177] L. Zeng, L. Zou, M. T. Özsu, L. Hu, and F. Zhang. GSI:
[152] L. Wang, W. Zhao, Z. Wei, and J. Liu. Simkgc: simple gpu-friendly subgraph isomorphism. In ICDE, 2020.
contrastive knowledge graph completion with pre-trained [178] M. Zhang, R. Dai, M. Dong, and T. He. Drlk: dynamic
language models. In ACL, 2022. hierarchical reasoning with language model and knowledge
[153] M. Wang, S. Wang, H. Yang, Z. Zhang, X. Chen, and G. Qi. Is graph for question answering. In EMNLP, 2022.
visual context really helpful for knowledge graph? a [179] W. Zhang, J. Chen, J. Li, Z. Xu, J. Z. Pan, and H. Chen.
representation learning perspective. In MM, 2021. Knowledge graph reasoning with logics and embeddings:
[154] X. Wang, T. Gao, Z. Zhu, Z. Zhang, Z. Liu, J. Li, and J. Tang. survey and perspective. CoRR, abs/2202.07412, 2022.
Kepler: a unified model for knowledge embedding and [180] W. Zhang, B. Paudel, L. Wang, J. Chen, H. Zhu, W. Zhang,
pre-trained language representation. Trans. Assoc. Comput. A. Bernstein, and H. Chen. Iteratively learning embeddings
Linguistics, 9:176–194, 2021. and rules for knowledge graph reasoning. In WWW, 2019.
[155] Y. Wang, A. Khan, T. Wu, J. Jin, and H. Yan. Semantic guided [181] X. Zhang, L. Chen, Y. Tong, and M. Wang. EAGRE: towards
and response times bounded top-k similarity search over scalable I/O efficient SPARQL query evaluation on the cloud.
knowledge graphs. In ICDE, 2020. In ICDE, 2013.
[156] Y. Wang, A. Khan, X. Xu, J. Jin, Q. Hong, and T. Fu. [182] Z. Zhang, J. Wang, J. Chen, S. Ji, and F. Wu. Cone: cone
Aggregate queries on knowledge graphs: fast approximation embeddings for multi-hop reasoning over knowledge graphs.
with semantic-aware sampling. In ICDE, 2022. In NeurIPS, 2021.
[157] Y. Wang, A. Khan, X. Xu, S. Ye, S. Pan, and Y. Zhou. [183] D. Zheng, X. Song, C. Ma, Z. Tan, Z. Ye, J. Dong, H. Xiong,
Approximate and interactive processing of aggregate queries Z. Zhang, and G. Karypis. Dgl-ke: training knowledge graph
on knowledge graphs: a demonstration. In CIKM, 2022. embeddings at scale. In SIGIR, 2020.
[158] Y. Wang, Y. Li, J. Fan, C. Ye, and M. Chai. A survey of typical [184] W. Zheng, J. X. Yu, L. Zou, and H. Cheng. Question
attributed graph queries. World Wide Web, 24(1):297–346, answering over knowledge graphs: question understanding via
2021. template decomposition. Proc. VLDB Endow.,
[159] Z. Wang, L. Li, Q. Li, and D. Zeng. Multimodal data enhanced 11(11):1373–1386, 2018.
representation learning for knowledge graphs. In IJCNN, 2019. [185] W. Zheng, L. Zou, W. Peng, X. Yan, S. Song, and D. Zhao.
[160] Z. Wang, J. Zhang, J. Feng, and Z. Chen. Knowledge graph Semantic SPARQL similarity search over RDF knowledge
embedding by translating on hyperplanes. In AAAI, 2014. graphs. PVLDB, 9(11):840–851, 2016.
[161] G. Weikum, X. L. Dong, S. Razniewski, and F. M. Suchanek. [186] H. Zhou, T. Young, M. Huang, H. Zhao, J. Xu, and X. Zhu.
Machine knowledge: creation and curation of comprehensive Commonsense knowledge aware conversation generation with
knowledge bases. Found. Trends Databases, 10(2-4):108–490, graph attention. In IJCAI, 2018.
2021. [187] Q. Zhou, C. Wang, M. Xiong, H. Wang, and Y. Yu. SPARK:
[162] K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. adapting keyword query to semantic search. In ISWC.
Efficient RDF storage and retrieval in jena2. In SWDB, 2003. Springer, 2007.
[163] P. T. Wood. Query languages for graph databases. SIGMOD [188] Z. Zhu, M. Galkin, Z. Zhang, and J. Tang. Neural-symbolic
Rec., 41(1):50–60, 2012. models for logical queries on knowledge graphs. In ICML,
[164] M. Wylot, M. Hauswirth, P. Cudré-Mauroux, and S. Sakr. Rdf 2022.
data storage and query processing schemes: a survey. ACM [189] L. Zou, R. Huang, H. Wang, J. X. Yu, W. He, and D. Zhao.
Comput. Surv., 51(4), 2018. Natural language question answering over RDF: a graph data
[165] R. Xie, Z. Liu, J. Jia, H. Luan, and M. Sun. Representation driven approach. In SIGMOD, 2014.
learning of knowledge graphs with entity descriptions. In

You might also like