Modeling and Management of Fuzzy Semantic RDF Data
Modeling and Management of Fuzzy Semantic RDF Data
Zongmin Ma
Guanfeng Li
Ruizhe Ma
Modeling and
Management
of Fuzzy Semantic
RDF Data
Studies in Computational Intelligence
Volume 1057
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes
new developments and advances in the various areas of computational
intelligence—quickly and with a high quality. The intent is to cover the theory,
applications, and design methods of computational intelligence, as embedded in
the fields of engineering, computer science, physics and life sciences, as well as
the methodologies behind them. The series contains monographs, lecture notes and
edited volumes in computational intelligence spanning the areas of neural networks,
connectionist systems, genetic algorithms, evolutionary computation, artificial
intelligence, cellular automata, self-organizing systems, soft computing, fuzzy
systems, and hybrid intelligent systems. Of particular value to both the contributors
and the readership are the short publication timeframe and the world-wide
distribution, which enable both wide and rapid dissemination of research output.
Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Zongmin Ma · Guanfeng Li · Ruizhe Ma
Ruizhe Ma
Department of Computer Science
University of Massachusetts Lowell
Lowell, MA, USA
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
In the era of big data, we have witnessed a tremendous increase in the amount of data
available. In this context, it has become very crucial to develop a common framework
for massive data sharing across applications, enterprises, and communities. For this
purpose, data should be provided with semantic meaning (through metadata), which
enables machines to consume, understand, and reason about the structure and purpose
of data. The Resource Description Framework (RDF) recommended by W3C (World
Wide Web Consortium) has quickly gained popularity since its emergence and has
been the de-facto standard for semantic information representation and exchange.
Nowadays, the RDF metadata model is finding increasing usage in a wide range of
massive data management scenarios (e.g., knowledge graph). With the widespread
acceptance of RDF in diverse applications, a considerable amount of RDF data is
being proliferated and becoming available.
RDF and related standards allow intelligent understanding and processing of big
data. This creates a new set of data processing requirements involving RDF, such as
the need to construct and manage RDF data. For the purpose of RDF construction,
various data resources, including the traditional databases, XML (Extensible Markup
Language) and JSON (JavaScript Object Notation) documents, texts, tabular data
such as CSV (comma-separated values) and TSV (tab-separated values), NoSQL
(not only SQL) databases and so on, have been used for automatically constructing
RDF models. RDF data management typically involves two primary technical issues:
scalable storage and efficient queries. For more effective queries, it is necessary to
index RDF data. All the listed issues are closely related. Indexing of RDF data is
enabled based on RDF storage, and efficient querying of RDF data is supported by
the indexing structure. Efficient and scalable management of massive RDF data is
of increasing importance.
With the wide and in-depth utilization of RDF in diverse application domains,
particularities with information management in concrete applications emerge, which
can challenge the traditional RDF technologies. In data and knowledge intensive
applications, one of the challenges can be generalized as the need to deal with uncer-
tain information in RDF data management. In the real world, human knowledge and
v
vi Preface
natural language have a great deal of imprecision and vagueness. With the increasing
amount of RDF data that is becoming available, efficient and scalable management
of massive RDF data with uncertainty is of crucial importance.
Fuzzy set theory, which has been one of the key means of implementing machine
intelligence, has been used in a large number and a wide variety of applications.
In order to bridge the gap between human-understandable soft logic and machine-
readable hard logic, fuzzy logic cannot be ignored. Fuzzy logic has been intro-
duced into diverse data models for fuzzy data processing. The emergence of the big
data era has put essential requirements on dealing with both semantic and fuzzy
phenomena. Currently, the research of fuzzy logic in RDF knowledge graphs is
attracting increasing attention, but the achievements are still few and scattered.
This book goes into great depth concerning the fast-growing topic of technologies
and approaches to fuzzy RDF data modeling and management. This book covers the
representation of fuzzy RDF, the persistence of fuzzy RDF, and the query of fuzzy
RDF. Concerning the representation of fuzzy RDF, the multi-granularity fuzziness
in the RDF graph and RDF schema are identified, and a set of algebraic opera-
tions is defined for the fuzzy RDF model. Concerning the persistence of fuzzy
RDF, several storage frameworks are proposed with diverse database models, the
traditional relational and object-oriented database models, as well as the emerging
NoSQL databases such as the HBase database and Neo4j database, are introduced.
Concerning the query of fuzzy RDF, the fuzzy graph pattern matching and the fuzzy
extension mechanism of SPARQL (Simple Protocol and RDF Query Language)
query language are investigated. The methods for exact pattern match query, approx-
imate fuzzy RDF subgraph match query, and fuzzy quantified query over fuzzy RDF
graph are proposed. In addition, an extension of SPARQL language to query fuzzy
RDF graphs is developed.
This book aims to provide a single record of current studies in the field of fuzzy
semantic data management with RDF. The objective of this book is to systematically
present the state-of-the-art information to researchers, practitioners, and graduate
students who need to intelligently deal with Big Data with uncertainty and, at the
same time, serve the data and knowledge engineering professionals faced with non-
traditional applications that make the application of conventional approaches difficult
or impossible. Researchers, graduate students, and information technology profes-
sionals interested in RDF and fuzzy data processing will find this book a starting
point and a reference for their study, research, and development.
We would like to acknowledge all of the researchers in the area of fuzzy data
and knowledge engineering. Based on both their publications and many discussions
with some of them, their influence on this book is profound. The materials in this
book are the outgrowth of research conducted by the authors in recent years. The
initial research work was supported by the National Natural Science Foundation
of China (62176121, 62066038, 61772269, and 61370075). We are grateful for the
financial support from the National Natural Science Foundation of China through
several research grant funds. Additionally, the assistance and facilities of authors’
universities are deemed important and highly appreciated. Special thanks go to Janusz
Kacprzyk, the series editor of Studies in Fuzziness and Soft Computing, and Thomas
Preface vii
ix
x Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Chapter 1
RDF Data and Management
1.1 Introduction
Recent years have witnessed a tremendous increase in the amount of data available
on the Web (Hassanzadeh et al., 2012). At the same time, Web 2.0 applications
have introduced new forms of data and have radically changed the nature of the
modern Web. In these applications, the Web has been transformed from a publish-
only environment into a vibrant forum for information exchange (Hassanzadeh et al.,
2012). The main purpose of the Semantic Web, proposed by W3C founder Tim
Berners-Lee in his description of the future of the Web (Berners-Lee et al., 2001), is
to provide a common framework for data sharing across applications, enterprises, and
communities. By giving data semantic meaning (through metadata), this framework
enables machines to consume, understand, and reason about the structure and purpose
of data.
The core of the Semantic Web is built on the Resource Description Framework
(RDF) data model (Manola & Miller, 2004). RDF provides a flexible and concise
model for representing metadata of resources on the Web. RDF can represent struc-
tured as well as unstructured data and is quickly becoming the de facto standard
for representation and exchange of information1 (Duan et al., 2011). Nowadays, the
RDF data model is finding increasing use in a wide range of Web data-management
scenarios and its use is now wider than the semantic web. Governments (e.g. from
the United States2 and United Kingdom3 ) and large companies and organizations
(e.g. New York Times,4 BBC,5 and Best Buy6 ) have started using RDF as a business
1 http://www.w3.org/RDF.
2 http://www.data.gov/.
3 http://www.data.gov.uk/.
4 http://data.nytimes.com/
5 http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup2010_dynamic_sem.html.
6 http://www.chiefmartec.com/2009/12/best-buy-jump-starts-data-webmarketing.html.
data model and representation format, either for semantic data integration, search-
engine optimization, and better product search, or to represent data from information
extraction. Yago (Suchanek et al., 2008) and DBPedia (Bizer et al., 2009) extract facts
from Wikipedia automatically and store them in RDF format to support structural
queries over Wikipedia; biologists encode their experiments and results using RDF to
communicate among themselves leading to RDF data collections, such as Bio2RDF
(bio2rdf.org) and Uniprot RDF (dev.isb-sib.ch/projects/uniprot-rdf). Furthermore, in
the Linked Open Data (LOD) cloud (Bizer et al., 2009), Web data from a diverse set
of domains like Wikipedia, films, geographic locations, and scientific data are linked
to provide one large RDF data cloud. With the increasing amount of RDF data which
is becoming available, efficient and scalable management of RDF data is of crucial
importance.
As a new data model, the RDF data-representation format largely determines how
to store and index RDF data and furthermore influences how to query RDF data.
Management of RDF data typically involves two primary technical challenges: scal-
able storage and efficient queries. Among these two issues, RDF data storage provides
the infrastructure for RDF data management. Many proposals of RDF queries have
been developed based on diverse query policies, (Ali et al., 2021) such as fuzzy
queries (Ma et al., 2016a, 2016b, 2016c), approximate queries (Yan et al., 2017),
keyword queries (Ma et al., 2018), natural language query (Hu et al., 2017), and so
on.
With the RDF format gaining widespread acceptance, much work is being done
in RDF data management, and a number of research efforts have been undertaken to
address these issues. Some RDF data-management systems have started to emerge
such as Sesame (Broekstra et al., 2002), Jena-TDB (Wilkinson et al., 2003), Virtuoso
(Erling & Mikhailov, 2007, 2009), 4store (Harris et al., 2009), BigOWLIM (Bishop
et al., 2011), SPARQLcity/SPARQLverse,7 MarkLogic,8 Clark & Parsia/Stardog,9
and Oracle Spatial and Graph with Oracle Database 12c.10 BigOWLIM was renamed
to OWLIM-SE and later on to GraphDB. In addition, some research prototypes have
been developed [e.g. RDF-3X (Neumann & Weikum, 2008, 2010); SW-Store (Abadi
et al., 2007, 2009), and RDFox11 ].
The purpose of the Semantic Web is to add semantic support to the existing Web,
so that machines can understand the meaning of information, so as to realize
the intelligent processing of Web information. This requires that machines be
7 http://sparqlcity.com/.
8 http://www.marklogic.com/.
9 http://clarkparsia.com/.
10 http://www.oracle.com/us/products/database/options/spatial/overview/index.html.
11 http://www.cs.ox.ac.uk/isg/tools/RDFox/.
1.2 RDF Data Model 3
provided with data describing Web data, that is, metadata. A universal metadata
model, RDF came into being. RDF is a framework for metadata and the corner-
stone of the Semantic Web. It provides interoperability between applications using
machine-understandable Web data.
The RDF is a W3C Recommendation that has rapidly gained popularity. RDF
provides a means of expressing and exchanging semantic metadata (i.e. data that
specify semantic information about data). By representing and processing metadata
about information sources, RDF defines a model for describing relationships among
resources in terms of uniquely identified attributes and values.
In the RDF data model, the universe is modelled as a set of resources, where
a resource is anything that has a universal resource identifier (URI), including all
information on the Web, virtual concepts, or things in the real world, such as movies,
screenwriters, directors, countries, etc. And a resource can be described using a set
of RDF statements in the form of (subject, predicate, object) triples. Here subject is
the resource being described, predicate is the property being described with respect
to the resource, and object is the value for the property. RDF data set consists of
these statements. For example, the natural language expression “The director of the
movie Dinner is Barry Levinson” can be expressed by RDF statements:
• A subject: http://www.example.org/director/BarryLevinson
• A predicate: http://www.example.org/dc/elements/direct
• and an object: http://www.example.org/film/Dinner
Here, URIs are used to identify the subject, predicate, and object of the statement.
Note that both subject and object can be anonymous objects, known as blank
nodes. RDF uses these triples to describe resources and attach additional semantic
information to the resources.
It is possible to annotate RDF data with semantic metadata using RDFS (RDF
Schema) or OWL, both of which are W3C standards. This annotation primarily
enables reasoning over the RDF data (called entailment), that we do not consider
in this book. However, as we will see below, it also impacts data organization in
some cases, and the metadata can be used for semantic query optimization. We
illustrate the fundamental concepts by simple examples using RDFS, which allows
the definition of classes and class hierarchies. RDFS has built-in class definitions—
the more important ones being rdfs: Class and rdfs: subClassOf that are used to
define a class and a subclass, respectively. To specify that an individual resource is
an element of the class, a special property, rdf: type is used. For example, if we wanted
to define a class called Movies and two subclasses ActionMovies and Dramas, this
would be accomplished in the following way:
4 1 RDF Data and Management
In this section, we introduce an abstract version of the RDF data model, which is
both a fragment following faithfully the original specification, and an abstract version
suitable to do formal analysis. The abstract syntax of RDF model is a set of triples.
Formally, an RDF triple is defined as (s, p, o) ∈ (U ∪ B) × U × (U ∪ L ∪ B), where
U, B, and L are infinite sets of URI, blank nodes, and RDF literals, respectively. In a
triple (s, p, o), s is called the subject, p the predicate (or property), and o the object.
The interpretation of a triple statement is that subject s has property p with value o.
Thus, an RDF triple can be seen as representing an atomic “fact” or a “claim”. Note
that any object in one triple, say oi in (si , pi , oi ), can play the role of a subject in
another triple, say (oi , pj , oj ). Therefore, RDF data is a directed, labelled graph data
format for representing Web resources.
There are many syntaxes available for writing RDF data and serializing RDF
data, such as N-Triples,12 RDF/XML,13 RDFa,14 JSON LD,15 Notation 3 (N3),16
Turtle17 and so on. This section mainly introduces three common RDF representation
methods: N-Triples, RDF/XML and graph-based representation grammar. Suppose
there are the following three statements in the RDF data:
Statement 1: Barry Levinson is the director of Dinner.
Statement 2: Barry Levinson’s age is 77.
Statement 3: Barry Levinson’s nationality is USA.
(a) N-Triples triples representation grammar
N-Triples aims to express RDF in a concise and intuitive syntax and provide
shortcuts to commonly used RDF functions. This grammar is based on the
definition of a statement. A statement consists of three parts: subject, attribute,
12 http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/#ntriples.
13 http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/.
14 https://www.w3.org/TR/rdfa-primer/.
15 https://json-ld.org/.
16 http://www.w3.org/TeamSubmission/n3/.
17 http://www.w3.org/TeamSubmission/turtle/.
1.2 RDF Data Model 5
http://www.example.org/film/Dinner
http://www.example.org/dc/elements/direct
http://www.example.org/director/ Elvin
http://www.example.org/dc/elements/age http://www.example.org/dc/elements/nationality
53 USA
The RDF specification includes a set of reserved words, the RDFS vocabulary [RDF
Schema (Brickley & Guha, 2004)], which is designed to describe relationships
between resources and properties like attributes of resources (traditional attribute-
value pairs). Roughly speaking, this vocabulary can be conceptually divided into the
following groups:
(a) A set of properties, which are binary relations between subject resources
and object resources: rdfs: subPropertyOf (denoted by sp in this book), rdfs:
subClassOf (sc), rdfs: domain (dom), rdfs: range (range) and rdf: type (type).
(b) A set of classes, that denote set of resources. Elements of a class are known
as instances of that class. To state that a resource is an instance of a class, the
reserved word type may be used.
1.2 RDF Data Model 7
(c) Other functionalities, like a system of classes and properties to describe lists,
and a system for doing reification.
(d) Utility vocabulary used to document, comment, etc. [the complete vocabulary
can be found in Brickley and Guha (2004)].
The groups in (b), (c) and (d) have a light semantics, essentially describing their
internal relationships in the ontological design of the system of classes of RDFS.
Their semantics is defined by a set of “axiomatic triples” (Hayes, 2004) which express
the relationships among these reserved words. All axiomatic triples are “structural”,
in the sense that do not refer to external data. Much of this semantics corresponds to
what in standard languages is captured via typing.
On the contrary, the group (a) is formed by predicates whose intended meaning is
non-trivial, and is designed to relate individual pieces of data external to the vocabu-
lary of the language. Their semantics is defined by rules which involve variables (to
be instantiated by actual data). For example, rdfs: subClassOf (sc) is a reflexive and
transitive binary property; and when combined with rdf: type (type) specifies that
the type of an individual (a class) can be lifted to that of a superclass.
The group (a) forms the core of the RDF language and, from a theoretical point of
view, it has been shown to be a very stable core to work with [the detailed arguments
supporting this claim are given in Munoz et al. (2007)]. Thus, throughout the charpter
we focused on the fragment of RDFS given by the set of keywords {sp, sc, type, dom,
range}.
In this section, we present the formalization of the semantics of RDF. The normative
semantics for RDF graphs given in Hayes (2004), and the mathematical formalization
in Marin (2004) follows standard classical treatment in logic with the notions of
model, interpretation, entailment, and so on.
Model theory assumes that the language refers to a ‘world’, and describes the
minimal conditions that a world must satisfy to assign an appropriate meaning for
every expression in the language. A particular world is called an interpretation, so
that model theory might be better called ‘interpretation theory’. The idea is to provide
an abstract, mathematical account of the properties that any such interpretation must
have, making as few assumptions as possible about its actual nature or intrinsic
structure, thereby retaining as much generality as possible.
All interpretations will be relative to a set of names, called the vocabulary of the
interpretation, so that one should speak, strictly, of an interpretation of an RDF vocab-
ulary, rather than of RDF itself. Some interpretations may assign special meanings
to the symbols in a particular vocabulary. Interpretations which share the special
meaning of a particular vocabulary will be named for that vocabulary, e.g. ‘rdf-
interpretations’, ‘rdfs-interpretations’, etc. An interpretation with no particular extra
8 1 RDF Data and Management
creates
type
Writer Novelist
sp
Sub-class:
• PExt(Int(sc)) is transitive and reflexive over Class,
• if (x, y) ∈ PExt (Int(sc)), then x, y ∈ Class and CExt(x) ⊆ CExt(y).
Typing:
• (x, y) ∈ PExt(Int(type)) if and only if y ∈ Class and x ∈ CExt(y),
• if (x, y) ∈ PExt(Int(dom)) and (u, v) ∈ PExt(x), then u ∈ CExt(y),
• if (x, y) ∈ PExt(Int(range)) and (u, v) ∈ PExt(x), then v ∈ CExt(y).
Example 1.1 Figure 1.2 shows an RDF graph storing information about writers.
All the triples in the graph are composed by elements in U, except for the triples
containing the blank node B. Consider now the interpretation I = (Res, Prop, Class,
PExt, CExt, Int) defined as follows:
• Res = {Writer, Henry, Novelist, creates, writes, boule de suif, 1880}.
• Prop = {creates, writes, issuing time, type, sp, sc, dom, range}.
• Class = {Writer, Novelist}.
• PExt is such that:
In 2004, the RDF Data Access Working Group, part of the W3C Semantic Web
Activity, released a first public working draft of a query language for RDF, called
SPARQL (Prud’hommeaux & Seaborne, 2008). The name SPARQL is a recursive
acronym that stands for SPARQL Protocol and RDF Query Language. Since then,
SPARQL has been rapidly adopted as the standard for querying Semantic Web data.
In January 2008, SPARQL became a W3C Recommendation. In this section, we give
a detailed description of the syntax and Semantics of SPARQL. RDF is a directed
labeled graph data format and, thus, SPARQL is essentially a graph-matching query
language. We start by focusing on the syntax of SPARQL in the specification of
SPARQL by the W3C, and then introduce an algebraic syntax for the language and
compare it with the official syntax. Finally, we formalize the semantics of SPARQL.
The syntax and semantics of SPARQL are specified by the RDF Data Access Working
Group (Prud’hommeaux & Seaborne, 2008). SPARQL is a language designed to
query data in the form of sets of triples, namely RDF graphs. The basic engine of
the language is a pattern matching facility, which uses some graph pattern matching
functionalities (sets of triples can be viewed also as graphs). From a syntactic point
of view, SPARQL language is similar to the SQL language, and the overall structure
consists of three main blocks.
The pattern matching part, which includes several interesting features of pattern
matching of graphs, like optional parts, union of patterns, nesting, filtering values of
possible matchings, and the possibility of choosing the data source to be matched
by a pattern. The solution modifiers, which once the output of the pattern has been
computed (in the form of a table of values of variables), allow to modify these values
applying classical operators like projection, distinct, order and limit. Finally, the
output of a SPARQL query can be of different types: yes/no queries, selections of
values of the variables which match the patterns, construction of new RDF data from
these values, and descriptions of resources.
In order to present the language, we follow the grammar given in Fig. 1.3 that
specifies the basic structure of the SPARQL Query Grammar (Prud’hommeaux &
Seaborne, 2008). There are several basic concepts used in the definition of the syntax
of SPARQL, many of which are taken from the RDF specification with some minor
modifications. For denoting resources, SPARQL uses IRIs instead of the URIs of
RDF. Anything represented by a literal could also be represented by an IRI, but it is
often more convenient or intuitive to use literals.
In what follows, we explain in more detail each component of the language. Of
course, for ultimate details the reader should consult the W3C Recommendation
(Prud’hommeaux & Seaborne, 2008).
1.3 RDF Query Language SPARQL 11
Fig. 1.3 A fragment of the SPARQL query grammar (Prud’hommeaux & Seaborne, 2008)
Example 1.2 (Pérez et al., 2006a, 2006b) Consider the following query: “Give the
name and the mailbox of each person who has a mailbox with domain.cl”. This query
can be expressed in SPARQL as follows:
RDF is a directed labeled graph data format and, thus, SPARQL is essentially a
graph-matching query language. In this section, we present the algebraic syntax of
the core fragment of SPARQL graph patterns proposed in (Arenas et al., 2009; Pérez
et al., 2006a, 2006b, 2009), and show that it is equivalent in expressive power to the
core fragment of SPARQL. Thus, this formalization is used in this chapter to give a
formal semantics to SPARQL.
The official syntax of SPARQL (Prud’hommeaux & Seaborne, 2008) considers
operators OPTIONAL, UNION, FILTER, and concatenation via a point symbol
(.), to construct graph pattern expressions. The syntax also considers {} to group
patterns, and some implicit rules of precedence and association. For example, the
1.3 RDF Query Language SPARQL 13
point symbol (.) has precedence over OPTIONAL, and OPTIONAL is left associative.
In order to avoid ambiguities in the parsing of expressions, Pérez and Arenas et al.
present a syntax of SPARQL graph patterns in a more traditional algebraic formalism,
using binary operators AND (.), UNION (UNION), OPT (OPTIONAL), and FILTER
(FILTER). They fully parenthesize expressions making explicit the precedence and
association of operators.
Assume the existence of a set of variables V disjoint from U. A SPARQL graph
pattern expression is defined recursively as follows:
(a) A tuple from (U ∪ V ) × (U ∪ V ) × (U ∪ V ) is a graph pattern (a triple pattern).
(b) If P1 and P2 are graph patterns, then expressions (P1 AND P2 ), (P1 OPT P2 ),
and (P1 UNION P2 ) are graph patterns (conjunction graph pattern, optional
graph pattern, and union graph pattern, respectively).
(c) If P is a graph pattern and R is a SPARQL built-in condition, then the expression
(P FILTER R) is a graph pattern (a filter graph pattern).
A SPARQL built-in condition is constructed using elements of the set U ∪ V and
constants, logical connectives (¬, ∧, ∨), inequality symbols (<, ≤, ≥, >), the equality
symbol (=), unary predicates like bound, isBlank, and isIRI, plus other features (see
[15] for a complete list). In this chapter, we restrict to the fragment where the built-in
condition is a Boolean combination of terms constructed by using = and bound, that
is:
(a) If ?X, ?Y ∈ V and c ∈ U, then bound(?X), ?X = c and ?X = ?Y are built-in
conditions.
(b) If R1 and R2 are built-in conditions, then (¬R1 ), (R1 ∨ R2 ) and (R1 ∧ R2 ) are
built-in conditions.
In the rest of the book, we use var(*) to denote the set of variables occurring in *,
where * be a SPARQL graph pattern P or a built-in condition R.
We conclude the definition of the algebraic framework by describing the formal
syntax of the SELECT query result form. A SELECT SPARQL query is simply a
tuple (W, P), where P is a SPARQL graph pattern expressions and W is a set of
variables such that W ⊆ var(P).
by use cases, mostly by specifying the expected output for particular example queries.
In fact, the interpretations of examples and the exact outcomes of cases not covered
in the initial drafts of the SPARQL specification, were a matter of long discussions
in the W3C mailing lists. Pérez et al. (2006a, 2006b) presented one of the first
formalizations of a semantics for a fragment of the language. Currently, the official
specification of SPARQL (Prud’hommeaux & Seaborne, 2008), endorsed by the
W3C, formalizes a semantics based on Pérez et al. (2006a, 2006b).
The semantics of SPARQL is formalized by using partial mappings between
variables in the patterns and actual values in the RDF graph being queried. To define
the semantics of SPARQL graph pattern expressions, we need to introduce some
terminology. A mapping μ from V to U is a partial function μ: V → U. Abusing
notation, for a triple pattern t we denote by μ(t) the triple obtained by replacing the
variables in t according to μ. The domain of μ, denoted by dom(μ), is the subset of
V where μ is defined. The empty mapping μΦ is a mapping such that dom(μΦ ) = Φ
(i.e. μΦ = Φ). Given a triple pattern t and a mapping μ such that var(t) ⊆ dom(μ),
μ(t) is the triple obtained by replacing the variables in t according to μ. Similarly,
given a basic graph pattern P and a mapping μ such that var(P) ⊆ dom(μ), we have
that μ(P) = ∪ t ∈P {μ(t)}, i.e. μ(P) is the set of triples obtained by replacing the
variables in the triples of P according to μ.
To define the semantics of more complex patterns, we need to introduce some
more notions. Two mappings μ1 and μ2 are compatible when for all ?X ∈ dom(μ1 )
∩ dom(μ2 ), it is the case that μ1 (?X) = μ2 (?X), i.e. when μ1 ∪ μ2 is also a mapping.
Intuitively, μ1 and μ2 are compatibles if μ1 can be extended with μ2 to obtain a new
mapping, and vice versa. Note that two mappings with disjoint domains are always
compatible, and that the empty mapping μΦ (i.e. the mapping with empty domain)
is compatible with any other mapping.
Let Ω 1 and Ω 2 be sets of mappings, the join of, the union of and the difference
between Ω 1 and Ω 2 are defined as:
Ω 1 ⨝ Ω 2 = {μ1 ∪ μ2 |μ1 ∈ Ω 1 , μ2 ∈ Ω 2 and μ1 , μ2 are compatible mappings},
Ω 1 ∪ Ω 2 = {μ|μ ∈ Ω 1 or μ ∈ Ω 2 },
Ω 1 \Ω 2 = {μ ∈ Ω 1 |for all μ' ∈ Ω 2 , μ and μ' are not compatible}.
Based on the previous operators, the left outer-join are defined as:
Ω 1 ⟕ Ω 2 = (Ω 1 ⨝ Ω 2 ) ∪ (Ω 1 \Ω 2 ).
Intuitively, Ω 1 ⨝ Ω 2 is the set of mappings that result from extending mappings
in Ω 1 with their compatible mappings in Ω 2 , and Ω 1 \Ω 2 is the set of mappings in
Ω 1 that cannot be extended with any mapping in Ω 2 . The operation Ω 1 ∪ Ω 2 is the
usual set theoretical union. A mapping μ is in Ω 1 ⟕ Ω 2 if it is the extension of a
mapping of Ω 1 with a compatible mapping of Ω 2 , or if it belongs to Ω 1 and cannot
be extended with any mapping of Ω 2 . These operations resemble relational algebra
operations over sets of mappings (partial functions).
We are ready to define the semantics of graph pattern expressions as a function
[[·]]G which takes a pattern expression and returns a set of mappings. We follow the
approach in Gutierrez et al. (2011) defining the semantics as the set of mappings that
1.3 RDF Query Language SPARQL 15
matches the graph G. For the sake of readability, the semantics of filter expressions
is presented in a separate definition.
Let G be an RDF graph and P be a graph pattern. The evaluation of P over G,
denoted by [[P]]G , is defined recursively as follows (Arenas et al., 2009):
(a) if P is a triple pattern t, then [[P]]G = { μ|dom(μ) = var(t) and μ(t) ∈ G} .
(b) if P is (P1 AND P2 ), then [[P]]G =[[P1 ]]G ⨝ [[P2 ]]G .
(c) if P is (P1 OPT P2 ), then [[P]]G =[[P1 ]]G ⟕ [[P2 ]]G .
(d) if P is (P1 UNION P2 ), then [[P]]G =[[P1 ]]G ∪ [[P2 ]]G .
The idea behind the OPT operator is to allow for optional matching of patterns.
Consider pattern expression ((P1 OPT P2 ) and let μ1 be a mapping in [[P1 ]]G . If there
exists a mapping μ2 ∈ [[P2 ]]G such that μ1 and μ2 are compatible, then (μ1 ∪ μ2 ) ∈
[[(P1 OPT P2 )]]G . But if no such a mapping μ2 exists, then μ1 ∈ [[(P1 OPT P2 )]]G .
Thus, operator OPT allows information to be added to a mapping μ if the information
is available, instead of just rejecting μ whenever some part of the pattern does not
match.
The semantics of filter expressions goes as follows. Given a mapping μ and a
built-in condition R, we say that μ satisfies R, denoted by μ |= R, if:
(a) R is bound(?X) and ?X ∈ dom(μ);
(b) R is ?X = c, ?X ∈ dom(μ) and μ(?X) = c;
(c) R is ?X = ?Y, ?X ∈ dom(μ), ?Y ∈ dom(μ) and μ(?X) = μ(?Y );
(d) R is (¬R1 ), R1 is a built-in condition, and it is not the case that μ |= R1 ;
(e) R is (R1 ∨ R2 ), R1 and R2 are built-in conditions, and μ |= R1 or μ |= R2 ;
(f) R is (R1 ∧ R2 ), R1 and R2 are built-in conditions, μ |= R1 and μ |= R2 .
Let G be an RDF graph and (P FILTER R) be a filter expression. The evaluation
of filter expression over G is defined as [[(P F I L T E R R)]]G = { μ ∈ [[P]]G |μ| =
R}.
Several algebraic properties of graph patterns are proved by Pérez et al. (2006a,
2006b). A simple property is that AND and UNION are associative and commutative.
This permits us to avoid parenthesis when writing sequences of AND operators or
UNION operators.
The official W3C Recommendation (Prud’hommeaux & Seaborne, 2008) defines
four query forms, namely SELECT, ASK, CONSTRUCT, and DESCRIBE queries.
These query forms use the mappings obtained after the evaluation of a graph pattern
to construct result sets or RDF graphs. The query forms are: (1) SELECT, that
performs a projection over a set of variables in the evaluation of a graph pattern,
(2) CONSTRUCT, that returns an RDF graph constructed by substituting variables
in a template, (3) ASK, that returns a truth value indicating whether the evaluation
of a graph pattern produces at least one mapping, and (4) DESCRIBE, that returns
an RDF graph that describes the resources found. In this paper, we only consider
the SELECT query form. We refer the reader to Pérez et al. (2006a, 2006b) for a
formalization of the remaining query forms.
To formally define the semantics of SELECT SPARQL queries, we need the
following notion. Given a mapping μ: V → U and a set of variables W ⊆ V, the
16 1 RDF Data and Management
Definition 1.1 A SPARQL SELECT query is a tuple (W, P), where P is a graph
pattern and W is a set of variables such that W ⊆ var(P). The answer of (W, P) over
an RDF graph G, denoted by [[(W, P)]]G , is the set of mappings:
RDF plays an important role in representing Web resources in a natural and flexible
way. As the amount of RDF datasets increasingly growing, efficient and scalable
management of RDF data is therefore of increasing importance. RDF data manage-
ment has attracted attention in the database and Semantic Web communities. Much
work has been devoted to proposing different solutions to store RDF data efficiently.
In this section, we focus on RDF data storage and present a full up-to-date overview
of the current state of the art in RDF data storage based on the work by Ma et al.
(2016a, 2016b, 2016c). The various approaches are classified according to their
storage strategy, including RDF data stores in traditional databases and RDF data
stores in NoSQL databases. Figure 1.4 illustrates this classification for RDF data
stores. Note that, two different levels of RDF data storage can be distinguished:
logical storage and physical storage. This chapter mainly focusses on logical storage
of RDF data.
Traditionally, databases are classified into relational databases and object-oriented
databases. In addition, NoSQL databases have only recently emerged as a commonly
used infrastructure for handling big data. So, two top categories of RDF data stores
in Fig. 1.4 are traditional database stores and NoSQL database stores, respectively.
For the traditional database stores, corresponding to two kinds of traditional database
models, two categories of RDF data stores in the traditional database stores are
relational stores and object-oriented stores, which apply relational database and
Vertical stores Horizontal stores Type stores Key-value stores Column-family stores Document stores
A number of attempts have been made to use traditional databases to store RDF
data, and various storage schemes for RDF data have been proposed. Some ideas
and techniques developed earlier for object-oriented databases, for example, have
already been adapted to the RDF setting. RDF data were stored in an object-oriented
database by mapping both triples and resources to objects in Bönström et al. (2003).
An object-oriented database model was proposed for storage of RDF documents
(Chao, 2007a, 2007b), but the RDF documents were encoded in XML (eXtensible
Markup Language).
Relational database management systems (RDBMSs) are currently the most
widely used databases. It has been shown that RDBMSs are very efficient, scal-
able, and successful in hosting various types of data, including some new types of
data such as XML data, temporal/spatial data, media data, and complex objects.
Currently, more mature RDF systems use RDBMSs to store RDF data, map RDF
triples with relational table structures, and use RDBMS for storage and retrieval.
According to the different table structure designed, the storage of RDF data can be
divided into three methods (Luo et al., 2012; Sakr & Al-Naymat, 2009), namely the
vertical stores, the horizontal stores, and the property stores.
1. Vertical stores
Vertical stores (also called triple stores, e.g. Broekstra et al. 2002, Harris and Gibbins
2003, Harris and Shadbolt 2005, Neumann and Weikum 2008, 2010, Weiss et al.
2008) use a single relational table to store a set of RDF statements, in which the
relational schema contains three columns for subject, property, and object. Formally,
each triple, say (s, p, o), occurs in the relational table as a row, that is, tuple <s, p,
o>. Here subject s is placed in column subject of this row, predicate p is placed in
column property of this row, and object o is placed in column object of this row. When
performing an RDF query, given a SPARQL query, a query rewriting mechanism
is designed to convert the SPARQL into a corresponding SQL statement, and the
relational database will answer the SQL statement. Although this method has good
versatility, the query performance is poor, and a lot of self-join operations need to
be performed when the query is executed. Moreover, because vertical stores quickly
encounter scalability limitations, several approaches have been proposed to deal with
18 1 RDF Data and Management
<sj , oj > occur in two different tables. It is clear that the number of relational tables
is the same as the number of predicates in the RDF data sets.
SW-Store was proposed by Abadi et al. (2007, 2009) as an RDF data store that
vertically partitions RDF data (by predicates) into a set of property tables, maps them
onto a column-oriented database, and builds a subject–object index on each property
table. Note that the implementation of SW-Store relies on the C-Store column-store
database (Stonebraker et al., 2005) to store tables as collections of columns rather
than as collections of rows. Current relational database systems, for example, Oracle,
DB2, SQL Server, and Postgres, are standard row-oriented databases in which entire
tuples are stored consecutively. In addition, the results of an independent evaluation
of SW-Store are reported by Sidirourgos et al. (2008).
Extending the SW-Store approach, an approach called SPOVC is proposed by
Mulay and Kumar (2012). The main techniques used in this approach are horizontal
partitioning of logical indices and special indices for values and classes. The SPOVC
approach uses five indices, namely, subject, predicate, object, value, and class, on
top of column-oriented databases.
3. Property stores
The third approach for storing RDF data is called property stores (e.g. Levandoski and
Mokbel 2009, Matono et al. 2005, Sintek and Kiesel 2006), in which one relational
table is created for each RDF data type and a relational table contains the properties
as n-ary table columns for the same subject.
Actually, property stores are type-oriented stores (Bornea et al., 2013). The basic
idea of this approach is to divide one wide table into multiple smaller tables so that
each table contains related predicates as its columns. Formally, for two triples, say
(si , pi , oi ) and (sj , pj , oj ), suppose that pi and pj are related. Then these two triples
occur in the same table, with one row for each subject.
Furthermore, when si = sj and pi /= pj , oi and oj are placed in different columns
pi and pj of the same row; when si = sj and pi = pj , oi and oj are placed in the same
column of the same row, and a set of values {oi , oj } results; when si /= sj and pi
= pj , oi and oj are placed in the same column of different rows si and sj . It is not
difficult to see that designing a schema for property tables depends on identifying
related predicates.
Jena is an open-source toolkit for Semantic Web programmers (McBride, 2002).
It implements persistence for RDF graphs using an SQL database through a JDBC
connection. Jena has evolved from its first version, Jena 1, to a second version, Jena
2. In the Jena RDF, the grouping of predicates is defined by applications (Wilkinson
et al., 2003; Wilkinson, 2006). Applications typically have access patterns in which
certain subjects or properties are accessed together. In particular, the application
programmer must specify which predicates are multivalued. For each such multi-
valued predicate p, a new relational table is created, with a schema consisting of
subject and p. Jena also supports so-called property-class tables, in which for each
value of the rdf: type predicate, a new table is created. The remaining predicates that
are not in any defined group are stored independently.
20 1 RDF Data and Management
scheme retains schema information and path expressions for each resource, the path-
based relational RDF database (Matono et al., 2005) can process path-based queries
efficiently and store RDF instance data without schema information.
Among the three approaches for storing RDF data in relational databases, vertical
stores use a fixed relational schema, and new triples can be inserted without consid-
ering RDF data types. Therefore, vertical stores can handle dynamic schema of RDF
data. However, vertical stores generally involve a number of self-join operations for
querying, and therefore efficient querying requires specialized techniques. To over-
come the problem of self-joins in vertical stores, horizontal stores using a single
relational table are proposed. However, it commonly occurs that in the single rela-
tional table containing all predicates as columns, a subject occurs only with certain
predicates, which leads to a sparse relational table with many null values. In addi-
tion, a subject may have multiple objects for the same predicate. Such a predicate is
called a multi-valued predicate. As a result, the relational table in a horizontal store
contains multi-valued attributes. Finally, when new triples are inserted, new predi-
cates result in changes to the relational schema, and dynamic schema of RDF data
cannot be handled. To solve the problem of null values as well as that of multi-valued
attributes, horizontal stores using a set of relational tables are proposed, where each
predicate corresponds to a relational table. However, horizontal stores using a set of
relational tables generally involve many join operations for querying. In addition,
when new triples are inserted, new predicates result in new relational tables, and
dynamic schema of RDF data cannot be handled. A vertical store in (p, s, o) shape
would equal the sequential concatenation of all tables in a horizontal store which
uses a set of relational tables.
The type-store approach is actually a trade-off between the two kinds of horizontal
stores. Compared with horizontal stores using a single relational table, type stores
contain fewer null values (no null values in horizontal stores using multiple relational
tables), and involve fewer join operations than horizontal stores using multiple rela-
tional tables (no join operations in horizontal stores using a single relational table).
It should be noted that, like horizontal stores using a single relational table, type
stores may contain multivalued attributes and new predicates, resulting in changes
to relational schema when new triples are inserted.
Some major features of relational RDF data stores are summarized in Table 1.1.
1. Graph model
RDF data has the characteristics of graph structure, so some work studies the storage
of RDF data from the perspective of graph model.
Bönström et al. (2003) first proposed to treat RDF data from the perspective of
XML format data or a collection of triples. The graph model contains more semantic
information in RDF data. They believe that the advantages of using graphs to store
22 1 RDF Data and Management
Table 1.1 Major features of relational resource description framework data stores
Join Multi-valued Null values Relational Number of
operations attributes schema relation(s)
Vertical stores More No No Fixed Fixed
self-joins
Horizontal stores
One table for all No Yes Yes and many Dynamic Fixed
predicates
One table for More joins No No Dynamic Dynamic
each predicate
Type stores Fewer joins Yes Yes and fewer Dynamic Dynamic
RDF data are: (i) RDF model and graph model structure can be directly mapped, when
RDF data is stored, the conversion of RDF data is avoided; (ii) RDF data is queried,
graph model Storing RDF data avoids refactoring. Angles and Gutierrez (2005)
discussed the problem of using graph databases to store RDF data, and compared
the relationship between relational models, object-oriented models, semantic models
and RDF models. In addition, in his work, he also studied the adaptability of graph
database query language to RDF data and the applicability of RDF query language
to graph data. The results show that most RDF query languages have low support
for some basic graph queries, even SPARQL does not Support path query and node
distance query in graph structure, but these queries are very important in practical
applications.
Udrea et al. proposed to use the GRIN algorithm to answer SPARQL queries. The
core of GRIN is to construct a GRIN index similar to the M-tree structure (Ciaccia
et al., 1997). Using distance constraints, GRIN can quickly determine and prune the
parts of the RDF graph that do not meet the query conditions, which improves query
performance as a whole. Wu et al. (2008) proposed using a hypergraph data model
to store RDF data, and designed a persistent storage strategy based on the graph
structure. Yan et al. (2009) proposed to divide the RDF graph into several subgraphs
and add indexes such as Bloom filters to quickly determine whether the data being
checked is in a certain subgraph during query. The graph segmentation technology is
used to reduce the self-connection of triples, but the update of the graph still needs to
be re-segmented. Zou et al. (2011) proposed using the gStore system to store RDF and
answer SPARQL queries. Through the coding method, each entity node, neighbor
attribute and attribute value in the RDF graph is coded into a node with Bitstring
to obtain a label graph G*. When querying, the query graph Q is also encoded into
a query label graph Q*, and then the subgraph matching method is used to find
the subgraph satisfying Q* in the label graph G*. Both Property Graphs (PG) and
RDF can be used to represent graph model data, but the conversion between the two
models is incompatible. To this end, Hartig (2014) proposed a formal definition of the
attribute graph model, and introduced a clear definition for the conversion between the
PG and RDF models. On the one hand, by implementing the RDF-to-PG conversion
definition, PG-based systems can enable users to load RDF data and enable them to
1.4 RDF Data Store 23
use the graph traversal language Gremlin or declarations in a compatible and system-
independent manner Figure query language Cypher. On the other hand, PG-to-RDF
conversion enables RDF data management systems to achieve compatible processing
of Property Graphs content by using SPARQL. Recently, De Virgilio (2017) proposed
an automatic conversion method from RDF to graph storage system. The conversion
uses the integrity constraints defined on the source to properly construct a target
database and attempts to reduce the number of visits required to answer the query in
the database. This is done by storing the same node data that may appear together in
the query results, and they have also developed a system to implement the conversion.
At present, the representative graph database products of RDF data mainly include
Neo4j18 and Dydra.19 Neo4j is currently a relatively mature and high-performance
open source graph database. Neo4j can traverse nodes and edges at the same speed,
and its traversal speed has nothing to do with the amount of data that constitutes the
graph. But it does not support distributed storage, and the existing research on using
Neo4j to store RDF and support SPARQL query is very few, which is limited to
some engineering applications. The Thinkerpop team developed LinkedDataSail,20
which provides an interface for processing RDF data in a graph database. Using this
interface, Neo4j can support SPARQL queries and can be used as a triple database.
Martella21 stored the DBpedia data set in Neo4j, and then expanded SPARQL
queries and other graph algorithms on this basis. Dydra is a cloud-based graph
database. Using Dydra, RDF data is directly stored as an attribute graph, directly
representing the relationship in the underlying RDF data, and can be accessed and
updated through an industry standard query language designed specifically for graph
processing. These works have applied graph databases to store RDF data, and used
graph algorithms to solve query problems.
2. NoSQL data-management systems
NoSQL data-management systems have emerged as a commonly used infrastruc-
ture for handling big data outside the RDF space. The various NoSQL data stores
were divided into four major categories by Grolinger et al. (2013): key-value stores,
column-family stores, document stores, and graph databases. Key-value stores have
a simple data model based on key-value pairs. Most column-family stores are derived
from Google BigTable (Chang et al., 2008), in which the data are stored in a
column-oriented way. In BigTable, the data set consists of several rows. Each row is
addressed by a primary key and is composed of a set of column families. Note that
different rows can have different column families. Representative column-family
stores include Apache HBase,22 which directly implements the Google BigTable
concepts. According to Grolinger et al. (2013), there is one type of column-family
store, say Amazon SimpleDB (Stein & Zachrias, 2010) and DynamoDB (DeCandia
18 http://neo4j.org/.
19 http://www.dydra.com.
20 https://github.com/thinkerpop/gremlin/wiki/linkeddatasail.
21 https://github.com/claudiomartella/dbpedia4neo.
22 https://hbase.apache.org/.
24 1 RDF Data and Management
et al., 2007), in which only a set of columns name value pairs is contained in each
row, without having column families. In addition, Cassandra (Lakshman & Malik,
2010) provides the additional functionality of super columns, which are formed by
grouping various columns together. Document stores provide another derivative of
the key-value store data model that uses keys to locate documents inside the data
store. Most document stores represent documents using JSON (JavaScript Object
Notation) or some format derived from it. Typically, CouchDB23 and the Couchbase
server24 use the JSON format for data storage, whereas MongoDB25 stores data in
BSON (Binary JSON). Graph databases use graphs as their data model, and a graph
is used to represent a set of objects, known as vertices or nodes, and the links (or
edges) that interconnect these vertices.
Illustrative representations of these NoSQL models were presented by Grolinger
et al. (2013) and are shown in Fig. 1.5.
Actually, massive RDF data-management merits the use of big-data infrastruc-
ture because of the scalability and high performance of cloud data management.
A number of efforts have been made to develop RDF data-management systems
based on NoSQL systems. SimpleDB by Amazon was used as a back end to store
RDF data quickly and reliably for massive parallel access (Stein & Zachrias, 2010).
Cloud-based key-value stores (e.g. BigTable) were used by Gueret et al. (2011), and
a robust query engine was developed over these key-value stores. In addition, there is
a new RDF store on the block, called SPARQLcity,26 which is a Hadoop-based graph
analytical engine for performing rich business analytics on RDF data with SPARQL.
SPARQLcity is the first just in time compiled engine in SPARQL query execution.
Given the fact that NoSQL systems offer either no support or only high latency
support (MapReduce) for effective join processing, however, SPARQL queries with
many joins, which is the mainstay, run into big problems on such systems. The normal
NoSQL APIs (Application Programming Interfaces) that are centered on individual
key lookup (whether one looks up a value, column-family, or document) simply have
too high latency if one has to join tens of thousands (or billions) of RDF triples.
Several NoSQL systems for RDF data were investigated by Cudre-Mauroux et al.
(2013), including document stores (e.g. CouchDB27 ), key-value/column stores (e.g.
Cassandra28 and HBase29 ), and query compilation for Hadoop (e.g. Hive30 ). Major
characteristics of these four NoSQL systems are described (Cudre-Mauroux et al.,
2013). First, Apache HBase is an open-source, horizontally scalable, row-consistent,
low-latency, and random-access data store. HBase uses HDFS as a storage back end
23 https://couchdb.apache.org/.
24 https://www.couchbase.com/products/server.
25 https://www.mongodb.com/.
26 https://www.hugedomains.com/domain_profile.cfm?d=sparqlcity.com.
27 https://couchdb.apache.org/.
28 https://cassandra.apache.org/_/index.html.
29 https://hbase.apache.org/.
30 https://hive.apache.org/query.
1.4 RDF Data Store 25
Key_5 Value_2
Column-Family-1
Key_6 Value_1 Column Column Column
ROW Name-3 Name-3 Name-3
Key_7 Value_4 KEY-2 Column Column Column
Name-3 Name-3 Name-3
Key_8 Value_3
Document-2
Document_ld-2
Document-3
Document_ld-3
Key-Value
Document-4 Node2
Document_ld-4
Fig. 1.5 Different types of NoSQL data model (Grolinger et al., 2013)
and Apache ZooKeeper31 to provide support for coordination tasks and fault toler-
ance. HBase is a column-oriented distributed NoSQL database system. Its data model
is a sparse, multi-dimensional sorted map. Here, columns are grouped into column
families, and timestamps add an additional dimension to each cell. HBase is well inte-
grated with Hadoop, which is a large-scale MapReduce computational framework.
The second HBase implementation uses Apache Hive, a SQL-like data-warehousing
tool that enables querying using MapReduce. Third, Couchbase32 is a document-
oriented, schema-less distributed NoSQL database system with native support for
JSON documents. Couchbase is intended to run mostly in-memory and on as many
nodes as needed to hold the whole data set in RAM (random-access memory). It has a
built-in object-managed cache to speedup random reads and writes. Updates to docu-
ments are first made in the in-memory cache and are only later processed to disk using
an eventual consistency paradigm. Finally, Apache Cassandra is a NoSQL database
31 https://zookeeper.apache.org/.
32 https://www.couchbase.com/.
26 1 RDF Data and Management
RDF triples) are represented as graph edges. Each RDF entity is represented as a
graph node with a unique id and stored as a key-value pair in the Trinity memory
cloud. Formally, a key-value pair (node-id, <in-adjacency-list, out-adjacency-list>)
consists of the node-id as the key and the node research directions to explore in
managing voluminous adjacency list as the value. The adjacency list is divided into
two lists: one for neighbors with incoming edges and the other for neighbors with
outgoing edges. Each element in the adjacency list is a (predicate, node-id) pair,
which records the id of the neighbor and the predicate on the edge.
1.5 Summary
The RDF is increasingly being adopted for modeling data in various application
domains and has become a cornerstone for publishing, exchanging, sharing, and
interrelating data on the Web. The goal of this chapter is to give an overview of the
basics of the theory of RDF data and management. We start by providing a formal
definition of RDF that includes the features that distinguish this model from other
graph data models. We then move into the fundamental issue of querying RDF data.
We study the RDF query language SPARQL, which is a W3C Recommendation since
January 2008. We provide an algebraic syntax and a compositional semantics for this
language. We furthermore focus on RDF data storage and present a full up-to-date
overview of the current state of the art in RDF data storage strategy, including RDF
data stores in traditional databases and RDF data stores in NoSQL databases.
However, the traditional database models and RDF feature limitations, mainly
with what can be said about fuzzy information that is commonly found in many
application domains. In order to provide the necessary means to handle and manage
such information there are today a huge number of proposals for fuzzy extensions
to database models and RDF. In particular, Zadeh’s fuzzy set theory (Zadeh, 1965)
has been identified as a successful technique for modelling the fuzzy information in
many application areas, especially in the databases and RDF. In the next chapter, we
will briefly introduce the fuzzy set theory and fuzzy database models.
References
Abadi, D. J., Marcus, A., Madden, S., & Hollenbach, K. (2007). Scalable semantic web data manage-
ment using vertical partitioning. In Proceedings of the 33th International Conference on Very
Large Data Bases (pp. 411–422).
Abadi, D. J., Marcus, A., Madden, S., & Hollenbach, K. (2009). SW-store: A vertically partitioned
DBMS for semantic web data management. VLDB Journal, 18(2), 385–406.
Ali, W., Saleem, M., Yao, B., Hogan, A., & Ngomo, A. C. N. (2021). A survey of RDF stores &
SPARQL engines for querying knowledge graphs. The VLDB Journal, 1–26.
Angles, R., & Gutierrez, C. (2005). Querying RDF data from a graph database perspective. In
Proceedings of the Second European Semantic Web Conference (pp. 346–360).
28 1 RDF Data and Management
Angles, R., & Gutierrez, C. (2008). Survey of graph database models. ACM Computing Surveys,
40, 1:1–1:39.
Arenas, M., Gutierrez, C., & Pérez, J. (2009, August). Foundations of RDF databases. In Reasoning
Web International Summer School (pp. 158–204). Springer.
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific American, 284(5),
34–43.
Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z., & Velkov, R. (2011). OWLIM: A
family of scalable semantic repositories. Semantic Web, 2(1), 1–10.
Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data—The story so far. International Journal
of Semantic Web and Information Systems, 5(3), 1–22.
Bönström, V., Hinze, A., & Schweppe, H. (2003). Storing RDF as a graph. In Proceedings of the
First Conference on Latin American Web Congress, 27–36.
Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., & Bhat-
tacharjee, B. (2013). Building an efficient RDF store over a relational database. In Proceedings
of the 2013 ACM International Conference on Management of Data (pp. 121–132).
Brickley, D., & Guha, R. V. (2004). RDF Vocabulary Description Language 1.0: RDF Schema,
W3C Recommendation.
Broekstra, J., Kampman, A., & van Harmelen, F. (2002). Sesame: a generic architecture for storing
and querying RDF and RDF schema. In Proceedings of the 2002 International Semantic Web
Conference (pp. 54–68).
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes,
A., & Gruber, R. E. (2008). BigTable: A distributed storage system for structured data. ACM
Transactions on Computer Systems 26(2), 4:1–4:26.
Chao, C.-M. (2007a). An object-oriented approach for storing and retrieving RDF/RDFS documents.
Tamkang Journal of Science and Engineering, 10(3), 275–286.
Chao, C.-M. (2007b). An object-oriented approach to storage and retrieval of RDF/XML documents.
In Proceedings of the 19th International Conference on Software Engineering & Knowledge
Engineering (pp. 586–591).
Chebotko, A., Abraham, J., Brazier, P., Piazza, A., Kashlev, A., & Lu, S. (2013). Storing, indexing
and querying large provenance data sets as RDF graphs in Apache HBase. In Proceedings of
IEEE Ninth World Congress on Services (pp. 1–8).
Choi, P., Jung, J., & Lee, K.-H. (2013). RDFChain: Chain centric storage for scalable join processing
of RDF graphs using MapReduce and HBase. In Proceeding of the 2013 International Semantic
Web Conference (pp. 249–252).
Ciaccia, P., Patella, M., & Zezula, P. (1997, August). M-tree: An efficient access method for similarity
search in metric spaces. In Vldb (Vol. 97, pp. 426–435).
Cudre-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., Keppmann, F. L.,
Miranker, D. P., Sequeda, J. F., & Wylot, M. (2013). NoSQL databases for RDF: An empirical
evaluation. In Proceedings of the 12th International Semantic Web Conference (pp. 310–325).
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubrama-
nian, S., Vosshall, P., & Vogels, W. (2007). Dynamo: Amazon’s highly available key-value store.
In Proceedings of the 21st ACM Symposium on Operating Systems Principles (pp. 205–220).
De Virgilio, R. (2017). Smart RDF data storage in graph databases. In 2017 17th IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (pp. 872–881).
IEEE.
Duan, S., Kementsietsidis, A., Srinivas, K., & Udrea, O. (2011). Apples and oranges: A compar-
ison of RDF benchmarks and real RDF datasets. In Proceedings of the 2011 ACM SIGMOD
International Conference on Management of Data (pp. 145–156).
Erling, O., & Mikhailov, I. (2007). RDF support in the Virtuoso DBMS. In Proceedings of the 1st
Conference on Social Semantic Web (pp. 59–68).
Erling, O., & Mikhailov, I. (2009). Virtuoso: RDF support in a native RDBMS. In R. De Virgilio, F.
Giunchiglia, & L. Tanca (Eds.), Semantic Web Information Management (pp. 501–519). Springer.
References 29
Franke, C., Morin, S., Chebotko, A., Abraham, J., & Brazier, P. (2011). Distributed semantic web
data management in HBase and MySQL Cluster. In Proceedings of the 2011 IEEE International
Conference on Cloud Computing (pp. 105–112).
Grolinger, K., Higashino, W. A., Tiwari, A., & Capretz, M. A. M. (2013). Data management in
cloud environments: NoSQL and NewSQL data stores. Journal of Cloud Computing: Advances
Systems and Applications, 2, 22.
Gueret, C., Kotoulas, S., & Groth, P. (2011). TripleCloud: an infrastructure for exploratory querying
over web-scale RDF data. In Proceedings of the 2011 IEEE/WIC/ACM International Joint
Conference on Web Intelligence and Intelligent Agent Technology—Workshops (pp. 245–248).
Gutierrez, C., Hurtado, C. A., Mendelzon, A. O., & Pérez, J. (2011). Foundations of semantic web
databases. Journal of Computer and System Sciences, 77(3), 520–541.
Harris, S., & Gibbins, N. (2003). 3store: efficient bulk RDF storage. In Proceedings of the First
International Workshop on Practical and Scalable Semantic Systems
Harris, S., Lamb, N., & Shadbolt, N. (2009). 4store: The design and implementation of a clus-
tered RDF store. In Proceedings of the 5th International Workshop on Scalable Semantic Web
Knowledge Base Systems (pp. 94–109).
Harris, S., & Shadbolt, N. (2005). SPARQL query processing with conventional relational database
systems. In Proceedings of the International Workshop on Scalable Semantic Web Knowledge
Base Systems (pp. 235–244).
Hartig, O. (2014). Reconciliation of RDF and Property Graphs. arXiv preprint arXiv:1409.3288
Hassanzadeh, O., Kementsietsidis, A., & Velegrakis, Y. (2012). Data management issues on the
semantic web. In Proceedings of the 2012 IEEE International Conference on Data Engineering
(pp. 1204–1206).
Hayes, J., & Gutierrez, C. (2004). Bipartite graphs as intermediate model for RDF. In Proceedings
of the 2004 International Semantic Web Conference (pp. 47–61).
Hayes, P. (2004). RDF Semantics, W3C Recommendation. http://www.w3.org/TR/rdf-mt/
Hu, X., Dang, D., Yao, Y., & Ye, L. (2017). Natural language aggregate query over Rdf data.
Information Sciences, 454–455, 363–381.
Khadilkar, V., Kantarcioglu, M., Thuraisingham, B. M., & Castagna, P. (2012). Jena-HBase: A
distributed, scalable and efficient RDF triple store. In Proceedings of the 2012 International
Semantic Web Conference.
Lakshman, A., & Malik, P. (2010). Cassandra: A decentralized structured storage system. ACM
SIGOPS Operating System Review, 44(2), 35–40.
Levandoski, J. J., & Mokbel, M. F. (2009). RDF data-centric storage. In Proceedings of the 2009
IEEE International Conference on Web Services (pp. 911–918).
Libkin, L., Reutter, J. L., & Vrgoc, D. (2013). Trial for RDF: Adapting graph query languages for
RDF data. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles
of Database Systems (pp. 201–212).
Luo, Y., Picalausa, F., Fletcher, G. H. L., Hidders, J., & Vansummeren, S. (2012). Storing and
indexing massive RDF datasets. In R. De Virgilio, F. Guerra, & Y. Velegrakis (Eds.), Semantic
Search Over the Web (pp. 31–60). Springer.
Ma, R., Jia, X., Cheng, J., & Angryk, R. A. (2016a). SPARQL queries on RDF with fuzzy constraints
and preferences. Journal of Intelligent & Fuzzy Systems, 30(1), 183–195.
Ma, Z., Capretz, M. A., & Yan, L. (2016b). Storing massive resource description framework (RDF)
data: A survey. The Knowledge Engineering Review, 31(4), 391–413.
Ma, Z., Lin, X., Yan, L., & Zhao, Z. (2018). RDF keyword search by query computation. Journal
of Database Management (JDM), 29(4), 1–27.
Ma, Z. M., Capretz, M. A. M., & Yan, L. (2016c). Storing massive resource description framework
(RDF) data: A survey. Knowledge Engineering Review, 31(4), 391–413.
Manola, F., & Miller, E. (2004). RDF Primer, W3C Recommendation. http://www.w3.org/TR/2004/
REC-rdf-primer-20040210/.
30 1 RDF Data and Management
Sun, J. L., & Jin, Q. (2010). Scalable RDF store based on HBase and MapReduce. In Proceedings
of the 3rd International Conference Advanced Computer Theory and Engineering (pp. V1-633–
V1-636).
Wang, Y., Du, X. Y., Lu, J. H., & Wang, X. F. (2010). FlexTable: using a dynamic relation model
to store RDF data. In Proceedings of the 15th International Conference on Database Systems for
Advanced Applications (pp. 580–594).
Weiss, C., Karras, P., & Bernstein, A. (2008). Hexastore: Sextuple indexing for semantic web data
management. Proceedings of the VLDB Endowment, 1(1), 1008–1019.
Wilkinson, K. (2006). Jena property table implementation. Technical Report HPL-2006-140, HP
Labs.
Wilkinson, K., Sayers, C., Kuno, H. A., & Reynolds, D. (2003). Efficient RDF storage and retrieval
in Jena2. In Semantic Web and Databases Workshop (pp. 131–150).
Wu, G., Li, J., & Wang, K. (2008, April). System II: A hypergraph based native RDF repository. In
Proceedings of the 17th international Conference on World Wide Web (pp. 1035–1036).
Wolff, B. G. J., Fletcher, G. H. L., & Lu, J. J. (2015). An extensible framework for query optimization
on TripleT-based RDF stores. In Proceedings of the Workshops of the EDBT/ICDT 2015 Joint
Conference (pp. 190–196).
Yan, L., Ma, R., Li, D., & Cheng, J. (2017). RDF approximate queries based on semantic similarity.
Computing, 99(5), 481–491.
Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., & Pan, Y. (2009). Efficient indices using graph parti-
tioning in RDF triple stores. In 2009 IEEE 25th International Conference on Data Engineering
(pp. 1263–1266).
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
Zeng, K., Yang, J. C., Wang, H. X., Shao, B., & Wang, Z. Y. (2013). A distributed graph engine for
web scale RDF data. Proceedings of the VLDB Endowment, 6(4), 265–276.
Zou, L., Mo, J., Chen, L., Özsu, M. T., & Zhao, D. (2011). gStore: Answering SPARQL queries
via subgraph matching. Proceedings of the VLDB Endowment, 4(8), 482–493.
Chapter 2
Fuzzy Sets and Fuzzy Database Modeling
2.1 Introduction
to invent some new fuzzy data models like semi-structured and graph data models.
Being one kind of special graph data model, RDF recommended by the W3C is
finding more and more uses in a wide range of semantic data management scenarios.
To represent and deal with fuzziness in RDF data, few efforts have proposed several
fuzzy RDF models. The elementary construct of RDF model is a triple with format
(subject, predicate, object), which encodes the binary relation predicate between
subject and object, representing a single knowledge fact. The most common fuzzy
RDF model contains the triples associated with membership degrees (Manolis &
Tzitzikas, 2011; Straccia, 2009). Here the fuzzy RDF triples represent the fuzziness
of a triple-level granularity and it is hard to exactly know the fuzziness of one triple’s
components. To tackle this, a kind of fuzzy RDF models are proposed in Ma et al.
(2018), in which the fuzziness of triples can appear in triple’s components. Based
on such a fuzzy RDF model with fine granularity of fuzziness, few recent efforts
investigate fuzzy RDF graph matching (Li et al., 2019a, 2019b, 2019c) and fuzzy
RDF graph storage (Fan et al., 2019, 2020; Ma et al., 2018).
In this chapter, we mainly introduce several fuzzy database models, including
fuzzy XML model, fuzzy relational and fuzzy object-oriented database models. These
models can be used for mapping to and from the fuzzy RDF models in order to
realize the fuzzy data management in many areas, such as database and Web-based
application domains. Before that, we briefly introduce some notions of fuzzy set
theory.
There are different categories of data quality (or the lack thereof) to be handled. Some
efforts try to identify and distinguish different types and sources of imperfect infor-
mation. In Parsons (1996): imperfect information can be imprecise, vague, inconsis-
tent, incomplete and/or uncertain. Bosc and Prade (1993) identify five basic kinds of
imperfection: inconsistency, imprecision, vagueness, uncertainty, and ambiguity. In
the following, we give explanations to the meanings of imperfect information.
(a) Inconsistency stands for a kind of semantic conflict, which means that the same
aspect of a real-world entity is irreconcilably represented more than once in data
resource(s). For example, the height value of one person is recorded as several
values with different scales (say, 1.78 m, 178.40 cm and 5.85 ft).
(b) Imprecision means that we must make a choice from a given range of values
without knowing which one should be chosen. This range is basically repre-
sented by an interval or a set of values. For example, we do not know the exact
height value of one person but know that it must be one of several values (say,
1.77 m, 178 m and 179 m).
(c) Vagueness has a similar semantics with imprecision but is generally represented
with linguistic terms. For example, “between 20 and 30 years old” and “young”
for the attribute Age are imprecise and vague values, respectively.
(d) Incompleteness means the information for which some data are missing. We
completely have no idea, for example, how tall one person is. Generally, incom-
plete information can be described by null values (Cross, 1996; Cross & Firat,
2000).
(e) Uncertainty means that we apportion some (maybe not all) of our belief to a value
or a group of values, which is related to the degree of truth. For example, a possi-
bility degree of 95% is assigned to the height value (say 1.78 m) of one person.
Note that this paper concentrates on subjective uncertainty described with
possibility theory rather than stochastic uncertainty described with probability
theory.
(f) Ambiguity means that some elements of a model lack complete semantics,
which can lead to several possible interpretations. For example, a length value 3
without necessary semantics may be interpreted a time length, a distance length
and so on. If it is a time length, it may be interpreted 3 days, 3 h, 3 min, or 3 s.
In general, several different kinds of imperfect information can co-exist with
respect to the same piece of information. In addition, imprecise values generally
denote a set of values in the form of [ai1, ai2, …, aim] or [ai1, ai2] for the discrete
and continuous universe of discourse, respectively, meaning that exactly one of the
values is the true value for the single-valued attribute, or at least one of the values is
the true value for the multivalued attribute. So, imprecise information here has two
interpretations: disjunctive information and conjunctive information.
Null values, which are originally called incomplete information, have several
possible interpretations: (a) “existing but unknown”, (b) “nonexisting” or
36 2 Fuzzy Sets and Fuzzy Database Modeling
“inapplicable”, (c) “no information” and (d) “open null value” (Gottlob & Zicari,
1988), which means that the value may do not exist, exactly one unknown value, or
several unknown values. An imprecise value can be considered as a particular case
of the null value with the semantics of “existent but unknown” (i.e., an applicable
null value), where the range of values that an imprecise value takes is restricted to a
given set or interval of values while the range of values that an applicable null value
takes corresponds to the corresponding universe of discourse.
The notion of a partial value is illustrated as follows (Grant, 1979). A partial value
on a universe of discourse U corresponds to a finite set of possible values in which
exactly one of the values in the set is the true value, denoted by {a1 , a2 , …, am } for
discrete U or [a1 , an ] for continua U, in which {a1 , a2 , …, am } ⊆ U or [a1 , an ] ⊆ U.
Let η be a partial value, then sub (η) and sup (η) are used to represent the minimum
and maximum in the set.
Note that crisp data can also be viewed as special cases of partial values. A crisp
data on discrete universe of discourse can be represented in the form of {p}, and a
crisp data on continua universe of discourse can be represented in the form of [p, p].
Moreover, a partial value without containing any element is called an empty partial
value, denoted by ⊥. In fact, the symbol ⊥ means an inapplicable missing data (Codd,
1986, 1987). Null values, partial values, and crisp values are thus represented with a
uniform format.
The fuzzy set was originally presented by Zadeh (1965). Since then fuzzy set has
been infiltrating into almost all branches of pure and applied mathematics that are
set-theory-based. This has resulted in a vast number of real applications crossing
over a broad realm of domains and disciplines. Over the years, many of the existing
approaches dealing with imprecision and uncertainty are based on the theory of fuzzy
sets.
Let U be a universe of discourse. A fuzzy value on U is characterized by a fuzzy
set F in U. A membership function
μ F : U → [0, 1]
is defined for the fuzzy set F, where μF (u), for each u ∈ U, denotes the degree
of membership of u in the fuzzy set F. For example, μF (u) = 0.8 means that u is
“likely” to be an element of F by a degree of 0.8. For ease of representation, a fuzzy
set F over universe U is organized into a set of ordered pairs:
Kernel: The set of the elements that completely belong to F is called the kernel of
F, denoted by
Cut: The set of the elements which degrees of membership in F are greater than
(greater than or equal to) α, where 0 ≤ α < 1 (0 < α ≤ 1), is called the strong (weak)
α-cut of F, respectively denoted by
∀u ∈ U, μ A (u) = 1 − μ A (u).
Based on these definitions, the difference of the fuzzy sets B and A can be defined
as:
B − A = B ∩ A.
Also, most of the properties that hold for classical set operations, such as
DeMorgan’s Laws, have been shown to hold for fuzzy sets. The only law of ordinary
set theory that is no longer true is the law of the excluded middle, i.e.,
A ∩ A /= ∅ and A ∪ A /= U.
μ A1×...×An (u 1 . . . u n ) = min(μ A1 (u 1 ), . . . , μ An (u n ))
and ui ∈ U i , i = 1, …, n.
2.2 Imperfect Information and Fuzzy Sets 39
Example 2.1 Let V = {a, b, c}. Define the fuzzy set σ on V as σ (a) = 0.5, σ (b) =
1 and σ (c) = 0.8. Define a fuzzy set μ of E such that μ(ab) = 0.5, μ(bc) = 0.7 and
μ(ac) = 0.1. Then μ(x, y) ≤ σ (x) ∧ σ (y) for all x, y ∈ V. Thus, G = (σ, μ) is a fuzzy
graph. If we redefine μ(ab) = 0.6, then it is no longer a fuzzy graph.
Let G = (σ, μ) be a fuzzy graph. Then a fuzzy graph G' = (σ ', μ' ) is called a
partial fuzzy subgraph of G if σ ' ⊆ σ and μ' ⊆ μ. Similarly, the fuzzy graph G' =
(σ ', μ' ) is called a fuzzy subgraph of G induced by P if P ⊆ V, σ ' (u) = σ (u) for
every u ∈ P and μ' (e) = μ(e) for every e ∈ E. We write <P> to denote the fuzzy
subgraph induced by P.
Example 2.2 Let G1 = (σ, μ), where σ * = {a, b, c} and μ* = {ab, bc} with σ (a)
= 0.4, σ (b) = 0.8, σ (c) = 0.5, μ(ab) = 0.3 and μ(bc) = 0.2. Then clearly G1 is a
40 2 Fuzzy Sets and Fuzzy Database Modeling
partial fuzzy subgraph of the fuzzy graph in Example 2.1. Also, if P = {a, b} and H
= (σ, μ), where σ (a) = 0.5, σ (b) = 1 and μ(ab) = 0.5, then H is the induced fuzzy
subgraph of G in Example 2.1, induced by P.
Let G = (σ, μ) be a fuzzy graph. Then a partial fuzzy subgraph G' = (σ ', μ' ) of
G is said to span G if σ ' = σ and μ' ⊆ μ; that is if σ ' (u) = σ (u) for every u ∈ V and
μ' (e) ≤ μ(e) for every e ∈ E. In this case, we call G' = (σ ', μ' ) a spanning fuzzy
subgraph of G.
In fact a fuzzy subgraph G' = (σ ', μ' ) of a fuzzy graph G = (σ, μ) induced by a
subset P of V is a particular partial fuzzy subgraph of G. Take σ ' (u) = σ (x) for all u
∈ P and 0 for all u ∈/ P. Similarly, take μ' (u1 , u2 ) = μ(u1 , u2 ) if (u1 , u2 ) is in a set of
edges involving elements from P, and 0 otherwise.
Let G: (σ, μ) and G': (σ ', μ' ) be the fuzzy graphs with underlying sets V, homo-
morphism of fuzzy graphs (Holub & Melichar, 1998) h: G → G' is a map h: V →
V' which satisfies σ (x) ≤ σ ' (h(x)) ∀x ∈ V and μ(x, y) ≤ μ' (h(x), h(y)) ∀x, y ∈ V.
We mainly discuss the concepts of fuzzy path and fuzzy bridges in this subsection.
Most of the results are due to the works (Sunitha & Vijayakumar, 2005; Mathew et al.,
2018).
Let G: (σ, μ) be a fuzzy graph. If μ (x, y) > 0, then x and y are called neighbors.
Then x and y lie on the edge e = (x, y). A path p in a fuzzy graph G: (σ, μ) is a
sequence of distinct vertices v0 , v1 , v2 , …, vn such that μ(vi-1 , vi ) > 0, 1 ≤ i ≤ n. Here
‘n’ is called the length of the path. The consecutive pairs (vi-1 , vi ) are called arcs of
the path. The diameter of x, y ∈ V, written diam(x, y), is the length of the longest
path joining x to y. The strength of P is defined to be ∧i=1 n
μ(xi−1 xi ). In words, the
strength of a path is defined to be the weight of the weakest edge. We denote the
strength of a path P by d(P). The strength of connectedness between two vertices x
and y is defined as the maximum of the strengths of all paths between x and y and
is denoted by μ∞ (x, y). A strongest path joining any two vertices x, y has strength
μ∞ (x, y). Two vertices that are joined by a path are called connected. It follows that
this notion of connectedness is an equivalence relation. The equivalence classes of
vertices under this equivalence relation are called connected components of the given
fuzzy graph. They are just its maximal connected partial fuzzy subgraphs.
In order to manage fuzzy data in the databases, fuzzy set theory has been exten-
sively applied to extend various database models and resulted in numerous contri-
butions, mainly with respect to the popular relational model or to some related form
of it. In general, several basic approaches can be classified: (i) one of fuzzy rela-
tional databases is based on possibility distribution (Chaudhry et al., 1999; Prade &
Testemale, 1984; Umano & Fukami, 1994); (ii) the other one is based on the use
of similarity relation (Buckles & Petry, 1982), proximity relation (De et al., 2001;
Shenoi & Melton, 1999), resemblance relation (Rundensteiner & Bic, 1992), or fuzzy
2.3 Fuzzy Relational Database Models 41
relation (Raju & Majumdar, 1988); (iii) another possible extension is to combine
possibility distribution and similarity (proximity or resemblance) relation (Chen
et al., 1992; Ma & Mili, 2002; Ma et al., 2000). Currently, some major questions
have been discussed and answered in the literature of the fuzzy relational databases,
including representations and models, semantic measures and data redundancies,
query and data processing, data dependencies and normalizations, implementation,
and etc. For a comprehensive review of what has been done in the development
of fuzzy relational databases, please refer to Chen (1999), Ma and Yan (2008), Ma
(2005b), Petry (1996), Yazici and George (1999). In this section, we briefly introduce
some basic notions of fuzzy relational databases based on possibility distributions.
A relation is a two-dimensional table and its rows and columns are called tuples
and attribute values, respectively. So, a relation is a set of tuples and a tuple consists
of attribute values. A relation has its relational schema, which is a set of attributes.
Each attribute corresponds to a range of values that this attribute can take and this
range is called the domain of the attribute.
Basically, a fuzzy relational database (FRDB) is based on the notions of fuzzy
relational schema, fuzzy relational instance, tuple, key, and constraints, which are
introduced briefly as follows:
• A fuzzy relational database consists of a set of fuzzy relational schemas and a set
of fuzzy relational instances (i.e., simply fuzzy relations).
• The set of fuzzy relational schemas specifies the structure of the data held in
a database. A fuzzy relational schema consists of a fixed set of attributes with
associated domains. The information of a domain is implied in forms of schemas,
attributes, keys, and referential integrity constraints.
• The set of fuzzy relations, which is considered to be an instance of the set of fuzzy
relation schemas, reflects the real state of a database. Formally, a fuzzy relation
is a two-dimensional array of rows and columns, where each column represents
an attribute and each row represents a tuple.
• Each tuple in a table denotes an individual in the real world identified uniquely
by primary key, and a foreign key is used to ensure the data integrity of a table. A
column (or columns) in a table that makes a row in the table distinguishable from
other rows in the same table is called the primary key. A column (or columns) in a
table that draws its values from a primary key column in another table is called the
foreign key. As is generally assumed in the literature, we assume that the primary
key attribute is always crisp and all fuzzy relations are in the third normal form.
• An integrity constraint in a schema is a predicate over relations expressing a
constraint; by far the most used integrity constraint is the referential integrity
constraint. A referential integrity constraint involves two sets of attributes S 1 and
S 2 in two relations R1 and R2 , such that one of the sets (say S 1 ) is a key for one of
the relations (called primary key). The other set is called a foreign key if R2 [S 2 ]
is a subset of R1 [S 1 ]. Referential integrity constraints are the glue that holds the
relations in a database together.
In summary, in a fuzzy relational database, the structure of the data is represented
by a set of fuzzy relational schemas, and data are stored in fuzzy relations (i.e.,
42 2 Fuzzy Sets and Fuzzy Database Modeling
tables). Each table contains rows (i.e., tuples) and columns (i.e., attributes). Each
tuple is identified uniquely by the primary key. The relationships among relations
are represented by the referential integrity constraints, i.e., foreign keys. Moreover,
here, two types of fuzziness are considered in fuzzy relational databases, one is the
fuzziness of attribute values (i.e., attributes may be fuzzy), which may be represented
by possibility distributions; another is the fuzziness of a tuple being a member of
the corresponding relation, which is represented by a membership degree associated
with the tuple.
Formally, a fuzzy relational database FRDB = <FS, FR> consists of a set of fuzzy
relational schemas FS and a set of fuzzy relations FR, where:
• Each fuzzy relational schema FS can be represented formally as FR (Al /D1 , A2 /D2 ,
…, An /Dn , μFR /DFR ), which denotes that a fuzzy relation FR has attributes Al , A2 ,
…, An and μFR with associated data types D1 , D2 , …, Dn and DFR . Here, μFR is
an additional attribute for representing the membership degree of a tuple to the
fuzzy relation.
• Each fuzzy relation FR on a fuzzy relational schema FR (Al /D1 , A2 /D2 , …, An /Dn ,
μFR /DFR ) is a subset of the Cartesian product of Dom (A1 ) × Dom (A2 ) ×…×
Dom (An ) × Dom (μFR ), where Dom (Ai ) may be a fuzzy subset or even a set
of fuzzy subset and Dom (μFR ) ∈ (0, 1]. Here, Dom (Ai ) denotes the domain of
attribute Ai , and each element of the domain satisfies the constraint of the datatype
Di . Formally, each tuple in FR has the form t = <πA1 , πA2 , …, πAi , …, πAn , μFR >,
where the value of an attribute Ai may be represented by a possibility distribution
πAi , and μFR ∈ (0, 1].
Moreover, a resemblance relation Res on Dom (Ai ) is a mapping: Dom (Ai ) ×
Dom (Ai ) → [0, 1] such that (i) for all x in Dom (Ai ), Res (x, x) = 1 (reflexivity) (ii)
for all x, y in Dom (Ai ), Res (x, y) = Res (y, x) (symmetry).
To provide the intuition on the fuzzy relational database we show an example. The
following gives a fuzzy relational database modeling parts of the reality at a company,
including fuzzy relational schemas in Table 2.1 and fuzzy relations in Table 2.2. The
detailed introduction is as follows:
• The attribute underlined stands for primary key PK. The foreign key (FK) is
followed by the parenthesized relation called referenced relation. A relation can
have several candidate keys from which one primary key, denoted PK, is chosen.
• An ‘f ’ next to an attribute means that the attribute is fuzzy.
• In Table 2.1, there are the inheritance relationships Chief-Leader “is-a” Leader
and Young-Employee “is-a” Employee. There is a 1-many relationship between
Department and Young-Employee. The relation Supervise is a relationship rela-
tion, and there is many-many relationship between Chief-Leader and Young-
Employee.
• Note that, a relation is different from a relationship. A relation is essentially a
table, and a relationship is a way to correlate, join, or associate the two tables.
2.4 Fuzzy Object-Oriented Database Models 43
In some real-world applications (e.g., CAD/CAM, multimedia and GIS), they charac-
teristically require the modeling and manipulation of complex objects and semantic
relationships. It has been proved that the object-oriented paradigm lends itself
extremely well to the requirements. Since classical relational database model and
its extension of fuzziness do not satisfy the need of modeling complex objects with
imprecision and uncertainty, currently many researches have been concentrated on
fuzzy object-oriented database models in order to deal with complex objects and
uncertain data together. Zicari and Milano (1990) first introduced incomplete infor-
mation, namely, null values, where incomplete schema and incomplete objects can
be distinguished. From then on, the incorporation of imprecise and uncertain infor-
mation in object-oriented databases has increasingly received attention. A fuzzy
object-oriented database model was defined in Bordogna and Pasi (2001) based on
the extension of a graphs-based object model. Based on similarity relationship, uncer-
tainty management issues in the object-oriented database model were discussed in
George et al. (1996). Based on possibility theory, vagueness and uncertainty were
represented in class hierarchies in Dubois et al. (1991). In more detail, also based
on possibility distribution theory, Ma et al. (2004) introduced fuzzy object-oriented
database models, some major notions such as objects, classes, objects-classes rela-
tionships and subclass/superclass relationships were extended under fuzzy infor-
mation environment. Moreover, other fuzzy extensions of object-oriented databases
were developed. In Marín et al. (2000, 2001), fuzzy types were added into fuzzy
object-oriented databases to manage vague structures. The fuzzy relationships and
fuzzy behavior in fuzzy object-oriented database models were discussed in Cross
(2001), Gyseghem and Caluwe (1995). Several intelligent fuzzy object-oriented
database architectures were proposed in Koyuncu and Yazici (2003), Ndouse (1997),
44 2 Fuzzy Sets and Fuzzy Database Modeling
Ozgur et al. (2009). The other efforts on how to model fuzziness and uncertainty in
object-oriented database models were done in Lee et al. (1999), Majumdar et al.
(2002), Umano et al. (1998). The fuzzy and probabilistic object bases (Cao &
Rossiter, 2003; Nam et al., 2007), fuzzy deductive object-oriented databases (Yazici
and Koyuncu, 1997), and fuzzy object-relational databases (Cubero et al., 2004)
were also developed. In addition, an object-oriented database modeling technique
was proposed based on the level-2 fuzzy sets in de Tré and de Caluwe (2003), where
the authors also discussed how the object Data Management Group (ODMG) data
model can be generalized to handle fuzzy data in a more advantageous way. Also,
the other efforts have been paid on the establishment of consistent framework for a
fuzzy object-oriented database model based on the standard for the ODMG object
2.4 Fuzzy Object-Oriented Database Models 45
data model (Cross et al., 1997). More recently, how to manage fuzziness on conven-
tional object-oriented platforms was introduced in Berzal et al. (2007). Yan and Ma
(2012) proposed the approach for the comparison of entity with fuzzy data types in
fuzzy object-oriented databases. Yan et al. (2012) investigated the algebraic opera-
tions in fuzzy object-oriented databases, and discussed fuzzy querying strategies and
gave the form of SQL-like fuzzy querying for the fuzzy object-oriented databases.
In the section, the basic notions of fuzzy object-oriented database (FOODB) models,
including fuzzy object, fuzzy class, fuzzy inheritance, and algebraic operations are
introduced.
Objects model real-world entities or abstract concepts. Objects have properties that
may be attributes of the object itself or relationships also known as associations
between the object and one or more other objects. An object is fuzzy because of a
lack of information. For example, an object representing a part in preliminary design
for certain will also be made of stainless steel, moulded steel, or alloy steel (each
of them may be connected with a possibility, say, 0.7, 0.5 and 0.9, respectively).
Formally, objects that have at least one attribute whose value is a fuzzy set are fuzzy
objects.
The fuzzy classes in fuzzy object-oriented databases are similar to the notion of the
fuzzy classes in fuzzy UML data models as introduced in Sect. 2.3.
The objects having the same properties are gathered into classes that are organized
into hierarchies. Theoretically, a class can be considered from two different view-
points (Dubois et al., 1991): (a) an extensional class, where the class is defined by
the list of its object instances, and (b) an intensional class, where the class is defined
by a set of attributes and their admissible values. In addition, a subclass defined from
its superclass by means of inheritance mechanism in the object-oriented database
(OODB) can be seen as the special case of (b) above.
Therefore, a class is fuzzy because of the following several reasons. First, some
objects are fuzzy ones, which have similar properties. A class defined by these objects
may be fuzzy. These objects belong to the class with membership degree of [0, 1].
Second, when a class is intensionally defined, the domain of an attribute may be
fuzzy and a fuzzy class is formed. For example, a class Old equipment is a fuzzy
one because the domain of its attribute Using period is a set of fuzzy values such as
long, very long, and about 20 years. Third, the subclass produced by a fuzzy class
by means of specialization and the superclass produced by some classes (in which
there is at least one class who is fuzzy) by means of generalization are also fuzzy.
46 2 Fuzzy Sets and Fuzzy Database Modeling
The main difference between fuzzy classes and crisp classes is that the boundaries
of fuzzy classes are imprecise. The imprecision in the class boundaries is caused by
the imprecision of the values in the attribute domain. In the FOODB, classes are fuzzy
because their attribute domains are fuzzy. The issue that an object fuzzily belongs
to a class occurs since a class or an object is fuzzy. Similarly, a class is a subclass of
another class with membership degree of [0, 1] because of the class fuzziness. In the
OODB, the above-mentioned relationships are certain. Therefore, the evaluations of
fuzzy object-class relationships and fuzzy inheritance hierarchies are the cores of
information modeling in the FOODB.
In the FOODB, the following four situations can be distinguished for object-class
relationships.
(a) Crisp class and crisp object. This situation is the same as the OODB, where the
object belongs or not to the class certainly. For example, the objects Car and
Computer are for a class Vehicle, respectively.
(b) Crisp class and fuzzy object. Although the class is precisely defined and has the
precise boundary, an object is fuzzy since its attribute value(s) may be fuzzy. In
this situation, the object may be related to the class with the special degree in [0,
1]. For example, the object which position attribute may be graduate, research
assistant, or research assistant professor, is for the class Faculty.
(c) Fuzzy class and crisp object. Being the same as the case in (b), the object may
belong to the class with the membership degree in [0, 1]. For example, a Ph.D.
student is for Young student class.
(d) Fuzzy class and fuzzy object. In this situation, the object also belongs to the
class with the membership degree in [0, 1].
The object-class relationships in (b), (c) and (d) above are called fuzzy object-
class relationships. In fact, the situation in (a) can be seen the special case of fuzzy
object-class relationships, where the membership degree of the object to the class is
one. It is clear that estimating the membership of an object to the class is crucial for
fuzzy object-class relationship when classes are instantiated.
In the OODB, determining if an object belongs to a class depends on if its attribute
values are respectively included in the corresponding attribute domains of the class.
Similarly, in order to calculate the membership degree of an object to the class
in a fuzzy object-class relationship, it is necessary to evaluate the degrees that the
attribute domains of the class include the attribute values of the object. However, it
should be noted that in a fuzzy object-class relationship, only the inclusion degree
of object values with respect to the class domains is not accurate for the evaluation
of membership degree of an object to the class. The attributes play different role
in the definition and identification of a class. Some may be dominant and some
not. Therefore, a weight w is assigned to each attribute of the class according to its
2.4 Fuzzy Object-Oriented Database Models 47
where SID (x, y) is used to calculate the degree that fuzzy value x includes
fuzzy value y.
Case 2: o (Ai ) is a crisp value. Then
Consider a fuzzy class Young students with attributes Age and Height, and two
objects o1 and o2 . Assume cdom (Age) = {5 − 20}, fdom (Age) = {{1.0/20, 1.0/21,
0.7/22, 0.5/23}, {0.4/22, 0.6/23, 0.8/24, 1.0/25, 0.9/26, 0.8/27, 0.6/28}, {0.6/27,
0.8/28, 0.9/29, 1.0/30, 0.9/31, 0.6/32, 0.4/33, 0.2/34}}, and dom (Height) = cdom
(Height) = [60, 210]. Let o1 (Age) = 15, o2 (Age) = {0.6/25, 0.8/26, 1.0/27, 0.9/28,
0.7/29, 0.5/30, 0.3/31}, and o2 (Height) = 182. According to the definition above,
we have
Therefore,
Now, we define the formula to calculate the membership degree of the object o to
the class C as follows, where w (Ai (C)) denotes the weight of attribute Ai to class
C.
Σn
I D(dom(Ai ), o(Ai )) × w(Ai (C))
μC (o) = i=1 Σn
i=1 w(Ai (C))
Consider the fuzzy class Young students and object o2 above. Assume w (Age
(Young students)) = 0.9 and w (Height (Young students)) = 0.2. Then
μC (o)
Σk Σm '
i=1 I D(dom(Ai ),o(Ai )) × w(Ai (C)) + j=k+1 I D(dom(Aj ),o(Aj )) × w(Aj (C))
= Σm .
i=1 w(Ai (C))
Based on the direct object-class relationship and the indirect object-class rela-
tionship, now we focus on arbitrary object-class relationship. Let C be a class with
attributes {A1 , A2 , …, Ak , Ak+1 , …, Am , Am+1 , …, An } and o be an object on attributes
{A1 , A2 , …, Ak , A' k+1 , …, A' m , Bm+1 , …, Bp }. Here attributes A' k+1 , …, and A' m are
overridden from Ak+1 , …, and Am , or Ak+1 , …, and Am are overridden from A' k+1 ,
…, and A' m . Attributes Am+1 , …, and An and Bm+1 , …, Bp are special in {A1 , A2 ,
…, Ak , Ak+1 , …, Am , Am+1 , …, An } and {A1 , A2 , …, Ak , A' k+1 , …, A' m , Bm+1 , …,
Bp }, respectively. Then we have
μC (o)
Σk Σm '
i=1 I D(dom(Ai ),o(Ai )) × w(Ai (C)) + j=k+1 I D(dom(Aj ),o(Aj )) × w(Aj (C))
= Σn
i=1 w(Ai (C))
Since an object may belong to a class with membership degree in [0, 1] in fuzzy
object-class relationship, it is possible that an object that is in a direct object-class
relationship and an indirect object-class relationship simultaneously belongs to the
subclass and superclass with different membership degrees. This situation occurs in
fuzzy inheritance hierarchies, which will be investigated in next section. Also for two
classes that do not have subclass/superclass relationship, it is possible that an object
may belong to these two classes with different membership degrees simultaneously.
This situation only arises in fuzzy object-oriented databases. In the OODB, an object
may or may not belong to a given class definitely. If it belongs to a given class, it can
only belong to it uniquely (except for the case of subclass/superclass).
The situation where an object belongs to different classes with different member-
ship degrees simultaneously in fuzzy object-class relationships is called multiple
membership of object in this paper. Now let us focus on how to handle the multiple
membership of object in fuzzy object-class relationships. Let C 1 and C 2 be (fuzzy)
classes and α be a given threshold. Assume there exists an object o. If μC1 (o) ≥ α
and μC2 (o) ≥ α, the conflict of the multiple membership of object occurs, namely,
o belongs to multiple classes simultaneously. At this moment, which one in C 1 and
C 2 is the class of object o dependents on the following cases.
Case 1: There exists a direct object-class relationship between object o and one
class in C 1 and C 2 .
Then the class in the direct object-class relationship is the class of
object o.
Case 2: There is no direct object-class relationship but only an indirect object-class
relationship between object o and one class in C 1 and C 2 , say C 1 . And
there exists such subclass C 1 ' of C 1 that object o and C 1 ' are in a direct
object-class relationship.
Then class C 1 ' is the class of object o.
50 2 Fuzzy Sets and Fuzzy Database Modeling
Assume
w (A (C 1 )) = w (A (C 2 )) = w (A (C 3 )),
w (B (C 1 )) = w (B (C 2 )), and
w (B (C 2 )) + w (D (C 2 )) = w (F (C 3 )).
Also assume μC1 (o) ≥ α, μC2 (o) ≥ α, and μC3 (o) ≥ α, where α is a given
threshold. Then object o belongs to classes C 1 , C 2 and C 3 simultaneously. The
conflict of the multiple membership of object occurs. It can be seen that the relation-
ship between o and C 1 is an indirect object-class relationship. But the relationship
between o and C 2 , which is the subclass of class C 1 , is not a direct object-class
relationship. So class C 2 is not the class of object o. It can also be seen that μC1 (o)
≥ μC2 (o) ≥ μC3 (o). So C 1 is considered as the class of object o. But in fact, there
should be a new class C with {A, B', E}, which is the class in the direct object-class
relationship of o and C. That μC1 (o) ≥ μC2 (o) ≥ μC3 (o) only means that C 1 with
{A, B} is more similar to C with {A, B', E} than C 2 with {A, B, E} and C 3 with {A,
2.4 Fuzzy Object-Oriented Database Models 51
F}. When class C is not available right now, class C 1 is considered as the class of
object o.
In the OODB, a new class, called subclass, is produced from another class, called
superclass by means of inheriting some attributes and methods of the superclass,
overriding some attributes and methods of the superclass, and defining some new
attributes and methods. Since a subclass is the specialization of the superclass, any
one object belonging to the subclass must belong to the superclass. This characteristic
can be used to determine if two classes have subclass/superclass relationship.
In the FOODB, however, classes may be fuzzy. A class produced from a fuzzy
class must be fuzzy. If the former is still called subclass and the later superclass,
the subclass/superclass relationship is fuzzy. In other words, a class is a subclass of
another class with membership degree of [0, 1] at this moment. Correspondingly, the
method used in the OODB for determination of subclass/superclass relationship is
modified as
(a) for any (fuzzy) object, if the member degree that it belongs to the subclass is
less than or equal to the member degree that it belongs to the superclass, and
(b) the member degree that it belongs to the subclass is great than or equal to the
given threshold.
The subclass is then a subclass of the superclass with the membership degree,
which is the minimum in the membership degrees to which these objects belong to
the subclass.
Let C 1 and C 2 be (fuzzy) classes and β be a given threshold. We say C 2 is a
subclass of C 1 if
Based on the inclusion degree of attribute domains of the subclass with respect
to the attribute domains of its superclass as well as the weight of attributes, we can
define the formula to calculate the degree to which a fuzzy class is a subclass of
another fuzzy class. Let C 1 and C 2 be (fuzzy) classes with attributes {A1 , A2 , …,
Ak , Ak+1 , …, Am } and {A1 , A2 , …, Ak , A' k+1 , …, A' m , Am+1 , …, An }, respectively,
and w (A) denote the weight of attribute A. Then the degree that C 2 is the subclass
of C 1 , written μ (C 1 , C 2 ), is defined as follows.
Σm
I D(dom(Ai (C1 )), dom(Ai (C2 ))) × w(Ai )
μ(C1 ,C2 ) = i=1
Σm
i=1 w(Ai )
and dom (Ai (C 2 )) are not identical, however, the conflict occurs. At this moment,
which one in Ai (C 1 )) and Ai (C 2 ) is inherited by C dependents on the following
rule:
I f I D(dom(Ai (C1 )), dom(Ai (C2 ))) × w(Ai (C1 )) > I D(dom(Ai (C2 )), dom(Ai
(C1 ))) × w(Ai (C2 )), then Ai (C1 ) is inherited by C, else Ai (C2 ) is inherited by C.
Note that in fuzzy multiple inheritance hierarchy, the subclass has different degrees
with respect to different superclasses, not being the same as the situation in classical
object-oriented database systems.
With the wide utilization of the Web and the availability of huge amounts of elec-
tronic data, information representation and exchange over the Web becomes impor-
tant, and eXtensible Markup Language (XML) has been the de facto standard (Bray
et al., 2000). XML and related standards are technologies that allow the easy develop-
ment of applications that exchange data over the Web such as e-commerce (EC) and
supply chain management (SCM). Unfortunately, although it is the current standard
for data representation and exchange over the Web, XML is not able to represent
and process imprecise and uncertain data. In fact, the fuzziness in EC and SCM has
received considerable attentions and fuzzy set theory has been used to implement
web-based business intelligence. Therefore, topics related to the modeling of fuzzy
data can be considered very interesting in the XML data context. Regarding modeling
fuzzy information in XML, Turowski and Weng (2002) extended XML DTDs with
fuzzy information to satisfy the need of information exchange. Lee and Fanjiang
(2003) studied how to model imprecise requirements with XML DTDs and devel-
oped a fuzzy object-oriented modeling technique schema based on XML. Ma and Yan
(2007) and Ma (2005a, 2005b) proposed a fuzzy XML model for representing fuzzy
information in XML documents. Tseng et al. (2005) presented an XML method to
represent fuzzy systems for facilitating collaborations in fuzzy applications. More-
over, aimed at modeling fuzzy information in XML Schemas, Gaurav and Alhajj
(2006) incorporated fuzziness in an XML document extending the XML Schema
associated to the document and mapped fuzzy relational data into fuzzy XML. In
detail, Oliboni and Pozzani (2008) proposed an XML Schema definition for repre-
senting different aspects of fuzzy information. Kianmehr et al. (2010) described a
fuzzy XML schema model for representing a fuzzy relational database. In addition,
XML with incomplete information (Abiteboul et al., 2006) and probabilistic data in
XML (Nierman & Jagadish, 2002; Senellart & Abiteboul, 2007) were presented in
research papers.
54 2 Fuzzy Sets and Fuzzy Database Modeling
Here, Poss (Z|Y ), Poss (Y|X) and Poss (X) can be obtained from the source XML
document.
2. the fuzziness in attribute values of elements: this kind of fuzziness use possibility
distributions to represent the values of the attributes. Furthermore, attributes are
classified into two types:
(a) single value attributes: some data items are known to have a single unique
value, e.g., the age of a person in years is a unique integer, and if such a value
is unknown so far, we can use the following possibility distribution: {21/0.4,
23/0.5, 25/0.8, 26/0.9, 27/0.6, 28/0.5, 29/0.3}. This is called disjunctive
possibility distribution.
2.5 Fuzzy XML Model 55
(b) multiple value attributes: XML restricts attributes to a single value, but it
is often the case that some data item is known to have multiple values-
these values may be unknown completely and can be specified with a possi-
bility distribution. For example, the e-mail address of a person may be
multiple character strings because he or she has several e-mail addresses
available simultaneously. In case we do not have complete knowledge of
the e-mail address for Tom Smith, we may say that the e-mail address may be
“[email protected]” with possibility 0.60, and “[email protected]”
with possibility 0.85. This is called conjunctive possibility distribution.
For ease of understanding, we interpret the above two kinds of fuzziness with a
simple fuzzy XML document d 1 in Fig. 2.1. In Fig. 2.1, we talk about the universities
in an area of a given city, say, Detroit, Michigan, in the USA.
(a) Wayne State University is located in downtown Detroit, and thus the possibility
that it is included in the universities in Detroit is 1.0. For pair < Val Poss = 1.0
> … < /Va > is omitted (see Lines 50–51).
(b) Oakland University, however, is located in a nearby county of Michigan, named
Oakland. Whether Oakland University is included in the universities in Detroit
depends on how to define the area of Detroit, the Greater Detroit Area or only
the City of Detroit. Assume that it is unknown and the possibility that Oakland
University is included in the universities in Detroit is assigned 0.8 (see Line 3).
The cases 1–2 are the fuzziness in elements. The degree associated with such an
element represents the possibility that a university is included in universities in
Detroit.
(c) For the student Tom Smith, if his age is unknown so far, i.e., he has fuzzy value
in the attribute age. Since age is known to have a single unique value, we can
use the disjunctive possibility distribution to represent such value (see Lines
23–35).
(d) The e-mail address of Tom Smith may be multiple character strings because he
has several e-mail addresses simultaneously. If we do not know his exact e-mail
addresses, and we use the conjunctive possibility distribution to represent such
information and may say that the e-mail address may be “[email protected]”
with possibility 0.6 and “[email protected]” with possibility 0.45 (see Lines 37–
45). Note that, the cases 3–4 are the fuzziness in attribute values of elements.
In an XML document, it is often the case that some values of attributes may be
unknown completely and can be specified with possibility distributions.
In order to represent the fuzzy data in XML documents, it is shown in the previous
part that several fuzzy constructs (such as Poss, Val and Dist) are introduced.
It is not difficult to see from the example given above that a possibility attribute,
denoted Poss, should be introduced first, which takes a value in [0, 1]. This possibility
attribute is applied together with a fuzzy construct called Val to specify the possibility
of a given element existing in the fuzzy XML document (see Line 3 in Fig. 2.1).
Another fuzzy construct called Dist, which specifies a possibility distribution,
is introduced. Based on pair <Val Poss> and </Val>, possibility distribution for an
element can be expressed. Also, possibility distribution can be used to express fuzzy
element values. For this purpose, we introduce another fuzzy construct called Dist to
specify a possibility distribution. Typically, a Dist element has multiple Val elements
as children, each with an associated possibility. Since we have two types of possibility
distribution, the Dist construct should indicate the type of a possibility distribution
being disjunctive or conjunctive (see Lines 24–34 and Lines 38–44 in Fig. 2.1).
Again consider Fig. 2.1. Lines 24–34 are the disjunctive Dist construct for the age
of student “Tom Smith”. Lines 38–44 are the conjunctive Dist construct for the email
of student “Tom Smith”. It should be noted, however, that the possibility distributions
in Lines 24–34 and Lines 38–44 are all for leaf nodes in the ancestor– descendant
chain. In fact, we can also have possibility distributions and values over non-leaf
nodes. Observe the disjunctive Dist construct in Lines 6–19, which express the two
possible statuses for the employee with ID 85,431,095. In these two employee values,
Lines 7–12 are with possibility 0.8, and Lines 13–18 are with possibility 0.6.
The structure of an XML document can be described by Document Type Definition
(DTD) or XML Schema (Antoniou and van Harmelen 2004). A DTD, which defines
the valid elements and their attributes and the nesting structures of these elements in
the instance documents, is used to assert the set of “rules” that each instance document
of a given document type must conform to. XML Schemas provide a much more
powerful means than DTDs by which to define your XML document structure and
limitations. It has been shown that the XML document must be extended for fuzzy
data modeling. As a result, several fuzzy constructs have been introduced.
In order to accommodate these fuzzy constructs, it is clear that the DTD of the source
XML document should be correspondingly modified. In this section, we focus on
DTD modification (i.e., fuzzy DTD) for representing the structure of the fuzziness
in XML document as introduced in Sect. 2.5.1.
Firstly we define the basic elements in a fuzzy DTD as follows:
58 2 Fuzzy Sets and Fuzzy Database Modeling
That is, a leaf element may be fuzzy and takes a value represented by a possibility
distribution.
• for the non-leaf element which contains the other elements, say nonleafElement,
its definition is modified from <!ELEMENT nonleafElement (basic_definition)>
to
That is, a non-leaf element may be crisp, e.g., student in Fig. 2.1, and thus the non-leaf
element student can be defined as
<!ELEMENT student (sname?, age?, sex?, email?)>.
Also, a non-leaf element may be fuzzy and takes a value represented by a possibility
distribution. We differentiate two cases: the first one is the element takes a value
connected with a possibility degree, e.g., university in Fig. 2.1, which can be defined
as
and the second one is the element takes a set of values and each value is connected
with a possibility degree, e.g., age of student in Fig. 2.1, which can be defined as
In the following, we define the XML Schema modification (i.e., fuzzy XML Schema)
for representing the structure of the fuzziness in XML document as introduced in
Sect. 2.5.1.
First, we define Val element as follows:
<xs: element name=“Val” type= “valtype”/>
<xs:complexType name=“valtype”>
<xs:sequence>
<xs:element name=“original-definition” minOccurs=“0” maxOccurs=
“unbounded”/>
<xs:attribute name=“Poss” type=“xs:fuzzy” minOccurs=“0” maxOccurs=
“unbounded” default=“1.0”/>
</xs:sequence>
</xs:complexType>
Then we define Dist element as follows:
<xs:element name= “Dist” type= “disttype”/>
<xs:complexType name= “disttype”>
<xs:element name=“Val” type=“valtype” minOccurs=”1” maxOccurs=
“unbounded”/>
<xs:attribute values=“disjunctive conjunctive” default=“disjunctive”/>
</xs:complexType>
60 2 Fuzzy Sets and Fuzzy Database Modeling
Fig. 2.2 The fuzzy DTD D1 w.r.t. the fuzzy XML document d1 in Fig. 2.1
Now we modify the element definition in the classical Schema so that all of the
elements can use possibility distributions (Dist). For a sub-element that only contains
leaf elements, its definition in the Schema is as follows.
<xs:element name=”leafElement” type=”leafelementtype”/>
<xs:complexType name=”leafelementtype”>
<xs:sequence>
<xs:element name=”original-definition” type=“xs:type” minOccurs=“0”
maxOccurs=“unbounded”/>
<xs:element name=“Dist” type=“disttype” minOccurs=“0” maxOccurs=
“unbounded”/>
</xs:sequence>
</xs:complexType>
2.5 Fuzzy XML Model 61
For an element that contains leaf elements without any fuzziness, its definition in
the Schema is as follows.
<xs:element name=“original-definition” type=“xs:type” minOccurs=“0”
maxOccurs=“unbounded”/>
For an element that contains leaf elements with fuzziness, its definition in the
Schema is as follows.
<xs:element name=“leafElement” type=“leafelementtype”/>
<xs:complexType name=“leafelementtype”>
<xs: element name= “Dist” type= “disttype” minOccurs= “0” maxOccurs=
“unbounded”/>
</xs:complexType>
For a sub-element that does not contain any leaf elements, its definition in the
Schema is as follows.
<xs: element name= “nonleafElement” type= “nonleafelementtype”/>
<xs: complexType name= “nonleafelementtype”>
<xs: sequence>
<xs:element name=“original-definition” type=“xs:type” minOccurs=“0”
maxOccurs=“unbounded”/>
<xs:element name=“Dist” type=“disttype” minOccurs=“0” maxOccurs=
“unbounded”/>
<xs:element name=“Val” type=“valtype” minOccurs=“0” maxOccurs=
“unbounded”/>
</xs: sequence>
</xs: complexType>
For an element that does not contain leaf elements without any fuzziness, its
definition in the Schema is as follows.
<xs: element name= “nonleafElement” type= “nonleafelementtype”/>
<xs: complexType name= “nonleafelementtype”>
<xs: element name= “original-definition” type= “xs:type” minOccurs=“0”
maxOccurs=“unbounded”/>
</xs: complexType>
For a sub-element that does not contain leaf elements but a fuzzy value, its
definition in the Schema is as follows.
<xs:element name=“nonleafElement” type=“nonleafelementtype”/>
<xs:complexType name=“nonleafelementtype”>
<xs:element name=“Val” type=“valtype” minOccurs=“0” maxOccurs=
“unbounded”/>
</xs:complexType>
For a sub-element that does not contain leaf elements but a set of fuzzy values,
its definition in the Schema is as follows.
62 2 Fuzzy Sets and Fuzzy Database Modeling
<xs:complexType name=“disttype”>
<xs:element name=“Val” type=“valtype” minOccurs=“1” maxOc-
curs=“unbounded”/>
<xs:attribute values=“disjunctive conjunctive” default=“disjunctive”/>
</xs:complexType> <xs:complexType name=“studenttype”> <xs:sequence>
<xs:element name=“sname” type=“xs:string” minOccurs=“0” maxOc-
curs=“1”/>
<xs:element name=“age” type=“agetype” minOccurs=“0” maxOccurs=“1”/>
<xs:element name=“sex” type=“xs:string” minOccurs=“0” maxOccurs=“1”/>
<xs:element name=“email” type=“emailtype” minOccurs=“0” maxOc-
curs=“1”/>
</xs:sequence>
<xs:attribute name=“SID” type=“xs:IDREF” use=“REQUIRED”/>
</xs:complexType>
<xs:complexType name=“agetype”>
<xs:element name=“Dist” type=“disttype”/>
</xs:complexType>
<xs:complexType name=“emailtype”>
<xs:element name=“Dist” type=“disttype”/>
<xs:attribute values=“conjunctive”/>
</xs:complexType>
</xs:schema>
Being similar to the classical XML document, a fuzzy XML document can be intu-
itionally seen as a syntax tree. Figure 2.3 shows a fragment of the fuzzy XML
document d 1 in Fig. 2.1 and its tree representation.
Based on the tree representation of the fuzzy XML document, in the following
we define the formalization of fuzzy XML models in Ma et al. (2010), Zhang et al.
(2013). It can be found from Fig. 2.2 that a fuzzy DTD is made up of element
type definitions, and each element may have associated attributes. Each element
type definition has the form E → (α, A), where E is the defined element type (e.g.,
university and student), α, called the content model such as university (UName, Val
+ ), and A are attributes of E.
For the sake of simplicity, we assume that the symbol T denotes the atomic types of
elements and attributes such as #PCDATA and CDATA, E denotes the set of elements
including the basic elements (e.g., university and student) and the special elements
(e.g., Val and Dist), A denotes the set of attributes, and S = T ∪ E.
A fuzzy DTD D is a pair (P, r), where P is a set of element type definitions, and r [
E is the root element type, which uniquely identifies a fuzzy DTD. Each element type
definition has the form E → (α, A), constructed according to the following syntax:
64 2 Fuzzy Sets and Fuzzy Database Modeling
Fig. 2.3 A fragment of the fuzzy XML document and its tree representation
Here:
1. S = T ∪ E; empty denotes the empty string; “|” denotes union, and “,” denotes
concatenation; α can be extended with cardinality operators “?”, “*”, and “+”,
where “?” denotes 0 or 1 time, “*” denotes 0 or n times, and “+” denotes 1 or n
times; the construct any stands for any sequence of element types defined in the
fuzzy DTD;
2. AN ∈ A denotes the attribute names of the element E; AT denotes the attribute
types; and VT is the value types of attributes which can be #REQUIRED,
#IMPLIED, #FIXED “value”, “value”, and disjunctive/conjunctive possibility
distribution.
The formal definition of fuzzy XML Schemas can be analogously given following
the procedure above. Next, we give a formal definition of the fuzzy XML documents.
A fuzzy XML document d over a fuzzy DTD D is a tuple d = <N, <, λ, η, r),
where:
• N is a set of nodes in a fuzzy XML document tree.
• < denotes the child relation between nodes, i.e., for two nodes vi , vj ∈ N, if vi <
vj , then vi is the parent node of vj .
• λ: N → E ∪ A is a labeling function for distinguishing elements and attributes
(where E and A are the sets of elements and attributes in the fuzzy DTD, and
attributes are preceded by a “@” to distinguish them from elements) such that if
2.6 Summary 65
In the following, we further give the formalization of fuzzy XML data models,
which is defined based on the characteristic of the tree structure of fuzzy XML data
models as mentioned above.
In short, the basic structure of a fuzzy XML data model is a tree. Let N be a
finite set (of vertices), E ∈ N × N be a set (of edges) and λ: E → L be a mapping
from edges to a set L of strings called labels. The triple G = (N, E, λ) is an edge
labeled directed graph. It should be noted that the tree structure only briefly describes
the characteristic of fuzzy XML data models, and ignores a number of fuzzy XML
features. Here, we further provide a more detailed formal definition of fuzzy XML
tree.
A fuzzy XML tree t can be a tuple t = (N, σ, λ, η, ρ, γ , ∝):
• N = {N 1 , …, N n } is a set of vertices.
• σ ⊂ {(N i , N j ) | N i , N j ∈ N}, (N, σ ) is a directed tree.
• λ: N → (L [ {NULL}), where L a set of strings called labels. For n ∈ N and l ∈
L, λ(n, l) specifies the set of objects that may be children of n with label l.
• η → T, where T is a set of fuzzy XML types (Oliboni & Pozzani, 2008).
• ρ is a mapping from the set of objects v ∈ V to local possibility functions. It defines
the possibility of a set of children of an object existing given that the parent object
exists.
• γ associates with n ∈ N, each label l ∈ L, and an integer-valued interval function,
i.e., γ (n, l) = [min, max]. γ is used to define the cardinality constrains of children
with a label.
• ∝ is a possibly empty partial order on N. Here, a relation “∝” is a partial order
on a set N if the following three characteristics hold: (1) reflexivity: θ ∝ θ for
all θ ∈ N; (2) antisymmetry: θ ∝ ω and ω ∝ θ implies ω = θ; (3) transitivity:
θ ∝ ω and ω ∝ ε implies θ ∝ ε.
2.6 Summary
References
Abiteboul, S., Segoufin, L., & Vianu, V. (2006). Representing and querying XML with incomplete
information. ACM Transactions on Database Systems (TODS), 31(1), 208–254.
Antoniou, G., & van Harmelen, F. (2004). A Semantic Web Primer. MIT Press.
Berzal, F., Marín, N., Pons, O., & Vila, M. A. (2007). Managing fuzziness on conventional object-
oriented platforms. International Journal of Intelligent Systems, 22(7), 781–803.
Bordogna, G., & Pasi, G. (2001). Graph-based interaction in a fuzzy object-oriented database.
International Journal of Intelligent Systems, 16, 821–841.
Bosc, P., & Prade, H. (1993). An introduction to fuzzy set and possibility theory based approaches
to the treatment of uncertainty and imprecision in database management systems. In Proceedings
of the Second Workshop on Uncertainty Management in Information Systems: From Needs to
Solutions.
Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F., & Cowan, J. (2000). Extensible
markup language (XML) 1.0.
Buckles, B., & Petry, F. (1982). A fuzzy representation for relational databases. Fuzzy Sets and
Systems, 7, 213–226.
Cao, T. H., & Rossiter, J. M. (2003). A deductive probabilistic and fuzzy object-oriented database
language. Fuzzy Sets and Systems, 140, 129–150.
Chaudhry, N., Moyne, J., & Rundensteiner, E. A. (1999). An extended database design methodology
for uncertain data management. Information Sciences, 121(1–2), 83–112.
Chen, G. Q. (1999). Fuzzy Logic in Data Modeling; Semantics, Constraints, and Database Design.
Kluwer Academic Publisher.
Chen, G. Q., Vandenbulcke, J., & Kerre, E. E. (1992). A general treatment of data redundancy
in a fuzzy relational data model. Journal of the American Society of Information Science, 43,
304–311.
Codd, E. F. (1986). Missing information (applicable and inapplicable) in relational databases.
SIGMOD Record, 15, 53–78.
References 67
Lee, J., & Fanjiang, Y. (2003). Modeling imprecise requirements with XML. Information and
Software Technology, 45(7), 445–460.
Lee, J., Xue, N. L., Hsu, K. H., & Yang, S. J. H. (1999). Modeling imprecise requirements with
fuzzy objects. Information Sciences, 118, 101–119.
Li, G., Yan, L., & Ma, Z. (2019a). An approach for approximate subgraph matching in fuzzy RDF
graph. Fuzzy Sets and Systems, 376, 106–126.
Li, G., Yan, L., & Ma, Z. (2019b). A method for fuzzy quantified querying over fuzzy resource
description framework graph. International Journal of Intelligent Systems, 34(6), 1086–1107.
Li, G., Yan, L., & Ma, Z. (2019c). Pattern match query over fuzzy RDF graph. Knowledge-Based
Systems, 165, 460–473.
Ma, Z. M. (2005a). Advances in Fuzzy Object-Oriented Databases: Modeling and Applications.
Idea Group Publishing.
Ma, Z. M. (2005b). Fuzzy Database Modeling with XML (The Kluwer International Series on
Advances in Database Systems). Springer-Verlag.
Ma, Z., Li, G., & Yan, L. (2018). Fuzzy data modeling and algebraic operations in RDF. Fuzzy Sets
and Systems, 351, 41–63.
Ma, Z. M., Liu, J., & Yan, L. (2010). Fuzzy data modeling and algebraic operations in XML.
International Journal of Intelligent Systems, 25(9), 925–947.
Ma, Z. M., & Mili, F. (2002). Handling fuzzy information in extended possibility-based fuzzy
relational databases. International Journal of Intelligent Systems, 17(10), 925–942.
Ma, Z., & Yan, L. (2016). Modeling fuzzy data with XML: A survey. Fuzzy Sets and Systems, 301,
146–159.
Ma, Z. M., & Yan, L. (2007). Fuzzy XML data modeling with the UML and relational data models.
Data & Knowledge Engineering, 63(3), 970–994.
Ma, Z. M., & Yan, L. (2008). A literature overview of fuzzy database models. Journal of Information
Science and Engineering, 24(1), 189–202.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2000). Semantic measure of fuzzy data in extended possibility-
based fuzzy relational databases. International Journal of Intelligent Systems, 15, 705–716.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2004). Extending object-oriented databases for fuzzy
information modeling. Information Systems, 29(5), 421–435.
Majumdar, A. K., Bhattacharya, I., & Saha, A. K. (2002). An object-oriented fuzzy data model for
similarity detection in image databases. IEEE Transactions on Knowledge and Data Engineering,
14, 1186–1189.
Manolis, N., & Tzitzikas, Y. (2011). Interactive exploration of fuzzy RDF knowledge bases. In
Extended Semantic Web Conference (pp. 1–16). Springer.
Marín, N., Pons, O., & Vila, M. A. (2001). A strategy for adding fuzzy types to an object oriented
database system. International Journal of Intelligent Systems, 16, 863–880.
Marín, N., Vila, M. A., & Pons, O. (2000). Fuzzy types: A new concept of type for managing vague
structures. International Journal of Intelligent Systems, 15, 1061–1085.
Mathew, S., Mordeson, J. N., & Malik, D. S. (2018). Fuzzy Graph Theory (Vol. 363). Springer
International Publishing.
Nam, M., Ngoc, N. T. B., Nguyen, H., & Cao, T. H. (2007). FPDB40: A fuzzy and probabilistic
object base management system. Proceedings of the FUZZ-IEEE, 2007, 1–6.
Ndouse, T. D. (1997). Intelligent systems modeling with reusable fuzzy objects. International
Journal of Intelligent Systems, 12, 137–152.
Nierman, A., & Jagadish, H. V. (2002). ProTDB: Probabilistic data in XML. In VLDB’02: Proceed-
ings of the 28th International Conference on Very Large Databases (pp. 646–657). Morgan
Kaufmann.
Oliboni, B., & Pozzani, G. (2008). Representing fuzzy information by using XML schema. In 2008
19th International Workshop on Database and Expert Systems Applications (pp. 683–687). IEEE.
Ozgur, N. B., Koyuncu, M., & Yazici, A. (2009). An intelligent fuzzy object-oriented database
framework for video database applications. Fuzzy Sets and Systems, 160, 2253–2274.
References 69
Parsons, S. (1996). Current approaches to handling imperfect information in data and knowledge
bases. IEEE Transactions on Knowledge Data Engineering, 8, 353–372.
Petry, F. E. (1996). Fuzzy Databases: Principles and Applications. Kluwer Academic Publisher.
Prade, H., & Testemale, C. (1984). Generalizing database relational algebra for the treatment of
incomplete or uncertain information and vague queries. Information Sciences, 34, 115–143.
Raju, K., & Majumdar, A. (1988). Fuzzy functional dependencies and lossless join decomposition
of fuzzy relational database systems. ACM TODS, 13(2), 129–166.
Rosenfeld, A. (1975). Fuzzy graphs. In L. A. Zadeh, K. S. Fu, & M. Shimura (Eds.), Fuzzy Sets
and Their Applications (pp. 77–95). Academic Press.
Rundensteiner, E., & Bic, L. (1992). Evaluating aggregates in possibilistic relational databases.
Data and Knowledge Engineering, 7, 239–267.
Senellart, P., & Abiteboul, S. (2007). On the complexity of managing probabilistic XML data. In
Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of
Database Systems (pp. 283–292).
Shenoi, S., & Melton, A. (1999). Proximity relations in the fuzzy relational database model. Fuzzy
Sets Syst (Suppl) 100, 51–62.
Straccia, U. (2009). A minimal deductive system for general fuzzy RDF. International Conference
on Web Reasoning and Rule Systems (pp. 166–181). Springer.
Sunitha, M. S. (2001). Studies on fuzzy graphs (PhD thesis). Cochin University of Science and
Technology, India.
Sunitha, M. S., & Vijayakumar, A. (2005). Blocks in fuzzy graphs. The Journal of Fuzzy
Mathematics, 13(1), 13–23.
Tseng, C., Khamisy, W., & Vu, T. (2005). Universal fuzzy system representation with XML.
Computer Standards & Interfaces, 28(2), 218–230.
Turowski, K., & Weng, U. (2002). Representing and processing fuzzy information—an XML-based
approach. Knowledge-Based Systems, 15(1–2), 67–75.
Umano, M., & Fukami, S. (1994). Fuzzy relational algebra for possibility-distribution-fuzzy-
relational model of fuzzy data. Journal of Intelligent Information Systems, 3, 7–27.
Umano, M., Imada, T., Hatono, I., & Tamura, H. (1998). Fuzzy object-oriented databases and
implementation of its SQL-type data manipulation language. In Proceedings of the 7th IEEE
International Conference on Fuzzy Systems (pp. 1344–1349).
Yan, L., & Ma, Z. M. (2012). Comparison of entity with fuzzy data types in fuzzy object-oriented
databases. Integrated Computer-Aided Engineering, 19(2), 199–212.
Yan, L., Ma, Z. M., & Zhang, F. (2012). Algebraic operations in fuzzy object-oriented databases.
Information Systems Frontiers, 1–14.
Yan, L., Ma, Z., Zhang, F., & Ma, Z. (2014). Fuzzy XML data management. Springer.
Yazici, A., & Koyuncu, M. (1997). Fuzzy object-oriented database modeling coupled with fuzzy
logic. Fuzzy Sets System 89, 1–26.
Yazici, A., & George, R. (1999). Fuzzy Database Modeling. Physica-Verlag.
Yeh, R. T., & Bang, S. Y. (1975a). Fuzzy graphs, fuzzy relations, and their applications to cluster
analysis. In L. A. Zadeh, K. S. Fu, & M. Shimura (Eds.), Fuzzy Sets and Their Applications
(pp. 125–149). Academic Press.
Yeh, R. T., & Bang, S. Y. (1975b). Fuzzy relations, fuzzy graphs, and their applications to clustering
analysis. Fuzzy Sets and their Applications to Cognitive and Decision Processes, 159(159), 125–
149.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Systems, 1(1), 3–28.
Zhang, F., Ma, Z. M., & Yan, L. (2013). Construction of fuzzy ontologies from fuzzy XML models.
Knowledge-Based Systems, 42, 20–39.
Zicari, R., & Milano, P. (1990). Incomplete information in object-oriented databases. ACM SIGMOD
Record, 19(3), 5–16.
Chapter 3
Fuzzy RDF Modeling
3.1 Introduction
RDF is a World Wide Web Consortium (W3C) recommendation, which can represent
structured data as well as unstructured data and is quickly becoming the de-facto stan-
dard for the representation and exchange of information. Nowadays RDF data model
is finding more and more use in a wide range of web data management scenarios.
However, information suffers from imperfections in real-world applications. In in an
open environment like the Web, the underlying RDF data is unreliable and imprecise.
Additionally, in the context of the Semantic Web, the need for fuzzy data arises in
information classifications, which are fuzzy by nature.
Identifying that it is essential to explicitly represent and process imperfect infor-
mation, managing imprecise and uncertain information (mainly with respective forms
of fuzzy sets or probability distributions) has been extensively investigated in the
context of the relational model, conceptual data models, object-oriented databases,
XML, and so on. Unfortunately, although it is the current standard for data represen-
tation and exchange over the Semantic Web, RDF is not able to represent and process
imprecise and uncertain data, although the databases with imprecise and uncertain
information have been extensively discussed. The extension of the RDF data model
is particularly important.
At present, a number of research efforts have proposed multiple mechanisms to
model uncertain RDF. One such approach is to incorporate additional semantics into
the RDF data model to represent uncertainty. Several extensions of RDF have been
proposed in order to deal with truth of imprecise information (Mazzieri et al., 2008;
Straccia 2009), time (Tappolet et al., 2009), trust (Hartig, 2009) and provenance
(Dividino et al., 2009). Particularly imprecise and uncertain data has become an
emerging topic for various applications of the Semantic Web. However, RDF lacks
sufficient power to represent and process imprecise and uncertain data although the
databases with imprecise and uncertain information have been extensively discussed
In the following, we first discuss the fuzziness in RDF. Then we formally present
the fuzzy RDF data model and some basic notions, which are used with our algebra
discussed in the later sections of this chapter.
In real-world applications, some works have used Zadeh’s fuzzy set theory to accom-
modate different types of imprecise and uncertain information in database systems
(Piattini et al., 2006). In order to represent and manipulate imprecise RDF data, RDF
should be extended using fuzzy set theory. From the perspective of semantics, there
are two level structures in RDF data model, namely the triple-level (statement-level)
and the element-level (i.e., subject, predicate, and object of a triple). Accordingly,
there may be two kinds of fuzziness in a fuzzy RDF dataset: one is the fuzziness in
triples, i.e., a fuzzy membership degree associated with an RDF triple represents the
amount of disagreement on the corresponding statement, the other is the fuzziness
in the element of triple, i.e., we do not know the crisp value of an element, and the
value of the element may be represented by a fuzzy set.
To represent fuzzy information at the triple-level, the fuzzy extension of RDF
model (Lopes et al., 2010; Mazzieri et al., 2008) simply associates to each of the
underlying components of the RDF model i.e., triples, a membership value repre-
senting a measure of uncertainty. Formally, this kind of fuzzy RDF model is a 4-tuple
(s, p, o) [n], where s, p and o are subject, predicate and object respectively, and n is
a numeric value between 0 and 1. The numeric value has a syntactic nature different
from the others: it is not an element of the domain of the discourse, but a property
related to the formalism used by the data model to represent the fuzzy truth-value of
the assertion (statement) made by the RDF triple. From the perspective of semantics,
a fuzzy membership degree associated with an RDF triple represents the amount
of disagreement on the corresponding statement. Although this typed of fuzzy RDF
model is simple, the added numeric value can be interpreted as various different mean-
ings in the real world such as uncertainty (Udrea et al., 2006, 2010), trust (Hartig,
2009), provenance (Dividino et al., 2009) and so on. This allows it to describe all
kinds of things in the objective world, and have great potential for application. For
instance, <Diner, writer, Barry Levinson> [0.75] is a fuzzy triple, intending that the
possibility that Barry Levinson is the writer of Diner is 0.75. From a graphical point
of view, a fuzzy RDF graph is a set of fuzzy RDF triples. The edges represent the
predicates of triples. They are annotated with the predicate identifier as usual and
with an additional label for the consumer-specific value of the corresponding triple.
74 3 Fuzzy RDF Modeling
However, this data model only considers the membership degree of triple which
represent the possibility of the triple being a member of the corresponding RDF
graph. And the element value in the triple is still crisp. Although the expression
ability is better than classic RDF data model, there is still a great limitation for
full fuzzy RDF modeling and reasoning. The semantics of an RDF graph relies on
the connectivity of the resources described. This is not properly considered in mere
triple-level vagueness. Therefore, it appears natural to allow element of triple as a
fuzzy concept and not only statement. In generally, the subject, object, and predicate
of a triple do not always need to be a crisp resource, but they can also be fuzzy
concepts.
In order to represent the fuzziness of three components of triples, Ma et al. propose
a fuzzy extension of RDF, which annotates three components of triples (i.e., subject,
predicate, and object) instead of whole triples with degrees in [0, 1]. This is a
general fuzzy RDF graph model that considers the element-level fuzziness based
(on fuzzy graph theory. ) The element-level fuzzy RDF model formally is fuzzy triple
μ S /s, μ R / p, μ Q /o , where s is fuzzy subject and μs ∈ [0, 1] denote the member-
ship degree of subject to the universe of an RDF dataset, p is fuzzy predicate and
μ p ∈ [0, 1] express the fuzzy degree to the property or relationship being described,
o is fuzzy object and μo ∈ [0, 1] represent the fuzzy degree of the property value.
Such RDF triples also can be conceived of as a directed fuzzy graph which allows
users to describe arbitrary resources in terms of their attributes and their relationships
to other resources. Each vertex and edge of fuzzy RDF graph are associated with a
membership value, e.g., each element of triple existence fuzzy value, which is quite
different from the model assumption in that only considers the existence membership
degree of triples to appear in reality.
The fuzzy degree associated with each vertex indicates the possibility that vertex
takes the label. In particular, if a vertex label is URI which represents a resource that
corresponds to a real-world entity, the fuzzy membership specifies the possibility
that resource identify across datasets. Another case, if a vertex label is literal which
represents property value of an entity, the fuzzy membership specifies the possibility
that the property value distributes across dataset. Furthermore, the edge between two
vertices represents semantic relationship, and the fuzzy degree associated with each
edge represents the possibility whether particular relationships exist.
It is shown that the RDF should be extended for fuzzy data modeling. In order
to accommodate the fuzzy information, the RDF syntax must be modified corre-
spondingly. In this subsection, we focus on syntax modification for fuzzy RDF data
modeling.
3.2 Fuzzy RDF Graph 75
In the following, we further give the formalization of fuzzy RDF data models, which
is defined based on the characteristic of fuzzy graph (Sunitha, 2001) in the fuzzy set
theory.
In short, the basic structure of a fuzzy RDF data model is a graph. We start by
introducing some simple concepts. Let V be a finite set of vertices, E ⊂ Vi j × Vi be
a set of edges and L : V ∪ E → Σ be a mapping from vertices and edges to a set Σ
of string called labels. Quadruple G M = (V , E, Σ, L) is a directed labeled graph.
Assume M is a set of RDF triples each represented by (s, p, o) ∈ U ∪ B × U × U
∪ L ∪ B. A conversion function from M to GM include the following two steps for
each (s, p, o) ∈ M: (i) add vertices vs , vo to V and assign L(vs ) = s, L(v0 ) = 0, (ii)
add a directed edge (vs , vo ) into E and assign L(vs , v0 ) = p. It should be noted that
the graph structure only briefly describes the structure characteristic of RDF data
model, and ignores fuzzy contents of vertices and edges of RDF data model. Here,
we further provide a more detailed formal definition of fuzzy RDF graph data model.
76 3 Fuzzy RDF Modeling
Definition 3.1 (Fuzzy RDF data graph). Fuzzy RDF data graph G is represented by
a 6-tuple (V , E, Σ, L , μ, ρ). Here
1. V is a finitude set of vertices
2. E ⊂ Vi × V j is a set of directed edges, where Vi , V j ⊂ V
3. Σ is a set of labels
4. L = {L V , L E }, L V : V → Σ is a function assigning labels to vertices, and
L E : E → Σ is a function assigning labels to edges
5. μ : V → [0, 1] is a fuzzy subset of V
6. ρ : (E → )[0, 1] is a fuzzy ( )relation on fuzzy subset μ. Note that ∀vi , v j ∈
V , ρ vi , v j ≤ μ(vi ) ∧ μ v j , where ∧ stands for minimum.
In Definition 3.1, each vertex vi ∈ V of graph G has one label, L V( (vi ), corre- )
sponding to either subjects or objects in RDF triples datasets. Moreover,
( )vi , v j ∈ E
is a directed edge from vertex vi to vertex vj , with an edge label L E vi , v j that corre-
sponds to the predicate in RDF triples. The labels values of vertices are associated
with fuzzy degree indicating the possibility that vertices take the labels, and the fuzzy
value associated with each edge represents the amount of disagreement on the corre-
sponding relationship between vertices. A fuzzy RDF data graph may contain both
fuzzy vertices (resp. edges) and crisp vertices (resp. edges) as a fuzzy vertex (resp.
edge) with a degree of 0 or 1 can be considered as crisp. Along the same line, a crisp
RDF graph is simply a special case of fuzzy RDF (data graph ) (where
) μ : V → {0, 1}
for all vi ∈ V and ρ : V × V → {0, 1} for all vi , v j ∈ E , and the fuzzy RDF
graph is a generalization of the crisp RDF graph.
Moreover, in our model, each vertex and edge of fuzzy RDF graphs is associated
with a membership value, e.g., each elements of triple existence fuzzy value, which
is quite different from the model assumption in Udrea et al. (2010}, Zimmermann
et al. (2011), that only considers the existence membership degree of triples to appear
in reality.
An example of fuzzy RDF data graph with some fuzzy vertices and edges is given
in Fig. 3.1, which describes some information about movies and actors. Here the
genre of the film with ID film1 is tragedy, the audience rating is “9.5”, the box office
is “$35 million”, the star is the person with ID pid1 and the director is the person
with ID pid1 who born in “area1”. From the graph, the genre of the film1 has label
“tragedy” with possibility 0.95, and it exactly corresponds to the object of triple
(film1, <genre>, 0.95/“tragedy”). Similarly, the vertex labeled “city1” is connected
to another vertex labeled “region1” through the directed edge labeled “locateIn” with
possibility 0.85, and it corresponds to the triple (city1, 0.85/<locateIn>, “region1”).
Therefore, this graphic representation is generic enough to capture the correlations or
constraints among labels of vertices and edges. In this example, the degree is based
on a simple statistical notion, which can be acquired from statistics of historical data
or reliability of data sources.
3.2 Fuzzy RDF Graph 77
Definition 3.4 (Fuzzy RDF graph isomorphism). Given two fuzzy RDF graphs G 1 =
(V1 , E 1 , Σ1 , L 1 , μ1 , ρ1 ) and G 2 = (V2 , E 2 , Σ2 , L 2 , μ2 , ρ2 ), an isomorphism from
G1 to G2 is a bijective function h: V 1 → V 2 such that:
78 3 Fuzzy RDF Modeling
Proof:
1. Reflexivity: Consider the identity map h: V → V such that ∀ v ∈ V, h (v) = v.
This (h is a )bijective
( map satisfying
( )) ∀v ∈ V, μ(v) = μ(h(v)) and ∀ (vi , vj ) ∈
E, ρ vi , v j = ρ h(vi ), h v j . Hence h is an isomorphism of the fuzzy graph
to itself. Therefore, it satisfies reflexivity.
2. Symmetry: Given two fuzzy RDF graphs G1 and G2 . Let h : V1 → V2 be
an isomorphism from G1 to G2 then h is a bijective (map h(v)1 ) = v2 , v1 ∈
)) ( μ1 (v)1 ) = μ2 (h(v1 )), ∀v1 ∈ V1 and ρ1 v1i , v1 j = ρ2 (h(v1i ) ,
v1( satisfying
h v1 j , ∀ v1i , v1 j ∈ E 1 .
−1
As h is ( bijective,
) by (h(v1 ) = ( v2 ,))
v1 ∈
( V1 then ) h (v2 ) = v(1 ∀v 2 ∈ ) V2 ;
−1
Using ρ1 v1i , v1 j = ρ2 (h(v1i ), h v1 j (, ∀ ))
v1i , v1 j ( ∈ E 1 then
) ( μ1 h )(v2 ) =
μ2 (v2 ) ∀ v2 ∈ V2 and ρ1 h −1 (v2i ), h −1 v2 j = ρ2 v2i , v2 j , ∀ v2i , v2 j ∈ E 2 .
Thus, we get a 1–1, onto map h −1 : V2 → V1 , which is an isomorphism from
G2 to G1, i.e., G 1 ∼
= G2 ⇒ G2 ∼ = G1.
3. Transitivity: Given three fuzzy RDF graphs G1 , G2 and G3 . Supposed h : V1 →
V2 and h 2 : V2 → V3 are isomorphism of the fuzzy RDF graph G1 onto G2 and
G2 onto G3 respectively.
( ) ( )
ρ1 v1i , v1 j = ρ2 v2i , v2 j
( ( ))
= ρ3 h 2 (v2i ), h 2 v2 j
( ( ( )))
= ρ3 h 2 (h 1 (v1i )), h 2 h 1 v1 j (∀(v1i , ∈ E 1 )).
Definition 3.5 (Fuzzy RDF graph pattern). A fuzzy RDF graph pattern is a 5-tuple
P = (V P , E P , FV , FE , R E ) where
For example, a pattern graph P for the RDF graph shown in Fig. 3.1 is given in
Fig. 3.2. This pattern is applied to model information concerning actor (?p) who is
born in country1. The box office of the film (?film) that the actor starred is more than
30 million (?b > $ 30 million) and its genre is a tragedy.
Depending on the meaning we want to give to a certain RDF graph we will consider
different kinds of fuzzy interpretations, e.g., simple, RDF, RDFS, and D, etc. For
each one of them there will be some special semantic conditions.
Intuitively, a fuzzy interpretation will represent a possible configuration of the
world, such that we can verify whether or not what is said on a graph G is true within
the framework of fuzzy logic. This leads us to think of an RDF graph as something
which satisfies the possible world, thus providing some information.
As described in Hayes (2004) any interpretation is relative to a certain vocabulary,
so we( will in general speak) of a fuzzy interpretation of the vocabulary V. For a
triple μs /s, μ p / p, μo /o can be thought of as stating that a certain binary predicate
associated to μp /p holds for the couple (μs /s, μo /o). A fuzzy interpretation will give
us this association, and given a fuzzy RDF graph, it will be true if none of its triple
state something false within the framework of fuzzy logic.
Given a fuzzy triple (μs /s, μp /p, μo /o), if min (μs , μp , μo , μIs(s) , μIs(p) , μIs(o) ) ≥ α
(α is a given threshold), then If (μs /s, μp /p, μo /o) = true, otherwise, If (μs /s, μp /p,
μo /o) = false. Given a set of triples S, If (S) = false if If (μs /s, μp /p, μo /o) = false
for some triple (μs /s, μp /p, μo /o) in S, otherwise If (S) = true. If satisfies S, written
as If |≈ S, if If (S) = true, in this case, we say If is a fuzzy simple interpretation of S.
Fuzzy simple interpretation, instead of associating a value in {0, 1} to each element
of the corresponding set, accepts any value in the closed unit interval [0, 1].
Definition 3.7 (Fuzzy Simple Entailment). Let S be a set of fuzzy RDF graphs, and
G a fuzzy RDF graph, then S fuzzy{ simply entails G if and only
} if for every fuzzy
simple interpretation If we have: I f | ≈ G|∀H ∈ S, I f | ≈ H . In that case we note
S|≈f G.
3.3 Fuzzy RDF Schema 81
On the basis of RDF, RDF Schema (RDFS) is used to describe the RDF vocabulary.
RDF and RDF Schema together implement data exchange at the semantic level
of any vocabulary between different machines. Here, RDF is the part of the data
model and RDF Schema is the semantic interpretation part with additional ability
to describe resources. While in RDF the main construct is the extension, the RDF
Schema semantics is stated in terms of classes (Hayes, 2004). As a class is a resource
with a class extension, which represents a set of domain element, the definition of
class relies on the definition of extension. If an extension is a set of couples, and a
fuzzy extension is a fuzzy set of couples, fuzzy class extensions in RDF Schema are
fuzzy sets of domain’s elements.
RDF Schema has a larger vocabulary then RDF, composed of URIs in the rdfs:
namespace. The semantics is conveniently expressed in terms of classes: a class is a
resource with a class extension, which is a subset of resource. As a consequence of
this definition, a class can have itself as a member. The relation between a class and
a class member is given using the RDF vocabulary property rdf: type, and the set of
all classes is IC.
With the RDFS, classes, properties, and relationships between classes and prop-
erties can be declared. The modeling primitive rdfs: Class and rdf: Property, for
example, are applied to define classes and properties, respectively, which are the
generalization of rdfs: Resource. In addition, the modeling primitive rdf: type is
applied to state that a resource is an instance of a class. In particular, class inheritance
and property inheritance can be described by rdfs: subClassOf and rdfs: subProp-
ertyOf , respectively. Furthermore, RDFS provides rdfs: domain and rdfs: range to
constrain the domain and range of properties, respectively.
In the following, we define fuzzy RDF Schema (Fan et al., 2019) for modeling
primitives, which can organize fuzzy RDF vocabularies into hierarchies. The formal
fuzzy RDFS is given as follows:
Definition 3.8 (Fuzzy RDF Shema graph). Fuzzy RDF(S) data graph GF is
represented by a 7-tuple = {V, E, Σ, L, μ, ρ, A}. Here.
In Definition 3.8, the fuzzy RDFS data graph GF is a directed labeled graph, in
which each vertex and the directed edge is assigned a label. The set of axioms A
denotes the semantic of the fuzzy RDFS data. In this case, the labels contain the
semantic information that can be used in the set of axioms. Let vi ∈ V and vj ∈ V
be a subject vertex and an object vertex of the graph GF , and their labels be L(vi )
∈ Σ. C and L(vj ) ∈ Σ. C, respectively. If edge label L E (vi ,vj ) is rdfs: subClassOf
and the label value is ρ(vi ,vj ), the class axiom can be represented as ρ(vi ,vj )/rdfs:
subClassOf (L(vi ) L(vj )). In a similar way, the extended fuzzy RDFS graph model
can describe not only its instance information but also its structure information and
the inferred semantic data can be derived from the graph. Table 3.1 shows the fuzzy
RDFS triples and their corresponding axioms.
In addition, Definition 3.8 explicitly classifies the element Σ that is the set of
labels into four categories: class resource labels, object property resource labels,
datatype property resource labels, and datatype labels. Along the same time, a crisp
RDFS graph is simply a special case of fuzzy RDFS data graph with fuzzy values of
0 or 1 on all vertices (resp. edges).
Data matching is the process of bringing data from different data sources together
and comparing them to find out whether they represent the same real-world object
in a given domain (Dorneles et al., 2011). Fuzzy RDF data matching is a funda-
mental problem in the integration of fuzzy RDF data. Based on the fuzzy RDF data
model, we propose an approach for fuzzy RDF graph matching in this section. The
method computes multiple measures of similarity among graph elements: syntactic,
3.4 Similarity Matching of Fuzzy RDF Graphs 83
semantic and structural. These measures are composed in a principled manner for
graph matching. In particular, an iterative similarity function is introduced with the
consideration of structural information of fuzzy RDF graph.
RDF data have a natural representation in the form of labeled directed graphs, in
which vertices represent resources and values (also called literals), and edges repre-
sent semantic relationships between resources. So, RDF data matching problem has
been often addressed in terms of graph matching approach.
Definition 3.9 (Fuzzy RDF graph matching). Given two fuzzy RDF graphs GS and
GT from a given domain, the matching problem is to identify all correspondences
between graphs GS and GT representing the same real-world object. The match result
is typically represented by a set of correspondences, sometimes called a mapping.
A correspondence c = (id, E s , E t , m) interrelates two element E s and E t from
graphs GS and GT . An optional similarity degree m ∈ [0, 1] indicates the similarity
or strength of the correspondence between the two elements.
Definition 3.10 (Similarity function). Let GS and GT are two datasets, a similarity
function is defined as: Fs (s, t) → [0, 1], where (s, t) ∈ GS × GT , i.e., the function
computers a normalized value for every pair (s, t). The higher the score value, the
more similar sand tare. The advantage of using similarity functions is to deal with a
finite interval for the score values.
For example, Fig. 3.3a, b illustrate two fragments of fuzzy RDF data graph with
some fuzzy elements, and crisp ones. The edge “pid2-has_address-addid2” associ-
ated with membership degree 0.5 in target graph Fig. 3.3a represents the fact that
the person labeled pid2, whose address label is addid2. And the possibility of the
fact is 0.5. Note that opaque labels exist as shown in Fig. 3.3b. The resource “_:”
is distinct from others, and it makes the resource name garbled. According to the
RDF specification (Manola et al., 2004), a blank vertex can be assigned an identifier
prefixed with “_:”.
With the presence of dislocated matching (Zhu et al., 2014), some vertices in an
RDF graph can be starting/ending vertices. We add the following restriction on fuzzy
RDF graph:
84 3 Fuzzy RDF Modeling
Fig. 3.3 The fuzzy RDF graphs. a Source graph; b target graph
1. There is one and only one vertex in the RDF graph that is called the home vertex,
denoted by v̂, which indicates the virtual beginning/end of all paths in an RDF
graph. We specify that the label of the home vertex is “_: H”, i.e., L(v̂) = “_:H”.
2. There are paths from the home vertex to any other vertices in fuzzy RDF graph.
That is, for each vertex v ∈ V except v̂, we add two edges (v, v̂) and (v̂, v). Thus, a
path can begin (or end
) with
( vertex
) v at all the locations where v occurs. Moreover,
we associate ρ v, v̂ ρ v̂, v = μ(v), i.e., we regard the fuzzy degree associated
with each vertex represents the possibility that the vertex exists in the graph as
the fuzzy degree of edge from home vertex to the vertex.
Matching procedure takes as input two fuzzy RDF graphs and outputs a set of
correspondences of the two graphs. Figure 3.4 illustrates an overview of the frame-
work and it has three main stages: First, the procedure is to compute a vertex-to-vertex
similarity score using different similarity functions. Label similarity functions adopt
different computation strategies to compute multiple types of vertex label similarity
scores. Structural similarity function iteratively computes similarity scores for every
vertex pair by aggregating the similarity scores of edge and immediate neighbors’
vertices. Then, we obtain the overall similarity by combining label similarity scores
and structural similarity scores. Finally, we select the potential correspondences
based on the similarity scores and include them in the alignment.
3.4 Similarity Matching of Fuzzy RDF Graphs 85
1. Syntactic Similarity
Intuitively, the element label denoting an element typically captures the most distinc-
tive characteristic of the element in the RDF graph model. The syntactic similarity
assigns a normalized similarity value to every pair (s, t) by applying the Levenshtein
distance (Levenshtein, 1966) to the name labels of s and t. Formally the syntactic
similarity sim sy (s, t) between two name labels s and t is defined as following.
L D(s.label, t.label)
sim sy (s, t) = 1 − (3.1)
max(|s.label|, |t.label|)
Here s.label and t.label denote the name label of s and t, respectively, max(|s.label|,
|t.label|) is the max length of the name string in s and t, and LD(w1 , w2 ) is the
Levenshtein distance between two words w1 and w2 .
2. Semantic Similarity
Step 3: Compute the Semantic similarity. We use the Jaccard similarity to calculate
the Semantic similarity on the synsets of each pair of tokens.
Step 4: Return the average-max Semantic similarity as the result. The formula is
shown as follows:
1 Σ
sim se (s, t) = max( J accar d(syn(s.toki ), syn(t.tok j )))
|s.tok| i j
(3.2)
Here |s.tok| is the number of tokens in the name of s, Jaccard denotes the Jaccard
similarity between two sets, and syn(w) denotes the WordNet synset of a token w.
(ρs ∨ ρt ) − (ρs ∧ ρt )
sim ρ (ρs , ρt ) = 1 − (3.3)
ρs + ρt
between vs and vt after step i − 1, i.e., Sim i−1 (vs , vt ); (ii) similarity degree between
( )
the forward neighbors of vs and those of vt after step i − 1, i.e., Sim i−1 vs' , vt' ;
(iii) similarity degree between the edge labels relating vs and vt to their forward
neighbors, i.e., sime (es , et ); (iv) similarity degree between the edges fuzzy values of
es and et .
To find the best match for vs among the forward neighbors ( of) vt , we need to
maximize the value sim e (es , et ) × sim Q (ρs , ρt ) × Sim i−1 vs' , vt' . The similarity
degrees between the forward neighbors of vs and their best matches among the
forward neighbors of vt after the ith iteration are computed by
1 Σ
sim i (vs , vt ) = max sim e (es , et )
| pr e(vs )| ' vt' ∈ pr e(vt )
vs ∈ pr e(vs )
( )
× sim ρ (ρs , ρt ) × Sim i−1 vs' , vt' (3.4)
And the similarity degrees between the forward neighbors of vt and their best
matches among the forward neighbors of vs after iteration i are computed by
1 Σ
sim i (vt , vs ) = max sim e (et , es )
| pr e(vt )| vs' ∈ pr e(vs )
vt' ∈ pr e(vt )
( )
× sim e (ρt , ρs ) × Sim i−1 vt' , vs' (3.5)
Note that this sim measure is asymmetric, i.e., sim i (vs , vt ) / = |sim i (vt , vs ).
In conclusion, we define the forward similarity degrees of vertex pair (vs , vt ) after
the ith iteration as follows:
(( ) )
Sim i (vs , vt ) = sim i (vs , vt ) + sim i (vt , vs ) /2 + Sim i−1 (vs , vt ) /2 (3.6)
For backward similarity degrees calculating, we perform the above formula for
vertex vs and vt , but consider their backward neighbors instead of their forward
neighbors.
2. Iterative Computation
1 Σ
sim k (vs , vt ) = max sim e (es , et ) × sim ρ (ρs , ρt )
| pr e(vs )| ' vt' ∈ pr e(vt )
v ∈ pr e (v )
s s
( ) 1 Σ
× Sim k−1
vs' , vt ≤
'
max sim e (es , et )
| pr e(vs )| ' vt' ∈ pr e (vt )
vs ∈ pr e (vs )
( )
× sim ρ (ρs , ρt ) × Sim k vs' , vt' = sim k+1 (vs , vt ).
Thus, we have Sim k (vs , vt ) ≤ Sim k+1 (vs , vt ), that is, the monotone non-
decreasing holds for i = k + 1.
Since both the basis and the inductive step have been performed, by mathematical
induction, the monotone non-decreasing holds for all i ≥ 1.
In order to obtain the overall similarity degrees between vertices, we need to aggregate
the results of different similarity functions. There are several approaches to this,
including linear averages, nonlinear averages and machine learning techniques. In
our works, we use a simple approach based on linear averages. Firstly, we obtain the
label similarity (SimL ) by taking an average of the syntactic similarities (simsy ) and
semantic (simse ) similarities. Then, total similarity (Sim) is calculated by combining
the label similarity and structural similarity (SimS ). To more accurately distinguish
between similarity scores that are close to the median, we use a non-linear function,
sigmoid function s (Ehrig and Sure, 2004), to compute each similarity score. The
idea behind using a sigmoid function is quite simple: it allows reinforcing similarity
scores higher than 0.5 and to weaken those lower than 0.5. That is to say, the sigmoid
function provides high values for the best matches and lower ones for the worse
matches. This treatment is meant to clearly separate two zones: the positive and
negative correspondences. In this way, the general formula for this combination task
can be given as following:
90 3 Fuzzy RDF Modeling
1. Set Operations
Set operations take a set of graphs as input and then perform set-theoretical oper-
ations. Here we identify four standard fuzzy set-graph operations, which are fuzzy
union (∪), fuzzy intersection (∩), Cartesian product (×) and fuzzy difference (−).
Fuzzy union: Let G 1 = (V1 , E 1 , Σ1 , L 1 , μ1 , ρ1 ) and G 2 =
(V2 , E 2 , Σ2 , L 2 , μ2 , ρ2 ) be two fuzzy RDF sub-graphs of G, respectively. The
fuzzy union of G1 and G2 is defined as follows.
G 1 ∪ G 2 = (Vr , Er , Σr , L r , μr , ρr )
G 1 ∩ G 2 = (Vr , Er , Σr , L r , μr , ρr )
Here Vr = V1 ∩ V2 , Er = E 1 ∩ E 2 , Σr = Σ1 ∩ Σ2 , L r = L 1 ∩ L 2 with
the( classic
) set theoretical
( ) ( μr (v)) = μ1 (v) ∧ μ2 (v), ∀v ∈ V1 ∩ V2 and
) intersection,
(
ρx vi , v j = ρ1 vi , v j ∧ ρ2 vi , v j , ∀ vi , v j ∈ E 1 ∩ E 2 are the membership degree
of fuzzy intersection (Sunitha, 2001) result respectively, and a ∧ b denoted the
minimum of a and b, i.e., a ∧ b = min(a, b).
For example, we apply a fuzzy intersection operation to the fuzzy RDF graphs
in Figs. 3.5a and 3.1. Then we get the result of the intersection operation shown in
Fig. 3.6.
92 3 Fuzzy RDF Modeling
G 1 × G 2 = (Vr , Er , Σr , L r , μr , ρr )
3.5 Algebraic Operations in Fuzzy RDF Graphs 93
Here
, and
G 1 − G 2 = (Vr , Er , Σr , L r , μr , ρr )
For example, Fig. 3.8 shows the result of fuzzy difference operation G1 − G2 ,
where the graph G1 is the fuzzy RDF graph in Fig. 3.1, and graph G2 is the fuzzy
RDF graph of Fig. 3.5a.
The fuzzy selection operation can filter the fuzzy graphs using a graph pattern. It
accepts a set of fuzzy graphs and a fuzzy graph pattern as input. The output is a fuzzy
collection composed of all subgraphs that match the given graph pattern, which is
not only the content of the right result but also the structure of objective graphs.
Fuzzy selection: Let G = (V , E, Σ, L , μ, ρ) be a fuzzy RDF data graph. For a
given RDF graph pattern P = (V P , E P , FV , FE , R E ), we have the definition of
fuzzy selection as follows.
Here g is a subgraph of G, function ∈(P, G) is used for fuzzy RDF graph pattern P
matching with G, and δ P (g) is the satisfaction degrees. In case of duplicates (a same
graph appearing with several satisfaction degrees), the highest satisfaction degree is
kept.
For example, Fig. 3.9 is the answer of σ P (G) where P is the RDF graph pattern of
Fig. 3.2 and G is the fuzzy data graph of Fig. 3.1. From the graph, the box office of
the film labeled Film1 is over 3.5 billion and its genre is tragedy. Two people labeled
pid1 and pid2 respectively are the stars of the film, and they are born in counrty1.
Furthermore, the path going from pid1/pid2 to country1 satisfies the regular express
RE = “* · locateIn+ ”. Thus, there are two answers (Fig. 3.9a, b) matching the graph
pattern P in the fuzzy data graph G. As the satisfaction degree is the minimum of
satisfaction degrees induced by Definition 3.4, we have δ P (g1 ) = 0.7 in Fig. 3.9a and
δ P (g2 ) = 0.3 in Fig. 3.9b, respectively.
Selection and projection are orthogonal operations in relational algebra. With RDF
graphs, selection and projection are not so obviously orthogonal. However, they
have different semantics that respectively correspond to two returned semantics for
matching pattern P against a fuzzy RDF graph G operation, and they are general-
izations of their respective relational counterparts. The fuzzy projection in our data
model takes a collection fuzzy graph as input, an RDF graph pattern P and a projec-
tion list PL as parameters. A projection list is a list of objects (vertices and edges)
labels appearing in the pattern P, possibly adorned with *. The output of projection
includes the all objects appearing in the P, however emphasizing that the (partial)
hierarchical relationship among the retained objects in the original input graph struc-
ture is preserved. Note that, if this projection list is empty, just the matching graphs
are returned. This implies that the fuzzy projection may be regarded as eliminating
96 3 Fuzzy RDF Modeling
objects other than specified in the fuzzy RDF data graph. The projection operation
is defined as follows.
Fuzzy projection: Let G = (V , E, Σ, L , μ, ρ) be a fuzzy RDF data graph, is a
fuzzy projection function and P is an RDF graph pattern. Then the fuzzy projection
can be defined as follows.
The result of the projection operation is a fuzzy set of graphs, and δT (g) is the
satisfaction degrees. The fuzzy projection operation returns a fuzzy set composed of
all subgraphs of G that match the fuzzy graph pattern P.
For example, we apply the same pattern graph of Fig. 3.2 and a projection to the
fuzzy RDF graph of Fig. 3.1. Then we obtain the result of the projection operation
shown in Fig. 3.10. The satisfaction degree δT (g) is 0.3. The difference in the output
structures of selection and projection operations is obvious.
The fuzzy join operation joins data graphs on a pattern. As in relational algebra, join
can be expressed as a Cartesian product followed by a fuzzy selection. The condition
of selection is to compare a property of the first graph with the other graph. In a valued
join, the join condition is a predicate on vertex labels of the constituent graphs. In a
structural join, the constituent graphs can be concatenated by edges or unification.
Fuzzy join: Let G1 and G2 be two fuzzy RDF graphs and P be an RDF graph pattern.
Then the fuzzy join operation is defined as follows.
G 1 ▷ P G 2 = {g|g = σ P (G 1 × G 2 )}
and G2 respectively, if no matching graph G '2 obtained from σ P2 (G 2 ) satisfies the join
condition L(v1 ) = L(v2 ), then output just σ P1 (G 1 ); otherwise, output σ P (G 1 × G 2 ).
5. Construction Operations
Querying a fuzzy RDF graph implies not only extracting interesting content from the
input model but also constructing an output model by inserting new vertices/edges
or by deleting vertices/edges from the extracted graph. Construction operations are
designed to facilitate the result graph construction for RDF queries.
The vertex deletion operation removes identify vertices from a graph. A delete
specification is used to identify vertices, and it indicates by vertex label which vertices
to delete.
Vertex deletion: Formally, the delete operation takes a fuzzy data graph G =
(V , E, Σ, L , μ, ρ) as input and a delete specification DS as parameter. A delete
specification is a set of vertices labels appearing in G. It generates a fuzzy graph
defined as follows:
{ ( )}
K (G, DS) = g|g = V ' , E ' , Σ, L , μ, ρ
Here V ' = {v|v ∈ V and L(v) ∈/ DS} and E ' is the restriction of E over V ' × V ' .
Edge deletion has the same idea with vertex deletion. It removes the relationship
from an RDF graph.
Edge deletion: Edge deletion operation takes as input a fuzzy graph G, a set of edge
labels ES, it returns a fuzzy graph defined as follows:
{ ( )}
λ(G, E S) = g|g = V, E ' , Σ, L , μ, ρ
Edge insertion: Let G be a fuzzy RDF graph, ES be the edges labels and δ be
fuzzy degree of the insert edges. The edge insertion operation returns a fuzzy graph
including the inserting edges.
( )
φ(G, E S) = (g|g = V , E ' , Σ ' , L , μ, ρ }
{ ( ) ( ) }
Here E ' = E ∪ e' |L e' ∈ E S and ρ e' = δ and Σ ' = Σ ∪ E S.
3.5.2 Equivalences
Equivalence laws can be applied to rewrite algebra expressions in a form that satis-
fies certain needs. In this section, we present some algebraic equivalences based on
data graph isomorphism. Algebraic laws are important for query optimization. Since
our RDF graph algebra shares some operations with relational algebra and there-
fore related properties and laws defined in relational algebra carry along. We focus
here on graph patterns properties that are unique to our algebra. First, we define an
equivalence relationship between graph patterns.
Definition 3.12 (Equivalence of graph patterns). Let P1 and P2 be two graph pattern
expressions. For any valuation ξ of P1 and P2 over G, it holds that ξ (P1 ) = ξ (P2 ).
Then the two graph pattern expressions P1 and P2 are equivalent, denoted by P1 ≡ P2 .
There are some properties for the fuzzy RDF algebra. For
4. σP (G1 ⟕ G2 ) = σP (G1 ) ⟕ G2
The list above is not comprehensive by any means. Further study of other algebraic
properties of RDF graph patterns is part of our current research focuses. We believe
that studying these algebraic properties can yield fruitful results that can further be
implemented in tasks like caching RDF query results, views management and query
results reuse.
In order to meet the needs of practical application, it is not enough for modeling fuzzy
RDF and querying fuzzy RDF is very necessary. This section investigates the fuzzy
RDF query processing according to the definitions of fuzzy RDF algebraic operations
presented above. We begin with the description of the characters of the SPARQL
query language in the fuzzy RDF and then explain the translation of SPARQL queries
into equivalent RDF algebraic expressions.
1. SPARQL Query in the Fuzzy RDF
SPARQL (Prud’hommeaux & Seaborne, 2008) is a proposal of a protocol and query
language designed for easy access to RDF format datasets. It defines a query language
with a SQL-like syntax, including joins and the capability to retrieve and combine
data from several graphs, where a simple query is based on graph patterns, and query
processing consists of the binding of variables to generate pattern solutions. SPARQL
comes with a powerful graph matching facility, whose basic construct are so-called
triple patterns. On top of that, SPARQL provides a number of advanced functions
100 3 Fuzzy RDF Modeling
for constructing more expressive queries, for stating additional filtering conditions,
and for formatting the final output.
The overall structure of the query language resembles SQL with its three major
parts, denoted by the upper-case key words SELECT, FROM, and WHERE.
1. The key word SELECT determines the result specification including solution
modifiers. The statements after SELECT refer to the remainder of the query: the
listed names are identifiers of variables for which return values are to be retrieved.
In contrast to SQL, SPARQL allows several forms of returning the data: a table
using SELECT, a graph using DESCRIBE or CONSTRUCT, or a TRUE/FALSE
answer using ASK.
2. The key word FROM specifies a dataset of one default graph and zero or more
named graphs to be queried.
3. The key word WHERE initiates the actual query, in which is composed of a graph
pattern. Informally speaking, this clause is given by a pattern that corresponds
to an RDF graph where some resources have been replaced by variables. But not
only that, more complex patterns are also allowed, which are formed by using
some algebraic operators. This pattern is used as a filter of the values of the
dataset to be returned.
Classical SPARQL query suffers from a lack of query flexibility. The given query
condition and the contents of the RDF repositories are all crisp. In this context, a
query answer will either definitely or definitely not satisfy the condition. In the fuzzy
RDF repositories, however, an answer may satisfy the query condition with a certain
possibility and a certain membership degree even if the condition is crisp due to the
fact that datasets are vagueness (or imprecision). Therefore, just like the definition of
fuzzy selection operation given above, one needs to compute appropriate trustworthi-
ness for the query results when fuzzy data are transformed through SPARQL queries.
Thus, we introduce one additional expression “WITH <threshold>”. The optional
parameter [WITH <threshold>] indicates the condition that must be satisfied as
the minimum membership degree threshold in [0, 1]. Users choose an appropriate
value of <threshold> to express his/her requirement. Therefore, a canonical SPARQL
statement is of the form: SELECT—FROM—WHERE—[WITH <threshold>].
Utilizing such SPARQL, one can get such answers that satisfy the given query
condition and the given threshold. Therefore, depending on the different thresholds
that are values in [0, 1], the same query for the same fuzzy RDF may have different
query answers. Queries for fuzzy RDF databases are concerned with the numerous
choices of threshold. Note that the item WITH <threshold> can be omitted. The
default of <threshold> is exactly 1 at this moment.
2. Translating SPARQL Pattern into Fuzzy RDF Algebraic Formalism
A principal motivation in designing fuzzy RDF graph model is to use it as a basis for
efficient implementation of high level RDF query language. As the standard query
language for RDF, SPARQL allows us to build complex group graph pattern. Group
patterns can be used to restrict the scope of query conditions to certain parts of the
pattern. Moreover, it is possible to define sub-patterns as being optional, or to provide
3.5 Algebraic Operations in Fuzzy RDF Graphs 101
multiple alternative patterns. In this section, we begin with the expressive power of
fuzzy RDF algebra w.r.t the core fragment of SPARQL query languages. Then,
we show that every SPARQL query pattern can be translated into our fuzzy RDF
algebraic terminology introduced above, and provide the procedure that performs
this translation.
Our fuzzy RDF algebra is designed taking SPARQL’s power of expression into
consideration. SPARQL pattern expressions from the WHERE clause can easily be
translated into fuzzy RDF algebraic expressions. The vice versa translation is not
always possible as there are fuzzy RDF algebra expressions (e.g. expressions with
construction operations) that are not expressible in SPARQL. Before providing the
procedure that performs this translation, we discuss the translation rules of SPARQL
pattern into RDF algebra expression. We do not recall the complete surface syntax
of SPARQL here but simply introduce the underlying algebraic operations using our
notation. Let G be a RDF graph over a RDF dataset D, t denotes a triple pattern,
P, P1 , P2 be basic SPARQL graph patterns, and R a filter condition, and S a set of
variables. Table 3.3 shows the translation rules of SPARQL query mode and RDF
algebraic expression.
An SPARQL query pattern is either a basic graph pattern or group graph pattern,
consisting of the triple blocks, FILTER, OPTIONAL, and UNION graph pattern.
Some of them contain other graph patterns. The above translation is applied to a single
SPARQL group graph pattern. Nested group graph pattern blocks in the WHERE
clauses can be handled quite easily, leading to the following result:
Theorem 3.4 Fuzzy RDF algebra expressions can express SPARQL query patterns.
Proof: SPARQL individual triple patterns can be expressed by “triple pattern match-
ing” expressions. Basic graph patterns in SPARQL imply a join on common variables
among individual triple patterns. The UNION, FILTER, OPTIONAL and SELECT
expressions can be directly mapped to “union”, “selection” “leftjoin” and “projec-
tion” operators in fuzzy RDF algebra operations. These identified pattern expressions
in the nesting sequence, inside out, and then can be expressed by a cascade of “join”
operator in the same way that natural join is defined in relational algebra.
Besides the conversion rules as such, it is of course also necessary to define how to
transform SPARQL queries into expressions of this algebra in the first place. Based on
the above translation rules and Theorem 3.4, we can transform any SPARQL patterns
into algebra expression. For the sake of readability, we assume that the translation
102 3 Fuzzy RDF Modeling
Algorithm 3.2 Transformation of SPARQL pattern syntax into fuzzy RDF algebraic
expression
Translate (group graph pattern G)
Input: a SPARQL pattern G
Output: an algebraic expression A
1: A= φ; F = φ
2: for each syntactic form g in G do
3: if g is triple pattern t then
4: A = (A ⨝ (t))
5: if g is OPTIONAL {P} then
6: A = (A ⟕ Translate (P))
7: if g is {P1 } UNION…UNION {Pn } then
8: if n>1 then
9: A' = (Translate (P1 ) ∪…∪ Translate (Pn ))
10: else
11: A' = Translate (P1 )
12: A = (A ⨝ A' )
13: if g is FILTER {R} then
14: F = F ∧ {R}
15: end for
16: if F /= φ then
17: A = σ F(A)
Algorithm 3.2 consists of three phases. In the first phase (Lines 1), initially the set
A and F are empty, in which store pattern and filtering conditions respectively. In the
second phase (Lines 2–15), translating is performed to obtain all algebraic expression
of g in group graph pattern G. For each loop of translating, if sub pattern g is a
triple pattern or triple block, a join operation is performed for collecting triples and
blocks (Line 3–4). Then, for each sub pattern g with Optional, a left join operation
is performed to provide optional matching (Lines 5–6). Next, all occurrences of
UNION are expressed using the binary operator union for specifying alternatives
(Lines 7–12). In case of a longer chain of alternatives, the patterns are processed
two at a time in accordance with the association rules for UNION. Finally, if g is an
operator FILTER, and R is a SPARQL built-in condition, a conjunction operator is
performed for combining filter condition R and F as basic constraints (Lines 13–14).
This procedure is repeated until all sub patterns in G have been translated. If F is not
empty, combine it with A with the selection operator in fuzzy RDF algebra operations
(Lines 16–17).
In Algorithm 3.2, we focus on the core fragment of SPARQL query pattern and,
thus, we impose the following restrictions on graph patterns and the translation
process. First, we will be mainly focused on the procedure that performs this transla-
tion of SPARQL patterns, that is, we do not take into account the solution modifiers
and the output of a SPARQL query. Second, we are not considering blank vertices.
3.5 Algebraic Operations in Fuzzy RDF Graphs 103
We make this simplification here to concentrate on the pattern matching part of the
language. And third, we concentrate on the set semantics of graph patterns.
Proposition 3.7 Algorithm 3.2 is correct and complete for translating the SPARQL
pattern into RDF algebraic expressions.
This proposition can be proved inductively. First, the set of algebraic expressions
is complete for the empty set, and at each step the SPARQL graph pattern G is
completely extended for the current syntactic form f , and the number of syntactic
forms being finite in SPARQL graph pattern. The algorithm proceeds recursively
until all syntax forms have been translated into algebraic expressions completely.
The procedure ends having an algebraic expression for each syntactic form in G.
In essence, a SPARQL query is a result constructor wrapped in a set of vari-
able bindings generated by the graph pattern. Therefore, the final step of translation
work generates the operators for the result type. The official W3C Recommenda-
tion (Prud’hommeaux & Seaborne, 2008) defines four different types of queries on
top of expressions, namely SELECT, ASK, CONSTRUCT, and DESCRIBE queries.
Depending to the result type of the query, the translation creates the appropriate oper-
ator and connects it to the algebraic expression generated so far, i.e., it constructs a
dataflow to the algebraic expression representing the graph pattern. We will restrict
our discussion to SELECT queries in Example 9. The various expressive features of
a SELECT query can be successively replaced by an expression using fuzzy RDF
algebra operators. We do not recall the complete surface syntax of SPARQL here but
simply introduce the underlying algebraic operations using our notation.
In the following, we show how a fuzzy RDF algebraic expression is used to
represent an SPARQL query. For convenience, we firstly use the natural language
to express the fuzzy RDF queries. Then, we provide the SPARQL query statement
written according to the official SPARQL syntax along with their equivalent RDF
algebraic expression.
For example, suppose that we would query the name of a movie and its starring
name. The movie’ box office is more than more than 30 million. The birthplace of
the starring is “region1” and, optionally (i.e., if available), their partner. At the same
time, trustworthiness for the query result is more than 0.2. Consider the SPARQL
query written according to the official SPARQL syntax.
PREFIX ex: <http://example.org/>
SELECT ?x, ?z, ?p
FROM G
WHERE { ?x ex: box office ?y
FILTER (?y > $ 30 million)
?x ex: starring ?z.
?z ex: bornIn ?c.
?c ex: locateIn ?r
FILTER (?r = “region1”)
OPTIONAL {?z ex: marriedTo ?p } }
WITH <0.2>
104 3 Fuzzy RDF Modeling
Following the grammar of SPARQL, the above pattern (WHERE clause) is parsed
as a single group graph pattern that contains the syntactic forms triple block, filter,
triple blocks, filter, and optional graph pattern in that order. This final optional graph
pattern syntactic form contains a group graph pattern with a single triple block
syntactic form. The translation procedure in Algorithm 3.1 starts with A = {} and F
= {}. Then we consider all the syntactic forms in the pattern to obtain:
P = σ F (A)
Here A = (({} ⨝ {(?x ex: box office ?y)} ⨝ {(?x ex: starring ?z), (?z ex: born
in ?c), (?c ex: locateIn ?r)}) ⟕ ({} ⨝ {(?z:marriedTo ?p)}) and F = ((?y > “$ 30
million”) ∧ (?r = “region1”)). Assume that the input RDF graph G is given in Fig. 1.
Then the above SPARQL query evaluated on the fuzzy RDF graph G is equivalent
to the RDF algebraic expression:
π P,L S (G)
Here P = σ F (A) is the pattern graph, LS = {?x, ?z, ?p} is the projection list and
G is the input RDF graph. It is easily verified that answers are as follows.
πP, LS (G) = {<{?x → film1, ?z → pid1}, 0.7>, <{?x → film1, ?z → pid2, ?p →
pid3}, 0.3>}.
Similar translations are also feasible for other SPARQL query types. The main
challenge of SPARQL query translation to the algebraic expression lies in the core
fragment of the query pattern, which is common to all query types. We will briefly
introduce the translation process corresponding to different SPARQL query types.
A CONSTRUCT query can copy existing triples from a dataset, or can create
new triples. For the former case, the triple graph (result graph) can be directly
retrieved from the data source by the selection operation. For the latter case, the
intermediate graph can be extracted by the selection operation firstly. And then the
required triples can be extracted by the projection operation. Finally, construction
operations are designed to facilitate the result graph construction for RDF queries by
providing a means for creating and inserting new vertices/edges and manipulating the
3.6 Summary 105
extracted structures. Of course, this process may be repeated using multiple construc-
tion operations to complete. And the specific number of construction and complexity
determined by the size of the query problem.
ASK asks a query processor whether a given graph pattern describes a set of triples
in a given dataset or not, and the processor returns a boolean true or false depended
on whether there is a result graph. We can use a selection operation to extract a result
graph from a specific data source, based on a given graph pattern.
In a DESCRIBE query, it takes each of the resources identified in a solution,
together with any resources directly named by IRI, and assembles a single RDF graph
by taking a “description” which can come from any information available including
the target RDF dataset. It is worth noting that the description is determined by the
SPARQL query processor, according to the SPARQL 1.1 specification. This has led
to inconsistent implementations of DESCRIBE queries. In our solution, similar to
the CONSTRUCT query, the query pattern is utilized to create a result set. And
selection and projection operations are designed to return an RDF graph describing
a set of IRIs and the resources that are bound to given variable names, i.e., it returns
all the triples in the dataset involving these resources. Finally, the result RDF graph
is obtained through the construction operation.
3.6 Summary
for fuzzy RDF data management. How to store RDF with imprecise or uncertain
information has raised certain concerns as will be introduced in the following chapter.
References
Chen, L., Gupta, A., & Kurul, M. E. (2005). A semantic-aware RDF query algebra. In Proceedings
of the International Conference on Management of Data, Hyderabad, India.
Dividino, R., Sizov, S., Staab, S., & Schueler, B. (2009). Querying for provenance, trust, uncertainty
and other Meta knowledge in RDF. Journal of Web Semantics: Science, Services and Agents on
the World Wide Web, 7(3), 204–219.
Dorneles, C. F., Gonçalves, R., & dos Santos Mello, R. (2011). Approximate data instance matching:
A survey. Knowledge and Information Systems, 27(1), 1–21.
Ehrig, M., & Sure, Y. (2004). Ontology mapping—An integrated approach. In European Semantic
Web Symposium (pp. 76–91). Springer.
Fan, T., Yan, L., & Ma, Z. (2019). Mapping fuzzy RDF(S) into fuzzy object-oriented databases.
International Journal of Intelligent Systems, 34(10), 2607–2632.
Fan, W., Li, J., Ma, S., Tang, N., & Wu, Y. (2011). Adding regular expressions to graph reacha-
bility and pattern queries. In Proceedings of the 27th IEEE International Conference on Data
Engineering, Hannover, Germany (pp. 39–50).
Frasincar, F., Houben, G. J., Vdovjak, R., & Barna, P. (2002). RAL: an algebra for querying RDF.
In Proceedings of the Third International Conference on Web Information Systems Engineering
(pp 173–181).
Fukushige, Y. (2005). Representing probabilistic relations in RDF. In Proceedings of the Interna-
tional Semantic Web Conference, Galway, Ireland (pp. 106–107).
Grant, J., & Beckett, D. (2002). RDF test cases. http://www.w3.org/TR/2002/WD-rdf-testcases-
20021112/
Hartig, O. (2009). Querying trust in RDF data with tSPARQL. In Proceedings of the 6th European
Semantic Web Conference on the Semantic Web: Research and Applications, Heraklion, Crete,
Greece (pp. 5–20).
Hayes, P. (2004). RDF Semantics, W3C Recommendation. http://www.w3.org/TR/rdf-mt/
Huang, H., & Liu, C. (2009). Query evaluation on probabilistic RDF databases. In Proceedings
of the 10th International Conference on Web Information Systems Engineering, Poznań, Poland
(pp. 307–320).
Jaro, M. A. (1989). Advances in record-linkage methodology as applied to matching the 1985
census of Tampa, Florida. Journal of the American Statistical Association, 84(406), 414–420.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals.
In Soviet physics doklady (Vol. 10, No. 8, pp. 707–710).
Lopes, N., Polleres, A., Straccia, U., & Zimmermann, A. (2010). Anql: Sparqling up annotated rdfs.
In Proceedings of the 9th International Semantic Web Conference, Shanghai, China (pp. 518–533).
Ma, Z., Li, G., & Yan, L. (2018). Fuzzy data modeling and algebraic operations in RDF. Fuzzy Sets
and Systems, 351, 41–63.
Ma, Z. M., Liu, J., & Yan, L. (2010). Fuzzy data modeling and algebraic operations in XML.
International Journal of Intelligent Systems, 25(9), 925–947.
Manola, F., Miller, E., & McBride, B. (2004). RDF primer. W3C Recommendation, 10(1–107), 6.
Mazzieri, M., & Dragoni, A. F. (2008). A Fuzzy Semantics for the Resource Description Framework,
Uncertainty Reasoning for the Semantic Web I: ISWC International Workshops, URSW 2005–
2007 (pp. 244–261). Springer.
Nejati, S., Sabetzadeh, M., Chechik, M., Easterbrook, S., & Zave, P. (2011). Matching and merging
of variant feature specifications. IEEE Transactions on Software Engineering, 38(6), 1355–1375.
References 107
Piattini, M., Galindo, J., & Urrutia, A. (2006). Fuzzy Databases: Modeling, Design and Implemen-
tation.
Pappis, C. P., & Karacapilidis, N. I. (1993). A comparative assessment of measures of similarity of
fuzzy values. Fuzzy Sets and Systems, 56(2), 171–174.
Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet:: Similarity-Measuring the
Relatedness of Concepts. In AAAI (Vol. 4, pp. 25–29).
Prud’hommeaux, E., & Seaborne, A. (2008). SPARQL Query Language for RDF. W3C Recommen-
dation. http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/
Robertson, E. L. (2004). Triadic relations: An algebra for the semantic web. In Proceedings of the
Second International Workshop on Semantic Web and Databases, Toronto, Canada (pp. 91–108).
Straccia, U. (2009). A minimal deductive system for general fuzzy RDF. In Proceedings of the Third
International Conference Web Reasoning and Rule Systems, Chantilly, VA, USA (pp. 166–181).
Sunitha, M. S. (2001). Studies on fuzzy graphs. PhD thesis, Cochin University of Science and
Technology, India.
Tappolet, J., & Bernstein, A. (2009). Applied temporal RDF: Efficient temporal querying of RDF
data with SPARQL. In Proceedings of the 6th European Semantic Web Conference on the Semantic
Web: Research and Applications, Heraklion, Crete, Greece (pp. 308–322).
Udrea, O., Recupero, D. R., & Subrahmanian, V. S. (2010). Annotated RDF. ACM Transactions on
Computational Logic, 11(2), 1–41.
Udrea, O., Subrahmanian, V. S., & Majkic, Z. (2006). Probabilistic RDF. In 2006 IEEE International
Conference on Information Reuse and Integration, Waikoloa Village, HI (pp. 172–177).
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Systems, 1(1), 3–28.
Zimmermann, A., Lopes, N., Polleres, A., & Straccia, U. (2011). A general framework for repre-
senting, reasoning and querying with annotated semantic web data. Journal of Web Semantics,
11(3), 72–95.
Zhu, X., Song, S., Lian, X., Wang, J., & Zou, L. (2014). Matching heterogeneous event data.
In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
(pp. 1211–1222).
Chapter 4
Persistence of Fuzzy RDF and Fuzzy
RDF Schema
4.1 Introduction
RDF represents an emerging data model that provides the means to describe resources
in a semi-structured manner for real-world applications. In practice, RDF is gaining
widespread momentum and usage in different domains, such as the Semantic Web,
Linked Data, Open Data, social networks, digital libraries, bioinformatics, or business
intelligence. With the wide application of RDF, the scale of available RDF data is
increasing dramatically. At this point, the scalable storage and efficient queries of
RDF data are becoming increasingly crucial. The former is the infrastructure for
RDF data management (Ma et al., 2016).
We can identify three major types of RDF data store: memory-based storage,
traditional databases-based storage, and NoSQL databases-based storage (Harris &
Gibbins, 2003). While the memory-based storage [e.g., BitMat (Atre et al., 2009),
BRAHMS (Janik & Kochut, 2005), and RDFox (Nenov et al., 2015)] has the fastest
speed of processing RDF, this method can only store the most necessary RDF struc-
tural data due to the memory usage restriction. A more common RDF storage
method is based on traditional databases, such as relational databases and object-
oriented databases. In the context of relational databases, we can further identify
three approaches.
With the first one called vertical stores or triple stores [e.g., RDFPeers (Cai &
Frank, 2004), 3store (Harris & Gibbins, 2003), RDF-3X (Neumann & Weikum,
2008; Neumann & Weikum, 2010a, 2010b), and Hexastore (Weiss et al., 2008)],
each RDF triple is stored as a tuple in a relational table with the relational schema
(subject, predicate, object), in which each column corresponds to an element of RDF
triple. The disadvantage of this approach is that too many self-join operations must
be applied while querying RDF data stored in the relational table.
The second approach called horizontal stores [e.g., SW-Store (Abadi et al., 2009)
and C-Store (Weiss et al., 2018)] divides RDF triples vertically based on their pred-
icates. Then the triples with the same predicate are stored in a relational table. Such
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 109
Z. Ma et al., Modeling and Management of Fuzzy Semantic RDF Data,
Studies in Computational Intelligence 1057,
https://doi.org/10.1007/978-3-031-11669-8_4
110 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
a predicate-oriented relational table does not contain null values and multivalued
attributes. But this approach involves complicated join operations among different
relational tables to get RDF data stored in multiple relational tables.
With the third approach called property stores (Chong et al., 2005; Sintek & Kiesel,
2006; Wilkinson et al., 2003), for the same or similar subject, multiple attributes are
designed in the form of n-ary table columns. Then each row stores the same or
similar subject and its corresponding attribute value. This approach can reduce join
operations but faces the problems of null values and multivalued attributes.
Although RDF storage based on relational database provides a convenient way
to manage RDF data, relational database cannot well support the storage of massive
RDF data. Therefore, to store large-scale RDF data, some distributed storage archi-
tectures are developed specifically for massive RDF data in the distributed RDF data
management systems. More recently, NoSQL databases such as CouchDB, HBase
(Sun & Jin, 2010), and graph databases (Peng et al., 2016) are used to store and
manage large-scale RDF data (Cudré-Mauroux et al., 2013).
Note that all the aforementioned works assume that the underlying RDF data
are reliable and precise. However, information is often imprecise and uncertain in
many real-world applications, and many sources can contribute to the imprecision
and uncertainty of data or information. Therefore, the study of reengineering fuzzy
RDF in fuzzy database models has received attention. Fuzzy databases such as fuzzy
relational databases and fuzzy object-oriented databases (Quasthoff & Meinel, 2011)
can store a large set of semantic information. And reengineering fuzzy RDF into fuzzy
database models may satisfy the needs of storing fuzzy RDF data in fuzzy databases.
Currently, there have been many efforts in storing crisp RDF data based on various
databases while few in storing fuzzy RDF data (Bornea et al., 2013; Chen et al.,
2006).
Currently, there have been many efforts in storing crisp RDF data based on various
databases while few in storing fuzzy RDF data. To the best of our knowledge, there
are only two efforts in the storage of fuzzy RDF. Ma and Yan (2018) investigated
the formal mapping from the fuzzy RDF model to the fuzzy relational databases,
which is based on the fuzzy relational database model and supports the storage of
fuzzy RDF triples. Considering the storage of fuzzy RDFS in addition to fuzzy RDF
triples, Fan et al. (2019) presented an approach for reengineering fuzzy RDF(S) into
fuzzy object-oriented database models. Like the situation of crisp RDF storage in
databases, the fuzzy relational databases and fuzzy object-oriented databases cannot
effectively support large-scale fuzzy RDF data management. In this chapter, we intro-
duce the issue on reengineering fuzzy RDF into fuzzy database models, including
fuzzy relational database model, fuzzy object-oriented database model, and HBase
databases.
4.2 Fuzzy RDF Mapping to Relational Databases 111
Because of its success in data storage and management, and because the triple form
of RDF data subject-predicate-object can be easily mapped to the relational data table
model, the relational database is used by many researchers as RDF. Depending on the
table structure of the RDF triples mapped to the relational database, the corresponding
storage methods are also different. To reengineer fuzzy RDF into fuzzy relational
database model, Ma and Yan (2018) investigated the formal mapping from the fuzzy
RDF model to the fuzzy relational database. In this section, we investigate the strate-
gies and approaches to mapping fuzzy RDF data to fuzzy relational databases based
on the research work of Ma and Yan (2018). It is important to note that the fuzzy
RDF model in this section differs from the model defined in the previous section for
the sake of simplicity. That is, the fuzzy RDF model in this section only considers
the fuzziness of triples, and does not consider the fuzziness of element-level.
Fig. 4.1 RDF triples and fuzzy graph view. a Fuzzy RDF data, b fuzzy RDF graph
In order to overcome the problem of self-joins in fuzzy triple stores, a single rela-
tional table containing all different predicates as columns may be applicable. In the
relational table, for each unique predicate of RDF triples, a subject–object relation
is directly represented, in which the predicate is as a column name and the object is
a value of this column. Triples with the same subject become a tuple of relational
databases. Note that several triples with the same subject may have the same predicate
and different objects. In Fig. 4.2 for example, we have three triples (IBM, industry,
Software, 1.0), (IBM, industry, Hardware, 1.0), and (IBM, industry, Services, 0.9).
They are mapped into a tuple, which value on attribute “industry” is a fuzzy set
represented by {(Software, 1.0), (Hardware, 1.0), (Services, 0.9)}. The approach for
storing fuzzy RDF data in a single relational table database is called fuzzy horizontal
stores in this paper.
Formally, for a given set T of fuzzy triples, suppose that n different predicates, say
p1 , p2 , …, pn , are included. Then we have a fuzzy relational schema with the form of
(subject, p1 , p2 , …, pn ). For any triple (si , pi , (oi , λi )) ∈ T, it should correspond to a
tuple t i in the fuzzy relation. If there are not any tuples which have value si on attribute
subject in the fuzzy relation, t i is a new tuple and inserted into the fuzzy relation. At
this point t i [subject] = si , t i [pi ] = {(oi , λi )}, and the values of t i on other attributes
are null values. If there exists a tuple ti which has value si on attribute subject in the
fuzzy relation (i.e., t i [subject] = si ), we need to further determine if t i [pi ] is a null
4.2 Fuzzy RDF Mapping to Relational Databases 113
value or not. If t i [pi ] is a null value, then t i [pi ] = {(oi , λi )}, otherwise t i [pi ] = t i [pi ]
∪ {(oi , λi )}. For the fuzzy RDF triples and fuzzy RDF graph presented in Fig. 4.2,
their relational representation of fuzzy horizontal stores is shown in Table 4.2. There
are five different subjects and 13 unique predicates, and so the single relational table
contains five tuples and 13 columns (attributes).
It can be seen from the example that fuzzy horizontal stores use a single rela-
tional table which contains all different predicates as columns. When new triples
are inserted, new predicates result in changes of the relational schema and dynamic
schemas of RDF data cannot be handled. In addition, it is a common case that in
the single relational table containing all predicates as columns, a subject occurs only
with some predicates, which leads to a sparse relational table with many null values.
To solve the problem of too many null values in fuzzy horizontal stores, we
propose two variations of fuzzy horizontal stores in the following, which are called
fuzzy column stores and fuzzy type stores in the paper. The basic idea of these two
kinds of stores is to vertically partition the single table of fuzzy horizontal stores into
a set of tables by the predicates. Each table contains one predicate (in fuzzy column
stores) or several predicates (in fuzzy type stores).
114 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
result, the descriptions of a subject in its properties and objects are partitioned in
multiple relational tables, and this generally involves too many join operations
for querying. In addition, when new triples are inserted, new predicates result in
new relational tables and dynamic schemas of RDF data cannot be handled also.
To solve the problem of too many join operations for querying, we introduce
fuzzy type stores in the following.
2. Fuzzy type stores
Fuzzy type stores also use a set of relational tables. But, instead of vertically
partitioning the single table of fuzzy horizontal stores into a set of tables by each
predicate, in fuzzy type stores, each relational table contains some predicates
as its columns. The predicates included in a relational table generally have the
same data types. The fuzzy RDF triples which properties have the same data
types will arise in the same relational table. Here, fuzzy triples with the same
subject become a tuple of the corresponding relational table. For the triples which
have the same subject, the same predicate and different objects, they are mapped
into a tuple, which value on the predicate as an attribute is represented as a fuzzy
set. The different objects contained in this fuzzy set act as its supports.
Formally, suppose that we have a set T of fuzzy triples having m unique predicates
with the same data type, say p1 , p2 , …, pm . For these fuzzy triples, we have a fuzzy
relational schema with the form of (subject, p1 , p2 , …, pm ). For any two triples (si ,
pi , (oi , λi )) ∈ T and (sj , pj , (oj , λj )) ∈ T, they arise in the same relational table with
one row for each subject. Furthermore, when si = sj and pi /= pj , oi and oj are placed
in different columns pi and pj of the same row in the forms of {(oi , λi )} and {(oj ,
λj )}, respectively; when si = sj and pi = pj , oi and oj are placed in the same column
of the same row in the forms of {(oi , λi ), (oj , λj )}; when si /= sj and pi = pj , oi and
oj are placed in the same column of different rows si and sj in the forms of {(oi , λi )}
and {(oj , λj )}, respectively.
For the fuzzy RDF triples and fuzzy RDF graph in Fig. 4.1, the predicates are
identified as three data types: people, companies and operation systems. Then their
relational representation of fuzzy type stores is shown in Fig. 4.3.
The approach of fuzzy type stores is actually a trade-off between fuzzy hori-
zontal stores and fuzzy column stores. Fuzzy horizontal stores use a single relational
table, which generally contain many null values but do not involve join operations.
Fuzzy column stores use a set of relational tables, which do not contain null values
but involve too many join operations. Fuzzy type stores contain fewer null values
compared with fuzzy horizontal stores and involve less join operations compared
with fuzzy column stores.
Summarily the strategies and approaches to storing fuzzy RDF data in fuzzy
relational databases are presented in the above, including fuzzy triple stores, fuzzy
horizontal stores, fuzzy column stores and fuzzy type stores. Since fuzzy RDF data
are stored in fuzzy relational databases and SPARQL (Simple Protocol and RDF
Query Language, the RDF query language recommended by W3C) cannot be applied
directly, a consequent issue emerges, that is how to query fuzzy RDF data stored in
fuzzy relational databases. A possible way is to translate SPARQL queries for RDF
4.2 Fuzzy RDF Mapping to Relational Databases 117
data to SQL (Structured Query Language, the standard query language for relational
databases) queries for relational databases. Let us look at fuzzy RDF data in Fig. 4.1.
Suppose that we have a SPARQL query:
SELECT DISTINCT ?p ?company
WHERE {{?p founder ?company.
UNION
?p board ?company.}
?company industry “Software”.
}
This SPARQL query returns (Charles Flint, IBM) and (Larry Page, Google).
Now we store fuzzy RDF data in Fig. 4.1 in fuzzy relational databases, say the
fuzzy relational databases in Fig. 4.2. Suppose that these tables are named with
their predicates and we have t_born, t_died, t_founder, t_board, t_home, t_version,
t_developer, t_kernel, t_preceded, t_graphics, t_industry, t_employees and t_HQ.
Then the SPARQL query above is translated to a SQL query correspondingly:
SELECT DISTINCT *
FROM
(((SELECT subject, founder as company FROM t_founder) AS t_1
LEFT OUTER JOIN
(SELECT subject as company, board FROM t_board) AS t_2
On (false))
UNION
((SELECT subject as company, board FROM t_board) AS t_3
LEFT OUTER JOIN
(SELECT subject, founder as company FROM t_founder) AS t_4
On (false))
) AS t _5
INNER JOIN
118 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
The classical relational database model and its fuzzy extension do not satisfy the need
of modeling complex objects with imprecision and uncertainty. In order to model
uncertain data and complex-valued attributes as well as complex relationships among
objects, current efforts have concentrated on the fuzzy object-oriented databases as
introduced in Chap. 2. Therefore, reengineering fuzzy RDF into fuzzy object-oriented
database model may satisfy the needs of storing fuzzy RDF data in fuzzy databases
and help to the interoperability between fuzzy object-oriented database model and
fuzzy RDF. Based on the similar idea in Sect. 4.2, in the following, we introduce
how to reengineer fuzzy RDF into fuzzy object-oriented database model, provide a
set of rules for mapping fuzzy RDF into fuzzy object-oriented database model and
implement a prototype to demonstrate our approach.
Note that, we apply the fuzzy object-oriented databases instead of the fuzzy rela-
tional databases because the fuzzy object-oriented database model can represent
complex objects and relationships with fuzziness more effectively. More impor-
tantly, the fuzzy object-oriented databases are very suitable for storing some impor-
tant concepts in the fuzzy RDFS such as fuzzy classes, instances, properties, and
fuzzy class/property hierarchies.
To deal with uncertainties in RDF Schema layer, Fan et al. (2019) extended the
definition of fuzzy RDF graph model, which explicitly classifies the element Σ that
is the set of labels into five categories. That is Σ = {C, OP, LP, D, T } is a set
of labels, where C is a set of class resource labels, OP is a set of object property
resource labels, LP is a set of datatype property resource labels, D is a set of datatype
labels, and T is a set of instance resource labels. In particular, we investigate how
to formally map the fuzzy RDF model to the fuzzy object-oriented database model
in the subsection. We develop mapping rules and implement a prototype system to
demonstrate the feasibility of our approach.
In the fuzzy RDF graph, its label elements include class resource labels, property
resource labels, datatype labels, and instance resource labels. The elements on the
fuzzy RDF semantic layer can identify the types of resources that the vertices and
edges in the fuzzy RDF graph model correspond to. It can be seen that they are very
similar to the elements of the fuzzy object-oriented database. The interpretation of
the semantic layer of the fuzzy RDF graph model mainly includes four aspects:
4.3 Fuzzy RDF Mapping to Object-Oriented Databases 119
In the fuzzy RDF model, classes are the elements of the RDFS layer. The fuzzy class
differs from the classic class because the behavior and state of the object contained
in the fuzzy class are uncertain. That is, the properties of the fuzzy classes are
fuzzy ones. In addition, the inheritance relationships between fuzzy classes are also
uncertain. It means that two (fuzzy) classes have a subclass-superclass relationship
with a membership degree.
For the fuzzy classes in the fuzzy RDF, we need to map not only these classes
themselves but also their relationships. For this purpose, we propose two mapping
rules in the following. Here we use a function Γ that maps the elements in the fuzzy
RDF model to the corresponding elements in the fuzzy object-oriented databases.
Rule 1: L v (vi ) ∈ Σ.C ⇒ Γ (vi ) = fci ∈ FC FS
When a vertex label of the fuzzy RDF graph model is a class label, it is mapped to
a class in the fuzzy object-oriented database model and then named after the label.
Rule 2: L v (vi ) ∈ Σ.C ∧ L v (vj ) ∈ Σ.C ∧ L E (vi × vj ) = subClassOf ⇒ Class fci is-a
fci /μ type-is ft k ∈ FT.
120 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
When an edge label of the fuzzy RDF graph model is subClassOf , the fuzzy class
which corresponds to the start point is a subclass of the fuzzy class which corresponds
to the end point. And the label value is the membership degree of the subclass to the
superclass.
Let us look at a fuzzy RDF subgraph model shown in Fig. 4.4. It is shown in
Fig. 4.4 that there are three vertex labels, which are the class labels Person, Student
and Staff , and two edge labels, which are named subClassOf . The labels values are
0.8 and 0.9, respectively. They are mapped to the class Person, the class Student, and
the class Staff in the fuzzy object-oriented database model respectively, the keyword
is-a in the type expression denotes that the class Student and the class Staff are
subclasses of the class Person.
The corresponding mapping structure is as follows:
Class Person type-is
Union Student/0.8, Staff /0.9
End
Class Student is-a Person/0.8 type-is
Record
…
End
Class Staff is-a Person/0.9 type-is
Record
…
End
Note that the above Rule 1 and 2 do not take the order of mapping into account.
When a fuzzy subclass is mapped and meanwhile its superclass is not mapped accord-
ingly, an error will occur. In the following, we organize the fuzzy classes into a
hierarchical structure and present an algorithm of mapping fuzzy class hierarchies
as shown in Algorithm 4.1.
4.3 Fuzzy RDF Mapping to Object-Oriented Databases 121
The root node “rdfs:Class” is the parent node of all fuzzy class nodes after the
fuzzy classes are organized in a hierarchical structure. Algorithm 4.1 uses a breadth-
first traversal. With the algorithm, the root node “rdfs:Class” first enters the queue
Q. Since the root node is just an abstract class, instead of mapping it, it is judged
whether it has a son node and if so, all its son nodes are enqueued. If the queue Q
is not empty, a fuzzy class node is sequentially dequeued from the queue Q, and it
is mapped according to Rule 1 and 2. This fuzzy class node is further determined
whether it has a son node, and if so, all its son nodes are enqueued. In a similar
way, each node in the queue is dealt with until the queue is empty. Finally, all fuzzy
classes in the fuzzy RDF graph are mapped in order according to their hierarchical
relationship.
When the edge label of the fuzzy RDF graph is an object property label, the start
points label and the end point label are both class labels. In this case, the edge is
mapped into an attribute of the fuzzy class, which corresponds to the start point and is
named after the object property label; the endpoint is mapped into the corresponding
fuzzy class according to Rule 1.
Note that in the above mapping of fuzzy properties, the relationship between
fuzzy property and fuzzy subproperty is not considered. Such a mapping process
does not need to consider the mapping order under the premise of not considering
time efficiency. In the following, we organize the fuzzy properties into a tree with
height 2, which root node is the node “rdf:Property”. We present an algorithm of
mapping fuzzy properties as shown in Algorithm 4.2
The root node “rdf:Property” is the parent node of all fuzzy property nodes after
the fuzzy properties are organized in a hierarchical structure. Algorithm 4.2 uses a
breadth-first traversal. With this algorithm, the root node “rdf:Property” first enters
the queue Q. The root node is just an abstract property, so it is dequeued rather than
mapped. Then it is judged if it has a son node, and if so, all its son nodes are enqueued.
If the queue Q is not empty, a fuzzy property node is sequentially dequeued from the
queue Q, and it is further determined if the node is a fuzzy datatype property node or
a fuzzy object property node. Then the fuzzy property is mapped according to Rule
3 or Rule 4. In a similar way, each node in the queue is handled until the queue is
empty. Finally, all fuzzy properties in the fuzzy RDF graph are completely mapped.
In the classic RDF, only one datatype rdf:XMLLiteral is predefined and users are
recommended to use the basis datatypes defined in XML Schema. In the fuzzy RDF,
the basis datatypes are not fuzzy and we can still use the basis datatypes defined
124 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
Table 4.3 Mapping of XSD datatypes into fuzzy object-oriented database datatypes
Datatype XSD datatype FOODB datatype
Numerical xsd:decimal Decimal
xsd:integer Integer
xsd:short Short
xsd:long Long
xsd:float Float
xsd:double Double
Enumeration xsd:enum Enum
String xsd:string String
Boolean xsd:boolean Boolean
Date and time xsd:date Date
xsd:time Time
by XML Schema, such as integer, float, string, date, time and so on. The major
basis datatypes in XML Schema and their corresponding datatypes in the fuzzy
object-oriented database model are shown in Table 4.3.
The datatypes used in the fuzzy RDF have the corresponding datatypes in the fuzzy
object-oriented databases. As shown in Table 4.3, for example, the XSD datatype
xsd:string is mapped to the datatype string in the fuzzy object-oriented databases.
Note that XML Schema supports custom complexType. At this point, the fuzzy
object-oriented databases need to provide a type generator to support the definition
of structured literal so that complexType can be mapped accordingly. Suppose that
XML Schema defines the following complexType element Degree:
<xsd:element name = “Degree”>
<xsd:complexType>
<xsd:sequence>
<xsd:element name = “school_name” type = “xsd:string”/>
<xsd:element name = “degree_type” type = “xsd:string”/>
<xsd:element name = “degree_year” type = “xsd:short”/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Then the complexType element Degree is mapped to the structured literal Degree
in the fuzzy object-oriented databases as follows:
struct Degree{
string school_name;
4.3 Fuzzy RDF Mapping to Object-Oriented Databases 125
string degree_type;
short degree_year;
};
In the fuzzy RDF graph model, the description of a fuzzy instance is realized by
describing the fuzzy property value of the fuzzy class. When the edge label is “type”,
the labels of the start point and the end point are the instance label and the class label,
respectively. In this case, the edge indicates that the start point is an instance of the
corresponding class of the end point. Rule 5 gives a rule of mapping the fuzzy RDF
instances to the FOODB instances.
Rule 5: L v (vi ) ∈ Σ.T ∧ L v (vj ) ∈ Σ.C ∧ L E (vi × vj ) = type ⇒ (Γ (vi ) = foi ∈
FOFS ) ∧ (Object foi belong-to fcj /μj has-value [fa1 : fb1 , …, fak : fbk ]).
When the edge label is “type”, the starting point is mapped into an instance of the
fuzzy object-oriented database model, which is the instance of the fuzzy class that
corresponds to the end point. All vertices and edges associated with the start point are
mapped into the corresponding attributes of the instance in the fuzzy object-oriented
database model. In the fuzzy object-oriented database model, an object with identifier
OID uniquely identifies an object and is named after the label of the start point.
In the fuzzy RDF data subgraph shown in Fig. 4.6, for example, there are two
edges labeled as “type”. The label of the start point is the instance label student1
and the label of the end point is the class label Student. The membership degree of
the edge is 0.9, indicating that the object student1 belongs to the class Student with
a membership degree of 0.9. The other label of the start point is the instance label
book1 and the label of the end point is the class label Book. The membership degree
of the edge is 0.85, indicating that the object book1 belongs to the class Book with
a membership degree of 0.85. For the fuzzy instances of the fuzzy RDF shown in
Fig. 4.6, the corresponding mapping structure is as follows:
Object student1 belong-to Student/0.9
has-value FUZZY Name: 1.0/Bob, FUZZY Sex: 0.85/male, FUZZY Age: 0.9/20,
FUZZY Read: 0.75/book1
End
Object book1 belong-to Book/0.85
has-value FUZZY Title: 0.8/A Semantic Web Primer, FUZZY Author:
0.85/Antoniou, FUZZY Category: 0.9/Science Information
End
4.3.5 Implementation
Based on the mapping rules proposed in Sect. 4.3, we implement a prototype called
FRDF2FOODB, which can map the fuzzy RDF model to the fuzzy object-oriented
4.4 Fuzzy RDF Mapping to HBase Databases 127
database model. In the following, we briefly explain the implementation of the proto-
type, which consists of three main modules: parsing module, mapping module, and
output module. Figure 4.8 shows the overall architecture of the FRDF2FOODB. The
functions of the three main modules of the FRDF2FOODB are described below:
1. Parsing module: The Parsing module parses the input fuzzy RDF model, which
is described in the form of triples, into classes, properties, instances, etc., and
stores the parsed results, which are the input of the mapping module.
2. Mapping module: The mapping module maps the fuzzy RDF classes, properties,
instances and other elements, which are obtained by the parsing module, into the
corresponding fuzzy object-oriented database classes and instances according to
the mapping rules proposed in Sect. 4.3.
3. Output module: The output module is actually an interface module, displaying the
input fuzzy RDF model, and the resulting fuzzy object-oriented database model
after mapping the fuzzy RDF model. Also, this module displays the specific
storage of the RDF in the fuzzy object-oriented databases after the mapping is
completed.
With the explosive growth of RDF data, some efforts have carried out massive RDF
data store. Several proposals are introduced to store RDF data in Hadoop (Farhan
Husain et al., 2009; Myung et al., 2010; Rohloff & Schantz, 2010). The drawback
of Hadoop-based RDF store is that RDF data are directly stored in HDFS, resulting
in a lack of efficient index structure. HBase, a column-oriented NoSQL database,
128 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
implements a global, distributed index with sorting the row key of HBase table by
dictionary. There have been some works proposed to store RDF data in HBase. In
Sun and Jin (2010) presented an approach for storing RDF data into six HBase tables,
which are S_PO, P_SO, O_SP, PS_O, SO_P, and PO_S. The row key of table S_PO
is the subject of RDF triple, and the column is the tuple (predicate, object). Similarly,
RDF data are repeatedly stored in different HBase tables according to the different
organizational forms of RDF triple elements. Papailiou et al. (2012) presented a fully
distributed RDF store method, H2RDF, which can reduce the number of HBase tables
in Sun and Jin (2010) from six to three (i.e., SP_O, PO_S, and OS_P). The row key
of table SP_O is the tuple (subject, predicate), and the column is the object of RDF
triple. At the same time, Abraham et al. (2010) also used three HBase tables to store
RDF data in which are Ts, Tp, and To. These three HBase tables, respectively, take
the subject, predicate, and object as the row key, and take the other terms as column
values.
Like the situation of crisp RDF storage in databases, the fuzzy relational databases
and fuzzy object-oriented databases cannot effectively support large-scale fuzzy
RDF data management. To manage large-scale fuzzy RDF data efficiently and effec-
tively, some work has already investigated the storage of fuzzy RDF data in NoSQL
databases. Since HBase databases support high-reliability underlying storage and
have high-performance computing power. Fan et al. (2020) proposed a fuzzy RDF
storage schema with fuzzy HBase databases. With the distributed fuzzy RDF(S)
storage approach proposed by Fan et al. (2020) in this section, we present a distributed
fuzzy RDF storage approach based on HBase databases. This approach makes use
of the index function of HBase databases. In addition, according to different organi-
zational forms of the fuzzy triple patterns, we propose a set of FHBase-based query
algorithms to deal with the query of fuzzy triples from different fuzzy HBase tables.
On the basis, we implement a prototype system to demonstrate the feasibility of our
approach.
In the fuzzy RDF graph model, it covers both fuzzy RDF schema layer and fuzzy
RDF instance layer. The former mainly describes two kinds of information about
fuzzy classes and fuzzy properties in fuzzy RDF ontology data, and the latter mainly
describes the specific information of fuzzy RDF instance data. To improve the query
efficiency of the storage of fuzzy RDF, we separately store the fuzzy RDFS data to
ensure the retrieval efficiency. As a result, we design two FHBase tables to store the
fuzzy RDFS data and other two FHBase tables to store the fuzzy RDF instance data.
4.4 Fuzzy RDF Mapping to HBase Databases 129
The fuzzy RDFS data describes the information about fuzzy classes and fuzzy proper-
ties in fuzzy RDF ontology data. The information related to fuzzy classes refers to the
corresponding fuzzy classes information of each fuzzy instance, the corresponding
fuzzy properties information of each fuzzy class and the subclass-superclass rela-
tionships between fuzzy classes, and so forth. And the information related to fuzzy
properties refers to the relationships between fuzzy properties, such as the inheri-
tance relationships, equivalence relationships, the domains and ranges of each fuzzy
property, and so forth.
To store the fuzzy classes and fuzzy properties of fuzzy RDFS data, we design two
FHBase tables which named FClassRelation and FPropertyRelation in the following.
The specific table structures and storage examples of FClassRelation and FProper-
tyRelation are shown in Tables 4.4 and 4.5, respectively. Note that for the sake of
simplicity of discussion, timestamp is omitted.
The FHBase table FClassRelation shown in Table 4.4 takes the fuzzy class name
as the row key and the class relationship as the column family name. For the relation-
ship between classes include fuzziness and a method is developed to calculate the
membership degree of fuzzy subclass/superclass relationship in (Ma et al., 2004) the
Fuzzy RDF data are modeled by describing the fuzzy property value of the fuzzy class
in fuzzy RDFS. For the purpose of storing fuzzy RDF instance data correctly and
supporting efficient queries for different triple pattern forms, we design two different
FHBase tables which named FHTS_PO and FHTO_PS, respectively. These two
tables both take “Object Property,” “Datatype Property,” and “Type” as the column
family name, while the former takes the subject in fuzzy RDF triple as the row key
and the latter takes the object as the row key. The specific table structure and storage
example of FHTS_PO and FHTO_PS are shown in Tables 4.6 and 4.7, respectively.
The FHBase table FHTS_PO shown in Table 4.6 takes the subject in fuzzy RDF
triple as the row key and stores the fuzzy RDF triples corresponding different prop-
erties in different column families. When the predicate of a fuzzy RDF triple is
an object property, for example, it is stored in the cell corresponding to the column
family which named “Object Property.” The category of predicate in fuzzy RDF triple
can be obtained from the axioms of the fuzzy RDF graph data model. The column
name and cell value are different in different cases. First, when the column family
name is “Object Property,” the column name is named according to the following
expression: the property name in fuzzy RDF triple followed by a number in [0, 1]
and a notation “/”, in which the number represents the membership degree of the
instance corresponding to the row key belonging to a class. And the cell value is the
corresponding class. Second, when the column family name is “Datatype Property,”
the column is named after the datatype property name. And the cell value is named
according to the following expression: the object name in fuzzy RDF triple followed
by a number in [0, 1] and a notation “/” where the number represents the membership
degree of object in fuzzy RDF triple.
In particular, when the column family name is “Type,” the column name is named
according to the following expression: “type” followed by a number in [0, 1] and
4.4 Fuzzy RDF Mapping to HBase Databases 131
a notation “/” where the number represents the membership degree of predicate in
fuzzy RDF triple. And the cell value is the corresponding object.
Likewise, the FHBase table FHTO_PS shown in Table 4.7 takes the object in
fuzzy RDF triple as the row key, and it can be obtained from the table FHTS_PO.
Specifically, the row key of table FHTO_PS is the cell value of table FHTS_PO, on
the contrary, its cell value is the row key of table FHTS_PO. At the same time, both
the tables have the same column families and columns. In particular, when the column
family name is “Type,” the cell values are the instances of the class corresponding
to the row key with uncertainties.
On the basis of the storage model of fuzzy HBase for fuzzy RDFS and fuzzy RDF
instance data proposed in Sect. 4.4.2, in this section, we investigate the issue of
queries which support the fuzzy HBase-based retrieval.
The aim of SPARQL queries is to get the triples that satisfy all the conditions in
the WHERE clause of the given SPARQL query. In the RDF data query based
on classical HBase database, the SPARQL query is first parsed into a set of triple
patterns, and then the triple matching algorithm proposed by Abraham et al. (2010)
is used to determine whether the given triple pattern matches. The input of matching
algorithm is a given triple pattern and a triple to be judged, and it will return true
if the triple matches the triple pattern or false otherwise. Note that this algorithm is
mainly for classical RDF triples while not considering the fuzzy RDF triples. Here,
we present a more general triple matching algorithm, MatchFTP-T, to support both
triple matching and fuzzy triple matching.
Given that the fuzzy RDF data are stored in FHDB, to get all the fuzzy triples
satisfying the parsed fuzzy triple patterns, we need to query different fuzzy HBase
table to get the fuzzy triples by judging if these fuzzy triples match the given fuzzy
triple pattern. For each fuzzy triple (s, ρ/p, μ/o), there are five elements: subject,
predicate, object, membership degree of predicate, and membership degree of object.
Note that when the predicate is an object property, the membership degree of object
is 1 which means it is determined, and similarly, when the predicate is a datatype
property, the membership degree of predicate is 1. As a result, unlike the eight
organizational forms of the classic triple pattern as shown in Table 4.8, there are 32
for fuzzy triple pattern as shown in Table 4.9.
Regardless of which organizational form of the fuzzy triple pattern to query, it is
closely related to the storage schema of fuzzy RDFS and fuzzy RDF data proposed
in Sect. 4.4.1. When dealing with different fuzzy triple pattern matches, we need
to select different FHBase tables and algorithms to query according to any known
elements in the fuzzy triple pattern.
On the basis of the storage schema of fuzzy RDF and organizational forms of the
fuzzy triple pattern mentioned above, we propose several specific query algorithms
as follows. Note that the function SPLIT (Expression) in the following algorithms
returns the element after the notation “/” of the expression (i.e., predicate or object
of the fuzzy triple). The following algorithms all deal with the query according to
whether the predicate of the fuzzy triple pattern is an object property or a datatype
property.
1. Query algorithm Query_FS_PO
When the given fuzzy triple pattern is one of the Type 1 contains, that is, the
subject and predicate in the fuzzy triple pattern are known. In this case, the fuzzy
HBase table that needs to be queried is FHTS_PO. Then we propose the query
algorithm Query_FS_PO.
Algorithm 4.4 starts with some initialization work, such as initializing the result
set to be returned and determining the row key to look for is the given subject. When
4.4 Fuzzy RDF Mapping to HBase Databases 135
the predicate is an object property, determine the column family and column to look
for are “Object Property” and the given predicate. Next, query the table FHTS_PO
and use the index function of the FHBase table to get all cell values according to
the determined row key name S, column family name “Object Property” and column
name P. This step can get the candidate fuzzy triples, then call the MatchFTP_T
algorithm to filter these fuzzy triples that match the condition and add them to the
result set. Finally, Algorithm 4.4 returns the result set which matches the given fuzzy
triple pattern.
2. Query algorithm Query_FSO_P
Query_FSO_P When the given fuzzy triple pattern is one of the Type 2 contains,
that is, the subject and object in the fuzzy triple pattern are known. In this case,
the fuzzy HBase table that needs to be queried is FHTS_PO. Then we propose
the query algorithm Query_FSO_P.
Algorithm 4.5 first initializes the result set to be returned and determines the row
key to look for is the given subject. Because the object of the given fuzzy triple
pattern is known, when the predicate is an object property, determine the column
family and cell value to look for are “Object Property” and the given object. Next,
136 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
query the table FHTS_PO and use the index function of the FHBase table to get all
column values according to the determined row key name S, column family name
“Object Property” and cell value O. This step can get the candidate fuzzy triples, then
call the MatchFTP_T algorithm to filter these fuzzy triples that match the condition
and add them to the result set. Finally, Algorithm 4.5 returns the result set which
matches the given fuzzy triple pattern. Similarly, Algorithm 4.5 can perform similar
operations when the predicate is a datatype property.
3. Query algorithm Query_FS_OP
Query_FS_OP When the given fuzzy triple pattern is one of the Type 4 contains,
that is, just the subject in the fuzzy triple pattern is known. In this case, the fuzzy
HBase table that needs to be queried is FHTS_PO. Then we propose the query
algorithm Query_FS_OP.
Algorithm 4.6 initializes the result set and determines the row key in the same
way as Algorithms 4.4 and 4.5. Because the predicate and object of the fuzzy triple
patterns processed by Algorithm 4.6 are both unknown. When the predicate is an
object property, just determine the column family to look for is “Object Property.”
Next, query the table FHTS_PO and use the index function of the FHBase table to
get all column values and corresponding cell values according to the determined row
4.4 Fuzzy RDF Mapping to HBase Databases 137
key name S, column family name “Object Property.” Then call the MatchFTP_T
algorithm to filter the candidate fuzzy triples that match the condition and add them
to the result set. Finally, Algorithm 4.6 returns the matched result set.
4. Query algorithm Query_FOP_S
When the given fuzzy triple pattern is one of the Type 3 contains, that is, the
predicate and object in the fuzzy triple pattern are known. In this case, the fuzzy
HBase table that needs to be queried is FHTO_PS. Then we propose the query
algorithm Query_FOP_S.
Algorithm 4.7 first initializes the result set and determines the row key to look
for is the given object. Being different from the above query algorithms, Algorithm
4.7 queries the table FHTO_PS rather than FHTS_PO. When the predicate is an
object property, determine the column family and column to look for are “Object
Property” and the given predicate. Next, query the table FHTO_PS and use the index
function of the FHBase table to get all cell values according to the determined row key
name S, column family name “Object Property” and column name P. Then call the
MatchFTP_T algorithm to filter the candidate fuzzy triples that match the condition
and add them to the result set. Finally, Algorithm 4.7 returns the matched result
138 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
set. Similarly, Algorithm 4.7 can perform similar operations when the predicate is a
datatype property.
5. Query algorithm Query_FP_SO
When the given fuzzy triple pattern is one of the Type 5 contains, that is, just the
predicate in the fuzzy triple pattern is known. In this case, we propose the query
algorithm Query_FP_SO.
The fuzzy triple patterns processed by Algorithm 4.8 are only the predicate is
known, Algorithm 4.8 first queries the table FPropertyRelation according to the given
predicate P to get the domains of P and add them to the set S. Second, get the deter-
mined equivalentclasses and subclasses of all fuzzy classes in the set S by querying
the table FClassRelation and add them to the set S. And then get the instances corre-
sponding to each fuzzy class in the set S by querying the table FHTO_PS and add
them to the set Instances. Next, for each instance in the set Instances, which is equiv-
alent to the subject in the fuzzy triple pattern, call the Algorithm 4.4 Query_FS_PO
to get the fuzzy triples that match the condition and add them to the result set. Finally,
Algorithm 4.8 returns the matched result set.
6. Query algorithm Query_FO_PS
When the given fuzzy triple pattern is one of the Type 6 contains, that is, just the
object in the fuzzy triple pattern is known. In this case, the fuzzy HBase table
4.4 Fuzzy RDF Mapping to HBase Databases 139
Algorithm 4.9 starts with some initialization work, such as initializing the result
set to be returned and determining the row key to look for is the given object. Because
the subject and predicate of the fuzzy triple patterns processed by Algorithm 4.9 are
both unknown. When the predicate is an object property, just determine the column
family to look for is “Object Property.” Next, query the table FHTO_POS and use the
index function of the FHBase table to get all column values and corresponding cell
values according to the determined row key name S, column family name “Object
Property.” Then call the MatchFTP_T algorithm to filter the candidate fuzzy triples
that match the condition and add them to the result set. Finally, Algorithm 4.9 returns
the matched result set. Note that Algorithm 4.9 can perform similar operations when
the predicate is a datatype property.
7. Query algorithm Query_FSPO
When the given fuzzy triple pattern is one of the Type 7 contains, that is, the
subject, predicate, and object in the fuzzy triple pattern are all unknown. In this
case, we need to take all the fuzzy triples in the FHDB and add them to the
candidate result set. It means whether we need to query the table FHTS_PO or
FHTO_PS. Then we propose the query algorithm Query_FSPO.
140 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
Being different from all query algorithms proposed above, the subject, predicate,
and object of the fuzzy triple patterns processed by Algorithm 4.10 are all unknown.
So, Algorithm 4.10 can get all fuzzy triples in the FHDBs, which are added to the
candidate result set by querying table FHTS_PO or FHTO_PS. Specifically speaking,
Algorithm 4.10 first initializes the result set to be returned. When the predicate is an
object property, the column family is just determined to look for “Object Property.”
Next, the table FHTS_PO or FHTO_PS is queried to get all fuzzy triples, which
are added to the candidate result set. Then, the MatchFTP_T algorithm is called to
filter the eligible fuzzy triples, which are added to the result set. Finally, Algorithm
4.10 returns the matched result set. Of course, Algorithm 4.10 can perform similar
operations when the predicate is a datatype property.
On the basis of the storage and query methods proposed in Sects. 4.4.1 and 4.4.2,
we design and implement a prototype called FRDF2FHBase, which can store the
fuzzy RDF data into the FHDB and support basic fuzzy triple patterns queries. In
the following, we briefly discuss the implementation of the FRDF2FHBase.
4.5 Fuzzy RDF Graph Mapping to Property Graph 141
1. Data loading module: The data loading module loads fuzzy RDF data which is
described in the form of triples. The data is divided into fuzzy RDFS data and
fuzzy RDF data, respectively.
2. Data storage module: The storage module stores the fuzzy RDF data in a target
FHDB according to the storage model proposed in Sect. 4.4.1.
3. FHBase-based query module: The FHBase-based query module processes the
input f-SPARQL queries, this module parses the f-SPARQL query into a set of
fuzzy triple patterns and returns the candidate result set satisfying the parsed
fuzzy triple patterns according to the FHBase-based RDF(S) query algorithms
proposed in Sect. 4.4.2.
4. Parsing module: The parsing module processes the candidate result set obtained
by the FHBase-based query module, it uses a greedy multiple connection join
strategy for f-SPARQL BGP processing and returns the final result set.
advantages of storing RDF data in a graph structure are: (i) Graph structures can
directly map to RDF models, avoiding the need to convert RDF data to accommodate
storage structures (ii) Query semantic information of RDF data does not require
reconstruction RDF graphs. The graph model conforms to the semantic level of the
RDF model, and can maintain the semantic information of the RDF data to the utmost
extent. In addition, many graphs theory-based algorithms can be applied to optimize
the inferential query of RDF data.
There has been some related work on RDF data graph storage. Zou et al. (2014)
proposed a method for storing and processing RDF data using a graph model called
gStore, which converted RDF graphs into data signature graphs and used vertex signa-
ture (VS)*-tree indexes to reduce maintenance overhead. Hartig (2014) proposed
a formal definition of the Property Graph model and introduced transformations
between Property Graphs and RDF*. Libkin et al. (2018) introduced a triple-based
model called Trial, which combined the concept of triple storage in RDF with the
concept of graph data, and illustrated the difference between the RDF graph model
based on triples and the standard graph database model. De Virgilio (2017) proposed
a method of using ontology and related constraint rules to convert RDF data storage
into a graph database. In order to realize the distributed management of Web-scale
RDF data, Zeng et al. (2013) proposed a distributed graph engine that stores RDF
data in the form of primitive graphs instead of triples or bitmap matrices, called
Trinity RDF. However, all the above works did not consider the issues of fuzzy RDF
graph data storage and query.
In order to solve the problem of fuzzy RDF data storage and query, it is an effective
method to establish the mapping relationship between fuzzy RDF and attribute graph.
In this section, we discuss the methodology to make a lossless transformation of a
fuzzy RDF graph into a property graph. The main idea is to represent any ordinary
RDF triple as property graph edge, and the fuzzy degree of the corresponding triple
can be expressed as an attribute of the edge. Specifically, our research goal is to
convert a fuzzy RDF graph into a property graph, and further realize the mapping of
a SPARQL query on G to a Cypher query over GP .
4.5.1 Preliminaries
if the default value of ⟨threshold⟩ is 1, then the item WITH ⟨threshold⟩ can be
omitted.
Suppose that we want to seek an action movie, whose director is an American. At
the same time, trustworthiness is more than 0.6. According to the extended SPARQL
syntax, the SPARQL SELECT statement that meets the above query conditions is
expressed as follows.
PREFIX le: < http://fuzzy RDF example.org/>
SELECT ?x
WHERE {
?x le: Genre “action”
?x le: Director ?z.
?z le: birthPlace “Ameircan”
}
WITH ⟨0.6⟩
Here, “WITH ⟨0.6⟩” is the threshold expression, which specifies the lowest possi-
bility of the matching subgraph. The symbol “?x” represents the film that we want
to retrieve.
2. Property Graph Model
Assume that the set D of data types contains the string type S, that is, S ∈ D, and
D may also include the data type of the collection type. For each data type D,
dom(D) represents the value space of type D, that is, all possible value sets of
data type D, and dom(S) represents all string sets. The formal definition of the
Property Graph is as follows:
A property graph GP , is a 6-tuple ⟨V P , E P , src, tgt, lbl, P⟩ and ⟨V P , E P , src, tgt,
lb⟩ represents a directed label multigraph, where V P and E P represent the set of
vertices and edges respectively; function src: E P → V P indicates that each edge has
a start(head) vertex; function tgt: E P → V P indicates that each edge has a termina-
tion(tail) vertex; lbl: E P → dom(S) means that each edge has a label. The function
P: V P ∪ E P → 2P indicates every vertex v ∈ V P and edge e ∈ E P are associated with
a set of pairs ⟨key, value⟩ called properties.
Neo4j is a management system for crisp property graph databases, whose prim-
itives are vertices, relationships, and attributes. Different types of vertices are iden-
tified by labels, which can be IRI, Literal, or Blank. Vertices can have zero or more
attributes, which exist as key-value pairs. The vertex of IRI has two attributes, namely
kind and IRI. The vertex of Blank has one attribute. The vertex of Literal has four
attributes: kind, value, datatype, and language. The attributes of the same vertex are
stored in a linked list. A relationship consists of a start vertex and an end vertex. As
with vertices, relationships can also have multiple attributes and labels.
Figure 4.10 is an example of simple Property Graph that contains two vertices
and a relationship between the two vertices. Among them, the relationship marked
as “partner” starts at the vertex “Pratt” and ends at the vertex “Statham”. In addi-
tion, some boxes associated with graph elements (vertices and edges) represent the
144 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
attributes of these elements. For example, the vertex of Chris Pratt has two attributes,
which represent the name and year of birth of the famous actor. The partnership
has only one attribute, which is used to indicate the certainty of whether Statham is
Pratt’s partner.
Cypher is the standard query language of Neo4j graph database, which is like SQL
in relational database, querying property graph in a crisp way. It is composed of is
composed of a START clause followed by a MATCH and a RETURN clause, where
START indicates a starting vertex of the matching subgraph, MATCH describes all
edges of the matching subgraph, and WHERE describes the attributes expression on
vertex and edges of the subgraph as a filter condition. An example of a Cypher query
that uses these three clauses to find the mutual partners of actor named James Guan
is:
START a = node: actor (name = “James Guan “)
MATCH (a) − [: partner] - > (b) − [: partner] - > (c), (a) − [: partner] - > (c)
RETURN b, c
In order to adapt the fuzzy RDF data model to the property graph data model, we use
the following rules to convert each triple in the RDF dataset into a property graph:
(i) any subject or object vertex in RDF becomes a vertex with a unique integer ID
in property graph, (ii) object property in RDF is designated as the adjacent edge in
the property graph, where the source and the target of the edge are vertex IDs, and
the edge is identified by integer ID, (iii) the datatype property in RDF is specified
as vertex attributes in the property graph, (iv) fuzzy degree information is converted
into vertex and edge attributes.
As we all know, a basic requirement for conversion is that any possible IRI must be
explicitly mapped to a different string. The IRI string indicates that this requirement
can be met. Therefore, we define an injective function called IRI-to-string im: I →
dom(S).
In view of these preliminary knowledge, the conversion rules are defined as
follows. Let G = (V, E, Σ, L, μ, ρ) is PG-convertible RDF graph. V = {x ∈ (I
∪ B ∪ L) | ⟨s, p, o⟩ ∈ G and x ∈ {s, o}} is the set of vertex elements. The property
graph corresponding to graph G can be expressed as GP = ⟨V P , E P , src, tgt, lbl, P⟩:
4.5 Fuzzy RDF Graph Mapping to Property Graph 145
• V P contains |V| vertices, and each vertex represents a different RDF item in V. In
other words, there is a function v: V → V P , such that each x ∈ V can be mapped
to a different vertex v(x) ∈ V P .
(i) If RDF item is IRI, then P(v(u)) = {⟨“kind”, “IRI”⟩, ⟨“IRI”, im(u)⟩}, here
u ∈ I, v(u) ∈ V P and im is the IRI-to-string mapping mentioned above.
(ii) If RDF item is blank vertex, then P(v(b)) = {⟨“kind”, “blank vertex”⟩}, here
b ∈ B and v(b) ∈ V P .
(iii) If RDF item is literal, then P(v(l)) = {⟨“kind”, “literal”⟩, ⟨“literal”, vm−1 (l)⟩,
⟨“datatype”, im(dtype(l))⟩} ∪ lang, here l ∈ L, v(l) ∈ V P , vm−1 is the
inverse operation of the value-to-literal bijective mapping, and lang =
{⟨”languge”, lang(l)⟩} i f l ∈ dom(lang)
φ else
(iv) If RDF item is x ∈ (I ∪ B ∪ L), the property set is defined as P(v(x)) =
{⟨“fdegree”, vm (μ(x))⟩}, here v(x) ∈ V P .
• E P contains |E| edges, and each edge corresponds to an RDF triple t ∈ G. Therefore,
a bijective function e: E → E P is defined, such that each triple t = ⟨s, p, o⟩ ∈ G
can be mapped to an edge e(t) ∈ E P .
(i) The edge label of e(t) is im(p), and the labels of the two adjacent vertices
corresponding to the edge e(t) are v(s) and v(o), respectively, which are
formally defined as: src(e(t)) = v(s), lbl(e(t)) = im(p), and tgt(e(t)) = v(o).
(ii) Moreover, the property P(e(t)) can be defined as P(e(t)) = {⟨“fdegree”, vm
(ρ(t))⟩}.
This conversion can represent any fuzzy RDF triple as an edge in the Property
Graph, and its attributes include the relationship and ambiguity of the RDF triples.
The two adjacent vertices of this edge correspond to the subject and object of the
RDF triples. Each vertex introduces two attributes: (i) kind indicates whether the
corresponding data type is IRI, Literal or Blank, (ii) and value indicates the corre-
sponding value. It should be noted that if the data type is Literal, another attribute
should be introduced, namely the type, to describe the type of value.
For the sake of clarity, we shall give an example to illustrate the global steps of our
proposed approach. Figure 4.11 shows a fuzzy RDF graph, in which vertices are used
to represent entity resources such as actors, movies, etc., while edges represent the
relationship between them. For readability reasons, each vertex in the graph uses the
name of the entity resource or the literals instead of the URI itself. The label on the
vertex is associated with the ambiguity to indicate the likelihood of the vertex being
labeled. For instance, the genre of the movie Guardian of the Galaxy 2 is labeled as
“action” and the possibility is 0.91. The fuzzy RDF graph G is PG-convertible in this
example, and the given conversion rule can be used to translate fuzzy RDF graph into
a Property Graph. The generated Property Graph GP is shown in Fig. 4.12, which
contains the following elements:
V P = {v1 , v2 , …, v7 }, E P = {e1 , e2 , …, e7 }, src(e1 ) = v1 , lbl(e1 ) = “Rating”,
tgt(e1 ) = v3 , src(e2 ) = v1 , tgt(e2 ) = v2 , lbl(e2 ) = “Genre”, src(e3 ) = v1 , tgt(e3 )
146 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
Since the fuzzy RDF data is stored in the Neo4j database, SPARQL cannot be directly
applied. The problem that follows is how to query the fuzzy RDF data stored in the
Neo4j database. There are two possible ways to implement the query of fuzzy RDF
data stored in Neo4j database: One way is to convert a SPARQL query into a Cypher
language to implement the query. Another way is to use Cypher to directly implement
the query. The former way keeps the SPARQL language extracting information
4.6 Summary 147
from Neo4j through supported plug-in. The plug-in was developed as a wrapper for
Neo4j graph database. It is tailor-made for reusing the advanced features of Neo4j to
efficiently store, index, and query graph structures using the core API of Neo4j. The
latter way focuses on the uses of Cypher query. Similar to SPARQL, this approach
considers that all entities and relations stored in the database are formed by the triple
storage of the [entity]-(relationship /predicate)-[entity] pattern—the first element of
the triple is also called as “subject”. In a graph database, a directed edge connecting
two vertices (that is, the relationship is directional) is used to indicate the “subject”
of a particular triple. In addition, the Cypher query language also supports grouping
(GROUP BY), filtering (WHERE), and sorting (ORDER BY) operations, which are
like the SQL language.
RDF graphs are usually queried by specifying a graph pattern using the standard
SPARQL query language, which returns matching subgraphs. There are several ways
to express pattern matching queries in Cypher. The most straightforward method is
to start with a vertex in the matching pattern graph, and then match all edges in
a MATCH statement in the Cypher query. In this research, we just focus on the
Cypher’s basic query approach and their advantages in handling fuzzy RDF data.
Cypher queries also enable users to implement some query functions that cannot be
implemented in SPARQL. For instance, in the attribute path query, Cypher allows
users to use of more powerful path expressions than those provided by SPARQL.
Let us consider a Cypher query with the same functionality as the SPARQL query
in the previous example. The query also specifies a threshold δ t (δ t = 0.25), which
is used to return matching items with possible greater than δ t . The Cypher query
statement in this example is presented as follows.
START v1 = node: nodes (IRI = “Guardian of the Galaxy 2”)
MATCH (v1 ) − [: Director] - > (v5 ) − [e: birthPlace] - > (v6 {vlaue: “Ameircan”})
WHERE v2 .fdegree > 0.6
MATCH (v1 ) − [: Genre] - > (v2 {vlaue: “action”})
WHERE e.fdegree > 0.6
RETURN v1
When translating the threshold expression into the corresponding Cypher, we
define the format of the conditional expression as: fdegree > δ t , which means that the
overall possibility of the matching answer must satisfy the fuzzy degree δ t ∈ [0, 1].
In the example, the Cypher language equivalent of the threshold expression “WITH
< 0.6>” is fdegree > 0.6. When the query contains multiple triple patterns, we must
aggregate the results of each pattern.
4.6 Summary
With the prompt development of the Internet, the requirement of managing informa-
tion based on the Web has attracted much attention both from academia and industry.
RDF is widely regarded as the next step in the evolution of the World Wide Web, and
148 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
has been the de-facto standard. This creates a new set of data management require-
ments involving RDF. On the other hand, fuzzy sets and possibility theory have
been extensively applied to deal with information imprecision and uncertainty in the
practical applications, and reengineering fuzzy RDF into fuzzy database models is
receiving more attention for managing fuzzy RDF data. In this chapter, we proposed
some approaches for reengineering fuzzy RDF into fuzzy database models, including
fuzzy relational database models, fuzzy object-oriented database models, and HBase
database models, respectively. Moreover, we investigate the storage and query of
fuzzy RDF graph represented by the labeled directed graph structure to Property
Graphs database storage model. We manage these data by a chosen Neo4j Graph
DBMS in order to support expressive querying services over the stored data.
The two-way mappings between the fuzzy database models to the fuzzy RDF
models pay an important role for establishing the overall management system of fuzzy
RDF data. Moreover, for processing fuzzy RDF data intelligently, fuzzy RDF query
is also very necessary. How to query RDF with imprecise or uncertain information
has raised certain concerns as will be introduced in the following chapter.
References
Abadi, D. J., Marcus, A., Madden, S. R., & Hollenbach, K. (2009). SW-Store: A vertically partitioned
DBMS for semantic web data management. The VLDB Journal, 18(2), 385–406.
Abraham, J., Brazier, P., Chebotko, A., Navarro, J., & Piazza, A. (2010). Distributed storage
and querying techniques for a semantic web of scientific workflow provenance. In 2010 IEEE
International Conference on Services Computing (pp. 178–185).
Atre, M., Srinivasan, J., & Hendler, J. A. (2009). BitMat: A main memory RDF triple store. In
Tetherless World Constellation, Rensselar Plytehcnic Institute, Troy NY.
Bönström, V., Hinze, A., & Schweppe, H. (2003). Storing RDF as a graph (detailed view). In
Proceedings of the First Latin American Web Congress (pp. 27–36).
Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., & Bhat-
tacharjee, B. (2013). Building an efficient RDF store over a relational database. In Proceedings
of the 2013 ACM SIGMOD International Conference on Management of Data (pp. 121–132).
Cai, M., & Frank, M. (2004). RDFPeers: A scalable distributed RDF repository based on a struc-
tured peer-to-peer network. In Proceedings of the 13th International Conference on World Wide
Web (pp. 650–657).
Chen, H., Wu, Z., Wang, H., & Mao, Y. (2006). RDF/RDFS-based relational database integration.
In 22nd International Conference on Data Engineering (ICDE’06) (pp. 94–94).
Chong, E. I., Das, S., Eadon, G., & Srinivasan, J. (2005). An efficient SQL-based RDF querying
scheme. In Proceedings of the 31st International Conference on Very Large Data Bases (pp. 1216–
1227).
Cudré-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., … & Wylot,
M. (2013). NoSQL databases for RDF: An empirical evaluation. In International Semantic Web
Conference (pp. 310-325). Springer.
De Virgilio, R. (2017). Smart RDF data storage in graph databases. In Proceedings of the 17th
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (pp. 872–881).
Fan, T., Yan, L., & Ma, Z. (2019). Mapping fuzzy RDF (S) into fuzzy object-oriented databases.
International Journal of Intelligent Systems, 34(10), 2607–2632.
References 149
Fan, T., Yan, L., & Ma, Z. (2020). Storing and querying fuzzy RDF (S) in HBase databases.
International Journal of Intelligent Systems, 35(4), 751–780.
Farhan Husain, M., Doshi, P., Khan, L., & Thuraisingham, B. (2009), Storage and retrieval of
large rdf graph using hadoop and mapreduce. In IEEE International Conference on Cloud
Computing (pp. 680–686). Springer.
Harris, S., & Gibbins, N. (2003). 3store: Efficient bulk RDF storage. In: R. Volz, S. Decker, & I. F.
Cruz (Eds.), Proceedings of the First International Workshop on Practical and Scalable Semantic
Systems (pp. 1–15). CEUR-WS.org.
Hartig, O. (2014). Reconciliation of RDF* and property graphs. Technical report, University of
Waterloo. http://arxiv.org/abs/1409.3288
Janik, M., & Kochut, K. (2005). Brahms: A workbench RDF store and high-performance memory
system for semantic association discovery. In International Semantic Web Conference (pp. 431-
445), Springer.
Libkin, L., Reutter, J. L., Soto, A., & Vrgoč, D. (2018). TriAL: A navigational algebra for RDF
triplestores. ACM Transactions on Database Systems (TODS), 43(1), 1–46.
Ma, Z., & Yan, L. (2018). Modeling fuzzy data with RDF and fuzzy relational database models.
International Journal of Intelligent Systems, 33(7), 1534–1554.
Ma, Z., Capretz, M. A., & Yan, L. (2016). Storing massive resource description framework (RDF)
data: A survey. The Knowledge Engineering Review, 31(4), 391–413.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2004). Extending object-oriented databases for fuzzy
information modeling. Information Systems, 29(5), 421–435.
Myung, J., Yeon, J., & Lee, S. G. (2010). SPARQL basic graph pattern processing with iterative
MapReduce. In Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud (pp. 1–
6).
Nenov, Y., Piro, R., Motik, B., Horrocks, I., Wu, Z., & Banerjee, J. (2015). RDFox: A highly-scalable
RDF store, In International Semantic Web Conference (pp. 3-20). Springer.
Neumann, T., & Weikum, G. (2008). RDF-3X: A RISC-style engine for RDF. Proceedings of the
VLDB Endowment, 1(1), 647–659.
Neumann, T., & Weikum, G. (2010a). The RDF-3X engine for scalable management of RDF data.
The VLDB Journal, 19(1), 91–113.
Neumann, T., & Weikum, G. (2010b). x-RDF-3X: Fast querying, high update rates, and consistency
for RDF databases. Proceedings of the VLDB Endowment, 3(1–2), 256–263.
Papailiou, N., Konstantinou, I., Tsoumakos, D., & Koziris, N. (2012). H2RDF: Adaptive query
processing on RDF data in the cloud. In Proceedings of the 21st International Conference on
World Wide Web (pp. 397–400).
Peng, P., Zou, L., Özsu, M. T., Chen, L., & Zhao, D. (2016). Processing SPARQL queries over
distributed RDF graphs. The VLDB Journal, 25(2), 243–268.
Quasthoff, M., & Meinel, C. (2011). Supporting object-oriented programming of semantic-web soft-
ware. IEEE Transactions on Systems, Man, and Cybernetics Part C (Applications and Reviews),
42(1), 15–24.
Rohloff, K., & Schantz, R. E. (2010). High-performance, massively scalable distributed systems
using the MapReduce software framework: The SHARD triple-store. In Programming Support
Innovations for Emerging Distributed Applications (pp. 1–5).
Sintek, M., & Kiesel, M. (2006). RDFBroker: A signature-based high-performance RDF store.
In European Semantic Web Conference (pp. 363–377). Springer.
Sun, J., & Jin, Q. (2010). Scalable RDF store based on Hbase and mapreduce. In 2010 3rd Inter-
national Conference on Advanced Computer Theory and Engineering (ICACTE) (Vol. 1, pp.
V1–633).
Weiss, C., Karras, P., & Bernstein, A. (2008). Hexastore: Sextuple indexing for semantic web data
management. Proceedings of the VLDB Endowment, 1(1), 1008–1019.
Wilkinson, K., Sayers, C., Kuno, H. A., & Reynolds, D. (2003). Efficient RDF storage and retrieval
in Jena2. In SWDB (Vol. 3, pp. 131–150).
150 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema
Zeng, K., Yang, J., Wang, H., Shao, B., & Wang, Z. (2013). A distributed graph engine for web
scale RDF data. Proceedings of the VLDB Endowment, 6(4), 265–276.
Zou, L., Özsu, M. T., Chen, L., Shen, X., Huang, R., & Zhao, D. (2014). gStore: A graph-based
SPARQL query engine. The VLDB Journal, 23(4), 565–590.
Chapter 5
Fuzzy RDF Queries
5.1 Introduction
The Resource Description Framework (RDF) has been widely applied to represent
and exchange domain information because of its machine-readable characteristic.
With a huge amount of RDF data available, retrieving RDF data is essential, so that
many RDF query approaches have been developed. Solving the RDF data retrieval
task can usually be achieved in two ways: The first way is to solve the problem
with the query language of the RDF database system. Another approach is to use
graph pattern matching algorithms to implement queries, since RDF data can be
represented as graphs. However, in many real applications, the RDF data are often
noisy, incomplete, and inaccurate. Traditional approaches generally cannot handle
imprecise and uncertain information, and this seriously prevents a large number
of common users from obtaining information in RDF datasets. Therefore, in this
chapter, we focus on fuzzy RDF queries. We present methods of pattern match
query, approximated fuzzy RDF subgraph matching query, fuzzy quantified query
over fuzzy RDF graph and investigate the problem of fuzzy RDF query based on
extended SPARQL.
In classical RDF graph pattern matching the task is to find inside a given graph
G some specific smaller graph Q, called pattern. A naive idea of this approach is
to compare all possible subgraph in G and its label bindings with the pattern graph
Q, i.e., obtaining all the candidate subgraphs with the existing techniques. Then,
check the dominating relationship and return true answers. Although there have been
many studies (Neumann & Weikum, 2008; Zou & Özsu, 2017) on RDF subgraph
matching, none of these works considers the problem that the RDF graph could
contain fuzzy information in some applications. Moreover, these methods are not effi-
cient in response time because of the need to perform subgraph isomorphism checks
on Q and G, producing a large number of unnecessary intermediate results, which
have now been shown to be NP-complete (Ullman, 1976). Therefore, a threshold-
based RDF subgraph pattern matching query method is introduced in Sect. 5.2. Based
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 151
Z. Ma et al., Modeling and Management of Fuzzy Semantic RDF Data,
Studies in Computational Intelligence 1057,
https://doi.org/10.1007/978-3-031-11669-8_5
152 5 Fuzzy RDF Queries
on the traditional subgraph isomorphism matching method, the fuzzy RDF subgraph
matching problem is solved efficiently, Specifically, we want to retrieve all qualified
matches of a query pattern in the fuzzy RDF graph.
In order to alleviate the time-consuming exhaustive search, the other method
resorts to the approximate matching strategy. This can relax rigid structure and label
matching constraints of subgraph isomorphism and other traditional graph simi-
larity measures. These various approaches (Costabello, 2014; Virgilio et al., 2015)
to approximate matching on RDF graph data rely on heuristics, based on similarity
or distance metrics, on the use of specific indexing structures to improve the perfor-
mance of the algorithm. However, the existing inexact graph matching algorithm
ignore many features of RDF graph. For example, these algorithms only take the
similarity of vertices and edges into account in RDF graph but did not concern the
structure among the vertices and edges. More importantly, these algorithms disre-
gard the semantic relationships between resources, and cannot process and manage
fuzzy information about RDF graph in the matching process as well. Inspired by the
method of joining the path query graph introduced in (Virgilio et al., 2015; Moustafa
et al., 2014; Zhao & Han, 2010), we choose the path instead of the vertex as the basic
matching unit and propose a new path-based solution to efficiently answer subgraph
pattern queries over fuzzy RDF graph. We introduce this path-based approximate
RDF subgraph pattern matching method in Sect. 5.3.
It has been widely recognized that classical querying suffers from a lack of flex-
ibility due to crisp querying conditions and querying objects. Flexible queries play
important roles in intelligent information retrieval and have become the main means
to realize data querying. Bosc and Pivert (1992) point out that a query is flexible if a
qualitative distinction between the selected entities is allowed. The case arises when
the query conditions are crisp but databases being queried contain imperfect infor-
mation. As a special kind of flexible query, fuzzy quantified queries have been long
recognized for their ability to express different types of imprecise and flexible infor-
mation needs in a relational database context. However, in the specific RDF/SPARQL
setting, the current approaches from the literature that deal with quantified queries
consider crisp quantifiers only (Bry et al., 2010; Fan et al., 2016) over crisp RDF
data. In Sect. 5.4, we intend to integrate linguistic quantifier in a subgraph patterns
addressed to a fuzzy RDF graph database and use graph pattern matching approach to
evaluate fuzzy quantified query. This extension allows to express fuzzy preferences
on values present in the graph as well as on the structure of the data graph, which
has not been proposed in any previous fuzzy RDF graph pattern matching.
SPARQL (Prudhommeaux, 2008), the official W3C recommendation as an RDF
query language, plays the same role for the RDF data model as SQL does for relational
data model. In SPARQL query, the “where” clause consists of triple patterns that
contain either variables or literals. Actually, each SPARQL query can be represented
by a graph pattern. As a result, any SPARQL query can be equivalently transformed
a subgraph query problem, which locates the subgraph in RDF data graph matching
with the query graph. Nevertheless, SPARQL requires accurate knowledge about the
graph structure and contents. As users are not very clear about the contents and the
data distribution of the database, such a strict query often leads to the Few Answers
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 153
Problem: the user query is too selective and the number of answers is not enough.
More importantly, classical SPARQL lacks of some expressiveness and usability
capabilities as it follows a crisp (Boolean) querying of RDF data for which the
response is either false or true. As a result, it lacks the ability to deal with flexibility
aspects (including queries with user preferences or vagueness), which is significant
in real-world applications. Therefore, we extend the SPARQL language in Sect. 5.5,
for querying fuzzy RDF data.
Traditional specialized pattern graph matching models are usually defined in terms
of subgraph isomorphism and its extensions (e.g., edit distance), which identify
subgraphs that are exactly or approximately isomorphic to pattern graphs. A compar-
ison of various specialized algorithms for graph pattern matching has been done
recently (Lee et al., 2012). The exact RDF graph matching algorithm (Carroll, 2002;
Wang et al., 2005) is not efficient in terms of response time, and it has been proved that
its complexity is NP-complete (Ullmann, 1976). Existing RDF matching algorithms
based on inexact graph matching (Costabello, 2014; Virgilio et al., 2015; Zhang et al.,
2012) ignored many features of RDF graph. For example, most of the algorithms
(Costabello, 2014; Zhang et al., 2012) disregarded the fuzzy data and the semantic
relationships between vertices, which in turn results in the loss of some potential
answers. Worse still, the traditional approach is incapable to recognize and evaluate
the fuzzy information in the matching process, which further results in the incapacity
of obtaining all the satisfactory answer. Therefore, traditional graph querying tech-
niques are not able to capture good quality matches in this context. Moreover, the
existing techniques (Ma et al., 2011) for processing twig-patterns over fuzzy XML
tree cannot be effectively applied to handle graph pattern matching over an RDF
graph. It is because a graph does not have nice property such that every two vertices
are connected along a unique path.
In this section, we study pattern matching in the context of large fuzzy RDF
graphs. Specifically, we want to retrieve all qualified matches of a query pattern in
the fuzzy RDF graph. We carefully defined the syntax and semantic of an extension
of the query pattern graph that makes it possible to express and interpret such queries.
We defined fuzzy graph patterns that allows: (i) to query a fuzzy RDF data model,
and (ii) to express preferences on data through fuzzy conditions and on the structure
of the data graph with regular expressions as edge constraints. In addition, in order to
answer subgraph pattern queries efficiently over fuzzy RDF data graph, we propose
an approach for evaluating RDF graph patterns.
154 5 Fuzzy RDF Queries
The basic graph pattern matching problem is to find matches in a graph for a spec-
ified pattern. We first introduce graph pattern matching on precise graphs based
on subgraph isomorphism. Then we will proceed to discuss fuzzy graph pattern
matching.
Subgraph isomorphism is a graph matching technique which is to find all
subgraphs of G that are isomorphic to Q (see (Gallagher, 2006) for a survey). Given
a query pattern graph Q = (V q , E q ) with n vertices {u1 , …, un } and a precise data
graph G = (V, E), a pattern match query based on subgraph isomorphism retrieves
all matches of Q in G. For a given Q and an n vertex set m = {v1 , …, vn } in G, m is
a match of Q in G, if (1) the n vertices {v1 , …, vn } in G have the same vertex labels
as the corresponding vertices {u1 , …, un } in Q; and (2) for any an edge (ui , uj ) in Q,
there exists a corresponding edge (vi , vj ) in G such that edge (vi , vj ) have the same
edge labels with edge (ui , uj ). This makes graph pattern matching NP-complete, and
hence, hinders its scalability in finding exact matches. Moreover, a bijective function
is often too restrictive to identify patterns in emerging applications.
Graph matching in our scenario is essentially finding a homomorphism (Hahn &
Tardif, 1997) from the pattern graph Q to elements of the data graph G. The traditional
notion is, however, often too restrictive for graph matching in emerging applications.
So, we introduce PRDF homomorphism (Alkhateeb et al., 2009) for checking if
an RDF graph pattern is a consequence of an RDF graph. The notion extends graph
homomorphism to deal with vertices connected with regular expression patterns, that
can be mapped to vertices connected by paths, rather than edge-to-edge mappings.
Here PRDF homomorphism is used for answering fuzzy RDF graph pattern.
The notion of graph pattern provides a simple yet intuitive specification of the struc-
tural and semantic requirements of interest in the input graph. Graph pattern as the
basic operational unit is central to the semantics of many operations in fuzzy RDF.
Essentially, a fuzzy graph pattern is a directed crisp graph with predicate on query
vertices, and regular expressions that denotes the path over relationships as edges’
labels. For the following, we assume the existence of an infinite set VAR of variables
such that VAR ∩ (U ∪ L) = ∅. By convention, we prefix the elements of VAR by a
question mark symbol.
Definition 5.2 (Fuzzy graph pattern) A fuzzy graph pattern is a labeled directed
graph defined as Q = (V q , E q , F V , RE ), where
Example 5.1 We want to find a rating-high action movie with more than 20 million
at the box office. Specifically, the film is starred by American actors. Query graph Q
in Fig. 5.1 is a possible way to express this information need. Here ?b, ?film and ?p are
three variables, expression ?b > “20 million” is a crisp compare expressions, expres-
sion “?r is high” is a fuzzy condition expression, and expression RE = birthPlace ·
locateIn+ is a regular expression. This pattern “models” information concerning high
rating (?r is high) action films (?film). The box office of the film is over 20 million
(?b > “0 million”). Moreover, the actors (?p) starred in the film are American.
The notion of graph pattern Q specifies the topological and content-based constraints
chosen by the user. Next, we introduce the notion of fuzzy RDF graph pattern
matching which generalizes result subgraph homomorphism with evaluation of the
RDF graph pattern. Intuitively, given a fuzzy RDF data graph G, the semantics of a
graph pattern Q defines a set of matching, where each matching (from variable of Q
to URIs and literals of G) matches the pattern to a homomorphism subgraph of G.
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 157
Intuitively, when the graph pattern Q is evaluated on a data graph G, the result is
a binary relation M ⊆ V q × V such that:
(a) for each u ∈ V q , there exists v ∈ V such that (u, v) ∈ M;
(b) for each edge (ui , uj ) in E q , there exists a nonempty path p from vi to vj in G
such that (i) the vertex label L(vi ) of vi satisfies the predicate condition specified
by F V (ui ); (ii) the path p is constrained by the regular expression re(ui , uj ); and
(iii) (uj , vj ) is also in M.
From this one can see that pattern query are defined in terms of an extension of
graph simulation (Henzinger et al., 1995), by (i) imposing query conditions on the
labels of vertices; (ii) mapping an edge in a pattern to a nonempty path in a data
graph; and (iii) constraining the edges on the path with a regular expression. This
also differs from the traditional notion of graph pattern matching defined in terms of
subgraph isomorphism (Gallagher, 2006).
Let us now come to the definition of matching result. Since our primary focus is on
fuzzy RDF graph matching, the above definition does not delve into the satisfaction
degree. We need to extend the query evaluation from returning a set of mappings
to returning a set of pairs. Given a fuzzy RDF graph G, a query pattern graph Q,
and a satisfaction degree threshold δ t (0 ≤ δ t ≤ 1), a graph pattern matching query
returns vertices mapping pair sets M = {(m, δ m )|m: V q × V ∧ δ m ≥ δ t }, where m is a
mapping from variable of Q to URIs and literals of G and δ m denotes the satisfaction
degree associated with the mapping.
Note that, a match M is a relation rather than a function. Hence, for each u in V q
there may exist multiple vertices v in V such that (u, v) is in M, i.e., each vertex in
Q is mapped to a nonempty set of vertices in G. Hence, we refer to the relation M
grouped by vertices in V q as a match in G for Q. There may be multiple matches in
a graph G for a pattern Q. Nevertheless, below we show that there exists a unique
maximum match in G for Q. That is, there exists a unique match QM (G) in G for Q
such that for any match M in G for Q, M ⊆ QM (G).
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 159
Proposition 5.1 For any data graph G and any graph pattern query Q, there is a
unique maximum match QM (G) in G for Q.
Proof
1. By Definition 5.3, we show that there exists a match, which covers all the vertices
in V q and is maximum. And it is the union of all matches in G for Q.
2. We then show the uniqueness by contradiction. That is, if there exist two distinct
maximum matches M 1 and M 2 , then M 3 = M 1 ∪ M 2 is a match that is larger
than both M 1 and M 2 .
By (1) and (2) Proposition 5.1 follows.
The task of graph pattern matching problem is finding the set M of subgraph of
G that “match” the pattern Q. Problem formulations often require that Q represent
a single connected graph and, therefore, that M is connected as well. A graph is
connected if there exists some path between every pair of its vertices.
We introduce result graph, to better illustrate the meaning of maximum match. A
result graph Gr = (V r , E r ) is a graph representation of the maximum match QM (G)
in G for Q, where (i) V r is the set of vertices of G in M, and (ii) there is an edge er
= (vi , vj ) ∈ E r if and only if there is an edge (ui , uj ) ∈ E q , such that (ui , vi ) ∈ M and
(uj , vj ) ∈ M. We use the following Example 5.1 to illustrate result graphs.
Example 5.2 Let us consider the fuzzy graph pattern Q of Example 5.1. We evaluate
this matching according to the fuzzy RDF data graph G of Fig. 5.2. The query also
specifies a threshold δ t (δ t = 0.25 in the example), to indicate that only matches
with possible larger than δ t should be returned. The matching process is depicted as
following.
Intuitively, this pattern retrieve the list of films in G, and the matching value of
?film is potentially Diner, Iron Man 2 and Chef . The actors in the three films are
American actor Mickey Rourke, Steve Gullenberg and Robert Downey Jr. respec-
tively. Because three paths p1 = Jon Favreau—birthPlace—New York—locateIn—
America, p2 = Steve Gullenberg—birthPlace—Florida—locateIn—America and
p3 = Robert Downey Jr.—birthPlace—New York—locateIn—America match the
regular expression RE . And the satisfaction degrees are δ re (p1 ) = 0.3, δ re (p2 ) = 0.4
and δ re (p3 ) = 0.75, respectively. However, the genre of film Diner is comedy, which
is not an action movie. And the box office of film Chef is 11 million, which does
not satisfy the condition ?b > 20 million. So, Iron Man 2 is the only movie which
is an action movie with satisfaction degree δ u (“action”) = 0.85 and it’s box office is
over 20 million with satisfaction degree δ co (“29 million”) = 0.7. If we suppose that
μhigh (7.1) = 0.65 and the satisfaction degree of the condition ?r is high is 0.65, which
is the minimum of satisfaction degrees induced by μhigh (7.1) and δ u (7.1). Moreover,
vertex labeled Iron Man 2 and vertex labeled Jon Favreau in G match with vertex
?film and vertex ?p in pattern graph with satisfaction degree 1, respectively. Thus,
the matching result graph is depicted in Fig. 5.3. As the satisfaction degree is the
minimum of satisfaction degrees induced by the results described above, we have
δ Q (G) = 0.3, which satisfy the minimum satisfaction degree threshold constraint.
160 5 Fuzzy RDF Queries
Algorithm 5.1 illustrates a general frame work for a pattern match query Q over a
fuzzy RDF graph G, which is a recursive version of the basic backtrack algorithm
(Golomb & Baumert, 1965). The input of this algorithm is: an RDF graph pattern Q,
an RDF graph G, and a partial map μp , which includes a set of pairs {(<u, v>, δ)} such
that u is a term of Q, v is the image of u in G and δ is a satisfaction degree associated
with the mapping. If we call this algorithm with (Q, G, μø ), where μø is the map with
the empty domain, then at the end of the algorithm we have all homomorphism from
the pattern graph Q into the fuzzy RDF graph G. The algorithm perform as follows:
The procedure first checks whether all homomorphism from the pattern graph Q
into the fuzzy RDF graph G are obtained in line 1. If all the homomorphisms are
obtained, we can stop the recursion process and return the complete solution in line 2.
Otherwise, the procedure chooses a term u ∈ V q to obtain a possible homomorphism
in line 3. After that, Pattern-Match takes each candidate v of the current term u ∈ V q
and the possible map μ, puts v in the mapping pairs, and tries to generate the possible
candidates of v in lines 4–5. This is done recursively in a depth-first manner through
the call of Pattern-Match (note that μp , {(<u, v>, δ)}, and μ are compatible, since the
set <v, μ> is calculated with respect to μp ). At the end of the algorithm, we have a
162 5 Fuzzy RDF Queries
tree that contains one level with a term from Q, i.e., a vertex from Q, and one level
with the possible images of that term in G. The input to each vertex of each level is
the current map. Each possible path in the tree from the root to a leaf labeled by a
term of G represents a possible homomorphism.
Algorithm 5.2 calculates all possible candidate maps in G for the current term u
satisfying the partial map μp . It returns all sets of pairs <v, μ> such that v is a possible
map of u, and μ is the possible map from the terms of each regular expression pattern
Ri appearing in a triple with u and one of the terms in V q already mapped in μp . That
is, if there is no term in V q involved in a triple with u, then the possible candidate
images of u are all v in G such that u can be mapped to v. Otherwise, there exists a set
of terms x 1 , …, x k ∈ V q involved in a triple with u, which are already mapped in μp .
In this case, the maps of x i and v satisfies μ(Ri ), where Ri is the regular expression
pattern appearing in the predicate position of the triple between x i and u. The order
in which the two mapping vertices of x i and v satisfy μ(Ri ) depends on the order
in which u and x i appear in the triple, that is, if the triple is <x i , Ri , u> then <μ(x i ),
v> satisfies μ(Ri ) in G, otherwise <v, μ(x i )> satisfies μ(Ri ) in G. μ maps the terms
appearing in the regular expression patterns of Q into the terms appearing along the
paths in G with respect to μp , that is, μ is a possible map such that μ and μp are
compatible.
At the beginning, we use collection T s to store triples <u, Ri , x i > in the line 1, in
which one of the predecessor vertices of u already mapped in μ. We use T s to store
triples <x i , Ri , u> in the same way in the line 2. If there is no term in T s and T o ,
we calculate the candidate matching information according to the type of u in the
lines 4–10. If u is simple variable and u is not mapped in μp , the candidates are all
v in G such that u can be mapped to v (Line 5). Otherwise, the candidate is μp (u)
(Line 6). If u is a constant or conditional expression, a candidate matching result
is obtained according to the matching operation (Line 9). After that, the algorithm
checks whether the edges between u and already matched query vertices of Q have
corresponding edges between v and already matched data vertices of G in lines 12–
15. It calls Eva to check whether the maps of x i and v satisfies μ(Ri ), and it obtains
temporary candidate set. At the same time, the algorithm updates T s and T o in line
13 and 15. Next, the algorithm proceeds to refine the candidate from T s and T o . it
updates status information in the line 17 and 19 and all changes done are restored.
Finally, we return candidates in line 20. The results of Algorithm 5.2 are used to
calculate the RDF homomorphism of a graph pattern Q into an RDF graph G by
successive joins in Algorithm 5.1.
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 163
Algorithm 5.3 calculates the set of maps μ such that <μ(ui ), μ(uj )> satisfies R in G
with the map μ (it is said that μ satisfies <ui , R, uj > in G). The results of Algorithm
5.3 are used to calculate the candidate homomorphism in Algorithm 5.2.
The algorithm first checks ui . If ui is a constant, i.e., a URIs or a literal, the
result set are obtained via calling the function Reach in line 2. The argument used to
the function is ui itself. Otherwise, the result set is then computed, by using Reach
algorithm in line 4. In this case, ui is a variable, and it constructs the map pair <ui , s>
as the argument used to call function in line 6. Algorithm 5.3 then checks uj , along
the same lines as ui . If uj is a constant, the result set of the algorithm is (s, uj , μ) in
lines 5–6, where s ∈ V. Otherwise, the result set is (s, o, μ' ) in line 8, where μ' ←
μ ▷◁ (uj ← o). Finally, the map result is returned in line 9.
164 5 Fuzzy RDF Queries
Regular path queries have been studied and used for querying databases and semi-
structured data. Liu et al. (2004) presented the algorithm Reach, which included
complete algorithms and data structures for directly and efficiently solving existential
and universal parametric regular path queries. Given a graph G, a regular expression
R, and a start vertex v0 in G, the authors consider a graph to be a set G of labeled
edges of the form <v1 , el, v2 >, with source and target vertices v1 and v2 respectively
and edge label el. They calculate Reach (G, R, v0 , μi ) called the reach set, which
are the set of triples <v, s, μ> such that some path from v0 to v in G matches some
path from s0 to s in R under map μ. The principle of the algorithm is based on the
following two rules:
Rule 1: if <v0 , el, v> ∈ G, <s0 , tl, s> ∈ R and μ ∈ match(tl, el), then <v, s, μ> ∈
Reach(G, R, v0 , μi );
Rule 2: if <v, s, μ> ∈ Reach(G, R, v0 , μi ), <v, el, v1 > ∈ G, <s, tl, s1 > ∈ R, μ1 ∈
match(tl, el) and μ2 = merge(μ, μ1 ), then <v1 , s1 , μ2 > ∈ Reach(G, R, v0 , μi ).
Here, match(tl, el) is the set of minimal substitutions μ such that el matches tl
under μ.
In order to realize the reachable query of fuzzy RDF regular path, we propose a
path reachable algorithm based on this method.
Algorithm 5.4 describes the detailed process, which computes all pairs <v, μ>
such that there is some path from v0 to vertex v that matches some path from s0 to
some vertex in A under map μ with satisfaction degree δ. In Algorithm 5.4, H is
the set of triples already considered for the reach set, W is the worklist of triples
yet to consider, E is the matching result, and we can compute Reach (G, R, v0 ,
μi ) by repeatedly adding triples according to the aforementioned two rules. We use
adjacency list to store adjacency information of each vertex of fuzzy RDF graph,
i.e., a list of triples (vertex ID, edge label, edge membership degree) ordered by the
vertex ID. We use nested arrays, hash tables, or combinations of them for R and W,
as well as for S.
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 165
This algorithm calculates the set of triples <v0 , vk , μ>, where vk is a vertex of G
and μ is a map from terms of R into terms of G such that there exists a sequence T
= (v0 , …, vk ) of vertices of G and a path label ω ∈ L(R) with T is a path label of
ω in G according to μ. We convert regular expression pattern straightforwardly to
a nondeterministic finite automaton, denoted by NDFA (Holub & Melichar, 1998)
in line 1. An automaton is a set A of labeled transitions of the form <s1 , tl, s2 >,
with source and target states s1 and s2 respectively and transition label tl, a finite
state set S, a start state s0 , and a final state set F ∈ S. To construct a NDFA that
generates an equivalent language to a given regular expression, we use the same
way described in (Aho & Hopcroft, 1974). Then we initialize reach set H, worklist
W and query result E in line 2. We compute possible map by adding triples yet to
consider into worklist W according to Rule 1 in lines 4–6. Given an edge label el
and a transition label tl, let match (tl, el) in line 5, which takes a set of symbols as
an implicit argument, be the set of minimal substitutions μ such that el matches tl
under μ. For each triple <v, s, μ> took from worklist, we add it to the set of triples
already considered for the reach set and update worklist in lines 7–8. We map a
pair <v, s> to the set of triples <v1 , s1 , μ1 > such that there is <v, el, v1 > in G and
<s, tl, s1 > in A and match (tl, el) = μ1 according to Rule 2 in lines 9–12. When a
mapping is dynamically constructed, we add it to the array of mappings if it is not
already present. To efficiently check whether it is present, we can maintain a nested
array structure representing all previously constructed mappings. We simply check
whether el matches tl under each of the extensions in line 11. In case of matching,
we merge the mapping with previously constructed mappings, and we calculate the
degree of satisfaction after the connection in line 12. If the extensions mapping is
166 5 Fuzzy RDF Queries
not in the set R, we add the result to the worklist W in lines 13–14. If s is the final
states (s ∈ F), the algorithm terminates execution and we add the matching result to
E in lines 15–16. We return E in line 17.
Proposition 5.2 Algorithm 5.1 is correct and complete for enumerating all RDF
homomorphism from a given pattern graph into a fuzzy RDF graph.
Proof We can prove this by means of induction. The set of all homomorphism
is complete for the empty set at the beginning of the algorithm. Because Algorithm
5.4 is complete (Alkhateeb et al., 2009) and the number of vertices being finite, the
partial homomorphism, i.e., μp , are completely extended for the current vertex at
each step. Finally, the procedure ends having a homomorphism mapping for each
vertex in Q.
Reach algorithm considers each triple <v, s, μ> in W and R, iterates over all
outgoing edges of v and outgoing transitions of s, and computes a match and possibly
a merger taking time O(predicatesize) and O(vars(Ri )), respectively, in each iteration.
The factor map is used because only substitutions that are the third component of a
triple in W and R, i.e., that match some path from v0 in G with some path from s0 in R,
are considered. So, Reach algorithm has worst-case running time O(|G| × |R| × maps
× (predicatesize + vars(Ri ))). For each triple <u1 , Ri , u2 > in Q, the Reach algorithm
is called by the Evaluate algorithm once if u1 is a constant; otherwise it is called for
each vertex in G multiplied by the number of variables in Q in the subject position.
So, Eva algorithm has overall time complexity O((vars(Q) × subj(G) + const(Q))
× |G| × |R| × maps × (predicatesize + vars(Ri ))), where vars(Q) and const(Q) are
the number of variables and constants appearing in the subject position in a triple of
Q. This result shows an exponential complexity O(pred(G)vars(R) ). However, vars(R)
can be a constant since it is usually considered very small with regards to the data
graph. Hence, the complexity of query evaluation is O(|G|2 ).
At the core of many advanced RDF graph operations, lies a common and critical graph
matching primitive. Particularly, as one of the most important topics in this area,
efficiently finding all occurrences of a subgraph pattern have received considerable
attention (Lian and Chen, 2011; Moustafa et al., 2014). Subgraph pattern matching
is meaningful and useful in many applications. For example, answering SPARQL
queries in RDF database is actually equivalent to conducting subgraph isomorphism
match over graphs, in which users need to pose a query with strict conditions over
the database. Nevertheless, as users are not very clear about the contents and the
data distribution of the database, such a strict query often leads to the Few Answers
5.3 Approximate Fuzzy RDF Subgraph Match Query 167
Problem (Yan et al., 2017): the user query is too selective and the number of answers is
not enough. In the worst case, they even cannot get matching results of some queries.
More importantly, classical SPARQL querying assumes that RDF data are certain
and accurate and it does not consider fuzzy information in the querying process.
This motivates us to investigate fuzzy subgraph matching techniques suitable for
query answering, which can relax the rigid structural and label matching constraints
of subgraph isomorphism and other traditional graph similarity measures. In order
to efficiently answer the subgraph pattern query over the fuzzy RDF data graph,
inspired by the method of joining the path query graph introduced in (Virgilio et al.,
2015; Moustafa et al., 2014; Zhao & Han, 2010), we choose the path instead of the
vertex as the basic matching unit and propose a new path-based solution to efficiently
answer subgraph pattern queries over such fuzzy RDF graphs. The process of fuzzy
RDF subgraph pattern matching is as follows: the pattern graph is firstly decomposed
into a set of paths that start from a root vertex and end into a destination vertex, then
these paths are matched against the data graph, and the candidate paths that best
match the query paths are finally reconstructed to generate the answer. At the same
time, we calculate the path match membership (referring to an absolute possibility
of a match), and then aggregate into an overall match membership above a given
threshold during the query evaluation process.
In the context of RDF graph, different paths denote different semantic relation-
ships between vertices. For an RDF graph, its root vertex is a vertex with indegree
(number of incoming edges) zero. While a destination vertex is a vertex with outde-
gree (number of outgoing edges) zero. A path whose starting vertex is a root is called
an absolute path. In addition, if there is no root vertex in the RDF graph, the starting
vertex of the path is the vertex with the largest difference between outdegree and
indegree. We call such vertices hubs.
In our work, path expressions can be extracted from RDF graph G by breadth-
first traversal on every vertex starting from the roots. For each step, the absolute path
expressions from all roots to the current vertex and the vertex itself are output and
stored in relational tables path and resource, respectively (Matono et al., 2005).
Definition 5.5 (path subsumption). Given two paths p and p' in RDF graph, p = v1 ,
e1 , v2 , e2 , v3 , …, em−1 , vm , p ' = v1' , e1' , v2' , e2' , v3' , . . . , en−1
' '
, vn and m ≥ n. For each
1 < k < n, if ∀vk , ek ∈ p , such that ∃ vi , ei ∈ p, and vi = vk , ei = ek' , we say that p'
' ' ' '
Example 5.3 Let us consider for instance the fuzzy RDF graph G in Fig. 5.4a. This
graph has two root vertices (mid1 and mid2) and three destination vertices (country1,
country2 and country3). Three examples of paths of G are: p1 = mid1—Title—
Movies1, of length 1, p2 = mid2—Director—pid3—bornIn—City3, of length 2,
and p3 = mid2—Director—pid3—bornIn—City3—locateIn—Country3, of length
3. Among them, p2 is subsumed by p3 , namely p2 ⊆ p3 .
Example 5.4 Let us now consider the query graph Q in Fig. 5.4b, which has two
root vertices (mid1 and mid2) and two destination vertices (City2 and tragedy). We
decompose Q into three paths q1 , q2 , and q3 that start from a root vertex and end into
a destination vertex. It follows that the paths of Q are:
The intersection points between the paths q1 and q2 are pid2 and City2 and the join
predicates are JoinPredicate(q1 , q2 ) = {(q1 .pid2 = q2 .pid2), (q1 .City2 = q2 .City2)}.
In the same way, the intersection points between the paths q2 and q3 are mid1, and
the join predicates are JoinPredicate(q2 , q3 ) = (q2 .mid1 = q3 .mid1).
5.3 Approximate Fuzzy RDF Subgraph Match Query 169
A subgraph query is to identify the occurrences of the query subgraph in the fuzzy
RDF database graph. A query graph Q = (V Q , E Q , L Q ) is an RDF graph, where
each vertex v ∈ V Q is labeled with a label L Q (v) ∈ Σ. The query graph specifies the
structural and semantic requirements that a subgraph of G must satisfy. Abstractly,
a subgraph query takes a query graph Q as input, retrieves the data graph G that
contains (or is similar to) the query graph, and returns the retrieved graphs or new
graphs composed from the retrieved graphs. In the fuzzy RDF database graph, we
formally define subgraph matching below.
170 5 Fuzzy RDF Queries
Given a fuzzy RDF data graph G, a query graph Q, and a user-specified satisfaction
threshold δ th ∈ [0, 1], where |V Q | ≤ |V|, a subgraph matching query is composed of
several parts including elements (vertices and edges) matching, structure matching
and match membership (referring to an absolute possibility of a match). Its answer
is a set of subgraphs M, such that (1) subgraph m ∈ M is similar to query graph Q,
and (2) matching membership δ m > δ th holds.
Naively, this problem can be solved by directly performing traditional subgraph
pattern matching over RDF graph. However, there are two key issues that need to be
solved:
How to effectively search for possible subgraphs in RDF graph?
How to effectively calculate the match satisfaction degree?
In order to deal with these two issues, we carefully design the corresponding
solutions to these two problems. As far as the first question is concerned, Zhao
and Han (2010) analyzed that paths have more advantage than trees and graphs as
appropriate indexing patterns in large graphs. Although more structural information
can be preserved by trees and graphs, their potentially massive size and expensive
pruning cost even outweigh the advantage for search space pruning. Thus, we choose
the path as the graph indexing during graph query processing. For the second issue,
the membership of a match M on G is an aggregation of the membership of a set of
matching paths. As the paths in this set are exactly those paths containing all vertices
in V M with correct labels as well as all edges in E M .
In the remainder of this section, we show how to measure path similarity by
calculating path edit distance and calculate the satisfaction degree of a given match
directly. This forms the basis for the algorithms discussed in Sect. 5.3.2, which further
speed up fuzzy subgraph pattern matching.
5.3 Approximate Fuzzy RDF Subgraph Match Query 171
In order to compare the data paths to an input query path and decide which of the data
paths is most similar to the query path, it is necessary to define a distance measure
for paths. Similar to the string matching problem where edit operations are used
to define the string edit distance (Wagner and Fischer, 1974), we define a path edit
distance that is based on the idea of altering the paths by means of edit operations
until there exist a path equaling to the query path.
Definition 5.7 (Edit Operation). Given an RDF path p, a basic path edits operation
ω(p) on p is any of the following:
Definition 5.8 (Edited Path). Given an RDF path p and a sequence T = (ω1 , ω2 , …,
ωn ) of edit operations, the edited path, T (p), is a path T (p) = ωn (…ω2 (ω1 (p))…).
In order to model the fact that certain edit operations are more likely than others,
each basic path edit operations ωi is assigned a certain cost c(ωi ). The cost c(ωi )
of an edit operation varies according to the type of edit operation and the nature of
the involved RDF element (Gao et al., 2010). For example, modifying vertex label
is less relevant to vertex insertion because the latter increases the semantic distance
between paths. It is obvious that how to determine the similarity of components in
paths and define costs of edit operations are the key issues. In order to make the
problem simple, in our work, we fix the cost of basic edit operations of insertion,
deletion, and labeling modification to 1, 0.5 and 0, respectively.
Σn
The total cost of the transformation of p into T (p) is given by c(T ) = i=1 c(ωi ).
In other words, the cost of edited path is the sum of the costs of all edit operations
in the sequence T. It is not difficult to see that there is usually no less one sequence
of edit operations that transforms one path p to another path T (p). For our path edit
distance measure, we are particularly interested in the sequence with the least cost.
172 5 Fuzzy RDF Queries
Definition 5.9 (Path edit distance). Given two paths p and p' , the path edit distance
between p and p' is defined as: dist ( p, p ' ) = MinTi ∈T {c(Ti )| T i is a sequence of
path edit operations that transformation p to p' }.
According to the above definition, we can conclude that the smaller the path edit
distance between a data model and an input query path, the more similar they are.
Intuitively, we will calculate the graph similarity distance by computing alignments
on the paths. It follows that a matching answer of Q over a data graph G is a set of
matching of all the paths of Q that forms a connected component of G (Virgilio et al.,
2015).
In a classical RDF database, the answer to a query Q is either true or false definitely.
However, in a fuzzy RDF database, the system computes the answers and for each
answer computes a membership score representing the possibility. In terms of fuzzy
RDF graphs, the existential possibility associated with an element (vertex and edge)
should be the possibility of the state of the world among these elements. On the
surface, each possibility in the fuzzy RDF graph is a relative one based upon the
assumption that the element exists is independent. Therefore, we consider this possi-
bility as a relative possibility. However, each element in the RDF graph is dependent
upon the graph structure. Correspondingly, the existential possibility of a substruc-
ture (such as a path or a subgraph) composed of some basic elements in a graph
must depend on the relative possibility of the elements. For example, the existential
possibility of a path is related to each the relative possibility of each element (vertex
and edge) in the path. Therefore, we consider this possibility as an absolute possi-
bility. In order to calculate the absolute possibility (whole membership) of a match,
we must consider all the relative possibilities in the match. In general, the absolute
possibility of a match can be computed by aggregating the relative possibilities in
the match.
In a fuzzy RDF graph, we define three kinds of fuzzy structures, namely the triple
structure, the path structure and the graph structure. The fuzziness membership of
these three structures can be defined as follows.
(i) The fuzziness membership in the single triple.
In RDF graph, every triple describes a directed edge labeled with p from the
vertex labeled with s to the vertex labeled with o. The interpretation of each
triple is that subject s has property p with value o. Thus, an RDF triple can be
seen as a relationship from the subject vertex to the object vertex. Hence, the
absolute possibility of a triple can be computed by aggregating the possibilities
of s, p and o. We introduce a membership aggregation function to calculate the
fuzziness memberships for RDF triples.
5.3 Approximate Fuzzy RDF Subgraph Match Query 173
It should be pointed out that applications have the freedom to choose a function
that fits their use cases. The minimum, for instance, is a cautious choice. It assumes
that the possibility of a triple is simply the possibility of the least possibility item
of the triple. The median is another reasonable membership aggregation function.
In our work, we choose the Zadeh’s logical product (minimum) t-norm (Zou et al.,
2014) for aggregating the relative possibilities.
(ii) The fuzziness membership in the single path.
The concept of a fuzzy relationship plays a fundamental role in modeling a fuzzy
graph. Let V be a set of vertices, a fuzzy relationship on V is a mapping function
ρ: V × V → [0, 1] where ρ(x, y) indicates the degree of relationship between x
and y. The fuzzy relation ρ may be viewed as a fuzzy subset on V × V, which
be used to represent the relationship between vertices. An important operation
on fuzzy relations is composition. In general, fuzzy relationship composition
is applied to derive new relationships between two relationships by reusing
already existing relationships.
Definition 5.11 (Zimmermann, 1996). Let V be a set of vertices. For i ∈ {1, 2, 3},
μi is a function from V i into [0, 1], and for i ∈ {1, 2}, ρ i is a function from V i ×
V i+1 into [0, 1], i.e. ρ 1 and ρ 2 be two fuzzy relations on μ1 × μ2 and μ2 × μ3 ,
respectively. The composition of ρ 1 and ρ 2 , denoted by ρ 1 ◦ ρ 2 , is defined as
∀ (u1 , u3 ) ∈ V 1 × V 3 , we have (ρ 1 ◦ ρ 2 ) (u1 , u3 ) = supu 2 ∈V2 {ρ 1 (u1 , u2 ) ∧ ρ 2 (u2 ,
u3 )}, here ∧ is the minimum.
Like the triple membership aggregation function in Definition 5.9, we also choose
the minimum t-norm for aggregating the relative possibilities. It is clear that δ p =
ρ(v1 , v2 ) ∧ ρ(v2 , v3 ) ∧ … ∧ ρ(vn−1 , vn ) ∧ μ(v1 ) ∧ μ(v2 ) ∧ … ∧ μ(vn ), i.e., it is the
minimum fuzzy value of the edge or vertex in the fuzzy path.
(iii) The fuzziness membership in the graph.
The fuzziness memberships of an RDF subgraph can be computed by aggre-
gating the possibilities of the set of paths comprising the subgraph. Hence, we
introduce a membership aggregation function to calculate the memberships for
RDF subgraphs.
The query processing phase first decomposes the query graph Q into a set of paths
that start from a root and end into a destination. In our example, we decompose Q
into three paths as described in Example 5.4.
Then the query method extracts all the paths of data graph G in Fig. 5.4a that
align with these query paths taking advantage of a special index structure that is built
off-line. In our example, the following data paths of G would be extracted:
Based on the above analysis, we propose a path-based solution to the fuzzy RDF
subgraph matching. The approach is composed of two main phases:
1. Data Preprocessing: The graph traverse algorithm is very time-consuming, since
it is made in every user interaction. Thus, we need to build an indexing structure
that contains information about vertices and edges in fuzzy RDF data graph.
The graph indexing is executed only one time, independently of the user interac-
tion. Based on the fact that paths have more advantage than trees and graphs as
appropriate indexing patterns in large graphs (Zhao & Han, 2010), we propose
a novel graph indexing method, context-aware path indexing, to capture infor-
mation about the graph paths and their membership degrees, enabling efficient
retrieval of candidate matches.
An optimization strategy by starting only from the root vertices is then considered.
In order to extract the set of all paths that reach a given vertex v, we started the
exploration of G from the roots by using a breadth first search. For each step, the
corresponding path expressions from all roots to the current vertex and the vertex
itself are output and stored in the path table and resource table, respectively. The
resource table can be used to locate the destination vertex of each candidate path,
such that given a vertex v in the query graph, we can easily figure out its candidate
paths. The path table enables us to skip the expensive graph traversal at runtime. In
order to increase efficiency of path-based query processing, we introduce reverse-
path expressions and build a B+ tree index in the path table. In addition, we pre-
computed and store the underlying membership degree of each path by applying
the corresponding aggregation functions as specified in Sect. 5.3.1.4. Thus, the
path index contains all reverse absolute arc-path expressions from the current
vertex to all roots in the fuzzy RDF graph, with an aggregation membership
degree δ.
2. Query Processing: This is the subgraph matching phase, which consists of three
sub phases, namely path decomposition, finding candidate path and jointing
candidate path. Figure 5.5 illustrates a general framework for a pattern match
query Q over a fuzzy RDF graph G. We briefly present each step in the following.
• Path Decomposition. In this step, we partition the query graph into a set of
paths Q = {q1 , q2 , …, qk } by decomposition algorithm. To facilitate recon-
struction answers subgraph, we employ a k-partition intersection graph to
preserve the structure information of the graph query. In the k-partite inter-
section graph, a vertex corresponds to a query path q while an edge (qi , qj )
5.3 Approximate Fuzzy RDF Subgraph Match Query 177
means that the paths qi and qj share at least one common vertex, i.e., paths
qi and qj are jointly and there are at least one intersection points between
them. Moreover, intersection points between the paths are expressed as join
predicates, which have to be satisfied when combining (reconstructing) path
matches into a full query match.
• Finding path candidates. For each query path q ∈ Q, we first conduct fuzzi-
ness membership (the membership degree must be greater than or equal to the
user-specified threshold δ th ) to obtain a set of qualified candidates matches in
the indexed paths of data graph G. Then we use path edit distance dist (q, p)
between query path q and data path p to further filter the remaining match set.
By using these later, the system generates from G all paths that have a good
candidate of the query paths.
• Combination. In this step, we obtain the full graph matches by reconstruction
candidate paths using a graph explore algorithm, which performs message
passing in the k-partition intersection graph where each partition corresponds
to a path in the query decomposition. The results are a set of approximate
subgraphs included in G, and it generated from joining all candidate paths
matching with the paths in the decomposition. In the end, the actual matching
answers are ranked according path edit distance, and the user is able to explore
these subgraphs to get more information about vertices.
In this section, we discuss how graph matches are processed. Given a query graph Q,
we first study how Q can be split into a set of paths, among which parts of paths with
good selectivity are then selected as candidates. Q is then reconstructed by joining
the selected candidate paths until every edge in Q has been examined at least once.
We discuss each step of the query processing in the following subsections.
1. Query Path Decomposition
Given a query graph Q, the main task of query decomposition is to split Q into
a set of possibly overlapping paths, denoted as P, that cover the entire query, by
traversing the entire query graph Q. As finding a least-cost path decomposition
based on the number of operations involved in producing the final result is too
costly, we use a simple path decomposition method in order to reduce query
search space and improve efficiency. The idea is simple: the set of all paths from
a vertex s to another vertex t is the intersection of all paths starting at s and the
set of all paths ending at t. The task of path decomposition is to split the query
into a set of possibly overlapping paths, each of length L or less, that cover the
entire query, and whose matches can be obtained from the path index.
178 5 Fuzzy RDF Queries
The principle to decompose query graph Q into a set of paths P is that we start the
exploration of graph Q from a root by using a bread-first search and extract all paths
starting from the root and end into a destination, whose matches can be obtained
from the path index of the data graph G. In order to preserve the graph structural
information of the query, the elements of P are organized as a k-partite intersection
graph.
Now, we are ready to implement the function which will list all paths between a
pair of vertices. The implementation is easy: look at the opposite graph of Q then
find the paths beginning at the given vertex and return the reverses of each path. The
code below will find all paths between every pair of root and destination.
In Algorithm 5.5, we begin by initializing the set of paths in line 1, and then extract
root vertex of Q in line 2. We further call function findpath for each root vertex to
obtain the paths and we add them into P in lines 3–4. Finally, we establish a k-partite
intersection to keep structural information of query graph Q in lines 5–9, in which
we obtain the intersection points and join predicates between the path q and q' .
Function findpath shows the main algorithm of query path decomposition, which
operates in three stages. In the first stage, we initialize all varies, in which PathSet
be used to store decomposed path set. We initialize it as null in line1. We use π[v]
to store parent vertex of v and set the parent of root vertex s to be NIL in line 2. We
use queue Queue to store visited vertices in line 3. In the second stage, the breadth-
first search algorithm develops a spanning tree (a breadth-first search tree) with the
source vertex, s, as its root. The parent or predecessor of any other vertex in the tree
is the vertex from which it was first discovered. For each vertex v, the parent of v is
placed in the variable π[v]. After initialization, the source vertex is discovered. Line
4 initialize Queue to contain just the root vertex s. Lines 6–9 guarantees to remove
the vertex u from the queue when insert the new vertex v adjacent to the vertex u in
the queue and establish the search tree. At the same time, whether the vertex adjacent
to u is visited in the process of creating the search tree, if it is not visited, we insert it
in the queue in line 10. The breadth-first search traversal terminates until the queue is
empty, i.e., every vertex has been fully explored. In the last stage, we obtain the path
from the source vertex s to destination vertex t in lines 11–17. Breadth-first search
algorithm builds a search tree containing all vertices reachable from s. The set of
edges in the tree contains (π[v], v) for all v where π[v] /= NIL. If s is reachable from
the bottom of the tree v then there is a unique reverse path of tree edges from v to s.
We return path set PathSet in line 18.
5.3 Approximate Fuzzy RDF Subgraph Match Query 179
How to extract from G the paths that are similar to the query paths is important.
Every query path q ∈ PathSet has two specific labeled vertices: the root vertex,
denoted by s(q) ∈ V (Q) and the destination vertex, denoted by t(q) ∈ V (Q). From
this, if the destination vertex t is specified, we can find its correspondents in data
graph G by acceding to the extended labels L(v) of every v ∈ V (G) using the labels
similarity which is able to discover the common meaning of given labels of two
180 5 Fuzzy RDF Queries
vertices. Thus, every destination vertex t(q) has a set of similar vertices from G,
denoted by M t(q) so M t(q) = {v|v ∈ V (G), L(v) = L(t)}. The goal here is clear: for
every query path q, using vertices in M t(q) , searching in the indexed paths of RDF
graph G and discovering a set of candidate paths, denoted by CandidateSet(q), which
represent an approximation of the query path q. In order to reconstruct the query Q
in an efficient way, we build a set for every element q ∈ PathSet. Then, we group in
the same set all the paths p of data graph G having a destination vertex that matches
the destination vertex of q. Thus, each path in the same set maps counterpart path q
of PathSet.
To build the answer subgraphs, the approximate candidate paths that participating
in this building must be computed. Since there is many false positive during the
matching candidate path examining, we need to prune the false positive in the first
place. For every path q ∈ PathSet, we access the path index to get its candidate
matches set CandidateSet(q) by only keeping those paths that satisfy certain context
criteria as following:
(i) We compute the path edit distance dist(q, p) between query path q and data path
p. The main goal of the path edit distance is deciding if a given path has a good
approximation of q. It can be concluded that the smaller the path edit distance
between a query and a data path is, the more similar they are. For a path p ∈ G,
if dist(q, p) is the smallest, p be a candidate for the corresponding path q.
(ii) We should obtain the fuzzy satisfaction degrees δ p for path p. The fuzziness
membership δ p in the single path p must be greater than or equal to δ th , which
is a user-specified fuzziness membership threshold.
Given a path q ∈ PathSet, we perform the above criteria tests to efficiently obtain
the final list of candidates CandidateSet(q) from G. And extraneous paths in the data
graph are automatically ignored. Thus, we are able to compute a set of ranked tuples
containing the candidate’s paths of q and its path edit distance. And the tuples in a
set are ordered according to their path edit distance, with the lower coming first.
5.3 Approximate Fuzzy RDF Subgraph Match Query 181
Given the query path set PathSet (i.e. also the k-partite intersection graph) and a
data graph G, we retrieve and select the paths from G ending into the destinations of
the paths of PathSet, as showed in Algorithm 5.6.
In Algorithm 5.6, for each q ∈ PathSet, we firstly extract destination vertex t of q
in line 2. Then we select all possible paths p from G index matching t by the function
getpaths in line 4. This prevents a sequential scan of all paths in a large graph. After
has obtained the possible path set C, we prune the false positive in line 6. At the same
time, we compute the path edit distance of each p transformed from q and we insert p
in the set cn in ascending order in lines 8–9. At the end, we insert cn in CandidateSet
in line 9. The set CandidateSet is implemented as a map where the key is a path q
from P and the value is a set with all the paths p ending in the destination of q. Each
set is implemented as a priority queue of paths, where the priority is determined by
the path edit distance associated with each path.
3. Full Query Matches
The last step of algorithm is selecting the most relevant paths and generating the
full query matches by joining the paths with the lowest path edit distance from
each set. The join order is determined by exploring the k-partite intersection
graph where vertices represent the retrieved paths, while edges between paths
mean that they have vertices in common. The join condition is the number of
join predicates between path p and p' equaling number of join predicates between
path q and q' , where q and q' are the paths corresponding to the sets where p and
p' were included, respectively.
At the same time, a join operator that operates on fuzzy solution matching has
to consider the fuzzy membership values while combining solutions. The fuzzy
membership value of a combined solution is an aggregation of the membership
values associated with the individual paths that has been used for combining. In
other words, the absolute possibility of a match can be computed by aggregating
the relative possibilities in the match. To determine the fuzzy membership values
for solution matching, we choose the minimum as our application-specific fuzzy
membership aggregation function. Algorithm 3 outlines the combining procedure.
Algorithm 5.7 starts from the matches of one path and progressively adds
matches of joining paths, based on the k-partition intersection graph K-
partiteIntersectionGraph. Once we have obtained the set CandidateSet, our graph
search algorithm is then performed by joining the most promising paths from Candi-
dateSet. We initialize our result set to an empty set in line 1. If there are no results
after a joining process ends, we output the empty set. If we are not able to generate
k answers and the set CandidateSet is not empty in line 2, we obtain the top-k
answer by selecting and combining the paths ordered in increasing order of the
path edit distance from each set CandidateSet. Firstly, we initialize the answers
set and the fuzzy membership value in line 3. Then we choose the vertex q of
K-partiteIntersectionGraph with the largest number of vertices overlapping (join
predicates) with the existing paths in line 4. We select the set cn corresponding to q
and we dequeue the top paths p from cn in lines 5–6. The path q is added into the set
V of visited matching paths in line 7. In lines 8–9, we add p into the answer ans and
182 5 Fuzzy RDF Queries
computer the fuzzy membership value δ m of the answer. We obtain the full answer
in line 10 by a breadth-first search traversal as shown in detail in function BFS-visit.
Finally, we include the full answer ans in the set ApproximateAnswersSet in line 11.
By using this strategy, if we are not able to find k approximate answers for the query
graph Q, the process is stopped.
We now analyze the complexity of each step of our algorithm. In the data prepro-
cessing, we need to construct a path indexing structure by traversing fuzzy RDF graph
G. For each vertex v, we exploit an optimized implementation of the breadth-first
search traversal from root vertex s to collect path information. Suppose the average
degree of s in G is d, it is straightforward to demonstrate that the time complexity of
the data preprocessing phase is O(|E| + |V| × d), where |E| is the number of relations,
|V| is the number of vertices in RDF graph G and d is the largest vertex degree.
The core procedure of the path decomposition step is a breadth-first search
processing in essence. The while-loop in the breadth-first search is executed at most
|V Q | times. The reason is that every vertex enqueued at most once. So, the complexity
is O(|V Q |). The for-loop inside the while-loop is executed at most |E Q | times since
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 183
Q is a directed graph. The reason is that every vertex dequeued at most once and we
examine (u, v) only when u is dequeued. Therefore, each edge is examined at most
once as directed. So, the complexity is O(|E Q |). Therefore, the total running time for
the sub-step is O(|V Q | + |E Q |), where |V Q | and |E Q | are the numbers of vertex and
edge of query graph Q, respectively.
The complexity of the procedure of finding path candidates step is |P| × O(D),
where |P| is the number of the query paths in set P and D is the number of paths
retrieved by the index that, in the worst case, is proportional to the size of data. That
is, we have to execute D insertions into CandidateSet for |P| times at most.
In full query matches step, the joint sub-step is most time-consuming. And it
iterates at most k times, where k is the number of the returned answers. In each
iteration, there is a call of the function BFS-visit, which explores the k-partition
intersection graph. In the worst case, it has a cost in O(h × D) since it checks h times
each data path in G, where h is the depth of K-partiteIntersectionGraph. Therefore
the complexity of this sub-step is O(k × h × D), since k times we call the function
BFS-visit to explore K-partiteIntersectionGraph, that is O(h × D).
Fuzzy queries to databases have been suitably used in several domains such as in
decision making support or linguistic summarization. In particular, fuzzy quantified
queries have proved useful in a relational database context for expressing different
types of imprecise information needs (Bosc et al., 1995). This work examines advan-
tages of fuzzy queries, which provide a better representation of the user requirements
by expressing imprecise conditions through linguistic terms. In this section, we intro-
duce fuzzy quantifiers (Zadeh, 1983) into fuzzy RDF database queries. Such quan-
tifiers can be used to express an intermediary attitude between conjunction (“all of
the criteria must be satisfied”) and disjunction (“at least one criterion must be satis-
fied”). They model linguistic expressions such as, “most of”, “about a third”, and are
notably used to construct fuzzy predicates (with quantifications).
Fuzzy quantified queries have received significant attention in the database
community for several decades. Bouchon-Meunier and Moyse (2012) proposed an
overview of linguistic summarization, presenting the main streams of a symbolic
representation and management of numerical data, which can be crisp or fuzzy. They
pointed out that fuzzy approaches bring solutions to the imprecision of quantifica-
tion and the use of subjective qualification of data. Delgado et al. (2014) presented
an overview of the existing approaches for evaluating and managing statements
involving quantification. In a graph database context, there have been some recent
proposals for incorporating quantified statements into user queries (see, (Bry et al.,
2010; Blau et al., 2002; Yager 2014; Pivert et al., 2016c). SPARQLog (Bry et al.,
2010) extended SPARQL with first-order logic (FO) rules, including existential and
universal quantification over vertex variables. QGRAPH (Blau et al., 2002) annotated
vertices and edges with a counting range (count 0 as negated edge) to specify the
184 5 Fuzzy RDF Queries
number of matches that must exist in a database. Yager (2014) briefly mentioned the
possibility of using fuzzy quantified structure queries in a social network database
context. He also suggested interpreting it using an OWA operator. However, the
author did not propose any formal language for expressing such queries. Pivert et al.
(2016c) considered a particular type of fuzzy quantified structural query in the general
context of fuzzy graph databases and showed how the fuzzy quantified structural
query could be expressed in FUDGE which is an extension of the CYPHER query
language. Castelltort and Laurent (2016) proposed an approach aimed to summarize
a (crisp) graph database by means of fuzzy quantified statements. They considered
a crisp interpretation of this concept and recall how the corresponding query can be
expressed in CYPHER. A limitation of this approach was that only the quantifier was
fuzzy. More recently, Fan et al. (2016) introduced quantified graph patterns (QGPs),
an extension of the classical graph patterns using simple counting quantifiers on
edges. The authors also showed that quantified matching in the absence of negation
did not significantly increase the cost of query processing. However, quantified graph
patterns could only express numeric and ratio aggregates, and negation besides exis-
tential and universal quantification. They did not consider fuzzy quantified patterns
matching in the fuzzy RDF graph database.
In the following, we intend to integrate linguistic quantifier in a subgraph patterns
addressed to a fuzzy RDF graph database and use graph pattern matching approach
to evaluate fuzzy quantified query. In a fuzzy RDF graph database context, fuzzy
quantified queries have an even higher potential since they can exploit the structure
of the RDF graph, beside the label values attached to the vertices or edges. In the
present section, we define the syntax and semantics of an extension of the query
pattern graph that makes it possible to express and interpret. In addition, in order to
answer subgraph pattern queries efficiently over fuzzy RDF data graph, we present
a novel approach for evaluating fuzzy quantified graph pattern.
Linguistic summaries have been studied for many years and allow to sum up large
volumes of data in a very intuitive manner. They have been studied over several types
of data. However, few works have been led on graph databases. In this section, we
recall important notions about linguistic quantifiers and fuzzy quantified statement.
Linguistic quantifiers modelled by means of fuzzy sets is then proposed for modelling
the so-called fuzzy quantified statements.
1. Linguistic quantifier
The notion of fuzzy or linguistic quantifier (Zadeh, 1983) describe an inter-
mediate attitude between the universal quantifier ∀ and the existential quanti-
fier ∃. Depending if quantifiers represent imprecise quantities or proportions of
quantifiers are classified into absolute or relative quantifiers respectively.
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 185
(i) Absolute quantifiers express quantity over the total number of elements of
a particular set, stating whether this number is, for example, “much more
than 10”, “around 5”, “a great number of”, and so forth.
(ii) Relative quantifiers express measurements over the total number of
elements, which fulfill a certain condition being dependent on the total
number of possible elements (the proportion of elements). This type of
quantifier is used in expressions such as “most”, “little of”, “at least half
of”, and so forth.
Consequently, the truth of the relative quantifier depends on two quantities. In
this case, in order to evaluate the truth of the quantifier, we need to find the total
number of elements fulfilling the condition and to consider this value with respect to
the total number of elements that could fulfill it (including those that do fulfill it and
those that do not). Essentially, linguistic quantifiers are fuzzy proportions or fuzzy
probabilities.
Q abs : R → [0, 1]
Q r el : [0, 1] → [0, 1]
where the domain of Qrel is [0, 1] because the division a/b ∈ [0, 1], where a is
the number of elements fulfilling a certain condition and b is the total number of
existing elements. The value μQ (x) expresses the extent to which proportion x (resp.
the cardinality x) agrees with the quantifier. Therefore, linguistic quantifiers can be
considered as fuzzy conditions which are defined on cardinalities or proportions.
A quantified statement of the Type I means that, among the elements of set X,
a quantity Q satisfies the fuzzy predicate f 1 . Such a statement can be more or less
true and many approaches can be used to interpret the quantified statement. Note
that Type II generalizes the case of Type I by considering that the set to which the
quantifier applies is itself fuzzy. An example of Type I statement is “Most of the
students are young” and of Type II is “Most of the good students are young”, where
X is a finite set of students, the quantifier is “most”, f 1 is the property “young” and
f 2 represents the property “good”.
Also associated with a linguistic quantified statement is a truth value [0,1], called
satisfactory degree of the statement. The process of calculating the satisfactory degree
of a quantified statement is usually known as an evaluation method. The problem is
to find truth value μ(Q of X are f 1 ) or μ(Q of f 2 X are f 1 ), respectively, knowing
that truth (x is f 1 ) ∀x ∈ X, which is done using Zadeh’s calculus of linguistically
quantified propositions (Zadeh, 1983). Some other quality criteria see the literature
(Delgado et al., 2014).
The problem to be addressed in this section is to find the answers to a fuzzy quan-
tified statement over a fuzzy RDF graph G. The key challenge in this problem is
how to represent the query intention of the fuzzy quantified statement in a structural
way. The underlying RDF repository is a graph structured data, but, the fuzzy quan-
tified statements is unstructured data. To enable query processing, we need a graph
representation of fuzzy quantified statement.
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 187
Definition 5.15 (Fuzzy quantified graph pattern). A fuzzy quantified graph pattern
is a labeled directed graph defined as Q(x 0 ) = (V q , E q , L q , F q ), where
(i) V q and E q are the set of pattern vertices and the set of directed pattern edges,
respectively, as defined for data graphs.
(ii) x 0 is a vertex in V q , referred to as the query focus of Q(x 0 ), for search intent
(Bendersky et al., 2010).
(iii) L q is a function that assigns a vertex label L q (v) (resp. edge label L q (e)) to each
pattern vertex v ∈ V q (resp. edge e ∈ E q ). The label can be variable, constant,
or condition. The predicates in the condition C can be defined as a combination
of atomic formulas of the form “?x op c”, “?x op ?y” and “?x is Fterm”, where
?x, ?y ∈ variable, c ∈ (U ∪ L), op is a fuzzy or crisp comparator, and Fterm
is a predefined or user-defined fuzzy term like young (see Fig. 5.7b). One can
extend fuzzy condition to support fuzzy conjunction ∧ (resp. disjunction ∨),
usually interpreted by the triangular norm minimum (resp. maximum).
(iv) F q is a function such that for a given triple pattern tp = <x 0 , p, o> ∈ Q(x 0 ), F q (tp)
is defined as the form Quant(p), here Quant is a linguistic quantifier and p is the
predicate of the triple pattern. We refer to Quant(p) as the quantifier of triple
pattern. We used this mechanism that makes it possible to attach linguistic
quantifier to triple.
Example 5.6 An example of fuzzy quantified statements is: “Most of the recent
films that actor x starred in, are directed by young directors”. The query, denoted
by Q(?actor), that aims to retrieve every actor (?actor) such that most of the recent
films (?film) that he/she starring in are directed by young directors (?director) may be
expressed in fuzzy quantified graph pattern shown in Fig. 5.8, where ?actor is its query
188 5 Fuzzy RDF Queries
focus, indicating potential actors, i.e., variables ?actor should be returned in the result
set; ?film and ?d are two variables, “?y is recent” and “?a is young” are fuzzy condition
expressions. Here edge Starring(?actor, ?film) carries a linguistic quantifier “most”,
for condition (d) above. In this query, ?actor corresponds to x 0 , ?film corresponds to
X, sub-pattern f 1 (<?film, Director, ?d>, <?d, Age, ?a>, FILTER(?a is young)) corre-
spond to f 1 and f 2 (<?actor, Starring, ?film>, <?film, Date, ?y>, FILTER(?y is recent))
correspond to f 2 , respectively.
Definition 5.16 (Fuzzy quantified graph pattern matching). A fuzzy quantified graph
pattern Q(x 0 ) = (V q , E q , L q , F q ) is matching with a fuzzy RDF graph G = (V, E, Σ,
L, μ, ρ) with a satisfaction degree δ th (G), if there exists a bijective function φ from
U ∪ L ∪ variable to U ∪ L, such that
(i) For each vertex u ∈ V q , there exists a vertex φ(u) ∈ V, associated with a
satisfaction degree δ u = μ(φ(u)), such that L q (u) = L(φ(u)). More precisely, if
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 189
where “∧” denotes the minimum operator and μ f2 is the satisfaction degree to which
x i satisfies the condition f 2 . μ f2 is obtained similar to μ f1 , which aggregates all the
satisfaction degrees associated with elements corresponded to f 2 and its result is a set
of elements {(μ f21 /x1 ), . . . , (μ f2n /xn )}. Then, the final satisfaction degree associated
with each answer A can be calculated as follows:
(Σ )
xi ∈X (μ f 1i (x i ) ∧ μ f 2i (x i ))
μ( A) = μ Q Σ (5.2)
xi ∈X μ f 2i (x i )
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 191
Note that the basic validity criterion, i.e., the truth of (5.1) and (5.2), is certainly
the most important, but it does not grasp all aspects of a linguistic statement.
Example 5.7 Let us consider the fuzzy quantified graph pattern Q(?actor) of
Example 5.6. We evaluate this matching query according to the graph G of Fig. 5.9.
To interpret Q(?actor), we first retrieve “the actors (?actor) who starring at least one
recent film (?film) (corresponds to f 2 ), possibly directed by young director (corre-
sponds to f 1 )”. This query returns a list of mappings of actor variables (?actor) with
their starring films (?film), along with their respective satisfaction degrees.
( )
μ f2 = min ρ Starring ( ?actor, ?film), ρ Date (?film, ?y), μr ecent (?y) and
( )
μ f1 = min ρ Driector (?film, ?d), ρ Age (?d, ?a), μ young (?a)
where μrecent and μyoung are membership function associated with the fuzzy terms
young and recent of Fig. 5.6a and b.
In this example, query concerns two actors Vin. Diesel and Chris Partt. More
specifically, for pattern edge e = Starring(x 0 , ?film), when x 0 is mapped to Vin.
Diesel, who starred in 3 films including Fast & Furious, Guardian of the Galaxy 2
and Riddick 3. Similarly, when x 0 is mapped to Chris Partt, he starred in the film
Guardian of the Galaxy 2. In contrast, Jason Statham do not belong to the result set
because he did not starring any somewhat recent films. The result set is as following:
Lastly, assuming for the sake of simplicity that most(x) = x, the final matching
result, given by Formula (5.2), is Q(?actor, G) = {0.28/Vin. Diesel, 0.26/Chris Partt}.
The graph traverse algorithm is very time-consuming, since it is made in every user
interaction. Thus, we need to build an indexing structure that containing information
about vertices and edges in fuzzy RDF data graph. The graph indexing is executed
only one time, independently of the user interaction. For subgraph isomorphism
algorithm, we tune the disk representation of a data graph to support fast retrieval
and construction of its main memory data structures.
Our method represents a fuzzy RDF graph using three structures: (i) vertex label
list that allows access to the destination vertex label of a vertex and its corresponding
membership degree by a given ID. And we implemented ad-hoc data structures which
offer appropriate access to membership degree in vertex label list (see Fig. 5.10a).
In order to increase efficiency of query processing, we build a B+ tree for storing all
distinct vertex labels along with their frequencies; (ii) inverse vertex label list that
allows access to the vertex ID list by a given vertex label (see Fig. 5.10b). Note that
we implement the list of inverse vertex label in the RDF graph database to speed
up, although it can be constructed from the list of vertex labels; and (iii) adjacency
lists (see Fig. 5.10c) of each vertex which store adjacency information, i.e., a list of
triples (vertex ID, edge label, edge membership degree) ordered by the vertex ID.
extends existing the algorithms in (Bendersky et al., 2010) for conventional subgraph
isomorphism, to incorporate quantifier checking and calculate satisfaction degree.
Before we describe the detailed workings of the algorithm, a few notational defini-
tions are in order: (i) Induced pattern of Q(x 0 ), denoted by Qπ (x 0 ), is a conventional
pattern graph, which is obtained by striping quantifier F q (tp) off from a quantified
graph pattern Q(x 0 ). (ii) M e (vx , v, Q) = {v' |φ ∈ Qπ (G), φ(x 0 ) = vx , φ(e) = (v, v' )},
the set of children of v via e and Q, i.e., the set of children of v that match u' when u
is mapped to v, subject to the constraints of Qπ , here e = (u, u' ) ∈ Q(x 0 ), vx , v ∈ G.
(iii) M e (v) = {v' |(v, v' ) ∈ G, L(v, v' ) = L q (e)}, the set of the children of v connected
by an e edge. We denote by Q(G) the set of all matches (isomorphic mappings) φ of
Q in G.
1. Pattern-Match Algorithm
Given a quantified graph pattern Q and a fuzzy RDF graph G, Algorithm QM is
to retrieve all entities that possibly correspond to x 0 , denoted as Q(x 0 , G). Each
item in Q(x 0 , G) is associated with a satisfaction degree. We will briefly present
each step as following:
(i) QM first initializes Q(x 0 , G), as well as a partial match M as a set of vertex
pairs (line 1). Each pair (u, v) in M denotes that a vertex from G matches
a pattern vertex u in Q.
(ii) Each vertex u in Q(x 0 , G) has a list C(u) of candidate vertices in RDF
graph G. QM next initializes the candidate set C(u) and auxiliary structures
with Filtercandidate (lines 2–4). QM maintains auxiliary structures for each
vertex v in C(u) as follows: (i) a Boolean variable B(u, v) indicating whether
v is a match of u via isomorphism from Q(x 0 ) to G, (ii) a variable δ(u, v) that
194 5 Fuzzy RDF Queries
Algorithm 5.8: QM
Input: pattern Q(x 0 ), graph G
Output: the answer set Q(x 0 , G).
1: Q(x 0 , G) ← { }; Q(G) ← { }; M ← { };
2: for each u of Q do
3: C(u) ← Filtercandidate(Q, G, u);
4: B(u, v) ← ⊥, δ(u, v) ← 0, c(v, e) ← 0, U(v, e) ← |M e (v)|;
5: if C(u) = φ; then return φ;
6: SubMatch(Q, G, M, Q(G));
7: for each isomorphic mapping φ ∈ Q(G) do
8: Q(x 0 , G) ← Q(x 0 , G) ∪ {φ(x 0 )};
9: return Q(x 0 , G);
(ii) After that, it calls RefineCandidates to obtain a refined candidate vertex set CR
from C(u) by using algorithm-specific pruning rules (Fan et al., 2016).
(iii) Next, for each candidate data vertex v such that v is not matched yet, the
IsExtend subroutine checks whether the edges between u and already matched
query vertices of Q have corresponding edges between v and already matched
data vertices of G (Line 5). IsExtend is the final verification to determine
whether the candidate vertex can be added to the partial solution. Given a
selected pattern vertex u', a candidate v ∈ C(u), and an edge e = (u, u' ) with
quantifier, IsExtend dynamically finds best vertices (recorded in a heap S P (u' ))
from C(u' ) that are children of v (lines 4–5). If v is qualified, it is matched
to u, SubMatch then updates status information by adding the newly matched
pair (u, v) into M (Line 6), and recursively conducts the next level of search by
calling SubMatch to match the remaining query vertices of q (Line 7). It keeps
a record of M and a cursor to memorize the candidates in S P for backtracking,
using a stack. When backtracking to a candidate v ∈ S P (u) from a child v' of
v, SubMatch restores M and the cursor by calling RestoreState, which restores
the partial match state by removing (u, v) from M (line 8). It next dynamically
updates S P (u). (a) If B(u' , v' ) = false, it reduces U(v, e) by 1. (b) It applies
the selection and pruning rules to C(u) using the updated potentials w.r.t. the
changes in (a). If the upper bound U(v, e) fails the quantifier of e, v is removed
from C(u) and S P (u) without further verifying its other children. Otherwise, it
picks a new set S P (u) of candidates with top potentials. The recursion terminates
when all possible matches are found (i.e., when |M| = |V q |).
Proposition 5.3 Algorithm QM is correct and complete for enumerating all RDF
isomorphism from a given pattern graph into a fuzzy RDF graph.
Proof To show the correctness of DM, first observe that DM always terminates.
Indeed, DM follows the verification process of conventional subgraph isomorphism
algorithm. The process, in the worst case, enumerates all possible isomorphism
mappings from the stratified pattern Qπ to G, which are finitely many. Hence DM
terminates.
We next show that DM correctly verifies whether a candidate vx is a match of x 0
in Qπ via an isomorphism φ ∈ Qπ (G).
(i) When DM terminates, for each u ∈ Q and every candidate v in C(u) with B(u, v)
= true, v = φ(u) for some φ ' ∈ Qπ (G), guaranteed by the correctness of Match.
(ii) For each edge (u, u' ) in Q and a vertex v with B(u, v) = true, DM correctly verifies
the quantifiers by checking the updated local counter of v that keeps track of the
current |M e (φ(x 0 ), φ(u),Q)|. In addition, DM waits until either v is determined
not a valid match due to that the upper bound fails the quantifier (by the local
pruning rule), or the lower bound satisfies the quantifier (in the verification).
Hence, vx is a match if and only if vx ∈ Q(x 0 , G) when DM terminates.
Algorithm QM correctly computes Q(x 0 , G) following the definition of quantified
matching. We further analytics its time and space complexity.
For its time complexity, QM is a fuzzy quantified subgraph patterns matching
process, which includes subgraph search process and evaluation of fuzzy quantifier.
Subgraph search process is an extension of the traditional subgraph isomorphism. It
has the same time complexity as conventional subgraph pattern matching algorithms,
5.5 Extended SPARQL for Fuzzy RDF Query 197
and fuzzy conditions and linguistic quantifier checking are incorporated into the
search process.
Let us consider the evaluation of a fuzzy quantified subgraph patterns matching,
which includes z occurrences of fuzzy terms, over a graph database G. We denote
by A the set of answers of Q over G. Computing A is a conventional subgraph
pattern isomorphism problem, which is intensively studied in literature. In our case,
we assume that the time complexity of the graph matching is t(T ). Computing the
satisfaction degree of each fuzzy conditions. This is done in O(|A| × 2 × |z|). Put
together, QM takes O(t(T ) + |A| × 2|z|) time in total, where t(T ) is the time complexity
of a matching algorithm T for conventional subgraph isomorphism. It seems obvious
that this way that permit to introduce flexibility in the graph pattern are strongly
dominated in complexity by the subgraph evaluation. Due to small |A| and |z|, QM
and T have comparable performance, i.e., QM has the same time complexity as
conventional subgraph pattern matching algorithms.
For the space complexity, QM needs O(|V|) space to store the auxiliary structures
of the vertices in G. During the search process, QM maintains at most pm best matches
to be verified at each level of the search, where pm is the largest constant in quantifiers.
Due to a total |Q| search levels, QM requires a total of O(pm |Q| + |V|) space.
With the increasing amount of fuzzy RDF data which is becoming available, the way
we query fuzzy RDF data is a crucial subject for supporting knowledge graph appli-
cations in various domains (Pivert et al., 2016a). SPARQL (Prudhommeaux, 2008),
the official W3C recommendation as an RDF query language, provides basic func-
tionalities in order to query RDF data through graph patterns. But classical SPARQL
lacks of some expressiveness and usability capabilities to deal with vagueness and
imprecise aspects as it follows a Boolean querying of crisp RDF data. As a result, the
need to query about the structure and vagueness information in fuzzy RDF knowl-
edge graph applications, has motivated research into extending SPARQL languages
to be more expressive than before.
Some works (Alkhateeb et al., 2009; Anyanwu et al., 2007; Kochut & Janik,
2007; Pérez et al., 2010) extend SPARQL by allowing to query crisp RDF through
graph patterns using regular expressions but they do not address the fuzziness in
their approaches. In order to make the expression of flexible queries, a variety of
proposals, such as f-SPARQL (Cheng et al., 2010) and SPARQLf (Ma et al., 2015),
introduce fuzzy terms and fuzzy operators into FILTER expression of SPARQL
queries. However, these works only considers crisp RDF graph. As far as fuzzy RDF
graphs are concerned, some extended SPARQL query languages already exist. Angles
and Gutierrez (2016) propose FURQL (Fuzzy RDF Query Language), a SPARQL
extension with navigational capabilities for querying fuzzy RDF data through fuzzy
graph path patterns by using regular expressions. Fuzzy conditions can be also used
to express fuzzy preferences on data. Almendros-Jiménez et al. (2017) propose
198 5 Fuzzy RDF Queries
In this section we are ready to extend SPARQL for querying fuzzy RDF knowledge
graph. We first introduce the graph pattern to enrich standard SPARQL graph pattern
with regular expressions and fuzzy conditions. Then we define the evaluation of the
query pattern in the proposed fuzzy RDF graph.
1. Fuzzy SPARQL Graph Pattern
Before giving the formal definition of fuzzy graph pattern, we first introduce the
concepts of fuzzy regular expressions and fuzzy conditions.
A regular expression is a property path, as specified in SPARQL 1.1 (Harris &
Seaborne, 2013). Let Rex be a path regular expression, and it can be constructed
inductively as Rex = u|R1 · R2 |R1 |R2 |R+ . Here u denotes either an edge label or
a wildcard symbol * matching any label in U, R1 · R2 denotes a concatenation
of expressions, R1 |R2 denotes disjunction and is an alternative of expressions, R+
denotes one or more occurrences of R.
A fuzzy condition is a logical combination of fuzzy terms which can be a constant
c, a variable ?x, or a fuzzy condition C defined as the form “bound(?x)”, “truth(?x)”,
“?x op c”, “?x op ?y” and “?x = Ft”. Here ?x, ?y ∈ VAR, c ∈ (U ∪ L), truth(?x) is
the truth degree of the variable ?x, op is a fuzzy or crisp comparator (e.g., <, ≤, =
, ≥, >, /=), and Ft is a predefined or user-defined fuzzy term like high, long, young
5.5 Extended SPARQL for Fuzzy RDF Query 199
Formally, a fuzzy RDF triple pattern has the form <t>: α. Here, α represents the
degree in which subject s' has property p' with value o' or subject s' and object o'
have a relationship p' . τ 1 and τ 2 represent the truth degree of subject s' and object o' .
Although they do not provide any additional information, we allow users to query
and use these truth degree variables. Furthermore, the optional parameter [WITH β]
indicates the condition that must be satisfied as the minimum membership degree in
[0, 1]. It is required in (Alkhateeb et al., 2009) that users need to choose an appropriate
value of β to express their requirements. If not specified, 1 is used as default.
2. Fuzzy Extension of SPARQL Query Language
In order to query fuzzy RDF knowledge graphs, we extend the declarative query
language SPARQL. Syntactically the extension naturally extends the SPARQL
one, by allowing the occurrence of fuzzy graph patterns in the WHERE clause
and the occurrence of fuzzy conditions in the FILTER clause. Its simple syntax
is given as follow:
SELECT… # Result variables
FROM … #Fuzzy RDF Dataset
WHERE … #Fuzzy RDF Graph patterns
FILTER … [WITH value] … #Value-constraints
[THRESHOLD value]
200 5 Fuzzy RDF Queries
Example 5.8 The following query looks for the recent (importance 0.6) thriller
movies in which American actors be the leading role.
SELECT ?Movie ?Actor ?l
WHERE {
?Movie Release Date ?Date.
(?Movie starring · nationality “America”): ?l.
?Movie genre “thriller”.
FILTER (?Date = recent) WITH 0.6}
THRESHOLD 0.6
Here, ?Movie and ?Date are variables, “staring · nationality” is a regular expres-
sion, and ?l represents the degree in which the American actor (?Actor) is the leading
role of the movie (?Movie). Furthermore, “?Date = recent” is a fuzzy condition
expression.
3. Fuzzy SPARQL Graph Pattern Evaluation
A fuzzy SPARQL query defines a fuzzy graph pattern to be matched against a
given fuzzy RDF graph. Intuitively, given a fuzzy RDF data graph G, the seman-
tics of a graph pattern P defines a set of matching, where each matching (from
variable of P to URIs and literals of G) matches the pattern to a homomorphism
subgraph of G (Pivert et al., 2016b). For introducing such a concept, the notion
of matching of a regular expression must first be defined.
defined as follows, according to the form of Rex (in the following, R, R1 and R2 are
regular expressions):
• Rex is of the form u with u ∈ U (resp. “*”). If pi is u (resp. any u ∈ U) then
δ Rex (pa) = ρ(pi ) else δ Rex (pa) = 0.
• Rex is of the form R1 · R2 . We denote by P the set of all pairs of paths (p1 , p2 ) such
that p is of the form p1 p2 . One has δ Rex (pa) = max P (min(δ R1 ( p1 ), δ R2 ( p2 ))).
• Rex is of the form R1 |R2 . One has δ Rex (pa) = max(δ R1 ( p1 ), δ R2 ( p2 )).
• Rex is of the form R+ . Let PA be the set of all tuples of paths (p1 , …,
pn ) (n > 0) such that p is of the form p1 … pn . One has δ Rex (pa) =
max P (min(δ R ( p1 ), . . . , δ R ( pn ))).
As we can see when a regular expression matches with a path of fuzzy RDF
knowledge graph, we consider the degree of truth associated to edge of fuzzy RDF
graph. Next, we will discuss the issue of interpretation of fuzzy conditions. And we
will consider the degree of truth associated to the vertex in the fuzzy RDF graph.
In fact, we define conditions on these degrees of truth with value constraints in the
FILTER statement.
Definition 5.20 (Evaluation of a fuzzy graph pattern): The fuzzy RDF graph eval-
uation of a fuzzy SPARQL graph pattern over G, denoted by [[P]]G is recursively
defined by:
• if P is of the form of a fuzzy triple graph pattern <t>: α, denoted by <s, p, o>: α,
then [[P]]G = {π | dom(π ) = var(t) ∧ π (t) ∈ G} and α = ρ(p).
202 5 Fuzzy RDF Queries
• if P is of the form of a fuzzy triple graph pattern <t>: α, denoted by <?s, Rex, ?o>:
α, then [[P]]G = {π | dom(π ) = var(t) ∧ π (t) ∈ G} and α = δ Rex (Rex), τ 1 =
truth(π (?s)), τ 2 truth(π (?o)).
• if P is of the form P1 AND P2 , then [[P]]G =[[P1 ]]G ▷◁ [[P2 ]]G .
• if P is of the form P1 OPTIONAL P2 then [[P]]G =[[P1 ]]G ⟕ [[P2 ]]G .
• if P is of the form P1 UNION P2 then [[P]]G =[[P1 ]]G ∪ [[P2 ]]G .
• if P is of the form P1 FILTER C then [[P]]G = {π ∈ [[P]]G |π n C} which denotes
the set of mappings in [[P]]G that satisfy C with a degree ≥ β.
Example 5.9 Let us consider the fuzzy SPARQL query of Example 5.8. We evaluate
this query according to the fuzzy RDF graph G of Fig. 5.1. The query also specifies a
threshold δ t (δ t = 0.6 in the example), to indicate that only matches with possibility
larger than δ t should be returned. The matching process is depicted as follows.
Intuitively, this pattern retrieves the list of movies in G, and the matching value of
?Movies is potentially Up in the Air, The Quest and Money Monster. The actors in
the three movies are Vera Farmiga, George Clooney and Julia Roberts respectively.
Because four paths p1 = Up in the Air—starring—George Clooney—nationality—
America, p2 = The Quest—starring—George Clooney—nationality—America, p3
= Money Monster—starring—George Clooney—nationality—America, and p4 =
Money Monster—starring—Julia Roberts—nationality—America match the regular
expression “staring · nationality”. The degrees of truth are δ re (p1 ) = 0.8, δ re (p2 ) =
0.9, δ re (p3 ) = 0.8, and δ re (p4 ) = 0.5. However, the genre of movies Up in the Air
and The Quest are “Romance” and “Fantasy” respectively. Moreover, if we suppose
that μrecent (2016) = 0.65, the degree of truth of “?Date = recent” is 0.65. So, Money
Monster is the only movie which is a thriller movie with degree of truth δ u (“Thriller”)
= 0.9. Thus, we obtain the following answers:
As the degree of truth of the final query result is the minimum of degrees of truth
induced by the results described above. There are δ 1P (G) = 0.65 and δ 2P (G) = 0.5 in
this example. So, π1 satisfies the minimum degree of truth threshold constraint and
is the only answer.
The most important and difficult part in the fuzzy SPARQL query language is the
fuzzy RDF evaluation of the query pattern P over a fuzzy RDF graph G. Actually, each
SPARQL query can be represented by a graph pattern. RDF graph pattern matching
in SPARQL is essentially enumerating all PRDF homomorphism from the patterns
graph into the data graph G. PRDF homomorphisms extend graph homomorphisms
5.5 Extended SPARQL for Fuzzy RDF Query 203
to deal with vertices connected with regular expression patterns, that can be mapped
to vertices connected by paths, rather than edge-to-edge mappings. As a result, any
SPARQL query can be equivalently transformed into a subgraph query problem,
which locates the subgraph in RDF data graph matching with the query graph. We
propose in Algorithm 5.8 a backtracking technique Q for the processing of a fuzzy
query pattern P over a fuzzy RDF graph G. The method generates each possible map
from the current one by traversing the parse tree in a depth-first manner. In particular,
we need to produce answers with truth degree of the query patterns.
Algorithm 5.9 describes the framework for a pattern match query Q over a fuzzy
RDF graph G, which is a recursive version of the basic backtrack algorithm. The
input of this algorithm is: an RDF graph pattern Q, an RDF graph G, and a partial
map μp , which includes a set of pairs {(<u, v>, δ)} such that u is a term of Q, v is the
image of u in G and δ is a truth degree associated with the mapping. If we call this
algorithm with (Q, G, μø ), where μø is the map with the empty domain, then it can
output all homomorphisms from the pattern graph Q into the fuzzy RDF graph G.
Specifically, we define in the following the operations used in the algorithm:
Complete (π p ) checks if each term u ∈ V p is mapped to a term in G. It returns
TRUE if all u ∈ V P are mapped, and FALSE otherwise.
ChooseTerm(V p ) chooses a term u ∈ V p to obtain a possible homomorphism.
Candidates (π p , u, G, P) calculates all possible candidate maps in G for the current
term u satisfying the partial map π p . It returns all sets of pairs <v, π >, where v is a
possible map of u, and π is the possible map from a term of regular expression pattern
Ri appearing in a triple with u to a term in V p already mapped in π p .
After that, the procedure takes each candidate v of the current term u ∈ V p and
the possible map π, puts v in the mapping pairs and tries to generate the possible
candidates of v. This is done recursively in depth-first by calling function Pattern-
Match (note that π p , {(<u, v>, δ)}, and π are compatible since the set <v, π > is
calculated with respect to π p ). Finally, we have a tree that contains one level with
a term from P, i.e., a vertex from P, and one level with the possible images of that
term in G. The input to each vertex of each level is the current map. Each possible
path in the tree from the root to a leaf labeled by a term of G represents a possible
homomorphism.
204 5 Fuzzy RDF Queries
Proposition 5.4 Algorithm 5.9 is correct and complete for enumerating all RDF
homomorphism from a given SPARQL graph pattern into a fuzzy RDF graph.
Proof We can prove this by means of induction. The set of all homomorphism
is complete for the empty set at the beginning of the algorithm. Because Candidates
(Li et al., 2019) is complete, and the number of vertices being finite, the partial
homomorphism, i.e., μp , are completely extended for the current vertex at each step.
Finally, the procedure ends having a homomorphism mapping for each vertex in P.
5.6 Summary
The flexibility of representation offered by fuzzy RDF raises challenging issues for
querying fuzzy RDF graph. In this chapter, we firstly propose three query evaluation
algorithms for processing subgraph queries of fuzzy RDF queries. In Sect. 5.2 we
propose a class of RDF graph pattern, in which a vertex is specified with a flexible
condition to express preferences on the vertex contents of the graph and an edge is
specified with a regular expression to express fuzzy preferences on the structure of the
data graph, and we study the pattern matching in a fuzzy RDF graph. Specifically, we
want to retrieve all qualified matches of a query pattern in the fuzzy RDF graph. We
further define a graph pattern matching algorithm based on a revised notion of graph
homomorphism, as opposed to the NP-completeness of graph pattern matching via
subgraph isomorphism. In Sect. 5.3 we propose a novel path-based solution to retrieve
subgraph from fuzzy RDF graph databases. In additions, the absolute possibility of
a match can be computed by aggregating the relative possibilities of each candidate
path in the match processing. In Sect. 5.4, we integrate fuzzy quantified statements
in fuzzy RDF queries addressed to a fuzzy RDF database. We present an approach to
summarize fuzzy RDF graph database in the form of linguistic summaries, and we
have showed how these statements can be defined and implemented. In Sect. 5.5, we
present an extension of SPARQL to query fuzzy RDF graph. The extension is able
to express fuzzy queries making use of regular expressions and fuzzy conditions.
We have provided syntax and semantics to the extension of SPARQL pattern. On
this basis, we have presented a query evaluation algorithm to subgraph query for
processing fuzzy RDF queries.
With the advent of the era of Big Data and artificial intelligence, dealing with
diverse fuzzy information in various fuzzy models will be very essential. We can
believe that fuzzy technique will be applied in increasing concrete application
domains and can play an increasingly important role in implementing Big Data
intelligence.
References 205
References
Aho, A. V., & Hopcroft, J. E. (1974). The design and analysis of computer algorithms. Addison-
Wesley Pub. Co.
Alkhateeb, F., Baget, J. F., & Euzenat, J. (2009). Extending SPARQL with regular expression
patterns (for querying RDF). Journal of Web Semantics, 7(2), 57–73.
Almendros-Jiménez, J. M., Becerra-Terón, A., & Moreno, G. (2017). A fuzzy extension of SPARQL
based on fuzzy sets and aggregators. In IEEE International Conference on Fuzzy Systems (FUZZ-
IEEE) (pp. 1–6).
Angles, R., & Gutierrez, C. (2016). The multiset semantics of SPARQL patterns. In International
Semantic Web Conference (pp. 20–36). Springer.
Anyanwu, K., Maduko, A., & Sheth, A. (2007). Sparq2l: Towards support for subgraph extraction
queries in RDF databases. In Proceedings of the 16th International Conference on World Wide
Web (pp. 797–806).
Bendersky, M., Metzler, D., & Croft, W. B. (2010). Learning concept importance using a weighted
dependence model. In Proceedings of the Third ACM International Conference on Web Search
and Data Mining (pp. 31–40).
Blau, H., Immerman, N., & Jensen, D. (2002). A visual language for querying and updating graphs.
University of Massachusetts Amherst Computer Science Technical Report, 37, 2002.
Bosc, P., & Pivert, O. (1992). Some approaches for relational databases flexible querying. Journal
of Intelligent Information Systems, 1(3), 323–354.
Bosc, P., Lietard, L., & Pivert, O. (1995). Quantified statements and database fuzzy querying. In
Fuzziness in Database Management Systems (pp. 275–308). Physica.
Bouchon-Meunier, B., & Moyse, G. (2012), Fuzzy linguistic summaries: Where are we, where
can we go? In IEEE Conference on Computational Intelligence for Financial Engineering &
Economics (CIFEr) (pp. 1–8).
Bry, F., Furche, T., Marnette, B., Ley, C., Linse, B., & Poppe, O. (2010). SPARQLog: SPARQL with
rules and quantification. In Semantic Web Information Management (pp. 341–370). Springer.
Carroll, J. J. (2002). Matching RDF graphs. Lecture Notes in Computer Science (pp. 5–15).
Castelltort, A., & Laurent, A. (2016). Extracting fuzzy summaries from NoSQL graph databases.
In Flexible Query Answering Systems 2015 (pp. 189–200). Springer.
Cheng, J., Ma, Z. M., & Yan, L. (2010). f-SPARQL: A flexible extension of SPARQL. In
International Conference on Database and Expert Systems Applications (pp. 487–494). Springer.
Costabello, L. (2014). Error-tolerant RDF subgraph matching for adaptive presentation of linked
data on mobile. In The Semantic Web: Trends and Challenges. Springer International Publishing.
Delgado, M., Ruiz, M. D., Sánchez, D., & Vila, M. A. (2014). Fuzzy quantification: A state of the
art. Fuzzy Sets and Systems, 242, 1–30.
Fan, W., Wu, Y., & Xu, J. (2016). Adding counting quantifiers to graph patterns. In Proceedings of
the 2016 International Conference on Management of Data (pp. 1215–1230), ACM.
Gallagher, B. (2006). Matching structure and semantics: A survey on graph-based pattern matching.
In AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection (Vol. 45).
Gao, X., Xiao, B., Tao, D., & Li, X. (2010). A survey of graph edit distance. Pattern Analysis &
Applications, 13(1), 113–129.
Golomb, S. W., & Baumert, L. D. (1965). Backtrack programming. Journal of the ACM, 12(4),
516–524.
Hahn, G., & Tardif, C. (1997). Graph homomorphisms: Structure and symmetry. Graph Symmetry.
Springer Netherlands.
Harris, S., & Seaborne, A. (2013). SPARQL 1.1 query language. In W3C Recommendation. http://
www.w3.org/TR/sparql11-query
Hayes, P. (2004). RDF semantics. http://www.w3.org/TR/2004/REC-rdf-mt-20040210/
Henzinger, M. R., Henzinger, T. A., & Kopke, P. W. (1995). Computing simulations on finite
and infinite graphs. In Proceedings of IEEE 36th Annual Foundations of Computer Science
(pp. 453–462). IEEE.
206 5 Fuzzy RDF Queries
Holub, J., & Melichar, B. (1998). Implementation of nondeterministic finite automata for
approximate pattern matching. In Automata Implementation, Third International Workshop on
Implementing Automata, WIA’98, Rouen, France, September 17–19 (pp.92–99). DBLP
Kochut, K. J., & Janik, M. (2007). SPARQLeR: Extended SPARQL for semantic association
discovery. In European Semantic Web Conference (pp. 145–159). Springer.
Lee, J., Han, W. S., Kasperovics, R., & Lee, J. H. (2012). An in-depth comparison of subgraph
isomorphism algorithms in graph databases. PVLDB., 6(2), 133–144.
Li, G., Yan, L., & Ma, Z. (2019). Pattern match query over fuzzy RDF graph. Knowledge-Based
Systems, 165, 460–473.
Lian, X., & Chen, L. (2011). Efficient query answering in probabilistic RDF graphs. ACM SIGMOD
International Conference on Management of Data (pp. 157–168).
Liu, Y. A., Rothamel, T., Yu, F., Stoller, S. D., & Hu, N. (2004). Parametric regular path queries.
ACM Sigplan Notices, 39(6), 219–230.
Ma, Z. M., Liu, J., & Yan, L. (2011). Matching twigs in fuzzy XML. Information Sciences, 181(1),
184–200.
Ma, R., Jia, X., Cheng, J., & Angryk, R. A. (2015). SPARQL queries on RDF with fuzzy constraints
and preferences. Journal of Intelligent & Fuzzy Systems, 30(1), 183–195.
Matono, A., Amagasa, T., Yoshikawa, M., & Uemura, S. (2005). A path-based relational RDF
database. In Australasian Database Conference-Volume (Vol. 39, pp. 95–103).
Moustafa, W. E., Kimmig, A., Deshpande, A., & Getoor, L. (2014). Subgraph pattern matching
over uncertain graphs with identity linkage uncertainty. IEEE, International Conference on Data
Engineering (pp. 904–915), IEEE.
Neumann, T., & Weikum, G. (2008). RDF-3x: A risc-style engine for RDF. Proceedings of the
VLDB Endowment, 1(1), 647–659.
Pivert, O., Slama, O., & Thion, V. (2016a). SPARQL extensions with preferences: A survey. In
Proceedings of the 31st Annual ACM Symposium on Applied Computing (pp. 1015–1020).
Pivert, O., Slama, O., & Thion, V. (2016b). An extension of SPARQL with fuzzy navigational
capabilities for querying fuzzy RDF data. In IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE) (pp. 2409–2416).
Pivert, O., Slama, O., & Thion, V. (2016c). Fuzzy quantified structural queries to fuzzy graph
databases. In International Conference on Scalable Uncertainty Management (pp. 260–273).
Springer.
Pérez, J., Arenas, M., & Gutierrez, C. (2010). nSPARQL: A navigational language for RDF. Journal
of Web Semantics, 8(4), 255–270.
Prudhommeaux, E. (2008). SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-
query/
Ullmann, J. R. (1976). An algorithm for subgraph isomorphism. Journal of the ACM, 23(23), 31–42.
Virgilio, R. D., Maccioni, A., & Torlone, R. (2015). Approximate querying of RDF graphs via path
alignment. Distributed and Parallel Databases, 33(4), 555–581.
Wagner, R. A., & Fischer, M. J. (1974). The string-to-string correction problem. Journal of the
ACM, 21(1), 168–173.
Wang, J., Jin, B., & Li, J. (2005). An efficient matching algorithm for RDF graph patterns. Journal
of Computer Research & Development, 42(10), 1763–1770.
Yager, R. R. (1993). On ordered weighted averaging aggregation operators in multicriteria decision
making. In Readings in Fuzzy Sets for Intelligent Systems (pp. 80–87).
Yager, R. R. (2014). Social network database querying based on computing with words. In Flexible
Approaches in Data, Information and Knowledge Management (pp. 241–257). Springer, Cham.
Zadeh, L. A. (1983). A computational approach to fuzzy quantifiers in natural languages.
Computers & Mathematics with Applications, 9(1), 149–184.
Zhang, D., Song, T., He, J., Shi, X., & Dong, Y. (2012), A similarity-oriented RDF graph matching
algorithm for ranking linked data. In IEEE 12th International Conference on Computer and
Information Technology (CIT) (pp.427–434). IEEE.
References 207
Zhao, P., & Han, J. (2010). On graph query optimization in large networks. Proceedings of the
VLDB Endowment, 3(3), 340–351.
Zimmermann, H. J. (1996). Fuzzy set theory and its applications (3rd ed.). Kluwer Academic
Publishers Norwell.
Zou, L., & Özsu, M. T. (2017). Graph-based RDF data management. Data Science and Engineering,
2, 56–70.
Zou, L., Özsu, M. T., Chen, L., Shen, X., Huang, R., & Zhao, D. (2014). gstore: A graph-based
SPARQL query engine. The VLDB Journal, 23(4), 565–590.
Index
Fuzzy XML tree, 65, 153 membership function, 37, 157, 185, 201
G N
Graph homomorphism, 154, 155, 202, 204 Neo4j database, 23, 143, 144, 146
Graph isomorphism, 77, 98, 151–154, 156,
158, 166, 167, 188, 192, 193, 196,
197, 204
Q
Graph patterns, 12, 15, 98–103
Query, 2, 3, 10–13, 15–18, 20, 22–24, 26,
27, 45, 72, 90, 98–105, 116–118,
H 128, 132–144, 146–148, 151–156,
HBase database, 26, 110, 127, 128, 148 158–162, 164–172, 174–184,
186–192, 194, 195, 197–200,
202–204
I
Imperfect information
ambiguity, 145 R
imprecision, 33, 35, 36, 43, 46, 100, RDF data storage, 2, 16, 27, 105, 111, 142
110, 118, 148, 183 Reasoning, 3, 74
inconsistency, 35 equivalence, 40, 78, 98, 129
uncertainty, 33–35, 43, 44, 72, 73, 110, inclusion, 46, 51
118, 148 Reengineer(ing), 110, 118, 148
vagueness, 35, 43, 74, 100, 153, 197 Regular path, 164
Index, 2, 18, 19, 22, 127, 135, 139, 147, Relational algebra, 14, 95, 96, 98, 101
152, 170, 175–178, 180–183, 192 Resemblance relation, 40, 42
K S
Key SPARQL, 2, 10–17, 22–24, 27, 90, 99–105,
foreign key, 41, 42 116, 117, 132, 141–143, 146, 147,
key-value, 17, 143 151–153, 166, 167, 183, 197–204
keyword(s), 120 Subgraph pattern, 151–153, 155, 166, 167,
primary key, 41, 42 169, 170, 184, 196, 197
Subgraph query, 152, 169, 203, 204
L
Logical database
object-oriented database, 16, 33, 43–45, T
49, 53, 66, 71, 110, 118, 128, 148 Triple(s), 3–7, 9–12, 14, 17–22, 24, 26, 27,
relational database, 16, 17, 19, 33, 34, 65, 72–76, 80, 82, 102, 104, 105,
41–44, 53, 66, 72, 109–112, 116–118, 109–114, 116, 127, 128, 130, 132,
128, 141, 144, 148 133, 135–142, 145, 158, 162, 164,
165, 172–174, 192
M
Membership X
membership degree, 34, 42, 45–49, 51, XML
73, 74, 76, 80, 83, 86, 91, 100, 111, XML DTD, 53
114, 119, 120, 122, 125, 129, 130, XML Schema, 53, 55, 57, 59, 62, 64,
132, 133, 164, 174, 176, 177, 192, 199 123