0% found this document useful (0 votes)

22 views217 pages

Modeling and Management of Fuzzy Semantic RDF Data

Uploaded by

dyw323

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views217 pages

Modeling and Management of Fuzzy Semantic RDF Data

Uploaded by

dyw323

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 217

Studies in Computational Intelligence 1057

Zongmin Ma
Guanfeng Li
Ruizhe Ma

Modeling and
Management
of Fuzzy Semantic
RDF Data
Studies in Computational Intelligence

Volume 1057

Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes
new developments and advances in the various areas of computational
intelligence—quickly and with a high quality. The intent is to cover the theory,
applications, and design methods of computational intelligence, as embedded in
the fields of engineering, computer science, physics and life sciences, as well as
the methodologies behind them. The series contains monographs, lecture notes and
edited volumes in computational intelligence spanning the areas of neural networks,
connectionist systems, genetic algorithms, evolutionary computation, artificial
intelligence, cellular automata, self-organizing systems, soft computing, fuzzy
systems, and hybrid intelligent systems. Of particular value to both the contributors
and the readership are the short publication timeframe and the world-wide
distribution, which enable both wide and rapid dissemination of research output.
Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Zongmin Ma · Guanfeng Li · Ruizhe Ma

Modeling and Management

of Fuzzy Semantic RDF Data
Zongmin Ma Guanfeng Li
College of Computer Science College of Information Engineering
and Technology Ningxia University
Nanjing University of Aeronautics Yinchuan, China
and Astronautics
Nanjing, China

Ruizhe Ma
Department of Computer Science
University of Massachusetts Lowell
Lowell, MA, USA

ISSN 1860-949X ISSN 1860-9503 (electronic)

Studies in Computational Intelligence
ISBN 978-3-031-11668-1 ISBN 978-3-031-11669-8 (eBook)
https://doi.org/10.1007/978-3-031-11669-8

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

In the era of big data, we have witnessed a tremendous increase in the amount of data
available. In this context, it has become very crucial to develop a common framework
for massive data sharing across applications, enterprises, and communities. For this
purpose, data should be provided with semantic meaning (through metadata), which
enables machines to consume, understand, and reason about the structure and purpose
of data. The Resource Description Framework (RDF) recommended by W3C (World
Wide Web Consortium) has quickly gained popularity since its emergence and has
been the de-facto standard for semantic information representation and exchange.
Nowadays, the RDF metadata model is finding increasing usage in a wide range of
massive data management scenarios (e.g., knowledge graph). With the widespread
acceptance of RDF in diverse applications, a considerable amount of RDF data is
being proliferated and becoming available.
RDF and related standards allow intelligent understanding and processing of big
data. This creates a new set of data processing requirements involving RDF, such as
the need to construct and manage RDF data. For the purpose of RDF construction,
various data resources, including the traditional databases, XML (Extensible Markup
Language) and JSON (JavaScript Object Notation) documents, texts, tabular data
such as CSV (comma-separated values) and TSV (tab-separated values), NoSQL
(not only SQL) databases and so on, have been used for automatically constructing
RDF models. RDF data management typically involves two primary technical issues:
scalable storage and efficient queries. For more effective queries, it is necessary to
index RDF data. All the listed issues are closely related. Indexing of RDF data is
enabled based on RDF storage, and efficient querying of RDF data is supported by
the indexing structure. Efficient and scalable management of massive RDF data is
of increasing importance.
With the wide and in-depth utilization of RDF in diverse application domains,
particularities with information management in concrete applications emerge, which
can challenge the traditional RDF technologies. In data and knowledge intensive
applications, one of the challenges can be generalized as the need to deal with uncer-
tain information in RDF data management. In the real world, human knowledge and

v
vi Preface

natural language have a great deal of imprecision and vagueness. With the increasing
amount of RDF data that is becoming available, efficient and scalable management
of massive RDF data with uncertainty is of crucial importance.
Fuzzy set theory, which has been one of the key means of implementing machine
intelligence, has been used in a large number and a wide variety of applications.
In order to bridge the gap between human-understandable soft logic and machine-
readable hard logic, fuzzy logic cannot be ignored. Fuzzy logic has been intro-
duced into diverse data models for fuzzy data processing. The emergence of the big
data era has put essential requirements on dealing with both semantic and fuzzy
phenomena. Currently, the research of fuzzy logic in RDF knowledge graphs is
attracting increasing attention, but the achievements are still few and scattered.
This book goes into great depth concerning the fast-growing topic of technologies
and approaches to fuzzy RDF data modeling and management. This book covers the
representation of fuzzy RDF, the persistence of fuzzy RDF, and the query of fuzzy
RDF. Concerning the representation of fuzzy RDF, the multi-granularity fuzziness
in the RDF graph and RDF schema are identified, and a set of algebraic opera-
tions is defined for the fuzzy RDF model. Concerning the persistence of fuzzy
RDF, several storage frameworks are proposed with diverse database models, the
traditional relational and object-oriented database models, as well as the emerging
NoSQL databases such as the HBase database and Neo4j database, are introduced.
Concerning the query of fuzzy RDF, the fuzzy graph pattern matching and the fuzzy
extension mechanism of SPARQL (Simple Protocol and RDF Query Language)
query language are investigated. The methods for exact pattern match query, approx-
imate fuzzy RDF subgraph match query, and fuzzy quantified query over fuzzy RDF
graph are proposed. In addition, an extension of SPARQL language to query fuzzy
RDF graphs is developed.
This book aims to provide a single record of current studies in the field of fuzzy
semantic data management with RDF. The objective of this book is to systematically
present the state-of-the-art information to researchers, practitioners, and graduate
students who need to intelligently deal with Big Data with uncertainty and, at the
same time, serve the data and knowledge engineering professionals faced with non-
traditional applications that make the application of conventional approaches difficult
or impossible. Researchers, graduate students, and information technology profes-
sionals interested in RDF and fuzzy data processing will find this book a starting
point and a reference for their study, research, and development.
We would like to acknowledge all of the researchers in the area of fuzzy data
and knowledge engineering. Based on both their publications and many discussions
with some of them, their influence on this book is profound. The materials in this
book are the outgrowth of research conducted by the authors in recent years. The
initial research work was supported by the National Natural Science Foundation
of China (62176121, 62066038, 61772269, and 61370075). We are grateful for the
financial support from the National Natural Science Foundation of China through
several research grant funds. Additionally, the assistance and facilities of authors’
universities are deemed important and highly appreciated. Special thanks go to Janusz
Kacprzyk, the series editor of Studies in Fuzziness and Soft Computing, and Thomas
Preface vii

Ditzinger, the senior editor of Applied Sciences and Engineering of Springer-Verlag,

for their advice and help in proposing, preparing, and publishing of this book. This
book will not have been completed without the support from them.

Nanjing, China Zongmin Ma

Yinchuan, China Guanfeng Li
Lowell, USA Ruizhe Ma
June 2022
Contents

1 RDF Data and Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 RDF Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 RDF Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 RDF Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 RDF Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 RDF Query Language SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 The W3C Syntax of SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 The Algebraic Syntax of SPARQL Graph Patterns . . . . . . . . 12
1.3.3 Semantics of SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 RDF Data Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.1 RDF Stores in Traditional Databases . . . . . . . . . . . . . . . . . . . . 17
1.4.2 RDF Stores in Not Only SQL Databases . . . . . . . . . . . . . . . . . 21
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Fuzzy Sets and Fuzzy Database Modeling . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Imperfect Information and Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.1 Imperfect Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.3 Fuzzy Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Fuzzy Relational Database Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Fuzzy Object-Oriented Database Models . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.1 Fuzzy Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.2 Fuzzy Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.3 Fuzzy Object-Class Relationships . . . . . . . . . . . . . . . . . . . . . . 46
2.4.4 Fuzzy Inheritance Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5 Fuzzy XML Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5.1 Fuzziness in XML Documents . . . . . . . . . . . . . . . . . . . . . . . . . 54

ix
x Contents

2.5.2 Fuzzy XML Representation Models

and Formalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3 Fuzzy RDF Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2 Fuzzy RDF Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.1 Fuzzy Information in RDF Graph . . . . . . . . . . . . . . . . . . . . . . 73
3.2.2 Fuzzy RDF Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3 Fuzzy RDF Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.4 Similarity Matching of Fuzzy RDF Graphs . . . . . . . . . . . . . . . . . . . . . 82
3.4.1 Matching Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4.2 Matching Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5 Algebraic Operations in Fuzzy RDF Graphs . . . . . . . . . . . . . . . . . . . . 90
3.5.1 Algebraic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.5.2 Equivalences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.5.3 Relationship of SPARQL and the Algebraic Operations . . . . 99
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4 Persistence of Fuzzy RDF and Fuzzy RDF Schema . . . . . . . . . . . . . . . . 109
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2 Fuzzy RDF Mapping to Relational Databases . . . . . . . . . . . . . . . . . . . 111
4.2.1 Fuzzy Triple Stores Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.2.2 Fuzzy Horizontal Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3 Fuzzy RDF Mapping to Object-Oriented Databases . . . . . . . . . . . . . 118
4.3.1 Mapping of Fuzzy Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.3.2 Mapping of Fuzzy Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.3.3 Mapping of Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.4 Mapping of Fuzzy Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4 Fuzzy RDF Mapping to HBase Databases . . . . . . . . . . . . . . . . . . . . . . 127
4.4.1 Fuzzy RDF Storage in Fuzzy HBase . . . . . . . . . . . . . . . . . . . . 128
4.4.2 FHBase-Based RDF Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.4.3 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.5 Fuzzy RDF Graph Mapping to Property Graph . . . . . . . . . . . . . . . . . 141
4.5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.5.2 Transform Fuzzy RDF Graph to Property Graph . . . . . . . . . . 144
4.5.3 Query Fuzzy RDF Graph in Neo4j . . . . . . . . . . . . . . . . . . . . . . 146
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Contents xi

5 Fuzzy RDF Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph . . . . . . . . . . . . . 153
5.2.1 Graph Pattern Matching Problem . . . . . . . . . . . . . . . . . . . . . . . 154
5.2.2 RDF Graph Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.2.3 Fuzzy Graph Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.2.4 Query Evaluation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.3 Approximate Fuzzy RDF Subgraph Match Query . . . . . . . . . . . . . . . 166
5.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.3.2 The Matching Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph . . . . . . . . . . . . . . . . 183
5.4.1 Linguistic Quantifier and Fuzzy Quantified Statement . . . . . 184
5.4.2 Fuzzy Quantified Graph Patterns Matching . . . . . . . . . . . . . . 186
5.4.3 Fuzzy Quantified Graph Patterns Matching . . . . . . . . . . . . . . 192
5.5 Extended SPARQL for Fuzzy RDF Query . . . . . . . . . . . . . . . . . . . . . . 197
5.5.1 The Fuzzy Query Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
5.5.2 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Chapter 1
RDF Data and Management

1.1 Introduction

Recent years have witnessed a tremendous increase in the amount of data available
on the Web (Hassanzadeh et al., 2012). At the same time, Web 2.0 applications
have introduced new forms of data and have radically changed the nature of the
modern Web. In these applications, the Web has been transformed from a publish-
only environment into a vibrant forum for information exchange (Hassanzadeh et al.,
2012). The main purpose of the Semantic Web, proposed by W3C founder Tim
Berners-Lee in his description of the future of the Web (Berners-Lee et al., 2001), is
to provide a common framework for data sharing across applications, enterprises, and
communities. By giving data semantic meaning (through metadata), this framework
enables machines to consume, understand, and reason about the structure and purpose
of data.
The core of the Semantic Web is built on the Resource Description Framework
(RDF) data model (Manola & Miller, 2004). RDF provides a flexible and concise
model for representing metadata of resources on the Web. RDF can represent struc-
tured as well as unstructured data and is quickly becoming the de facto standard
for representation and exchange of information1 (Duan et al., 2011). Nowadays, the
RDF data model is finding increasing use in a wide range of Web data-management
scenarios and its use is now wider than the semantic web. Governments (e.g. from
the United States2 and United Kingdom3 ) and large companies and organizations
(e.g. New York Times,4 BBC,5 and Best Buy6 ) have started using RDF as a business

1 http://www.w3.org/RDF.
2 http://www.data.gov/.
3 http://www.data.gov.uk/.
4 http://data.nytimes.com/
5 http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup2010_dynamic_sem.html.
6 http://www.chiefmartec.com/2009/12/best-buy-jump-starts-data-webmarketing.html.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 1

Z. Ma et al., Modeling and Management of Fuzzy Semantic RDF Data,
Studies in Computational Intelligence 1057,
https://doi.org/10.1007/978-3-031-11669-8_1
2 1 RDF Data and Management

data model and representation format, either for semantic data integration, search-
engine optimization, and better product search, or to represent data from information
extraction. Yago (Suchanek et al., 2008) and DBPedia (Bizer et al., 2009) extract facts
from Wikipedia automatically and store them in RDF format to support structural
queries over Wikipedia; biologists encode their experiments and results using RDF to
communicate among themselves leading to RDF data collections, such as Bio2RDF
(bio2rdf.org) and Uniprot RDF (dev.isb-sib.ch/projects/uniprot-rdf). Furthermore, in
the Linked Open Data (LOD) cloud (Bizer et al., 2009), Web data from a diverse set
of domains like Wikipedia, films, geographic locations, and scientific data are linked
to provide one large RDF data cloud. With the increasing amount of RDF data which
is becoming available, efficient and scalable management of RDF data is of crucial
importance.
As a new data model, the RDF data-representation format largely determines how
to store and index RDF data and furthermore influences how to query RDF data.
Management of RDF data typically involves two primary technical challenges: scal-
able storage and efficient queries. Among these two issues, RDF data storage provides
the infrastructure for RDF data management. Many proposals of RDF queries have
been developed based on diverse query policies, (Ali et al., 2021) such as fuzzy
queries (Ma et al., 2016a, 2016b, 2016c), approximate queries (Yan et al., 2017),
keyword queries (Ma et al., 2018), natural language query (Hu et al., 2017), and so
on.
With the RDF format gaining widespread acceptance, much work is being done
in RDF data management, and a number of research efforts have been undertaken to
address these issues. Some RDF data-management systems have started to emerge
such as Sesame (Broekstra et al., 2002), Jena-TDB (Wilkinson et al., 2003), Virtuoso
(Erling & Mikhailov, 2007, 2009), 4store (Harris et al., 2009), BigOWLIM (Bishop
et al., 2011), SPARQLcity/SPARQLverse,7 MarkLogic,8 Clark & Parsia/Stardog,9
and Oracle Spatial and Graph with Oracle Database 12c.10 BigOWLIM was renamed
to OWLIM-SE and later on to GraphDB. In addition, some research prototypes have
been developed [e.g. RDF-3X (Neumann & Weikum, 2008, 2010); SW-Store (Abadi
et al., 2007, 2009), and RDFox11 ].

1.2 RDF Data Model

The purpose of the Semantic Web is to add semantic support to the existing Web,
so that machines can understand the meaning of information, so as to realize
the intelligent processing of Web information. This requires that machines be

7 http://sparqlcity.com/.
8 http://www.marklogic.com/.
9 http://clarkparsia.com/.
10 http://www.oracle.com/us/products/database/options/spatial/overview/index.html.
11 http://www.cs.ox.ac.uk/isg/tools/RDFox/.
1.2 RDF Data Model 3

provided with data describing Web data, that is, metadata. A universal metadata
model, RDF came into being. RDF is a framework for metadata and the corner-
stone of the Semantic Web. It provides interoperability between applications using
machine-understandable Web data.

1.2.1 RDF Basic Definitions

The RDF is a W3C Recommendation that has rapidly gained popularity. RDF
provides a means of expressing and exchanging semantic metadata (i.e. data that
specify semantic information about data). By representing and processing metadata
about information sources, RDF defines a model for describing relationships among
resources in terms of uniquely identified attributes and values.
In the RDF data model, the universe is modelled as a set of resources, where
a resource is anything that has a universal resource identifier (URI), including all
information on the Web, virtual concepts, or things in the real world, such as movies,
screenwriters, directors, countries, etc. And a resource can be described using a set
of RDF statements in the form of (subject, predicate, object) triples. Here subject is
the resource being described, predicate is the property being described with respect
to the resource, and object is the value for the property. RDF data set consists of
these statements. For example, the natural language expression “The director of the
movie Dinner is Barry Levinson” can be expressed by RDF statements:
• A subject: http://www.example.org/director/BarryLevinson
• A predicate: http://www.example.org/dc/elements/direct
• and an object: http://www.example.org/film/Dinner
Here, URIs are used to identify the subject, predicate, and object of the statement.
Note that both subject and object can be anonymous objects, known as blank
nodes. RDF uses these triples to describe resources and attach additional semantic
information to the resources.
It is possible to annotate RDF data with semantic metadata using RDFS (RDF
Schema) or OWL, both of which are W3C standards. This annotation primarily
enables reasoning over the RDF data (called entailment), that we do not consider
in this book. However, as we will see below, it also impacts data organization in
some cases, and the metadata can be used for semantic query optimization. We
illustrate the fundamental concepts by simple examples using RDFS, which allows
the definition of classes and class hierarchies. RDFS has built-in class definitions—
the more important ones being rdfs: Class and rdfs: subClassOf that are used to
define a class and a subclass, respectively. To specify that an individual resource is
an element of the class, a special property, rdf: type is used. For example, if we wanted
to define a class called Movies and two subclasses ActionMovies and Dramas, this
would be accomplished in the following way:
4 1 RDF Data and Management

Movies rdf: type rdfs: Class.

ActionMovies rdfs: subClassOf Movies.
Dramas rdfs: subClassOf Movies.
In addition, the RDF specification includes a built-in vocabulary with a normative
semantics (RDFS). This vocabulary deals with inheritance of classes and properties,
as well as typing, among other features (Brickley & Guha, 2004).

1.2.2 RDF Data Model

1.2.2.1 RDF Graphs

In this section, we introduce an abstract version of the RDF data model, which is
both a fragment following faithfully the original specification, and an abstract version
suitable to do formal analysis. The abstract syntax of RDF model is a set of triples.
Formally, an RDF triple is defined as (s, p, o) ∈ (U ∪ B) × U × (U ∪ L ∪ B), where
U, B, and L are infinite sets of URI, blank nodes, and RDF literals, respectively. In a
triple (s, p, o), s is called the subject, p the predicate (or property), and o the object.
The interpretation of a triple statement is that subject s has property p with value o.
Thus, an RDF triple can be seen as representing an atomic “fact” or a “claim”. Note
that any object in one triple, say oi in (si , pi , oi ), can play the role of a subject in
another triple, say (oi , pj , oj ). Therefore, RDF data is a directed, labelled graph data
format for representing Web resources.
There are many syntaxes available for writing RDF data and serializing RDF
data, such as N-Triples,12 RDF/XML,13 RDFa,14 JSON LD,15 Notation 3 (N3),16
Turtle17 and so on. This section mainly introduces three common RDF representation
methods: N-Triples, RDF/XML and graph-based representation grammar. Suppose
there are the following three statements in the RDF data:
Statement 1: Barry Levinson is the director of Dinner.
Statement 2: Barry Levinson’s age is 77.
Statement 3: Barry Levinson’s nationality is USA.
(a) N-Triples triples representation grammar
N-Triples aims to express RDF in a concise and intuitive syntax and provide
shortcuts to commonly used RDF functions. This grammar is based on the
definition of a statement. A statement consists of three parts: subject, attribute,

12 http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/#ntriples.
13 http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/.
14 https://www.w3.org/TR/rdfa-primer/.
15 https://json-ld.org/.
16 http://www.w3.org/TeamSubmission/n3/.
17 http://www.w3.org/TeamSubmission/turtle/.
1.2 RDF Data Model 5

and object. Therefore, each RDF statement is expressed as a <subject, attribute,

and object> triplet. The above three statements are expressed in triple grammar
as follows:
<http://www.example.org/film/Dinner, http://www.example.org/dc/ele
ments/direct, http://www.example.org/director/Barry Levinson>.
<http://www.example.org/director/Barry Levinson, http://www.example.
org/dc/elements/age,“53”>
<http://www.example.org/director/Barry Levinson, http://www.example.
org/dc/elements/nationality,“USA”>
As shown above, http://www.example.org/film/Dinner and http://www.exa
mple.org/director/BarryLevinsonn are two subjects, http://www.example.org/
dc/element/direct, http://www.example.org/dc/elements/age and http://www.
example.org/dc/elements/nationality are three predicates. “77” and “USA” are
property values, written in quotation marks. If RDF data is expressed in this
syntax, an RDF data set consists of many such triples.
(b) XML-based representation grammar
Since the RDF triplet model is abstract in nature, W3C introduced a standard
called RDF/XML. As the name suggests, RDF/XML involves encoding RDF
in XML format. The core principle behind RDF/XML is to use XML tools to
create and parse RDF serialization. RDF/XML is the recommended syntax for
applications to exchange RDF information.
To express RDF in XML syntax, there are several tags that need to be defined:
rdf: RDF as the root tag represents the beginning of an RDF document, and the
content of the tag is a series of descriptions. The rdf: Description tag represents
a description of a resource. As an attribute of the tag, rdf: about indicates the ID
of the resource to be described. The subtags of the rdf: Description tag are the
attribute names of the resources to be described, and the content of the subtags
is the corresponding attribute value. The rdf: resource attribute indicates that
the value of the attribute is a resource, and this attribute indicates the ID of the
referenced resource. The three statements in the above example are expressed
in XML-based syntax as follows:
<? xml version = “1.0” encoding = “UTF-16” ?>
<rdf: RDF xmlns:rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns#xml
ns: dc = “http://www.example.org/dc/elements/”>
<rdf: Description rdf: about = “http://www.example.org/director/BarryLevi
nson”>
<dc: age>53</dc: age>
<dc: nationality>USA</dc: nationality>
</rdf: Description>
<rdf: Description rdf: about = “http://www.example.org/film/Dinner”>
<dc: direct rdf: resource = “http://www.example.org/director/Elvin”/>
</rdf: Description>
</rdf: RDF>
6 1 RDF Data and Management

http://www.example.org/film/Dinner

http://www.example.org/dc/elements/direct

http://www.example.org/director/ Elvin

http://www.example.org/dc/elements/age http://www.example.org/dc/elements/nationality

53 USA

Fig. 1.1 Graphic-based RDF syntax representation

(c) Graph-based representation grammar

As the name of RDF graph already hints at, the RDF data model is essentially a
graph-based data model, albeit with special features such as vertexes that can act as
edge labels and no fundamental distinction between schemas and instances, which
can be represented in one and the same graph (Ma et al., 2016a, 2016b, 2016c). In a
finite set of RDF triples, any object from one triple can play the role of a subject in
another triple which amounts to chain two labeled edges in a graph-based structure.
As such, RDF triples datasets can be naturally represented as a directed labeled RDF
graph, each vertex corresponding to a subject (or object), and the edge representing
predicate. Figure 1.1 shows the corresponding RDF graph of the above example.
In Fig. 1.1, resources are represented by nodes, and three attributes are used as
directed edges. The literal values “77” and “United States” are also represented by
nodes.
Note that in RDF, the qualified name QName can be used to simplify the URIref.
If there is no special description, in the later chapters of this book, all the URIrefs
involved are in simplified form.

1.2.2.2 RDF Schema

The RDF specification includes a set of reserved words, the RDFS vocabulary [RDF
Schema (Brickley & Guha, 2004)], which is designed to describe relationships
between resources and properties like attributes of resources (traditional attribute-
value pairs). Roughly speaking, this vocabulary can be conceptually divided into the
following groups:
(a) A set of properties, which are binary relations between subject resources
and object resources: rdfs: subPropertyOf (denoted by sp in this book), rdfs:
subClassOf (sc), rdfs: domain (dom), rdfs: range (range) and rdf: type (type).
(b) A set of classes, that denote set of resources. Elements of a class are known
as instances of that class. To state that a resource is an instance of a class, the
reserved word type may be used.
1.2 RDF Data Model 7

(c) Other functionalities, like a system of classes and properties to describe lists,
and a system for doing reification.
(d) Utility vocabulary used to document, comment, etc. [the complete vocabulary
can be found in Brickley and Guha (2004)].
The groups in (b), (c) and (d) have a light semantics, essentially describing their
internal relationships in the ontological design of the system of classes of RDFS.
Their semantics is defined by a set of “axiomatic triples” (Hayes, 2004) which express
the relationships among these reserved words. All axiomatic triples are “structural”,
in the sense that do not refer to external data. Much of this semantics corresponds to
what in standard languages is captured via typing.
On the contrary, the group (a) is formed by predicates whose intended meaning is
non-trivial, and is designed to relate individual pieces of data external to the vocabu-
lary of the language. Their semantics is defined by rules which involve variables (to
be instantiated by actual data). For example, rdfs: subClassOf (sc) is a reflexive and
transitive binary property; and when combined with rdf: type (type) specifies that
the type of an individual (a class) can be lifted to that of a superclass.
The group (a) forms the core of the RDF language and, from a theoretical point of
view, it has been shown to be a very stable core to work with [the detailed arguments
supporting this claim are given in Munoz et al. (2007)]. Thus, throughout the charpter
we focused on the fragment of RDFS given by the set of keywords {sp, sc, type, dom,
range}.

1.2.3 RDF Semantics

In this section, we present the formalization of the semantics of RDF. The normative
semantics for RDF graphs given in Hayes (2004), and the mathematical formalization
in Marin (2004) follows standard classical treatment in logic with the notions of
model, interpretation, entailment, and so on.
Model theory assumes that the language refers to a ‘world’, and describes the
minimal conditions that a world must satisfy to assign an appropriate meaning for
every expression in the language. A particular world is called an interpretation, so
that model theory might be better called ‘interpretation theory’. The idea is to provide
an abstract, mathematical account of the properties that any such interpretation must
have, making as few assumptions as possible about its actual nature or intrinsic
structure, thereby retaining as much generality as possible.
All interpretations will be relative to a set of names, called the vocabulary of the
interpretation, so that one should speak, strictly, of an interpretation of an RDF vocab-
ulary, rather than of RDF itself. Some interpretations may assign special meanings
to the symbols in a particular vocabulary. Interpretations which share the special
meaning of a particular vocabulary will be named for that vocabulary, e.g. ‘rdf-
interpretations’, ‘rdfs-interpretations’, etc. An interpretation with no particular extra
8 1 RDF Data and Management

conditions on a vocabulary (including the RDF vocabulary itself) will be called a

simple interpretation, or simply an interpretation.
Next, we present the simplification of the normative semantics proposed in Munoz
et al. (2007). An RDF interpretation is a tuple I = (Res, Prop, Class, PExt, CExt, Int),
where (1) Res is a nonempty set of resources, called the domain or universe of I; (2)
Prop is a set of property names (not necessarily disjoint from Res); (3) Class ⊆ Res
is a distinguished subset of Res identifying if a resource denotes a class of resources;
(4) PExt: Prop → 2Res × Res , a mapping that assigns an extension to each property
name; (5) CExt: Class → 2Res a mapping that assigns a set of resources to every
resource denoting a class; (6) Int: U → Res ∪ Prop, the interpretation mapping, is a
mapping that assigns a resource or a property name to each element of U.
Intuitively, a ground triple (s, p, o) in a graph G is true under the interpretation I,
if p is interpreted as a property name, s and o are interpreted as resources, and the
interpretation of the pair (s, o) belongs to the extension of the property assigned to
p. Formally, we say that I satisfies the ground triple (s, p, o) if Int(p) ∈ Prop and
(Int(s), Int(o)) ∈ PExt (Int(p)). An interpretation must also satisfy additional condi-
tions induced by the usage of the RDFS vocabulary. For example, an interpretation
satisfying the triple (c1 , sc, c2 ) must interpret c1 and c2 as classes of resources, and
must assign to c1 a subset of the set assigned to c2 . More formally, we say that I
satisfy (c1 , sc, c2 ) if Int(c1 ), Int(c2 ) ∈ Class and CExt(c1 ) ⊆ CExt(c2 ).
Blank nodes work as existential variables. Intuitively, a triple (x, p, o) would be
true under I, where x is a blank node, if there exists a resource s such that (s, p, o)
is true under I. An arbitrary element can be chosen when interpreting a blank node,
with the restriction that all the occurrences of the same blank node in an RDF graph
must be replaced by the same value. To formally deal with blank nodes, an extension
of the interpretation mapping Int is used. Let A: B → Res be a function between blank
nodes and resources. Then Int A : UB → Res is defined as the extension of function
Int: Int A (x) = A(x) for x ∈ B, and Int A (x) = Int(x) for x ∈ U.
We next formalize the notion of model for an RDF graph (Hayes, 2004; Munoz
et al., 2007). We say that the RDF interpretation I = (Res, Prop, Class, PExt, CExt,
Int) is a model of (is an interpretation for) an RDF graph G, denoted by I |= G, if
the following conditions hold.
Simple Interpretation:
• there exists a function A: B → Res such that for each (s, p, o) ∈ G, it holds that
Int(p) ∈ Prop and (Int A (s), Int A (o)) ∈ PExt(Int(p)).
Properties and Classes:
• Int(sp), Int(sc), Int(type), Int(dom), Int(range) ∈ Prop,
• if (x, y) ∈ PExt(Int(dom)) ∪ PExt(Int(range)), then x ∈ Prop and y ∈ Class.
Sub-property:
• PExt(Int(sp)) is transitive and reflexive over Prop,
• if (x, y) ∈ PExt(Int(sp)), then x, y ∈ Prop and PExt(x) ⊆ PExt(y).
1.2 RDF Data Model 9

creates
type
Writer Novelist
sp

type type type issuing

writes
time
boule de
Henry X 1880
suif

Fig. 1.2 Example of an RDF graph

Sub-class:
• PExt(Int(sc)) is transitive and reflexive over Class,
• if (x, y) ∈ PExt (Int(sc)), then x, y ∈ Class and CExt(x) ⊆ CExt(y).
Typing:
• (x, y) ∈ PExt(Int(type)) if and only if y ∈ Class and x ∈ CExt(y),
• if (x, y) ∈ PExt(Int(dom)) and (u, v) ∈ PExt(x), then u ∈ CExt(y),
• if (x, y) ∈ PExt(Int(range)) and (u, v) ∈ PExt(x), then v ∈ CExt(y).

Example 1.1 Figure 1.2 shows an RDF graph storing information about writers.
All the triples in the graph are composed by elements in U, except for the triples
containing the blank node B. Consider now the interpretation I = (Res, Prop, Class,
PExt, CExt, Int) defined as follows:
• Res = {Writer, Henry, Novelist, creates, writes, boule de suif, 1880}.
• Prop = {creates, writes, issuing time, type, sp, sc, dom, range}.
• Class = {Writer, Novelist}.
• PExt is such that:

PExt(writes) = PExt(creates) = {(Henry, boule de suif)},

– PExt(issuing time) = {(boule de suif, 1880)},

– PExt(type) = {(Henry, Writer), (Henry, Writer)},
– PExt(sp) = {(writes, creates)} ∪ {(x, x)|x ∈ Prop},
– PExt(sc) = {(Novelist, Writer), (Novelist, Novelist), (Writer, Writer)},
– PExt(dom) = PExt(range) = ∅.

• CExt is such that CExt(Novelist) = CExt(Writer) = {Henry}.

• Int is the identity mapping over Res ∪ Prop.
10 1 RDF Data and Management

1.3 RDF Query Language SPARQL

In 2004, the RDF Data Access Working Group, part of the W3C Semantic Web
Activity, released a first public working draft of a query language for RDF, called
SPARQL (Prud’hommeaux & Seaborne, 2008). The name SPARQL is a recursive
acronym that stands for SPARQL Protocol and RDF Query Language. Since then,
SPARQL has been rapidly adopted as the standard for querying Semantic Web data.
In January 2008, SPARQL became a W3C Recommendation. In this section, we give
a detailed description of the syntax and Semantics of SPARQL. RDF is a directed
labeled graph data format and, thus, SPARQL is essentially a graph-matching query
language. We start by focusing on the syntax of SPARQL in the specification of
SPARQL by the W3C, and then introduce an algebraic syntax for the language and
compare it with the official syntax. Finally, we formalize the semantics of SPARQL.

1.3.1 The W3C Syntax of SPARQL

The syntax and semantics of SPARQL are specified by the RDF Data Access Working
Group (Prud’hommeaux & Seaborne, 2008). SPARQL is a language designed to
query data in the form of sets of triples, namely RDF graphs. The basic engine of
the language is a pattern matching facility, which uses some graph pattern matching
functionalities (sets of triples can be viewed also as graphs). From a syntactic point
of view, SPARQL language is similar to the SQL language, and the overall structure
consists of three main blocks.
The pattern matching part, which includes several interesting features of pattern
matching of graphs, like optional parts, union of patterns, nesting, filtering values of
possible matchings, and the possibility of choosing the data source to be matched
by a pattern. The solution modifiers, which once the output of the pattern has been
computed (in the form of a table of values of variables), allow to modify these values
applying classical operators like projection, distinct, order and limit. Finally, the
output of a SPARQL query can be of different types: yes/no queries, selections of
values of the variables which match the patterns, construction of new RDF data from
these values, and descriptions of resources.
In order to present the language, we follow the grammar given in Fig. 1.3 that
specifies the basic structure of the SPARQL Query Grammar (Prud’hommeaux &
Seaborne, 2008). There are several basic concepts used in the definition of the syntax
of SPARQL, many of which are taken from the RDF specification with some minor
modifications. For denoting resources, SPARQL uses IRIs instead of the URIs of
RDF. Anything represented by a literal could also be represented by an IRI, but it is
often more convenient or intuitive to use literals.
In what follows, we explain in more detail each component of the language. Of
course, for ultimate details the reader should consult the W3C Recommendation
(Prud’hommeaux & Seaborne, 2008).
1.3 RDF Query Language SPARQL 11

Fig. 1.3 A fragment of the SPARQL query grammar (Prud’hommeaux & Seaborne, 2008)

As shown in Fig. 1.3, a SPARQL Query is given by a Prologue followed by any

of the four types of SPARQL queries: SelectQuery, ConstructQuery, DescribeQuery
or AskQuery. The Prologue contains the declaration of variables, namespaces, and
abbreviations to be used in the query. The SELECT clause in a SelectQuery selects
a group of variables, or all of them using—as in SQL—the wildcard*. In this type of
queries, one can eliminate duplicate solutions using DISTINCT. In a ConstructQuery,
the CONSTRUCT form, and more specifically the ConstructTemplate form, is used
to constructs an RDF graph using the obtained solutions. In a DescribeQuery, the
DESCRIBE form is not normative (only informative). It is intended to describe the
specified variables or IRIs, i.e., it returns all the triples in the dataset involving these
resources. In an AskQuery, the ASK form has no parameters but the dataset to be
queried and a WHERE clause. It returns TRUE if the solution set is not empty, and
FALSE otherwise.
In a SPARQL query, the DatasetClause allows to specify one graph (the Default-
GraphClause) or a set of named graphs, i.e., a set of pairs of identifiers and graphs,
which are the data sources to be used when computing the answer to the query.
Moreover, the WHERE clause is used to indicate how the information from the data
sources is to be filtered, and it can be considered the central component of the query
language. It specifies the pattern to be matched against the data sources. In partic-
ular, it includes sets of triples with some of the IRIs or blank elements replaced
by variables, called “triple blocks” (TB in the grammar), an operator for collecting
triples and blocks (denoted by {A. B}, and with no fixed arity), an operator UNION
for specifying alternatives, an operator OPTIONAL to provide optional matchings,
and an operator FILTER that allows filtering results of patterns under certain basic
constraints.
12 1 RDF Data and Management

Example 1.2 (Pérez et al., 2006a, 2006b) Consider the following query: “Give the
name and the mailbox of each person who has a mailbox with domain.cl”. This query
can be expressed in SPARQL as follows:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX ex: <http://example.com/ns#>
SELECT ?name ?mbox
FROM <myDataSource.rdf>
WHERE {
?x foaf:name ?name.
?x foaf:mbox ?mbox.
?mbox ex:domain “.cl”
}
The first two lines in this example form the Prologue of the query, which specifies
the namespaces to be used. In this case, one is the well-known FOAF ontology, and
the other one is an example namespace. The keywords foaf and ex are abbreviations
for the namespaces, which are used in the body of the query.
The SELECT keyword indicates that the query returns a table with two columns,
corresponding to the values obtained from the matching of the variables ?name and
?mbox against the graph pointed to in the FROM clause (myDataSource.rdf), and
according to the pattern described in the WHERE clause. It should be noticed that a
string starting with the symbol ? denotes a variables in SPARQL.
In the above query, the WHERE clause is composed by a pattern with three triples:
?x foaf:name ?name, ?x foaf:mbox ?mbox and ?mbox ex:domain “.cl”, where.cl is
a literal. This pattern indicates that one is looking for the elements ?x, ?name and
?mbox in the RDF graph myDataSource.rdf such that the foaf:name of ?x is ?name,
the foaf:mbox of ?x is ?mbox and the ex:domain of ?mbox is.cl. Thus, an expression
of the form {A. B} in SPARQL denotes the conjunction of A and B, as this expression
holds if both A and B holds.

1.3.2 The Algebraic Syntax of SPARQL Graph Patterns

RDF is a directed labeled graph data format and, thus, SPARQL is essentially a
graph-matching query language. In this section, we present the algebraic syntax of
the core fragment of SPARQL graph patterns proposed in (Arenas et al., 2009; Pérez
et al., 2006a, 2006b, 2009), and show that it is equivalent in expressive power to the
core fragment of SPARQL. Thus, this formalization is used in this chapter to give a
formal semantics to SPARQL.
The official syntax of SPARQL (Prud’hommeaux & Seaborne, 2008) considers
operators OPTIONAL, UNION, FILTER, and concatenation via a point symbol
(.), to construct graph pattern expressions. The syntax also considers {} to group
patterns, and some implicit rules of precedence and association. For example, the
1.3 RDF Query Language SPARQL 13

point symbol (.) has precedence over OPTIONAL, and OPTIONAL is left associative.
In order to avoid ambiguities in the parsing of expressions, Pérez and Arenas et al.
present a syntax of SPARQL graph patterns in a more traditional algebraic formalism,
using binary operators AND (.), UNION (UNION), OPT (OPTIONAL), and FILTER
(FILTER). They fully parenthesize expressions making explicit the precedence and
association of operators.
Assume the existence of a set of variables V disjoint from U. A SPARQL graph
pattern expression is defined recursively as follows:
(a) A tuple from (U ∪ V ) × (U ∪ V ) × (U ∪ V ) is a graph pattern (a triple pattern).
(b) If P1 and P2 are graph patterns, then expressions (P1 AND P2 ), (P1 OPT P2 ),
and (P1 UNION P2 ) are graph patterns (conjunction graph pattern, optional
graph pattern, and union graph pattern, respectively).
(c) If P is a graph pattern and R is a SPARQL built-in condition, then the expression
(P FILTER R) is a graph pattern (a filter graph pattern).
A SPARQL built-in condition is constructed using elements of the set U ∪ V and
constants, logical connectives (¬, ∧, ∨), inequality symbols (<, ≤, ≥, >), the equality
symbol (=), unary predicates like bound, isBlank, and isIRI, plus other features (see
[15] for a complete list). In this chapter, we restrict to the fragment where the built-in
condition is a Boolean combination of terms constructed by using = and bound, that
is:
(a) If ?X, ?Y ∈ V and c ∈ U, then bound(?X), ?X = c and ?X = ?Y are built-in
conditions.
(b) If R1 and R2 are built-in conditions, then (¬R1 ), (R1 ∨ R2 ) and (R1 ∧ R2 ) are
built-in conditions.
In the rest of the book, we use var(*) to denote the set of variables occurring in *,
where * be a SPARQL graph pattern P or a built-in condition R.
We conclude the definition of the algebraic framework by describing the formal
syntax of the SELECT query result form. A SELECT SPARQL query is simply a
tuple (W, P), where P is a SPARQL graph pattern expressions and W is a set of
variables such that W ⊆ var(P).

1.3.3 Semantics of SPARQL

In this section, we present a streamlined version of the core fragment of SPARQL

with precise algebraic syntax and a formal compositional semantics based on Pérez
et al. (2006a, 2006b). The definition of a formal semantics for SPARQL has played
a key role in the standardization process of this query language. Although taken one
by one the features of SPARQL are intuitive and simple to describe and understand,
it turns out that the combination of them makes SPARQL into a complex language.
Reaching a consensus in the W3C standardization process about a formal semantics
for SPARQL was not an easy task. The initial efforts to define SPARQL were driven
14 1 RDF Data and Management

by use cases, mostly by specifying the expected output for particular example queries.
In fact, the interpretations of examples and the exact outcomes of cases not covered
in the initial drafts of the SPARQL specification, were a matter of long discussions
in the W3C mailing lists. Pérez et al. (2006a, 2006b) presented one of the first
formalizations of a semantics for a fragment of the language. Currently, the official
specification of SPARQL (Prud’hommeaux & Seaborne, 2008), endorsed by the
W3C, formalizes a semantics based on Pérez et al. (2006a, 2006b).
The semantics of SPARQL is formalized by using partial mappings between
variables in the patterns and actual values in the RDF graph being queried. To define
the semantics of SPARQL graph pattern expressions, we need to introduce some
terminology. A mapping μ from V to U is a partial function μ: V → U. Abusing
notation, for a triple pattern t we denote by μ(t) the triple obtained by replacing the
variables in t according to μ. The domain of μ, denoted by dom(μ), is the subset of
V where μ is defined. The empty mapping μΦ is a mapping such that dom(μΦ ) = Φ
(i.e. μΦ = Φ). Given a triple pattern t and a mapping μ such that var(t) ⊆ dom(μ),
μ(t) is the triple obtained by replacing the variables in t according to μ. Similarly,
given a basic graph pattern P and a mapping μ such that var(P) ⊆ dom(μ), we have
that μ(P) = ∪ t ∈P {μ(t)}, i.e. μ(P) is the set of triples obtained by replacing the
variables in the triples of P according to μ.
To define the semantics of more complex patterns, we need to introduce some
more notions. Two mappings μ1 and μ2 are compatible when for all ?X ∈ dom(μ1 )
∩ dom(μ2 ), it is the case that μ1 (?X) = μ2 (?X), i.e. when μ1 ∪ μ2 is also a mapping.
Intuitively, μ1 and μ2 are compatibles if μ1 can be extended with μ2 to obtain a new
mapping, and vice versa. Note that two mappings with disjoint domains are always
compatible, and that the empty mapping μΦ (i.e. the mapping with empty domain)
is compatible with any other mapping.
Let Ω 1 and Ω 2 be sets of mappings, the join of, the union of and the difference
between Ω 1 and Ω 2 are defined as:
Ω 1 ⨝ Ω 2 = {μ1 ∪ μ2 |μ1 ∈ Ω 1 , μ2 ∈ Ω 2 and μ1 , μ2 are compatible mappings},
Ω 1 ∪ Ω 2 = {μ|μ ∈ Ω 1 or μ ∈ Ω 2 },
Ω 1 \Ω 2 = {μ ∈ Ω 1 |for all μ' ∈ Ω 2 , μ and μ' are not compatible}.
Based on the previous operators, the left outer-join are defined as:
Ω 1 ⟕ Ω 2 = (Ω 1 ⨝ Ω 2 ) ∪ (Ω 1 \Ω 2 ).
Intuitively, Ω 1 ⨝ Ω 2 is the set of mappings that result from extending mappings
in Ω 1 with their compatible mappings in Ω 2 , and Ω 1 \Ω 2 is the set of mappings in
Ω 1 that cannot be extended with any mapping in Ω 2 . The operation Ω 1 ∪ Ω 2 is the
usual set theoretical union. A mapping μ is in Ω 1 ⟕ Ω 2 if it is the extension of a
mapping of Ω 1 with a compatible mapping of Ω 2 , or if it belongs to Ω 1 and cannot
be extended with any mapping of Ω 2 . These operations resemble relational algebra
operations over sets of mappings (partial functions).
We are ready to define the semantics of graph pattern expressions as a function
[[·]]G which takes a pattern expression and returns a set of mappings. We follow the
approach in Gutierrez et al. (2011) defining the semantics as the set of mappings that
1.3 RDF Query Language SPARQL 15

matches the graph G. For the sake of readability, the semantics of filter expressions
is presented in a separate definition.
Let G be an RDF graph and P be a graph pattern. The evaluation of P over G,
denoted by [[P]]G , is defined recursively as follows (Arenas et al., 2009):
(a) if P is a triple pattern t, then [[P]]G = { μ|dom(μ) = var(t) and μ(t) ∈ G} .
(b) if P is (P1 AND P2 ), then [[P]]G =[[P1 ]]G ⨝ [[P2 ]]G .
(c) if P is (P1 OPT P2 ), then [[P]]G =[[P1 ]]G ⟕ [[P2 ]]G .
(d) if P is (P1 UNION P2 ), then [[P]]G =[[P1 ]]G ∪ [[P2 ]]G .
The idea behind the OPT operator is to allow for optional matching of patterns.
Consider pattern expression ((P1 OPT P2 ) and let μ1 be a mapping in [[P1 ]]G . If there
exists a mapping μ2 ∈ [[P2 ]]G such that μ1 and μ2 are compatible, then (μ1 ∪ μ2 ) ∈
[[(P1 OPT P2 )]]G . But if no such a mapping μ2 exists, then μ1 ∈ [[(P1 OPT P2 )]]G .
Thus, operator OPT allows information to be added to a mapping μ if the information
is available, instead of just rejecting μ whenever some part of the pattern does not
match.
The semantics of filter expressions goes as follows. Given a mapping μ and a
built-in condition R, we say that μ satisfies R, denoted by μ |= R, if:
(a) R is bound(?X) and ?X ∈ dom(μ);
(b) R is ?X = c, ?X ∈ dom(μ) and μ(?X) = c;
(c) R is ?X = ?Y, ?X ∈ dom(μ), ?Y ∈ dom(μ) and μ(?X) = μ(?Y );
(d) R is (¬R1 ), R1 is a built-in condition, and it is not the case that μ |= R1 ;
(e) R is (R1 ∨ R2 ), R1 and R2 are built-in conditions, and μ |= R1 or μ |= R2 ;
(f) R is (R1 ∧ R2 ), R1 and R2 are built-in conditions, μ |= R1 and μ |= R2 .
Let G be an RDF graph and (P FILTER R) be a filter expression. The evaluation
of filter expression over G is defined as [[(P F I L T E R R)]]G = { μ ∈ [[P]]G |μ| =
R}.
Several algebraic properties of graph patterns are proved by Pérez et al. (2006a,
2006b). A simple property is that AND and UNION are associative and commutative.
This permits us to avoid parenthesis when writing sequences of AND operators or
UNION operators.
The official W3C Recommendation (Prud’hommeaux & Seaborne, 2008) defines
four query forms, namely SELECT, ASK, CONSTRUCT, and DESCRIBE queries.
These query forms use the mappings obtained after the evaluation of a graph pattern
to construct result sets or RDF graphs. The query forms are: (1) SELECT, that
performs a projection over a set of variables in the evaluation of a graph pattern,
(2) CONSTRUCT, that returns an RDF graph constructed by substituting variables
in a template, (3) ASK, that returns a truth value indicating whether the evaluation
of a graph pattern produces at least one mapping, and (4) DESCRIBE, that returns
an RDF graph that describes the resources found. In this paper, we only consider
the SELECT query form. We refer the reader to Pérez et al. (2006a, 2006b) for a
formalization of the remaining query forms.
To formally define the semantics of SELECT SPARQL queries, we need the
following notion. Given a mapping μ: V → U and a set of variables W ⊆ V, the
16 1 RDF Data and Management

restriction of μ to W, denoted by μ|W , is a mapping such that dom(μ|W ) = dom(μ)

∩ W and μ|W (?X) = μ(?X) for every ?X ∈ dom(μ) ∩ W.

Definition 1.1 A SPARQL SELECT query is a tuple (W, P), where P is a graph
pattern and W is a set of variables such that W ⊆ var(P). The answer of (W, P) over
an RDF graph G, denoted by [[(W, P)]]G , is the set of mappings:

[[(W, P)]]G = { μ|W |μ ∈ [[P]]G } .

1.4 RDF Data Store

RDF plays an important role in representing Web resources in a natural and flexible
way. As the amount of RDF datasets increasingly growing, efficient and scalable
management of RDF data is therefore of increasing importance. RDF data manage-
ment has attracted attention in the database and Semantic Web communities. Much
work has been devoted to proposing different solutions to store RDF data efficiently.
In this section, we focus on RDF data storage and present a full up-to-date overview
of the current state of the art in RDF data storage based on the work by Ma et al.
(2016a, 2016b, 2016c). The various approaches are classified according to their
storage strategy, including RDF data stores in traditional databases and RDF data
stores in NoSQL databases. Figure 1.4 illustrates this classification for RDF data
stores. Note that, two different levels of RDF data storage can be distinguished:
logical storage and physical storage. This chapter mainly focusses on logical storage
of RDF data.
Traditionally, databases are classified into relational databases and object-oriented
databases. In addition, NoSQL databases have only recently emerged as a commonly
used infrastructure for handling big data. So, two top categories of RDF data stores
in Fig. 1.4 are traditional database stores and NoSQL database stores, respectively.
For the traditional database stores, corresponding to two kinds of traditional database
models, two categories of RDF data stores in the traditional database stores are
relational stores and object-oriented stores, which apply relational database and

RDF Data Stores

Traditional database stores NoSQL database stores

Relational stores Object-oriented stores Graph databases stores

Vertical stores Horizontal stores Type stores Key-value stores Column-family stores Document stores

Fig. 1.4 Classification of resource description framework (RDF) data stores

1.4 RDF Data Store 17

object-oriented database models, respectively. Depending on concrete data models

adopted, the NoSQL database stores are categorized into key-value stores, column-
family stores, document stores, and graph databases. Note that in the relational stores
of RDF data, several different relational schemas can be designed, depending on how
to distribute RDF triples to an appropriate relational schema. This results in three
major categories of RDF relational stores, which are vertical stores, horizontal stores,
and type stores. They are formally illustrated in the following section.

1.4.1 RDF Stores in Traditional Databases

A number of attempts have been made to use traditional databases to store RDF
data, and various storage schemes for RDF data have been proposed. Some ideas
and techniques developed earlier for object-oriented databases, for example, have
already been adapted to the RDF setting. RDF data were stored in an object-oriented
database by mapping both triples and resources to objects in Bönström et al. (2003).
An object-oriented database model was proposed for storage of RDF documents
(Chao, 2007a, 2007b), but the RDF documents were encoded in XML (eXtensible
Markup Language).
Relational database management systems (RDBMSs) are currently the most
widely used databases. It has been shown that RDBMSs are very efficient, scal-
able, and successful in hosting various types of data, including some new types of
data such as XML data, temporal/spatial data, media data, and complex objects.
Currently, more mature RDF systems use RDBMSs to store RDF data, map RDF
triples with relational table structures, and use RDBMS for storage and retrieval.
According to the different table structure designed, the storage of RDF data can be
divided into three methods (Luo et al., 2012; Sakr & Al-Naymat, 2009), namely the
vertical stores, the horizontal stores, and the property stores.

1. Vertical stores

Vertical stores (also called triple stores, e.g. Broekstra et al. 2002, Harris and Gibbins
2003, Harris and Shadbolt 2005, Neumann and Weikum 2008, 2010, Weiss et al.
2008) use a single relational table to store a set of RDF statements, in which the
relational schema contains three columns for subject, property, and object. Formally,
each triple, say (s, p, o), occurs in the relational table as a row, that is, tuple <s, p,
o>. Here subject s is placed in column subject of this row, predicate p is placed in
column property of this row, and object o is placed in column object of this row. When
performing an RDF query, given a SPARQL query, a query rewriting mechanism
is designed to convert the SPARQL into a corresponding SQL statement, and the
relational database will answer the SQL statement. Although this method has good
versatility, the query performance is poor, and a lot of self-join operations need to
be performed when the query is executed. Moreover, because vertical stores quickly
encounter scalability limitations, several approaches have been proposed to deal with
18 1 RDF Data and Management

these limitations by using extensive sets of indices or by using selectivity estimation

information to optimize the join ordering.
Sesame, a generic architecture for storing and querying RDF and RDF schema, is
introduced by Broekstra et al. (2002). An important feature of the Sesame architecture
is its abstraction from the details of any particular repository used for the actual
storage. Therefore, Sesame can be ported to a large variety of different repositories.
The implementation of Sesame (Broekstra et al., 2002) uses both PostgreSQL and
MySQL as database platforms.
An RDF storage scheme called Hexastore RDF is proposed by Weiss et al. (2008).
This scheme enhances the vertical partitioning idea and takes it to its logical conclu-
sion. RDF data are indexed in six possible ways, one for each possible ordering of
the three RDF elements. Each instance of an RDF element is associated with two
vectors; each such vector gathers elements of one of the other types, along with
lists of the third resource type attached to each vector element. Hence, a sextuple
indexing scheme emerges. This format enables quick and scalable general-purpose
query processing; it confers significant advantages (up to five orders of magnitude)
over previous approaches for RDF data management, at the price of a worst-case five-
fold increase in index space. Note that Hexastore focusses on exhaustive indexing of
pairs of positions in triples such as SP, SO, …, OP. Being different from Hexastore,
RDF-3X focusses on exhaustive indexing off all permutations of triples of positions
such as SPO, SOP, …, OPS and TripleT (Wolff et al., 2015) focusses on exhaustive
indexing of all single positions, S, P, and O.
2. Horizontal stores
The second approach for storing RDF data is called horizontal stores (e.g. Abadi
et al., 2007, 2009). Under the horizontal representation, RDF data can be stored
directly in a single table. This table has one column for each predicate occurring in
the RDF graph and one row for each subject. Formally, for a triple (s, p, o), object o
is placed in column p of row s. Note that for two triples, say (si , pi , oi ) and (sj , pj , oj ),
one may have si = sj and either pi = pj or pi /= pj . At this point, if si = sj and pi /= pj ,
oi and oj are placed in different columns pi and pj of the same row. However, if si =
sj and pi = pj , oi and oj are placed in the same column of the same row, and a set of
values {oi , oj} results. Of course, it is possible that si /= sj and pi = pj . Then oi and oj
are placed in the same column of different rows si and sj . It is very common that for
any two triples (si , pi , oi ) and (sj , pj , oj ) in the context of massive RDF triples, they
have different subjects and different predicates, and oi and oj are placed in different
columns of different rows. As a result, row si has a null value in the pj column and
row sj has a null value in the pi column. This will lead to a sparse table with many
null values.
To solve these problems of null values and sets of values, efforts have been made
to partition a single table vertically into a set of property tables using predicates. Each
predicate has a table over the schema (subject, object), in which a binary relation
between a subject and an object with respect to the given predicate is represented.
Formally, for two triples, say (si , pi , oi ) and (sj , pj , oj ), if pi = pj , there are two tuples
<si , oi > and <sj , oj > in the same table. However, if pi /= pj , the tuples <si , oi > and
1.4 RDF Data Store 19

<sj , oj > occur in two different tables. It is clear that the number of relational tables
is the same as the number of predicates in the RDF data sets.
SW-Store was proposed by Abadi et al. (2007, 2009) as an RDF data store that
vertically partitions RDF data (by predicates) into a set of property tables, maps them
onto a column-oriented database, and builds a subject–object index on each property
table. Note that the implementation of SW-Store relies on the C-Store column-store
database (Stonebraker et al., 2005) to store tables as collections of columns rather
than as collections of rows. Current relational database systems, for example, Oracle,
DB2, SQL Server, and Postgres, are standard row-oriented databases in which entire
tuples are stored consecutively. In addition, the results of an independent evaluation
of SW-Store are reported by Sidirourgos et al. (2008).
Extending the SW-Store approach, an approach called SPOVC is proposed by
Mulay and Kumar (2012). The main techniques used in this approach are horizontal
partitioning of logical indices and special indices for values and classes. The SPOVC
approach uses five indices, namely, subject, predicate, object, value, and class, on
top of column-oriented databases.

3. Property stores

The third approach for storing RDF data is called property stores (e.g. Levandoski and
Mokbel 2009, Matono et al. 2005, Sintek and Kiesel 2006), in which one relational
table is created for each RDF data type and a relational table contains the properties
as n-ary table columns for the same subject.
Actually, property stores are type-oriented stores (Bornea et al., 2013). The basic
idea of this approach is to divide one wide table into multiple smaller tables so that
each table contains related predicates as its columns. Formally, for two triples, say
(si , pi , oi ) and (sj , pj , oj ), suppose that pi and pj are related. Then these two triples
occur in the same table, with one row for each subject.
Furthermore, when si = sj and pi /= pj , oi and oj are placed in different columns
pi and pj of the same row; when si = sj and pi = pj , oi and oj are placed in the same
column of the same row, and a set of values {oi , oj } results; when si /= sj and pi
= pj , oi and oj are placed in the same column of different rows si and sj . It is not
difficult to see that designing a schema for property tables depends on identifying
related predicates.
Jena is an open-source toolkit for Semantic Web programmers (McBride, 2002).
It implements persistence for RDF graphs using an SQL database through a JDBC
connection. Jena has evolved from its first version, Jena 1, to a second version, Jena
2. In the Jena RDF, the grouping of predicates is defined by applications (Wilkinson
et al., 2003; Wilkinson, 2006). Applications typically have access patterns in which
certain subjects or properties are accessed together. In particular, the application
programmer must specify which predicates are multivalued. For each such multi-
valued predicate p, a new relational table is created, with a schema consisting of
subject and p. Jena also supports so-called property-class tables, in which for each
value of the rdf: type predicate, a new table is created. The remaining predicates that
are not in any defined group are stored independently.
20 1 RDF Data and Management

Using a dynamic relation model, a system called FlexTable is proposed to store

RDF data (Wang et al., 2010). In FlexTable, all triples of an instance are coalesced
into one tuple, and all tuples are stored in relational schema. To partition all the triples
into several tables, first, a schema evolution method is proposed, based on a lattice
structure, to evolve schema automatically when new triples are inserted; second, a
page layout with an interpreted storage format is proposed to reduce the physical
adjustment cost during schema evolution.
For each subject s in the RDF graph G, a set of predicates satisfying {p|(∃ o) ∧ ((s,
p, o) ∈ G)} is obtained, which is called the signature of s (Sintek & Kiesel, 2006). The
predicates in the set are considered as related predicates. Then, for each signature,
a corresponding predicate relational table called the signature table is created. The
relational schema contains the subject and the set of predicates. Based mainly on
the concepts of signatures and signature tables which are organized in a lattice-like
structure, RDFBroker, an RDF store, was introduced by Sintek and Kiesel (2006).
Note that the approach by Sintek and Kiesel (2006) actually creates many small
tables. To improve query evaluation performance, various criteria are proposed for
merging small tables into larger ones, but this introduces null values when a predicate
is absent.
Based on RDF document structure, a storage scheme is proposed by Matono and
Kojima (2012), which stores RDF graphs without decomposition. Considering that
adjacent triples have a strong relationship and can be described for the same resource,
a set of adjacent triples that refer to the same resource is defined as an RDF paragraph.
Then the table layout is constructed based on RDF paragraphs (Luo et al., 2012).
Levandoski and Mokbel (2009) proposed a data-centric schema-creation approach
for storing RDF data in relational databases. With the aim of maximizing the size
of each group of predicates and meanwhile minimizing the number of null values
that occur in the tables, association rule mining is used to determine automatically
the predicates that often occur together. According to the support threshold, which
measures the strength of correlation between properties in the RDF data, properties
which are grouped together in the same cluster may constitute a single n-ary table,
and properties which are not grouped in any cluster may be stored in binary tables.
Finally, in the partitioning phase, the formed clusters are checked, and the trade-off is
made between storing as many RDF properties as possible in clusters while keeping
null storage to a minimum based on the null threshold. Actually, the approach by
Levandoski and Mokbel (2009) provides a tailored schema for each RDF data set,
using a balance between n-ary and binary tables.
Each triple in the form (subject, predicate, object) is classified into categories
by Matono et al. (2005) according to the type of predicate, and then subgraphs
are constructed for each category. The graph is decomposed into five subgraphs
according to the type of predicate: class inheritance graphs, property inheritance
graphs, type graphs, domain-range graphs, and generic graphs. Each subgraph is
stored by applicable techniques into distinct relational tables. More precisely, all
classes and properties are extracted from RDF schema data, and all resources are
also extracted from RDF data. Each extracted item is assigned an identifier and a path
expression and stored in the corresponding relational table. Because the proposed
1.4 RDF Data Store 21

scheme retains schema information and path expressions for each resource, the path-
based relational RDF database (Matono et al., 2005) can process path-based queries
efficiently and store RDF instance data without schema information.
Among the three approaches for storing RDF data in relational databases, vertical
stores use a fixed relational schema, and new triples can be inserted without consid-
ering RDF data types. Therefore, vertical stores can handle dynamic schema of RDF
data. However, vertical stores generally involve a number of self-join operations for
querying, and therefore efficient querying requires specialized techniques. To over-
come the problem of self-joins in vertical stores, horizontal stores using a single
relational table are proposed. However, it commonly occurs that in the single rela-
tional table containing all predicates as columns, a subject occurs only with certain
predicates, which leads to a sparse relational table with many null values. In addi-
tion, a subject may have multiple objects for the same predicate. Such a predicate is
called a multi-valued predicate. As a result, the relational table in a horizontal store
contains multi-valued attributes. Finally, when new triples are inserted, new predi-
cates result in changes to the relational schema, and dynamic schema of RDF data
cannot be handled. To solve the problem of null values as well as that of multi-valued
attributes, horizontal stores using a set of relational tables are proposed, where each
predicate corresponds to a relational table. However, horizontal stores using a set of
relational tables generally involve many join operations for querying. In addition,
when new triples are inserted, new predicates result in new relational tables, and
dynamic schema of RDF data cannot be handled. A vertical store in (p, s, o) shape
would equal the sequential concatenation of all tables in a horizontal store which
uses a set of relational tables.
The type-store approach is actually a trade-off between the two kinds of horizontal
stores. Compared with horizontal stores using a single relational table, type stores
contain fewer null values (no null values in horizontal stores using multiple relational
tables), and involve fewer join operations than horizontal stores using multiple rela-
tional tables (no join operations in horizontal stores using a single relational table).
It should be noted that, like horizontal stores using a single relational table, type
stores may contain multivalued attributes and new predicates, resulting in changes
to relational schema when new triples are inserted.
Some major features of relational RDF data stores are summarized in Table 1.1.

1.4.2 RDF Stores in Not Only SQL Databases

1. Graph model
RDF data has the characteristics of graph structure, so some work studies the storage
of RDF data from the perspective of graph model.
Bönström et al. (2003) first proposed to treat RDF data from the perspective of
XML format data or a collection of triples. The graph model contains more semantic
information in RDF data. They believe that the advantages of using graphs to store
22 1 RDF Data and Management

Table 1.1 Major features of relational resource description framework data stores
Join Multi-valued Null values Relational Number of
operations attributes schema relation(s)
Vertical stores More No No Fixed Fixed
self-joins
Horizontal stores
One table for all No Yes Yes and many Dynamic Fixed
predicates
One table for More joins No No Dynamic Dynamic
each predicate
Type stores Fewer joins Yes Yes and fewer Dynamic Dynamic

RDF data are: (i) RDF model and graph model structure can be directly mapped, when
RDF data is stored, the conversion of RDF data is avoided; (ii) RDF data is queried,
graph model Storing RDF data avoids refactoring. Angles and Gutierrez (2005)
discussed the problem of using graph databases to store RDF data, and compared
the relationship between relational models, object-oriented models, semantic models
and RDF models. In addition, in his work, he also studied the adaptability of graph
database query language to RDF data and the applicability of RDF query language
to graph data. The results show that most RDF query languages have low support
for some basic graph queries, even SPARQL does not Support path query and node
distance query in graph structure, but these queries are very important in practical
applications.
Udrea et al. proposed to use the GRIN algorithm to answer SPARQL queries. The
core of GRIN is to construct a GRIN index similar to the M-tree structure (Ciaccia
et al., 1997). Using distance constraints, GRIN can quickly determine and prune the
parts of the RDF graph that do not meet the query conditions, which improves query
performance as a whole. Wu et al. (2008) proposed using a hypergraph data model
to store RDF data, and designed a persistent storage strategy based on the graph
structure. Yan et al. (2009) proposed to divide the RDF graph into several subgraphs
and add indexes such as Bloom filters to quickly determine whether the data being
checked is in a certain subgraph during query. The graph segmentation technology is
used to reduce the self-connection of triples, but the update of the graph still needs to
be re-segmented. Zou et al. (2011) proposed using the gStore system to store RDF and
answer SPARQL queries. Through the coding method, each entity node, neighbor
attribute and attribute value in the RDF graph is coded into a node with Bitstring
to obtain a label graph G*. When querying, the query graph Q is also encoded into
a query label graph Q*, and then the subgraph matching method is used to find
the subgraph satisfying Q* in the label graph G*. Both Property Graphs (PG) and
RDF can be used to represent graph model data, but the conversion between the two
models is incompatible. To this end, Hartig (2014) proposed a formal definition of the
attribute graph model, and introduced a clear definition for the conversion between the
PG and RDF models. On the one hand, by implementing the RDF-to-PG conversion
definition, PG-based systems can enable users to load RDF data and enable them to
1.4 RDF Data Store 23

use the graph traversal language Gremlin or declarations in a compatible and system-
independent manner Figure query language Cypher. On the other hand, PG-to-RDF
conversion enables RDF data management systems to achieve compatible processing
of Property Graphs content by using SPARQL. Recently, De Virgilio (2017) proposed
an automatic conversion method from RDF to graph storage system. The conversion
uses the integrity constraints defined on the source to properly construct a target
database and attempts to reduce the number of visits required to answer the query in
the database. This is done by storing the same node data that may appear together in
the query results, and they have also developed a system to implement the conversion.
At present, the representative graph database products of RDF data mainly include
Neo4j18 and Dydra.19 Neo4j is currently a relatively mature and high-performance
open source graph database. Neo4j can traverse nodes and edges at the same speed,
and its traversal speed has nothing to do with the amount of data that constitutes the
graph. But it does not support distributed storage, and the existing research on using
Neo4j to store RDF and support SPARQL query is very few, which is limited to
some engineering applications. The Thinkerpop team developed LinkedDataSail,20
which provides an interface for processing RDF data in a graph database. Using this
interface, Neo4j can support SPARQL queries and can be used as a triple database.
Martella21 stored the DBpedia data set in Neo4j, and then expanded SPARQL
queries and other graph algorithms on this basis. Dydra is a cloud-based graph
database. Using Dydra, RDF data is directly stored as an attribute graph, directly
representing the relationship in the underlying RDF data, and can be accessed and
updated through an industry standard query language designed specifically for graph
processing. These works have applied graph databases to store RDF data, and used
graph algorithms to solve query problems.
2. NoSQL data-management systems
NoSQL data-management systems have emerged as a commonly used infrastruc-
ture for handling big data outside the RDF space. The various NoSQL data stores
were divided into four major categories by Grolinger et al. (2013): key-value stores,
column-family stores, document stores, and graph databases. Key-value stores have
a simple data model based on key-value pairs. Most column-family stores are derived
from Google BigTable (Chang et al., 2008), in which the data are stored in a
column-oriented way. In BigTable, the data set consists of several rows. Each row is
addressed by a primary key and is composed of a set of column families. Note that
different rows can have different column families. Representative column-family
stores include Apache HBase,22 which directly implements the Google BigTable
concepts. According to Grolinger et al. (2013), there is one type of column-family
store, say Amazon SimpleDB (Stein & Zachrias, 2010) and DynamoDB (DeCandia

18 http://neo4j.org/.
19 http://www.dydra.com.
20 https://github.com/thinkerpop/gremlin/wiki/linkeddatasail.
21 https://github.com/claudiomartella/dbpedia4neo.
22 https://hbase.apache.org/.
24 1 RDF Data and Management

et al., 2007), in which only a set of columns name value pairs is contained in each
row, without having column families. In addition, Cassandra (Lakshman & Malik,
2010) provides the additional functionality of super columns, which are formed by
grouping various columns together. Document stores provide another derivative of
the key-value store data model that uses keys to locate documents inside the data
store. Most document stores represent documents using JSON (JavaScript Object
Notation) or some format derived from it. Typically, CouchDB23 and the Couchbase
server24 use the JSON format for data storage, whereas MongoDB25 stores data in
BSON (Binary JSON). Graph databases use graphs as their data model, and a graph
is used to represent a set of objects, known as vertices or nodes, and the links (or
edges) that interconnect these vertices.
Illustrative representations of these NoSQL models were presented by Grolinger
et al. (2013) and are shown in Fig. 1.5.
Actually, massive RDF data-management merits the use of big-data infrastruc-
ture because of the scalability and high performance of cloud data management.
A number of efforts have been made to develop RDF data-management systems
based on NoSQL systems. SimpleDB by Amazon was used as a back end to store
RDF data quickly and reliably for massive parallel access (Stein & Zachrias, 2010).
Cloud-based key-value stores (e.g. BigTable) were used by Gueret et al. (2011), and
a robust query engine was developed over these key-value stores. In addition, there is
a new RDF store on the block, called SPARQLcity,26 which is a Hadoop-based graph
analytical engine for performing rich business analytics on RDF data with SPARQL.
SPARQLcity is the first just in time compiled engine in SPARQL query execution.
Given the fact that NoSQL systems offer either no support or only high latency
support (MapReduce) for effective join processing, however, SPARQL queries with
many joins, which is the mainstay, run into big problems on such systems. The normal
NoSQL APIs (Application Programming Interfaces) that are centered on individual
key lookup (whether one looks up a value, column-family, or document) simply have
too high latency if one has to join tens of thousands (or billions) of RDF triples.
Several NoSQL systems for RDF data were investigated by Cudre-Mauroux et al.
(2013), including document stores (e.g. CouchDB27 ), key-value/column stores (e.g.
Cassandra28 and HBase29 ), and query compilation for Hadoop (e.g. Hive30 ). Major
characteristics of these four NoSQL systems are described (Cudre-Mauroux et al.,
2013). First, Apache HBase is an open-source, horizontally scalable, row-consistent,
low-latency, and random-access data store. HBase uses HDFS as a storage back end

23 https://couchdb.apache.org/.
24 https://www.couchbase.com/products/server.
25 https://www.mongodb.com/.
26 https://www.hugedomains.com/domain_profile.cfm?d=sparqlcity.com.
27 https://couchdb.apache.org/.
28 https://cassandra.apache.org/_/index.html.
29 https://hbase.apache.org/.
30 https://hive.apache.org/query.
1.4 RDF Data Store 25

Key_1 Value_1 Dataset

Key_2 Value_2 Column-Family-1 Column-Family-2
Column Column Column
Key_3 Value_1 ROW Name-1 Name-2 Name-3
KEY-1
Column Column Column
Key_4 Value_3 Value-1 Value-2 Value-3

Key_5 Value_2
Column-Family-1
Key_6 Value_1 Column Column Column
ROW Name-3 Name-3 Name-3
Key_7 Value_4 KEY-2 Column Column Column
Name-3 Name-3 Name-3
Key_8 Value_3

(a) Key-value store. (b) Column-family store.

Dataset
Document-1 Key-Value Key-Value
Document_ld-1 Key-Value1
Node1 Node3

Document-2
Document_ld-2

Document-3
Document_ld-3
Key-Value
Document-4 Node2
Document_ld-4

(c) Document store. (d) Graph database

Fig. 1.5 Different types of NoSQL data model (Grolinger et al., 2013)

and Apache ZooKeeper31 to provide support for coordination tasks and fault toler-
ance. HBase is a column-oriented distributed NoSQL database system. Its data model
is a sparse, multi-dimensional sorted map. Here, columns are grouped into column
families, and timestamps add an additional dimension to each cell. HBase is well inte-
grated with Hadoop, which is a large-scale MapReduce computational framework.
The second HBase implementation uses Apache Hive, a SQL-like data-warehousing
tool that enables querying using MapReduce. Third, Couchbase32 is a document-
oriented, schema-less distributed NoSQL database system with native support for
JSON documents. Couchbase is intended to run mostly in-memory and on as many
nodes as needed to hold the whole data set in RAM (random-access memory). It has a
built-in object-managed cache to speedup random reads and writes. Updates to docu-
ments are first made in the in-memory cache and are only later processed to disk using
an eventual consistency paradigm. Finally, Apache Cassandra is a NoSQL database

31 https://zookeeper.apache.org/.
32 https://www.couchbase.com/.
26 1 RDF Data and Management

management system originally developed by Facebook (Lakshman & Malik, 2010),

which provides decentralized data storage and failure tolerance based on replication
and failover.
Among the NoSQL systems available, HBase has been the most widely used. To
manage distributed RDF data, HBase and MySQL Cluster were used by Franke et al.
(2011) to store RDF data. An empirical comparison of these two approaches was then
conducted on a cluster of commodity machines. The Hexastore (Weiss et al., 2008)
schema was applied for HBase to store verbose RDF data (Sun & Jin, 2010). Also
based on HBase, two distributed triple stores called H2RDF and H2RDF+ were devel-
oped to optimize distributed joins using MapReduce (Papailiou et al., 2012, 2013).
The main differences between H2RDF and H2RDF+ are found in the join algorithms
and the number of maintained indices (three vs. six). Combining the Jena frame-
work with the storage provided by HBase, Khadilkar et al. (2012) developed several
versions of a triple store. A scalable technique was created (Przyjaciel-Zablocki et al.,
2012) for performing indexed nested loop joins, which combines the power of the
MapReduce paradigm with the random-access pattern provided by HBase. Using
a combination of MapReduce and HBase, a storage schema called RDFChain was
proposed by Choi et al. (2013) to support scalable storage and efficient retrieval of a
large set of RDF data. Focussing on large data sets from various areas of provenance
which record the history of an in-silico experiment, large collections of provenance
graphs were serialized as RDF graphs in an Apache HBase database (Chebotko et al.,
2013). On this basis, storage, indexing, and query techniques for RDF data in HBase
were proposed, which are better suited for provenance data sets than generic RDF
graphs.
Another important category of NoSQL RDF storage is the concept of graph
databases (Angles & Gutierrez, 2008). Focussing on the structure of RDF data, these
data are viewed as a classical graph in which subjects and objects form the nodes and
triples specify directed and labelled edges. Note that here RDF graphs may contain
cycles and have labelled edges. Angles and Gutierrez (2005) surveyed graph database
models and query languages and they further propose that RQL should incorporate
graph database query language primitives.
Identifying that the standard graph database model (essentially labelled graphs)
is different from the triples-based RDF model, a triple-based model called Trial was
introduced by Libkin et al. (2013), which combines the usual idea of triple stores
used in many RDF implementations with that of graphs with data. Actually, one of
the major problems encountered in modelling RDF data as classical graphs is that
an edge or a labelled edge cannot represent the ternary relation given by an RDF
triple. It is natural to use hypergraphs for this purpose with three-node connecting
edges instead of classical two-node edges. Hypergraphs are represented naturally by
bipartite graphs (Hayes & Gutierrez, 2004), in which the concept of an RDF bipartite
graph is introduced as an intermediate model for RDF data.
With a focus on distributed and Web-scale RDF data management, a memory-
based graph engine called Trinity.RDF is introduced (Zeng et al., 2013). Trinity.RDF
models RDF data in its native graph form, in which entities (i.e. subjects and objects
of RDF triples) are represented as graph nodes and relationships (i.e. predicates of
References 27

RDF triples) are represented as graph edges. Each RDF entity is represented as a
graph node with a unique id and stored as a key-value pair in the Trinity memory
cloud. Formally, a key-value pair (node-id, <in-adjacency-list, out-adjacency-list>)
consists of the node-id as the key and the node research directions to explore in
managing voluminous adjacency list as the value. The adjacency list is divided into
two lists: one for neighbors with incoming edges and the other for neighbors with
outgoing edges. Each element in the adjacency list is a (predicate, node-id) pair,
which records the id of the neighbor and the predicate on the edge.

1.5 Summary

The RDF is increasingly being adopted for modeling data in various application
domains and has become a cornerstone for publishing, exchanging, sharing, and
interrelating data on the Web. The goal of this chapter is to give an overview of the
basics of the theory of RDF data and management. We start by providing a formal
definition of RDF that includes the features that distinguish this model from other
graph data models. We then move into the fundamental issue of querying RDF data.
We study the RDF query language SPARQL, which is a W3C Recommendation since
January 2008. We provide an algebraic syntax and a compositional semantics for this
language. We furthermore focus on RDF data storage and present a full up-to-date
overview of the current state of the art in RDF data storage strategy, including RDF
data stores in traditional databases and RDF data stores in NoSQL databases.
However, the traditional database models and RDF feature limitations, mainly
with what can be said about fuzzy information that is commonly found in many
application domains. In order to provide the necessary means to handle and manage
such information there are today a huge number of proposals for fuzzy extensions
to database models and RDF. In particular, Zadeh’s fuzzy set theory (Zadeh, 1965)
has been identified as a successful technique for modelling the fuzzy information in
many application areas, especially in the databases and RDF. In the next chapter, we
will briefly introduce the fuzzy set theory and fuzzy database models.

References

Abadi, D. J., Marcus, A., Madden, S., & Hollenbach, K. (2007). Scalable semantic web data manage-
ment using vertical partitioning. In Proceedings of the 33th International Conference on Very
Large Data Bases (pp. 411–422).
Abadi, D. J., Marcus, A., Madden, S., & Hollenbach, K. (2009). SW-store: A vertically partitioned
DBMS for semantic web data management. VLDB Journal, 18(2), 385–406.
Ali, W., Saleem, M., Yao, B., Hogan, A., & Ngomo, A. C. N. (2021). A survey of RDF stores &
SPARQL engines for querying knowledge graphs. The VLDB Journal, 1–26.
Angles, R., & Gutierrez, C. (2005). Querying RDF data from a graph database perspective. In
Proceedings of the Second European Semantic Web Conference (pp. 346–360).
28 1 RDF Data and Management

Angles, R., & Gutierrez, C. (2008). Survey of graph database models. ACM Computing Surveys,
40, 1:1–1:39.
Arenas, M., Gutierrez, C., & Pérez, J. (2009, August). Foundations of RDF databases. In Reasoning
Web International Summer School (pp. 158–204). Springer.
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific American, 284(5),
34–43.
Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z., & Velkov, R. (2011). OWLIM: A
family of scalable semantic repositories. Semantic Web, 2(1), 1–10.
Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data—The story so far. International Journal
of Semantic Web and Information Systems, 5(3), 1–22.
Bönström, V., Hinze, A., & Schweppe, H. (2003). Storing RDF as a graph. In Proceedings of the
First Conference on Latin American Web Congress, 27–36.
Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., & Bhat-
tacharjee, B. (2013). Building an efficient RDF store over a relational database. In Proceedings
of the 2013 ACM International Conference on Management of Data (pp. 121–132).
Brickley, D., & Guha, R. V. (2004). RDF Vocabulary Description Language 1.0: RDF Schema,
W3C Recommendation.
Broekstra, J., Kampman, A., & van Harmelen, F. (2002). Sesame: a generic architecture for storing
and querying RDF and RDF schema. In Proceedings of the 2002 International Semantic Web
Conference (pp. 54–68).
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes,
A., & Gruber, R. E. (2008). BigTable: A distributed storage system for structured data. ACM
Transactions on Computer Systems 26(2), 4:1–4:26.
Chao, C.-M. (2007a). An object-oriented approach for storing and retrieving RDF/RDFS documents.
Tamkang Journal of Science and Engineering, 10(3), 275–286.
Chao, C.-M. (2007b). An object-oriented approach to storage and retrieval of RDF/XML documents.
In Proceedings of the 19th International Conference on Software Engineering & Knowledge
Engineering (pp. 586–591).
Chebotko, A., Abraham, J., Brazier, P., Piazza, A., Kashlev, A., & Lu, S. (2013). Storing, indexing
and querying large provenance data sets as RDF graphs in Apache HBase. In Proceedings of
IEEE Ninth World Congress on Services (pp. 1–8).
Choi, P., Jung, J., & Lee, K.-H. (2013). RDFChain: Chain centric storage for scalable join processing
of RDF graphs using MapReduce and HBase. In Proceeding of the 2013 International Semantic
Web Conference (pp. 249–252).
Ciaccia, P., Patella, M., & Zezula, P. (1997, August). M-tree: An efficient access method for similarity
search in metric spaces. In Vldb (Vol. 97, pp. 426–435).
Cudre-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., Keppmann, F. L.,
Miranker, D. P., Sequeda, J. F., & Wylot, M. (2013). NoSQL databases for RDF: An empirical
evaluation. In Proceedings of the 12th International Semantic Web Conference (pp. 310–325).
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubrama-
nian, S., Vosshall, P., & Vogels, W. (2007). Dynamo: Amazon’s highly available key-value store.
In Proceedings of the 21st ACM Symposium on Operating Systems Principles (pp. 205–220).
De Virgilio, R. (2017). Smart RDF data storage in graph databases. In 2017 17th IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (pp. 872–881).
IEEE.
Duan, S., Kementsietsidis, A., Srinivas, K., & Udrea, O. (2011). Apples and oranges: A compar-
ison of RDF benchmarks and real RDF datasets. In Proceedings of the 2011 ACM SIGMOD
International Conference on Management of Data (pp. 145–156).
Erling, O., & Mikhailov, I. (2007). RDF support in the Virtuoso DBMS. In Proceedings of the 1st
Conference on Social Semantic Web (pp. 59–68).
Erling, O., & Mikhailov, I. (2009). Virtuoso: RDF support in a native RDBMS. In R. De Virgilio, F.
Giunchiglia, & L. Tanca (Eds.), Semantic Web Information Management (pp. 501–519). Springer.
References 29

Franke, C., Morin, S., Chebotko, A., Abraham, J., & Brazier, P. (2011). Distributed semantic web
data management in HBase and MySQL Cluster. In Proceedings of the 2011 IEEE International
Conference on Cloud Computing (pp. 105–112).
Grolinger, K., Higashino, W. A., Tiwari, A., & Capretz, M. A. M. (2013). Data management in
cloud environments: NoSQL and NewSQL data stores. Journal of Cloud Computing: Advances
Systems and Applications, 2, 22.
Gueret, C., Kotoulas, S., & Groth, P. (2011). TripleCloud: an infrastructure for exploratory querying
over web-scale RDF data. In Proceedings of the 2011 IEEE/WIC/ACM International Joint
Conference on Web Intelligence and Intelligent Agent Technology—Workshops (pp. 245–248).
Gutierrez, C., Hurtado, C. A., Mendelzon, A. O., & Pérez, J. (2011). Foundations of semantic web
databases. Journal of Computer and System Sciences, 77(3), 520–541.
Harris, S., & Gibbins, N. (2003). 3store: efficient bulk RDF storage. In Proceedings of the First
International Workshop on Practical and Scalable Semantic Systems
Harris, S., Lamb, N., & Shadbolt, N. (2009). 4store: The design and implementation of a clus-
tered RDF store. In Proceedings of the 5th International Workshop on Scalable Semantic Web
Knowledge Base Systems (pp. 94–109).
Harris, S., & Shadbolt, N. (2005). SPARQL query processing with conventional relational database
systems. In Proceedings of the International Workshop on Scalable Semantic Web Knowledge
Base Systems (pp. 235–244).
Hartig, O. (2014). Reconciliation of RDF and Property Graphs. arXiv preprint arXiv:1409.3288
Hassanzadeh, O., Kementsietsidis, A., & Velegrakis, Y. (2012). Data management issues on the
semantic web. In Proceedings of the 2012 IEEE International Conference on Data Engineering
(pp. 1204–1206).
Hayes, J., & Gutierrez, C. (2004). Bipartite graphs as intermediate model for RDF. In Proceedings
of the 2004 International Semantic Web Conference (pp. 47–61).
Hayes, P. (2004). RDF Semantics, W3C Recommendation. http://www.w3.org/TR/rdf-mt/
Hu, X., Dang, D., Yao, Y., & Ye, L. (2017). Natural language aggregate query over Rdf data.
Information Sciences, 454–455, 363–381.
Khadilkar, V., Kantarcioglu, M., Thuraisingham, B. M., & Castagna, P. (2012). Jena-HBase: A
distributed, scalable and efficient RDF triple store. In Proceedings of the 2012 International
Semantic Web Conference.
Lakshman, A., & Malik, P. (2010). Cassandra: A decentralized structured storage system. ACM
SIGOPS Operating System Review, 44(2), 35–40.
Levandoski, J. J., & Mokbel, M. F. (2009). RDF data-centric storage. In Proceedings of the 2009
IEEE International Conference on Web Services (pp. 911–918).
Libkin, L., Reutter, J. L., & Vrgoc, D. (2013). Trial for RDF: Adapting graph query languages for
RDF data. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles
of Database Systems (pp. 201–212).
Luo, Y., Picalausa, F., Fletcher, G. H. L., Hidders, J., & Vansummeren, S. (2012). Storing and
indexing massive RDF datasets. In R. De Virgilio, F. Guerra, & Y. Velegrakis (Eds.), Semantic
Search Over the Web (pp. 31–60). Springer.
Ma, R., Jia, X., Cheng, J., & Angryk, R. A. (2016a). SPARQL queries on RDF with fuzzy constraints
and preferences. Journal of Intelligent & Fuzzy Systems, 30(1), 183–195.
Ma, Z., Capretz, M. A., & Yan, L. (2016b). Storing massive resource description framework (RDF)
data: A survey. The Knowledge Engineering Review, 31(4), 391–413.
Ma, Z., Lin, X., Yan, L., & Zhao, Z. (2018). RDF keyword search by query computation. Journal
of Database Management (JDM), 29(4), 1–27.
Ma, Z. M., Capretz, M. A. M., & Yan, L. (2016c). Storing massive resource description framework
(RDF) data: A survey. Knowledge Engineering Review, 31(4), 391–413.
Manola, F., & Miller, E. (2004). RDF Primer, W3C Recommendation. http://www.w3.org/TR/2004/
REC-rdf-primer-20040210/.
30 1 RDF Data and Management

Marin, D. (2004). A formalization of RDF (applications de la logique á la sémantique du Web),

Tech. rep., École Polytechnique–Universidad de Chile, dept. Computer Science, Universidad de
Chile, TR/DCC-2006-8. http://www.dcc.uchile.cl/cgutierr/ftp/draltan.pdf
Matono, A., Amagasa, T., Yoshikawa, M., & Uemura, S. (2005). A path-based relational RDF
database. In Proceedings of the 16th Australasian Database Conference (pp. 95–103).
Matono, A., & Kojima, I. (2012). Paragraph tables: A storage scheme based on RDF document
structure. In Proceedings of the 23rd International Conference on Database and Expert Systems
Applications (pp. 231–247).
McBride, B. (2002). Jena: A semantic web toolkit. IEEE Internet Computing, 6(6), 55–59.
Morsey, M., Lehmann, J., Auer, S., & Ngomo, A. C. N. (2011). DBpedia SPARQL benchmark-
performance assessment with real queries on real data. In Proceedings of the 10th International
Semantic Web Conference (pp. 454–469).
Morsey, M., Lehmann, J., Auer, S., & Ngomo, A. C. N. (2012). Usage-centric benchmarking of
RDF triple stores. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence
(pp. 2134–2140).
Mulay, K., & Kumar, P. S. (2012). SPOVC: A scalable RDF store using horizontal partitioning
and column-oriented DBMS. In Proceedings of the 4th International Workshop on Semantic Web
Information Management.
Munoz, S., Pérez, J., & Gutiérrez, C. (2007). Minimal deductive systems for RDF. In European
Semantic Web Conference. Springer.
Neumann, T., & Weikum, G. (2008). RDF-3X: A RISC-style engine for RDF. Proceedings of the
VLDB Endowment, 1(1), 647–659.
Neumann, T., & Weikum, G. (2010). The RDF-3X engine for scalable management of RDF data.
The VLDB Journal, 19(1), 91–113.
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., & Koziris, N. (2013). H2RDF+: High-
performance distributed joins over large-scale RDF graphs. In Proceedings of the 2013 IEEE
International Conference on Big Data (pp. 255–263).
Papailiou, N., Konstantinou, I., Tsoumakos, D., & Koziris, N. (2012). H2RDF: Adaptive query
processing on RDF data in the cloud. In Proceedings of the 21st World Wide Web Conference
(pp. 397–400).
Pérez, J., Arenas, M., & Gutierrez, C. (2006a). Semantics and complexity of SPARQL. In
International Semantic Web Conference. Springer.
Pérez, J., Arenas, M., & Gutierrez, C. (2006b). Semantics of SPARQL. Technical Report,
Universidad de Chile TR/DCC-2006b-17.
Pérez, J., Arenas, M., & Gutierrez, C. (2009). Semantics and complexity of SPARQL. ACM
Transactions on Database Systems (TODS), 34(3), 1–45.
Prud’hommeaux, E., & Seaborne, A. (2008). SPARQL Query Language for RDF, W3C Recommen-
dation. http://www.w3.org/TR/rdf-sparql-query/
Przyjaciel-Zablocki, M., Schatzle, A., Hornung, T., Dorner, C., & Lausen, G. (2012). Cascading
map-side joins over HBase for scalable join processing. In CoRR 2012.
Sakr, S., & Al-Naymat, G. (2009). Relational processing of RDF queries: A survey. SIGMOD
Record, 38(4), 23–28.
Sidirourgos, L., Goncalves, R., Kersten, M. L., Nes, N., & Manegold, S. (2008). Column-store
support for RDF data management: Not all swans are white. Proceedings of the VLDB Endowment,
1(2), 1553–1563.
Sintek, M., & Kiesel, M. (2006). RDFBroker: A signature-based high-performance RDF store. In
Proceedings of the 3rd European Semantic Web Conference (pp. 363–377).
Stein, R., & Zachrias, V. (2010). RDF on cloud number nine. In Proceedings of the 4th Workshop
on New Forms of Reasoning for the Semantic Web: Scalable & Dynamic (pp. 11–23).
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A.,
Madden, S., O’Neil, E., Rasin, A., Tran, N., & Zdonik, S. (2005). C-Store: a column-oriented
DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases (pp. 553–
564).
References 31

Sun, J. L., & Jin, Q. (2010). Scalable RDF store based on HBase and MapReduce. In Proceedings
of the 3rd International Conference Advanced Computer Theory and Engineering (pp. V1-633–
V1-636).
Wang, Y., Du, X. Y., Lu, J. H., & Wang, X. F. (2010). FlexTable: using a dynamic relation model
to store RDF data. In Proceedings of the 15th International Conference on Database Systems for
Advanced Applications (pp. 580–594).
Weiss, C., Karras, P., & Bernstein, A. (2008). Hexastore: Sextuple indexing for semantic web data
management. Proceedings of the VLDB Endowment, 1(1), 1008–1019.
Wilkinson, K. (2006). Jena property table implementation. Technical Report HPL-2006-140, HP
Labs.
Wilkinson, K., Sayers, C., Kuno, H. A., & Reynolds, D. (2003). Efficient RDF storage and retrieval
in Jena2. In Semantic Web and Databases Workshop (pp. 131–150).
Wu, G., Li, J., & Wang, K. (2008, April). System II: A hypergraph based native RDF repository. In
Proceedings of the 17th international Conference on World Wide Web (pp. 1035–1036).
Wolff, B. G. J., Fletcher, G. H. L., & Lu, J. J. (2015). An extensible framework for query optimization
on TripleT-based RDF stores. In Proceedings of the Workshops of the EDBT/ICDT 2015 Joint
Conference (pp. 190–196).
Yan, L., Ma, R., Li, D., & Cheng, J. (2017). RDF approximate queries based on semantic similarity.
Computing, 99(5), 481–491.
Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., & Pan, Y. (2009). Efficient indices using graph parti-
tioning in RDF triple stores. In 2009 IEEE 25th International Conference on Data Engineering
(pp. 1263–1266).
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
Zeng, K., Yang, J. C., Wang, H. X., Shao, B., & Wang, Z. Y. (2013). A distributed graph engine for
web scale RDF data. Proceedings of the VLDB Endowment, 6(4), 265–276.
Zou, L., Mo, J., Chen, L., Özsu, M. T., & Zhao, D. (2011). gStore: Answering SPARQL queries
via subgraph matching. Proceedings of the VLDB Endowment, 4(8), 482–493.
Chapter 2
Fuzzy Sets and Fuzzy Database Modeling

2.1 Introduction

Information is often imprecise and uncertain in many real-world applications, and

many sources can contribute to the imprecision and uncertainty of data or information.
Therefore, it has been pointed out that we need to learn how to manage data that is
imprecise or uncertain (Dalvi & Suciu, 2007).
Unfortunately, the classical data models such as relational database, object-
oriented databases, and the RDF data model introduced in Chap. 1 often suffer
from their inability to represent and manipulate imprecise and uncertain data infor-
mation. Since the early 1980s, Zadeh’s fuzzy logic (Zadeh, 1965) has been intro-
duced into various database models to enhance the classical models such that uncer-
tain and imprecise information can be represented and manipulated. Over the past
40 years, a significant body of research in fuzzy database modeling has been devel-
oped and tremendous gain is hereby accomplished in this area. Various fuzzy database
models have been proposed, and some major issues related to these models have been
investigated. Among various fuzzy database models, many fruitful results have been
achieved in fuzzy relational database modeling (Chen, 1999; Galindo, 2008; Petry,
1996). Furthermore, to model complex objects with uncertainty, much work has
devoted to fuzzy object-oriented database models (de Caluwe, 1998; Ma, 2005a).
Fuzzy object-oriented database model is a fuzzy extension to the classical object-
oriented database model by introducing the related notions of fuzzy classes, gener-
alization/specialization, and fuzzy inheritance relationships (Yan et al., 2012). In
addition, with the wide use of XML as the de-facto standard of data representation
and exchange on the Web, fuzzy XML (Yan et al., 2014; Ma & Yan, 2016) have been
attracting more attention.
Recent years have witnessed many new application perspectives such as Big Data
and artificial intelligence (AI). As a result, some new data models are emerging
beyond the traditional data models. Clearly it is not enough for the existing fuzzy
data models and their extensions to represent necessary data semantics. It is essential

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 33

to invent some new fuzzy data models like semi-structured and graph data models.
Being one kind of special graph data model, RDF recommended by the W3C is
finding more and more uses in a wide range of semantic data management scenarios.
To represent and deal with fuzziness in RDF data, few efforts have proposed several
fuzzy RDF models. The elementary construct of RDF model is a triple with format
(subject, predicate, object), which encodes the binary relation predicate between
subject and object, representing a single knowledge fact. The most common fuzzy
RDF model contains the triples associated with membership degrees (Manolis &
Tzitzikas, 2011; Straccia, 2009). Here the fuzzy RDF triples represent the fuzziness
of a triple-level granularity and it is hard to exactly know the fuzziness of one triple’s
components. To tackle this, a kind of fuzzy RDF models are proposed in Ma et al.
(2018), in which the fuzziness of triples can appear in triple’s components. Based
on such a fuzzy RDF model with fine granularity of fuzziness, few recent efforts
investigate fuzzy RDF graph matching (Li et al., 2019a, 2019b, 2019c) and fuzzy
RDF graph storage (Fan et al., 2019, 2020; Ma et al., 2018).
In this chapter, we mainly introduce several fuzzy database models, including
fuzzy XML model, fuzzy relational and fuzzy object-oriented database models. These
models can be used for mapping to and from the fuzzy RDF models in order to
realize the fuzzy data management in many areas, such as database and Web-based
application domains. Before that, we briefly introduce some notions of fuzzy set
theory.

2.2 Imperfect Information and Fuzzy Sets

In the real-world applications, information is usually vague or ambiguous. Some

data are inherently fuzzy since their values are subjective in real applications (Ma &
Yan, 2008). Considering values representing the satisfaction degree for a film, clearly
different individuals may have different satisfaction degrees. Uncertainty extensively
exists in data and knowledge intensive applications, in which fuzzy information
processing plays a crucial role. Fuzzy set theory introduced by Zadeh (1965) is
applied to explain the concept of uncertainty in real life. A fuzzy set is defined
mathematically by assigning to each possible individual in the universe of discourse
a value, representing its grade of membership, which corresponds to the degree,
to which that individual is similar or compatible with the concept represented by
the fuzzy set. Currently, fuzzy sets have been extensively used to enhance various
database models for managing fuzzy data. Therefore, in this section, we briefly
introduce some notions of fuzzy sets and fuzzy graph theory.
2.2 Imperfect Information and Fuzzy Sets 35

2.2.1 Imperfect Information

There are different categories of data quality (or the lack thereof) to be handled. Some
efforts try to identify and distinguish different types and sources of imperfect infor-
mation. In Parsons (1996): imperfect information can be imprecise, vague, inconsis-
tent, incomplete and/or uncertain. Bosc and Prade (1993) identify five basic kinds of
imperfection: inconsistency, imprecision, vagueness, uncertainty, and ambiguity. In
the following, we give explanations to the meanings of imperfect information.
(a) Inconsistency stands for a kind of semantic conflict, which means that the same
aspect of a real-world entity is irreconcilably represented more than once in data
resource(s). For example, the height value of one person is recorded as several
values with different scales (say, 1.78 m, 178.40 cm and 5.85 ft).
(b) Imprecision means that we must make a choice from a given range of values
without knowing which one should be chosen. This range is basically repre-
sented by an interval or a set of values. For example, we do not know the exact
height value of one person but know that it must be one of several values (say,
1.77 m, 178 m and 179 m).
(c) Vagueness has a similar semantics with imprecision but is generally represented
with linguistic terms. For example, “between 20 and 30 years old” and “young”
for the attribute Age are imprecise and vague values, respectively.
(d) Incompleteness means the information for which some data are missing. We
completely have no idea, for example, how tall one person is. Generally, incom-
plete information can be described by null values (Cross, 1996; Cross & Firat,
2000).
(e) Uncertainty means that we apportion some (maybe not all) of our belief to a value
or a group of values, which is related to the degree of truth. For example, a possi-
bility degree of 95% is assigned to the height value (say 1.78 m) of one person.
Note that this paper concentrates on subjective uncertainty described with
possibility theory rather than stochastic uncertainty described with probability
theory.
(f) Ambiguity means that some elements of a model lack complete semantics,
which can lead to several possible interpretations. For example, a length value 3
without necessary semantics may be interpreted a time length, a distance length
and so on. If it is a time length, it may be interpreted 3 days, 3 h, 3 min, or 3 s.
In general, several different kinds of imperfect information can co-exist with
respect to the same piece of information. In addition, imprecise values generally
denote a set of values in the form of [ai1, ai2, …, aim] or [ai1, ai2] for the discrete
and continuous universe of discourse, respectively, meaning that exactly one of the
values is the true value for the single-valued attribute, or at least one of the values is
the true value for the multivalued attribute. So, imprecise information here has two
interpretations: disjunctive information and conjunctive information.
Null values, which are originally called incomplete information, have several
possible interpretations: (a) “existing but unknown”, (b) “nonexisting” or
36 2 Fuzzy Sets and Fuzzy Database Modeling

“inapplicable”, (c) “no information” and (d) “open null value” (Gottlob & Zicari,
1988), which means that the value may do not exist, exactly one unknown value, or
several unknown values. An imprecise value can be considered as a particular case
of the null value with the semantics of “existent but unknown” (i.e., an applicable
null value), where the range of values that an imprecise value takes is restricted to a
given set or interval of values while the range of values that an applicable null value
takes corresponds to the corresponding universe of discourse.
The notion of a partial value is illustrated as follows (Grant, 1979). A partial value
on a universe of discourse U corresponds to a finite set of possible values in which
exactly one of the values in the set is the true value, denoted by {a1 , a2 , …, am } for
discrete U or [a1 , an ] for continua U, in which {a1 , a2 , …, am } ⊆ U or [a1 , an ] ⊆ U.
Let η be a partial value, then sub (η) and sup (η) are used to represent the minimum
and maximum in the set.
Note that crisp data can also be viewed as special cases of partial values. A crisp
data on discrete universe of discourse can be represented in the form of {p}, and a
crisp data on continua universe of discourse can be represented in the form of [p, p].
Moreover, a partial value without containing any element is called an empty partial
value, denoted by ⊥. In fact, the symbol ⊥ means an inapplicable missing data (Codd,
1986, 1987). Null values, partial values, and crisp values are thus represented with a
uniform format.

2.2.2 Fuzzy Sets

The fuzzy set was originally presented by Zadeh (1965). Since then fuzzy set has
been infiltrating into almost all branches of pure and applied mathematics that are
set-theory-based. This has resulted in a vast number of real applications crossing
over a broad realm of domains and disciplines. Over the years, many of the existing
approaches dealing with imprecision and uncertainty are based on the theory of fuzzy
sets.
Let U be a universe of discourse. A fuzzy value on U is characterized by a fuzzy
set F in U. A membership function

μ F : U → [0, 1]

is defined for the fuzzy set F, where μF (u), for each u ∈ U, denotes the degree
of membership of u in the fuzzy set F. For example, μF (u) = 0.8 means that u is
“likely” to be an element of F by a degree of 0.8. For ease of representation, a fuzzy
set F over universe U is organized into a set of ordered pairs:

F = {μ F (u 1 )/u 1 , μ F (u 2 )/u 2 , . . . , μ F (u n )/u n }.

When the membership function μF (u) above is explained to be a measure of the

possibility that a variable X has the value u in this approach, where X takes values
2.2 Imperfect Information and Fuzzy Sets 37

in U, a fuzzy value is described by a possibility distribution π X (Zadeh, 1978).

π X = {π X (u 1 )/u 1 , π X (u 2 )/u 2 , . . . , π X (u n )/u n }

Here, π X (ui ), ui ∈ U, denotes the possibility that ui is true.

In addition, a fuzzy data is represented by similarity relations in domain elements
(Buckles & Petry, 1982), in which the fuzziness comes from the similarity relations
between two values in a universe of discourse, not from the status of an object itself.
Similarity relations are thus used to describe the degree similarity of two values from
the same universe of discourse.
A similarity relation Sim on the universe of discourse U is a mapping: U × U →
[0, 1] such that:
(i) for ∀ x ∈ U, Sim (x, x) = 1, (reflexivity);
(ii) for ∀ x, y ∈ U, Sim (x, y) = Simi (y, x), (symmetry); and
(iii) for ∀x, y, z ∈ U, Sim (x, z) ≥ maxy (min (Sim (x, y), Sim (y, z))), (transitivity).
Moreover, the following notions related to fuzzy sets can be defined.
Support: The set of the elements that have non-zero degrees of membership in F is
called the support of F, denoted by

supp(F) = {u|u ∈ U and μ F (u) > 0}.

Kernel: The set of the elements that completely belong to F is called the kernel of
F, denoted by

ker(F) = {u|u ∈ U and μ F (u) = 1}.

Cut: The set of the elements which degrees of membership in F are greater than
(greater than or equal to) α, where 0 ≤ α < 1 (0 < α ≤ 1), is called the strong (weak)
α-cut of F, respectively denoted by

Fα+ = {u|u ∈ U and μ F (u) > α} and

Fα = {u|u ∈ U and μ F (u) ≥ α}.

In addition, to manipulate fuzzy sets and possibility distributions, several common

set operations are defined. The usual set operations (such as union, intersection and
complementation) have been extended to deal with fuzzy sets (Zadeh, 1965).
Let A and B be fuzzy sets on the same universe of discourse U with the membership
functions μA and μB , respectively. Then we have
Union. The union of fuzzy sets A and B, denoted A ∪ B, is a fuzzy set on U with
the membership function μA ∪ B : U → [0, 1], where

∀u ∈ U, μ A∪B (u) = max(μ A (u), μ B (u)).

38 2 Fuzzy Sets and Fuzzy Database Modeling

Intersection. The intersection of fuzzy sets A and B, denoted A ∩ B, is a fuzzy set

on U with the membership function μA ∩ B : U → [0, 1], where

∀u ∈ U, μ A∩B (u) = min(μ A (u), μ B (u)).

Complementation. The complementation of fuzzy set Ā, denoted by Ā, is a fuzzy

set on U with the membership function μĀ : U → [0, 1], where

∀u ∈ U, μ A (u) = 1 − μ A (u).

Based on these definitions, the difference of the fuzzy sets B and A can be defined
as:

B − A = B ∩ A.

Also, most of the properties that hold for classical set operations, such as
DeMorgan’s Laws, have been shown to hold for fuzzy sets. The only law of ordinary
set theory that is no longer true is the law of the excluded middle, i.e.,

A ∩ A /= ∅ and A ∪ A /= U.

Let A, B and C be fuzzy sets in a universe of discourse U. Then the operations on

fuzzy sets satisfy the following conditions:
• Commutativity laws: A ∪ B = B ∪ A, A ∩ B = B ∩ A
• Associativity laws: (A ∪ B) ∪ C = A ∪ (B ∪ C), (A ∩ B) ∩ C = A ∩ (B ∩ C)
• Distribution laws: (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C), (A ∩ B) ∪ C = (A ∪ C) ∩
(B ∪ C)
• Absorption laws: (A ∪ B) ∩ A = A, (A ∩ B) ∪ A = A
• Idempotency laws: A ∪ A = A, A ∩ A = A
• de Morgan laws: A ∪ B = A ∩ B, A ∩ B = A ∪ B

Given two fuzzy sets A and B in U, B is a fuzzy subset of A, denoted by B ⊆ A, if

μ B (u) ≤ μ A (u) for all u ∈ U.

Two fuzzy sets A and B are said to be equal if A ⊆ B and B ⊆ A.

Let U = U 1 × U 2 × … × U n be the Cartesian product of n universes and A1 ,
A2 , …, An be fuzzy sets in U 1 , U 2 , …, U n , respectively. The Cartesian product A1 ×
A2 × … × An is defined to be a fuzzy subset of U 1 × U 2 × … × U n , where

μ A1×...×An (u 1 . . . u n ) = min(μ A1 (u 1 ), . . . , μ An (u n ))

and ui ∈ U i , i = 1, …, n.
2.2 Imperfect Information and Fuzzy Sets 39

2.2.3 Fuzzy Graph

A graph represents a particular relationship between elements of a set V. It gives

an idea about the extent of the relationship between any two elements of V. We
can solve this problem by using a weighted graph if proper weights are known. But
in most of the situations, the weights may not be known, and the relationships are
‘fuzzy’ in a natural sense. Hence, a fuzzy relation can deal with the situation in a
better way. As an example, if V represents certain locations and a network of roads is
to be constructed between elements of V, then the costs of construction of the links
are fuzzy. But the costs can be compared, to some extent using the terrain and local
factors and can be modeled as fuzzy relations. Thus, fuzzy graph models are more
helpful and realistic in natural situations.
Kaufman (1973) gave the first definition of a fuzzy graph. But it was Rosen-
feld (1975) and Yeh and Bang (1975b) who laid the foundations for fuzzy graph
theory. Rosenfeld introduced fuzzy analogs of several basic graph-theoretic concepts,
including subgraphs, paths, connectedness, cliques, bridges, cut vertices, forests, and
trees. Yeh and Bang (1975a) independently introduced many connectivity concepts
including vertex and edge connectivity in fuzzy graphs and applied fuzzy graphs for
the first time in clustering of data. In this section, we discuss fundamentals of fuzzy
graph theory (Sunitha, 2001) and we provide formal definitions, basic concepts, and
properties of fuzzy graphs.
Let V be a non-empty set. A fuzzy graph G = (σ, μ) is a pair of functions σ: S
→ [0, 1], μ: V 1 × V 2 → [0, 1] which satisfies ∀(u1 , u2 ) ∈ V 1 × V 2 , μ(u1 , u2 ) ≤
σ 1 (u1 ) ∧ σ 2 (u2 ), where ∧ denotes the minimum.
The fuzzy set σ is called the fuzzy vertex set of G and μ the fuzzy edge set of G.
Clearly μ is a fuzzy relation on σ. We denote the underlying (crisp) graph of G by
G* = (σ * , μ* ) where μ* is referred to as the (nonempty) set S of nodes and μ* = E
⊆ S 1 × S 2 . Note that the crisp graph (V, E) is a special case of a fuzzy graph with
each vertex and edge of (V, E) having degree of membership 1. We need not consider
loops and we assume that μ is reflexive. Also, the underlying set V is assumed to be
finite and μ can be chosen in any manner so as to satisfy the definition of a fuzzy
graph.

Example 2.1 Let V = {a, b, c}. Define the fuzzy set σ on V as σ (a) = 0.5, σ (b) =
1 and σ (c) = 0.8. Define a fuzzy set μ of E such that μ(ab) = 0.5, μ(bc) = 0.7 and
μ(ac) = 0.1. Then μ(x, y) ≤ σ (x) ∧ σ (y) for all x, y ∈ V. Thus, G = (σ, μ) is a fuzzy
graph. If we redefine μ(ab) = 0.6, then it is no longer a fuzzy graph.
Let G = (σ, μ) be a fuzzy graph. Then a fuzzy graph G' = (σ ', μ' ) is called a
partial fuzzy subgraph of G if σ ' ⊆ σ and μ' ⊆ μ. Similarly, the fuzzy graph G' =
(σ ', μ' ) is called a fuzzy subgraph of G induced by P if P ⊆ V, σ ' (u) = σ (u) for
every u ∈ P and μ' (e) = μ(e) for every e ∈ E. We write <P> to denote the fuzzy
subgraph induced by P.

Example 2.2 Let G1 = (σ, μ), where σ * = {a, b, c} and μ* = {ab, bc} with σ (a)
= 0.4, σ (b) = 0.8, σ (c) = 0.5, μ(ab) = 0.3 and μ(bc) = 0.2. Then clearly G1 is a
40 2 Fuzzy Sets and Fuzzy Database Modeling

partial fuzzy subgraph of the fuzzy graph in Example 2.1. Also, if P = {a, b} and H
= (σ, μ), where σ (a) = 0.5, σ (b) = 1 and μ(ab) = 0.5, then H is the induced fuzzy
subgraph of G in Example 2.1, induced by P.
Let G = (σ, μ) be a fuzzy graph. Then a partial fuzzy subgraph G' = (σ ', μ' ) of
G is said to span G if σ ' = σ and μ' ⊆ μ; that is if σ ' (u) = σ (u) for every u ∈ V and
μ' (e) ≤ μ(e) for every e ∈ E. In this case, we call G' = (σ ', μ' ) a spanning fuzzy
subgraph of G.

In fact a fuzzy subgraph G' = (σ ', μ' ) of a fuzzy graph G = (σ, μ) induced by a
subset P of V is a particular partial fuzzy subgraph of G. Take σ ' (u) = σ (x) for all u
∈ P and 0 for all u ∈/ P. Similarly, take μ' (u1 , u2 ) = μ(u1 , u2 ) if (u1 , u2 ) is in a set of
edges involving elements from P, and 0 otherwise.
Let G: (σ, μ) and G': (σ ', μ' ) be the fuzzy graphs with underlying sets V, homo-
morphism of fuzzy graphs (Holub & Melichar, 1998) h: G → G' is a map h: V →
V' which satisfies σ (x) ≤ σ ' (h(x)) ∀x ∈ V and μ(x, y) ≤ μ' (h(x), h(y)) ∀x, y ∈ V.
We mainly discuss the concepts of fuzzy path and fuzzy bridges in this subsection.
Most of the results are due to the works (Sunitha & Vijayakumar, 2005; Mathew et al.,
2018).
Let G: (σ, μ) be a fuzzy graph. If μ (x, y) > 0, then x and y are called neighbors.
Then x and y lie on the edge e = (x, y). A path p in a fuzzy graph G: (σ, μ) is a
sequence of distinct vertices v0 , v1 , v2 , …, vn such that μ(vi-1 , vi ) > 0, 1 ≤ i ≤ n. Here
‘n’ is called the length of the path. The consecutive pairs (vi-1 , vi ) are called arcs of
the path. The diameter of x, y ∈ V, written diam(x, y), is the length of the longest
path joining x to y. The strength of P is defined to be ∧i=1 n
μ(xi−1 xi ). In words, the
strength of a path is defined to be the weight of the weakest edge. We denote the
strength of a path P by d(P). The strength of connectedness between two vertices x
and y is defined as the maximum of the strengths of all paths between x and y and
is denoted by μ∞ (x, y). A strongest path joining any two vertices x, y has strength
μ∞ (x, y). Two vertices that are joined by a path are called connected. It follows that
this notion of connectedness is an equivalence relation. The equivalence classes of
vertices under this equivalence relation are called connected components of the given
fuzzy graph. They are just its maximal connected partial fuzzy subgraphs.

2.3 Fuzzy Relational Database Models

In order to manage fuzzy data in the databases, fuzzy set theory has been exten-
sively applied to extend various database models and resulted in numerous contri-
butions, mainly with respect to the popular relational model or to some related form
of it. In general, several basic approaches can be classified: (i) one of fuzzy rela-
tional databases is based on possibility distribution (Chaudhry et al., 1999; Prade &
Testemale, 1984; Umano & Fukami, 1994); (ii) the other one is based on the use
of similarity relation (Buckles & Petry, 1982), proximity relation (De et al., 2001;
Shenoi & Melton, 1999), resemblance relation (Rundensteiner & Bic, 1992), or fuzzy
2.3 Fuzzy Relational Database Models 41

relation (Raju & Majumdar, 1988); (iii) another possible extension is to combine
possibility distribution and similarity (proximity or resemblance) relation (Chen
et al., 1992; Ma & Mili, 2002; Ma et al., 2000). Currently, some major questions
have been discussed and answered in the literature of the fuzzy relational databases,
including representations and models, semantic measures and data redundancies,
query and data processing, data dependencies and normalizations, implementation,
and etc. For a comprehensive review of what has been done in the development
of fuzzy relational databases, please refer to Chen (1999), Ma and Yan (2008), Ma
(2005b), Petry (1996), Yazici and George (1999). In this section, we briefly introduce
some basic notions of fuzzy relational databases based on possibility distributions.
A relation is a two-dimensional table and its rows and columns are called tuples
and attribute values, respectively. So, a relation is a set of tuples and a tuple consists
of attribute values. A relation has its relational schema, which is a set of attributes.
Each attribute corresponds to a range of values that this attribute can take and this
range is called the domain of the attribute.
Basically, a fuzzy relational database (FRDB) is based on the notions of fuzzy
relational schema, fuzzy relational instance, tuple, key, and constraints, which are
introduced briefly as follows:
• A fuzzy relational database consists of a set of fuzzy relational schemas and a set
of fuzzy relational instances (i.e., simply fuzzy relations).
• The set of fuzzy relational schemas specifies the structure of the data held in
a database. A fuzzy relational schema consists of a fixed set of attributes with
associated domains. The information of a domain is implied in forms of schemas,
attributes, keys, and referential integrity constraints.
• The set of fuzzy relations, which is considered to be an instance of the set of fuzzy
relation schemas, reflects the real state of a database. Formally, a fuzzy relation
is a two-dimensional array of rows and columns, where each column represents
an attribute and each row represents a tuple.
• Each tuple in a table denotes an individual in the real world identified uniquely
by primary key, and a foreign key is used to ensure the data integrity of a table. A
column (or columns) in a table that makes a row in the table distinguishable from
other rows in the same table is called the primary key. A column (or columns) in a
table that draws its values from a primary key column in another table is called the
foreign key. As is generally assumed in the literature, we assume that the primary
key attribute is always crisp and all fuzzy relations are in the third normal form.
• An integrity constraint in a schema is a predicate over relations expressing a
constraint; by far the most used integrity constraint is the referential integrity
constraint. A referential integrity constraint involves two sets of attributes S 1 and
S 2 in two relations R1 and R2 , such that one of the sets (say S 1 ) is a key for one of
the relations (called primary key). The other set is called a foreign key if R2 [S 2 ]
is a subset of R1 [S 1 ]. Referential integrity constraints are the glue that holds the
relations in a database together.
In summary, in a fuzzy relational database, the structure of the data is represented
by a set of fuzzy relational schemas, and data are stored in fuzzy relations (i.e.,
42 2 Fuzzy Sets and Fuzzy Database Modeling

tables). Each table contains rows (i.e., tuples) and columns (i.e., attributes). Each
tuple is identified uniquely by the primary key. The relationships among relations
are represented by the referential integrity constraints, i.e., foreign keys. Moreover,
here, two types of fuzziness are considered in fuzzy relational databases, one is the
fuzziness of attribute values (i.e., attributes may be fuzzy), which may be represented
by possibility distributions; another is the fuzziness of a tuple being a member of
the corresponding relation, which is represented by a membership degree associated
with the tuple.
Formally, a fuzzy relational database FRDB = <FS, FR> consists of a set of fuzzy
relational schemas FS and a set of fuzzy relations FR, where:
• Each fuzzy relational schema FS can be represented formally as FR (Al /D1 , A2 /D2 ,
…, An /Dn , μFR /DFR ), which denotes that a fuzzy relation FR has attributes Al , A2 ,
…, An and μFR with associated data types D1 , D2 , …, Dn and DFR . Here, μFR is
an additional attribute for representing the membership degree of a tuple to the
fuzzy relation.
• Each fuzzy relation FR on a fuzzy relational schema FR (Al /D1 , A2 /D2 , …, An /Dn ,
μFR /DFR ) is a subset of the Cartesian product of Dom (A1 ) × Dom (A2 ) ×…×
Dom (An ) × Dom (μFR ), where Dom (Ai ) may be a fuzzy subset or even a set
of fuzzy subset and Dom (μFR ) ∈ (0, 1]. Here, Dom (Ai ) denotes the domain of
attribute Ai , and each element of the domain satisfies the constraint of the datatype
Di . Formally, each tuple in FR has the form t = <πA1 , πA2 , …, πAi , …, πAn , μFR >,
where the value of an attribute Ai may be represented by a possibility distribution
πAi , and μFR ∈ (0, 1].
Moreover, a resemblance relation Res on Dom (Ai ) is a mapping: Dom (Ai ) ×
Dom (Ai ) → [0, 1] such that (i) for all x in Dom (Ai ), Res (x, x) = 1 (reflexivity) (ii)
for all x, y in Dom (Ai ), Res (x, y) = Res (y, x) (symmetry).
To provide the intuition on the fuzzy relational database we show an example. The
following gives a fuzzy relational database modeling parts of the reality at a company,
including fuzzy relational schemas in Table 2.1 and fuzzy relations in Table 2.2. The
detailed introduction is as follows:
• The attribute underlined stands for primary key PK. The foreign key (FK) is
followed by the parenthesized relation called referenced relation. A relation can
have several candidate keys from which one primary key, denoted PK, is chosen.
• An ‘f ’ next to an attribute means that the attribute is fuzzy.
• In Table 2.1, there are the inheritance relationships Chief-Leader “is-a” Leader
and Young-Employee “is-a” Employee. There is a 1-many relationship between
Department and Young-Employee. The relation Supervise is a relationship rela-
tion, and there is many-many relationship between Chief-Leader and Young-
Employee.
• Note that, a relation is different from a relationship. A relation is essentially a
table, and a relationship is a way to correlate, join, or associate the two tables.
2.4 Fuzzy Object-Oriented Database Models 43

Table 2.1 The fuzzy relational schemas of a fuzzy relational database

Relation name Attribute and datatype Foreign key and referenced relation
Leader leaID (String), lNumber (String), no
μFR (Real)
Employee empID (String), eNumber (String), no
μFR (Real)
Chief-Leader leaID (String), clName (String), leaID (Leader (leaID))
f_clAge (Integer), μFR (Real)
Young-Employee empID (String), yeName (String), empID (Employee (empID))
f_yeAge (Integer), f_yeSalary dept_ID (Department (depID))
(Integer), dep_ID (String), μFR
(Real)
Supervise supID (String), lea_ID (String), lea_ID (Chief-Leader (leaID))
emp_ID (String), μFR (Real) emp_ID (Young-Employee (empID))
Department depID (String), dName (String), no
μFR (Real)

2.4 Fuzzy Object-Oriented Database Models

In some real-world applications (e.g., CAD/CAM, multimedia and GIS), they charac-
teristically require the modeling and manipulation of complex objects and semantic
relationships. It has been proved that the object-oriented paradigm lends itself
extremely well to the requirements. Since classical relational database model and
its extension of fuzziness do not satisfy the need of modeling complex objects with
imprecision and uncertainty, currently many researches have been concentrated on
fuzzy object-oriented database models in order to deal with complex objects and
uncertain data together. Zicari and Milano (1990) first introduced incomplete infor-
mation, namely, null values, where incomplete schema and incomplete objects can
be distinguished. From then on, the incorporation of imprecise and uncertain infor-
mation in object-oriented databases has increasingly received attention. A fuzzy
object-oriented database model was defined in Bordogna and Pasi (2001) based on
the extension of a graphs-based object model. Based on similarity relationship, uncer-
tainty management issues in the object-oriented database model were discussed in
George et al. (1996). Based on possibility theory, vagueness and uncertainty were
represented in class hierarchies in Dubois et al. (1991). In more detail, also based
on possibility distribution theory, Ma et al. (2004) introduced fuzzy object-oriented
database models, some major notions such as objects, classes, objects-classes rela-
tionships and subclass/superclass relationships were extended under fuzzy infor-
mation environment. Moreover, other fuzzy extensions of object-oriented databases
were developed. In Marín et al. (2000, 2001), fuzzy types were added into fuzzy
object-oriented databases to manage vague structures. The fuzzy relationships and
fuzzy behavior in fuzzy object-oriented database models were discussed in Cross
(2001), Gyseghem and Caluwe (1995). Several intelligent fuzzy object-oriented
database architectures were proposed in Koyuncu and Yazici (2003), Ndouse (1997),
44 2 Fuzzy Sets and Fuzzy Database Modeling

Table 2.2 The fuzzy relations of a fuzzy relational database

Leader
leaID lNumber μFR
L001 001 0.7
L002 002 0.9
L003 003 0.8
Employee
empID eNumber μFR
E001 001 0.8
E002 002 0.9
Chief-Leader
leaID clName f_clAge μFR
L001 Chris {35/0.8, 39/0.9} 0.65
L003 Billy 37 0.7
Young-Employee
empID yeName f_yeAge f_yeSalary dep_ID μFR
E001 John {24/0.7, 25/0.9} {2000/0.3, 3000/0.4} D001 0.75
E002 Mary 23 {4000/0.5, 4500/0.7, 5000/1.0} D003 0.85
Department
depID dName μFR
D001 HR 0.8
D002 Finance 0.9
D003 Sales 0.7
Supervise
supID lea_ID emp_ID μFR
S001 L001 E001 0.78
S002 L001 E002 0.8
S002 L003 E002 0.9

Ozgur et al. (2009). The other efforts on how to model fuzziness and uncertainty in
object-oriented database models were done in Lee et al. (1999), Majumdar et al.
(2002), Umano et al. (1998). The fuzzy and probabilistic object bases (Cao &
Rossiter, 2003; Nam et al., 2007), fuzzy deductive object-oriented databases (Yazici
and Koyuncu, 1997), and fuzzy object-relational databases (Cubero et al., 2004)
were also developed. In addition, an object-oriented database modeling technique
was proposed based on the level-2 fuzzy sets in de Tré and de Caluwe (2003), where
the authors also discussed how the object Data Management Group (ODMG) data
model can be generalized to handle fuzzy data in a more advantageous way. Also,
the other efforts have been paid on the establishment of consistent framework for a
fuzzy object-oriented database model based on the standard for the ODMG object
2.4 Fuzzy Object-Oriented Database Models 45

data model (Cross et al., 1997). More recently, how to manage fuzziness on conven-
tional object-oriented platforms was introduced in Berzal et al. (2007). Yan and Ma
(2012) proposed the approach for the comparison of entity with fuzzy data types in
fuzzy object-oriented databases. Yan et al. (2012) investigated the algebraic opera-
tions in fuzzy object-oriented databases, and discussed fuzzy querying strategies and
gave the form of SQL-like fuzzy querying for the fuzzy object-oriented databases.
In the section, the basic notions of fuzzy object-oriented database (FOODB) models,
including fuzzy object, fuzzy class, fuzzy inheritance, and algebraic operations are
introduced.

2.4.1 Fuzzy Objects

Objects model real-world entities or abstract concepts. Objects have properties that
may be attributes of the object itself or relationships also known as associations
between the object and one or more other objects. An object is fuzzy because of a
lack of information. For example, an object representing a part in preliminary design
for certain will also be made of stainless steel, moulded steel, or alloy steel (each
of them may be connected with a possibility, say, 0.7, 0.5 and 0.9, respectively).
Formally, objects that have at least one attribute whose value is a fuzzy set are fuzzy
objects.

2.4.2 Fuzzy Classes

The fuzzy classes in fuzzy object-oriented databases are similar to the notion of the
fuzzy classes in fuzzy UML data models as introduced in Sect. 2.3.
The objects having the same properties are gathered into classes that are organized
into hierarchies. Theoretically, a class can be considered from two different view-
points (Dubois et al., 1991): (a) an extensional class, where the class is defined by
the list of its object instances, and (b) an intensional class, where the class is defined
by a set of attributes and their admissible values. In addition, a subclass defined from
its superclass by means of inheritance mechanism in the object-oriented database
(OODB) can be seen as the special case of (b) above.
Therefore, a class is fuzzy because of the following several reasons. First, some
objects are fuzzy ones, which have similar properties. A class defined by these objects
may be fuzzy. These objects belong to the class with membership degree of [0, 1].
Second, when a class is intensionally defined, the domain of an attribute may be
fuzzy and a fuzzy class is formed. For example, a class Old equipment is a fuzzy
one because the domain of its attribute Using period is a set of fuzzy values such as
long, very long, and about 20 years. Third, the subclass produced by a fuzzy class
by means of specialization and the superclass produced by some classes (in which
there is at least one class who is fuzzy) by means of generalization are also fuzzy.
46 2 Fuzzy Sets and Fuzzy Database Modeling

The main difference between fuzzy classes and crisp classes is that the boundaries
of fuzzy classes are imprecise. The imprecision in the class boundaries is caused by
the imprecision of the values in the attribute domain. In the FOODB, classes are fuzzy
because their attribute domains are fuzzy. The issue that an object fuzzily belongs
to a class occurs since a class or an object is fuzzy. Similarly, a class is a subclass of
another class with membership degree of [0, 1] because of the class fuzziness. In the
OODB, the above-mentioned relationships are certain. Therefore, the evaluations of
fuzzy object-class relationships and fuzzy inheritance hierarchies are the cores of
information modeling in the FOODB.

2.4.3 Fuzzy Object-Class Relationships

In the FOODB, the following four situations can be distinguished for object-class
relationships.
(a) Crisp class and crisp object. This situation is the same as the OODB, where the
object belongs or not to the class certainly. For example, the objects Car and
Computer are for a class Vehicle, respectively.
(b) Crisp class and fuzzy object. Although the class is precisely defined and has the
precise boundary, an object is fuzzy since its attribute value(s) may be fuzzy. In
this situation, the object may be related to the class with the special degree in [0,
1]. For example, the object which position attribute may be graduate, research
assistant, or research assistant professor, is for the class Faculty.
(c) Fuzzy class and crisp object. Being the same as the case in (b), the object may
belong to the class with the membership degree in [0, 1]. For example, a Ph.D.
student is for Young student class.
(d) Fuzzy class and fuzzy object. In this situation, the object also belongs to the
class with the membership degree in [0, 1].
The object-class relationships in (b), (c) and (d) above are called fuzzy object-
class relationships. In fact, the situation in (a) can be seen the special case of fuzzy
object-class relationships, where the membership degree of the object to the class is
one. It is clear that estimating the membership of an object to the class is crucial for
fuzzy object-class relationship when classes are instantiated.
In the OODB, determining if an object belongs to a class depends on if its attribute
values are respectively included in the corresponding attribute domains of the class.
Similarly, in order to calculate the membership degree of an object to the class
in a fuzzy object-class relationship, it is necessary to evaluate the degrees that the
attribute domains of the class include the attribute values of the object. However, it
should be noted that in a fuzzy object-class relationship, only the inclusion degree
of object values with respect to the class domains is not accurate for the evaluation
of membership degree of an object to the class. The attributes play different role
in the definition and identification of a class. Some may be dominant and some
not. Therefore, a weight w is assigned to each attribute of the class according to its
2.4 Fuzzy Object-Oriented Database Models 47

importance by designer. Then the membership degree of an object to the class in

a fuzzy object-class relationship should be calculated using the inclusion degree of
object values with respect to the class domains, and the weight of attributes.
Let C be a class with attributes {A1 , A2 , …, An }, o be an object on attribute set
{A1 , A2 , …, An }, and o (Ai ) denote the attribute value of o on Ai (1 ≤ i ≤ n). In
C, each attribute Ai is connected with a domain denoted dom (Ai ). The inclusion
degree of o (Ai ) with respect to dom (Ai ) is denoted ID (dom (Ai ), o (Ai )). In the
following, we investigate the evaluation of ID (dom (Ai ), o (Ai )). As we know, dom
(Ai ) is a set of crisp values in the OODB and may be a set of fuzzy subsets in fuzzy
databases. Therefore, in a uniform OODB for crisp and fuzzy information modeling,
dom (Ai ) should be the union of these two components, dom (Ai ) = cdom (Ai ) ∪
fdom (Ai ), where cdom (Ai ) and fdom (Ai ) respectively denote the sets of crisp values
and fuzzy subsets. On the other hand, o (Ai ) may be a crisp value or a fuzzy value.
The following cases can be identified for evaluating ID (dom (Ai ), o (Ai )).
Case 1: o (Ai ) is a fuzzy value. Let fdom (Ai ) = {f 1 , f 2 , …, f m }, where f j (1 ≤ j ≤
m) is a fuzzy value, and cdom (Ai ) = {c1 , c2 , …, ck }, where cl (1 ≤ l ≤ k)
is a crisp value. Then

ID(dom(Ai ), o(Ai )) = max(ID(cdom(Ai ), o(Ai )), ID( f dom(Ai ), o

(
(Ai ))) = max SID({1.0/c1 , 1.0/c2 , . . . , 1.0/ck }, o(Ai )), max j (SID
( f i , o(Ai )))),

where SID (x, y) is used to calculate the degree that fuzzy value x includes
fuzzy value y.
Case 2: o (Ai ) is a crisp value. Then

ID(dom(Ai ), o(Ai )) = 1 if o(Ai ) ∈ cdom(Ai ) else

ID(dom(Ai ), o(Ai )) = ID( f dom(Ai ), {1.0/o(Ai )}).

Consider a fuzzy class Young students with attributes Age and Height, and two
objects o1 and o2 . Assume cdom (Age) = {5 − 20}, fdom (Age) = {{1.0/20, 1.0/21,
0.7/22, 0.5/23}, {0.4/22, 0.6/23, 0.8/24, 1.0/25, 0.9/26, 0.8/27, 0.6/28}, {0.6/27,
0.8/28, 0.9/29, 1.0/30, 0.9/31, 0.6/32, 0.4/33, 0.2/34}}, and dom (Height) = cdom
(Height) = [60, 210]. Let o1 (Age) = 15, o2 (Age) = {0.6/25, 0.8/26, 1.0/27, 0.9/28,
0.7/29, 0.5/30, 0.3/31}, and o2 (Height) = 182. According to the definition above,
we have

ID(dom( Age), o1 ( Age)) = 1,

ID(dom(H eight), o2 (H eight)) = 1,
ID(cdom( Age), o2 ( Age)) = SID({1.0/5, 1.0/6, . . . , 1.0/19, 1.0/20},
o2 ( Age)) = 0, and
48 2 Fuzzy Sets and Fuzzy Database Modeling

ID( f dom(Age), o2 (Age)) = max(SID({1.0/20, 1.0/21, 0.7/22,

0.5/23}, o2 ( Age)), SID{0.4/22, 0.6/23, 0.8/24, 1.0/25, 0.9/26,
0.8/27, 0.6/28}, O2 ( Age)), SID({0.6/27, 0.8/28, 0.9/29, 1.0/30,
0.9/31, 0.6/32, 0.4/33, 0.2/34}, o2 ( Age))) = max(0, 0.58, 0.60) =
0.60.

Therefore,

ID(dom(Age), o2 (Age)) = max(ID(cdom( Age), o2 (Age)), ID

( f dom(Age), o2 (Age))) = 0.60.

Now, we define the formula to calculate the membership degree of the object o to
the class C as follows, where w (Ai (C)) denotes the weight of attribute Ai to class
C.
Σn
I D(dom(Ai ), o(Ai )) × w(Ai (C))
μC (o) = i=1 Σn
i=1 w(Ai (C))

Consider the fuzzy class Young students and object o2 above. Assume w (Age
(Young students)) = 0.9 and w (Height (Young students)) = 0.2. Then

μYoung students (o2 ) = (0.9 × 0.6 + 0.2 × 1.0)/(0.9 + 0.2) = 0.67.

In the above-given determination that an object belongs to a class fuzzily, it is

assumed that the object and the class have the same attributes, namely, class C is
with attributes {A1 , A2 , …, An } and object o is on {A1 , A2 , …, An } also. Such an
object-class relationship is called direct object-class relationship. As we know, there
exist subclass/superclass relationships in the OODB, where subclass inherits some
attributes and methods of the superclass, overrides some attributes and methods of
the superclass, and define some new attributes and methods. Any object belonging to
the subclass must belong to the superclass since a subclass is the specialization of the
superclass. So we have one kind of special object-class relationship: the relationship
between superclass and the object of subclass. Such an object-class relationship is
called indirect object-class relationship. Since the object and the class in indirect
object-class relationship have different attributes, in the following, we present how
to calculate the membership degree of an object to the class in an indirect object-class
relationship.
Let C be a class with attributes {A1 , A2 , …, Ak , Ak+1 , …, Am } and o be an object
on attributes {A1 , A2 , …, Ak , A' k+1 , …, A' m , Am+1 , …, An }. Here attributes A' k+1 ,
…, and A' m are overridden from Ak+1 , …, and Am and attributes Am+1 , …, and An
are special. Then we have
2.4 Fuzzy Object-Oriented Database Models 49

μC (o)
Σk Σm '
i=1 I D(dom(Ai ),o(Ai )) × w(Ai (C)) + j=k+1 I D(dom(Aj ),o(Aj )) × w(Aj (C))
= Σm .
i=1 w(Ai (C))

Based on the direct object-class relationship and the indirect object-class rela-
tionship, now we focus on arbitrary object-class relationship. Let C be a class with
attributes {A1 , A2 , …, Ak , Ak+1 , …, Am , Am+1 , …, An } and o be an object on attributes
{A1 , A2 , …, Ak , A' k+1 , …, A' m , Bm+1 , …, Bp }. Here attributes A' k+1 , …, and A' m are
overridden from Ak+1 , …, and Am , or Ak+1 , …, and Am are overridden from A' k+1 ,
…, and A' m . Attributes Am+1 , …, and An and Bm+1 , …, Bp are special in {A1 , A2 ,
…, Ak , Ak+1 , …, Am , Am+1 , …, An } and {A1 , A2 , …, Ak , A' k+1 , …, A' m , Bm+1 , …,
Bp }, respectively. Then we have

μC (o)
Σk Σm '
i=1 I D(dom(Ai ),o(Ai )) × w(Ai (C)) + j=k+1 I D(dom(Aj ),o(Aj )) × w(Aj (C))
= Σn
i=1 w(Ai (C))

Since an object may belong to a class with membership degree in [0, 1] in fuzzy
object-class relationship, it is possible that an object that is in a direct object-class
relationship and an indirect object-class relationship simultaneously belongs to the
subclass and superclass with different membership degrees. This situation occurs in
fuzzy inheritance hierarchies, which will be investigated in next section. Also for two
classes that do not have subclass/superclass relationship, it is possible that an object
may belong to these two classes with different membership degrees simultaneously.
This situation only arises in fuzzy object-oriented databases. In the OODB, an object
may or may not belong to a given class definitely. If it belongs to a given class, it can
only belong to it uniquely (except for the case of subclass/superclass).
The situation where an object belongs to different classes with different member-
ship degrees simultaneously in fuzzy object-class relationships is called multiple
membership of object in this paper. Now let us focus on how to handle the multiple
membership of object in fuzzy object-class relationships. Let C 1 and C 2 be (fuzzy)
classes and α be a given threshold. Assume there exists an object o. If μC1 (o) ≥ α
and μC2 (o) ≥ α, the conflict of the multiple membership of object occurs, namely,
o belongs to multiple classes simultaneously. At this moment, which one in C 1 and
C 2 is the class of object o dependents on the following cases.
Case 1: There exists a direct object-class relationship between object o and one
class in C 1 and C 2 .
Then the class in the direct object-class relationship is the class of
object o.
Case 2: There is no direct object-class relationship but only an indirect object-class
relationship between object o and one class in C 1 and C 2 , say C 1 . And
there exists such subclass C 1 ' of C 1 that object o and C 1 ' are in a direct
object-class relationship.
Then class C 1 ' is the class of object o.
50 2 Fuzzy Sets and Fuzzy Database Modeling

Case 3: There is neither direct object-class relationship nor indirect object-class

relationship between object o and classes C 1 and C 2 . Or there exists only
an indirect object-class relationship between object o and one class in C 1
and C 2 , say C 1 , but there is not such subclass C 1 ' of C 1 that object o and
C 1 ' are in a direct object-class relationship.
Then class C 1 is considered as the class of object o if μC1 (o) > μC2 (o),
else class C 2 is considered as the class of object o.
It can be seen that in Case 1 and Case 2, the class in direct object-class relationship
is always the class of object o and the object and the class have the same attributes.
In Case 3, however, object o and the class that is considered as the class of object o,
say C 1 , have different attributes. It should be pointed out that class C 1 and object o
are definitely defined, respectively, viewed from their structures. For the situation in
Case 3, the attributes of C 1 do not affect the attributes of o and the attributes of o do
not affect the attributes of C 1 also. There should be a class C and C and o are in direct
object-class relationship. But class C is not available so far. That C 1 is considered
as the class of object o, compared with C 2 , only means that C 1 is more similar to C
than C 2 . Class C is the class of object o once C is available.
Consider three fuzzy classes C 1 with {A, B}, C 2 with {A, B, D}, and C 3 with {A,
F}. There exists a fuzzy object o on {A, B', E}. Here, B' is overridden from B and D
/= E /= F. According to the definitions above, we have

I D(dom(A), o( A)) × w(A(C1 )) + I D(dom(B), o(B ' )) × w(B(C1 ))

μC1 (o) = ,
w(A(C1 )) + w(B(C1 ))
I D(dom(A), o( A)) × w(A(C2 )) + I D(dom(B), o(B ' )) × w(B(C2 ))
μC2 (o) = ,
w(A(C2 )) + w(B(C2 )) + w(D(C2 ))
I D(dom(A), o( A)) × w(A(C3 ))
μC3 (o) = .
w(A(C3 )) + w(F(C3 ))

Assume
w (A (C 1 )) = w (A (C 2 )) = w (A (C 3 )),
w (B (C 1 )) = w (B (C 2 )), and
w (B (C 2 )) + w (D (C 2 )) = w (F (C 3 )).
Also assume μC1 (o) ≥ α, μC2 (o) ≥ α, and μC3 (o) ≥ α, where α is a given
threshold. Then object o belongs to classes C 1 , C 2 and C 3 simultaneously. The
conflict of the multiple membership of object occurs. It can be seen that the relation-
ship between o and C 1 is an indirect object-class relationship. But the relationship
between o and C 2 , which is the subclass of class C 1 , is not a direct object-class
relationship. So class C 2 is not the class of object o. It can also be seen that μC1 (o)
≥ μC2 (o) ≥ μC3 (o). So C 1 is considered as the class of object o. But in fact, there
should be a new class C with {A, B', E}, which is the class in the direct object-class
relationship of o and C. That μC1 (o) ≥ μC2 (o) ≥ μC3 (o) only means that C 1 with
{A, B} is more similar to C with {A, B', E} than C 2 with {A, B, E} and C 3 with {A,
2.4 Fuzzy Object-Oriented Database Models 51

F}. When class C is not available right now, class C 1 is considered as the class of
object o.

2.4.4 Fuzzy Inheritance Hierarchies

In the OODB, a new class, called subclass, is produced from another class, called
superclass by means of inheriting some attributes and methods of the superclass,
overriding some attributes and methods of the superclass, and defining some new
attributes and methods. Since a subclass is the specialization of the superclass, any
one object belonging to the subclass must belong to the superclass. This characteristic
can be used to determine if two classes have subclass/superclass relationship.
In the FOODB, however, classes may be fuzzy. A class produced from a fuzzy
class must be fuzzy. If the former is still called subclass and the later superclass,
the subclass/superclass relationship is fuzzy. In other words, a class is a subclass of
another class with membership degree of [0, 1] at this moment. Correspondingly, the
method used in the OODB for determination of subclass/superclass relationship is
modified as
(a) for any (fuzzy) object, if the member degree that it belongs to the subclass is
less than or equal to the member degree that it belongs to the superclass, and
(b) the member degree that it belongs to the subclass is great than or equal to the
given threshold.
The subclass is then a subclass of the superclass with the membership degree,
which is the minimum in the membership degrees to which these objects belong to
the subclass.
Let C 1 and C 2 be (fuzzy) classes and β be a given threshold. We say C 2 is a
subclass of C 1 if

(∀o)(β ≤ μC2 (o) ≤ μC1 (o)).

The membership degree that C 2 is a subclass of C 1 should be minμC2(o)≥β (μC2

(o)).
It can be seen that by utilizing the inclusion degree of objects to the class, we can
assess fuzzy subclass/superclass relationships in the FOODB. It is clear that such
assessment is indirect. If there is no any object available, this method is not used. In
fact, the idea used in evaluating the membership degree of an object to a class can
be used to determine the relationships between fuzzy subclass and superclass. We
can calculate the inclusion degree of a (fuzzy) subclass with respect to the (fuzzy)
superclass according to the inclusion degree of the attribute domains of the subclass
with respect to the attribute domains of the superclass as well as the weight of
attributes. In the following, we give the method for evaluating the inclusion degree
of fuzzy attribute domains.
52 2 Fuzzy Sets and Fuzzy Database Modeling

Let C 1 and C 2 be (fuzzy) classes with attributes {A1 , A2 , …, Ak , Ak+1 , …, Am }

and {A1 , A2 , …, Ak , A' k+1 , …, A' m , Am+1 , …, An }, respectively. It can be seen that
in C 2 , attributes A1 , A2 , …, and Ak are directly inherited from A1 , A2 , …, and Ak
in C 1 , attributes A' k+1 , …, and A' m are overridden from Ak+1 , …, and Am in C 1 , and
attributes Am+1 , …, and An are special. For each attribute in C 1 or C 2 , say Ai , there is
a domain, denoted dom (Ai ). As shown above, dom (Ai ) should be dom (Ai ) = cdom
(Ai ) ∪ fdom (Ai ), where cdom (Ai ) and fdom (Ai ) denote the sets of crisp values and
fuzzy subsets, respectively. Let Ai and Aj be attributes of C 1 and C 2 , respectively.
The inclusion degree of dom (Aj ) with respect to dom (Ai ) is denoted by ID (dom
(Ai ), dom (Aj )). Then we identify the following cases and investigate the evaluation
of ID (dom (Ai ), dom (Aj )):
(a) when i /= j and 1 ≤ i, j ≤ k, ID (dom (Ai ), dom (Aj )) = 0,
(b) when i = j and 1 ≤ i, j ≤ k, ID (dom (Ai ), dom (Aj )) = 1, and
(c) when i = j and k + 1 ≤ i, j ≤ m, ID (dom (Ai ), dom (Aj )) = ID (dom (Ai ), dom
(A' i )) = max (ID (dom (Ai ), cdom (A' i )), ID (dom (Ai ), fdom (A' i ))).
Now we respectively define ID (dom (Ai ), cdom (A' i )) and ID (dom (Ai ), fdom
(A i )). Let fdom (A' i ) = {f 1 , f 2 , …, f m }, where f j (1 ≤ j ≤ m) is a fuzzy value, and
'

cdom (A' i ) = {c1 , c2 , …, ck }, where cl (1 ≤ l ≤ k) is a crisp value. We can consider

{c1 , c2 , …, ck } as a special fuzzy value {1.0/c1 , 1.0/c2 , …, 1.0/ck }. Then we have
the following:
( ( ))
ID dom(Ai ), cdom A'i = ID(dom(Ai ), {1.0/c1 , 1.0/c2 , . . . , 1.0/ck }).
( ( )) ( ( ))
ID dom(Ai ), f dom A'i = max j ID dom(Ai ), f j .

Based on the inclusion degree of attribute domains of the subclass with respect
to the attribute domains of its superclass as well as the weight of attributes, we can
define the formula to calculate the degree to which a fuzzy class is a subclass of
another fuzzy class. Let C 1 and C 2 be (fuzzy) classes with attributes {A1 , A2 , …,
Ak , Ak+1 , …, Am } and {A1 , A2 , …, Ak , A' k+1 , …, A' m , Am+1 , …, An }, respectively,
and w (A) denote the weight of attribute A. Then the degree that C 2 is the subclass
of C 1 , written μ (C 1 , C 2 ), is defined as follows.
Σm
I D(dom(Ai (C1 )), dom(Ai (C2 ))) × w(Ai )
μ(C1 ,C2 ) = i=1
Σm
i=1 w(Ai )

In subclass-superclass hierarchies, a critical issue is multiple inheritance of class.

Ambiguity arises when more than one of the superclasses have common attributes
and the subclass does not declare explicitly the class from which the attribute was
inherited.
Let class C be a subclass of classes C 1 and C 2 . Assume that the attribute Ai in
C 1 , denoted by Ai (C 1 ), is common to the attribute Ai in C 2 , denoted by Ai (C 2 ). If
dom (Ai (C 1 )) and dom (Ai (C 2 )) are identical, there does not exist a conflict in the
multiple inheritance hierarchy and C inherits attribute Ai directly. If dom (Ai (C 1 ))
2.5 Fuzzy XML Model 53

and dom (Ai (C 2 )) are not identical, however, the conflict occurs. At this moment,
which one in Ai (C 1 )) and Ai (C 2 ) is inherited by C dependents on the following
rule:

I f I D(dom(Ai (C1 )), dom(Ai (C2 ))) × w(Ai (C1 )) > I D(dom(Ai (C2 )), dom(Ai
(C1 ))) × w(Ai (C2 )), then Ai (C1 ) is inherited by C, else Ai (C2 ) is inherited by C.

Note that in fuzzy multiple inheritance hierarchy, the subclass has different degrees
with respect to different superclasses, not being the same as the situation in classical
object-oriented database systems.

2.5 Fuzzy XML Model

With the wide utilization of the Web and the availability of huge amounts of elec-
tronic data, information representation and exchange over the Web becomes impor-
tant, and eXtensible Markup Language (XML) has been the de facto standard (Bray
et al., 2000). XML and related standards are technologies that allow the easy develop-
ment of applications that exchange data over the Web such as e-commerce (EC) and
supply chain management (SCM). Unfortunately, although it is the current standard
for data representation and exchange over the Web, XML is not able to represent
and process imprecise and uncertain data. In fact, the fuzziness in EC and SCM has
received considerable attentions and fuzzy set theory has been used to implement
web-based business intelligence. Therefore, topics related to the modeling of fuzzy
data can be considered very interesting in the XML data context. Regarding modeling
fuzzy information in XML, Turowski and Weng (2002) extended XML DTDs with
fuzzy information to satisfy the need of information exchange. Lee and Fanjiang
(2003) studied how to model imprecise requirements with XML DTDs and devel-
oped a fuzzy object-oriented modeling technique schema based on XML. Ma and Yan
(2007) and Ma (2005a, 2005b) proposed a fuzzy XML model for representing fuzzy
information in XML documents. Tseng et al. (2005) presented an XML method to
represent fuzzy systems for facilitating collaborations in fuzzy applications. More-
over, aimed at modeling fuzzy information in XML Schemas, Gaurav and Alhajj
(2006) incorporated fuzziness in an XML document extending the XML Schema
associated to the document and mapped fuzzy relational data into fuzzy XML. In
detail, Oliboni and Pozzani (2008) proposed an XML Schema definition for repre-
senting different aspects of fuzzy information. Kianmehr et al. (2010) described a
fuzzy XML schema model for representing a fuzzy relational database. In addition,
XML with incomplete information (Abiteboul et al., 2006) and probabilistic data in
XML (Nierman & Jagadish, 2002; Senellart & Abiteboul, 2007) were presented in
research papers.
54 2 Fuzzy Sets and Fuzzy Database Modeling

2.5.1 Fuzziness in XML Documents

The fuzziness in an XML document is similar with the fuzziness in a relational

database. There may be two kinds of fuzziness occur in a fuzzy XML document: one
is the fuzziness in elements, the other is the fuzziness in attribute values of elements.
1. the fuzziness in elements: this kind of fuzziness use membership degrees associ-
ated with elements. The membership degree associated with an element repre-
sents the possibility of this element (including itself and the sub-elements rooted
at it) belonging to its parent element.
Now let us interpret what a membership degree associated with an element means,
given that the element can nest under other elements, and more than one of these
elements may have an associated membership degree. The existential membership
degree associated with an element should be the possibility that the state of the
world includes this element and the sub-tree rooted at it. For an element with the
sub-tree rooted at it, each node in the sub-tree is not treated as independent but
dependent upon its root-to-node chain. Each possibility in the source XML document
is assigned conditioned on the fact that the parent element exists certainly. In other
words, this possibility is a relative one based upon the assumption that the possibility
the parent element exists is exactly 1.0. In order to calculate the absolute possibility,
we must consider the relative possibility in the parent element. In general, the absolute
possibility of an element e can be obtained by multiplying the relative possibilities
found in the source XML along the path from e to the root. Of course, each of these
relative possibilities will be available in the source XML document. By default,
relative possibilities are therefore regarded as 1.0.
Consider a chain X → Y → Z from the root node X. Assume that the source
XML document contains the relative possibilities Poss (Z|Y ), Poss (Y|X), and Poss
(X), associated with the nodes Z, Y, and X, respectively.
Then we have

Poss(Y ) = Poss(Y |X ) × Poss(X )

Poss(Z ) = Poss(Z |Y ) × Poss(Y |X ) × Poss(X )

Here, Poss (Z|Y ), Poss (Y|X) and Poss (X) can be obtained from the source XML
document.
2. the fuzziness in attribute values of elements: this kind of fuzziness use possibility
distributions to represent the values of the attributes. Furthermore, attributes are
classified into two types:
(a) single value attributes: some data items are known to have a single unique
value, e.g., the age of a person in years is a unique integer, and if such a value
is unknown so far, we can use the following possibility distribution: {21/0.4,
23/0.5, 25/0.8, 26/0.9, 27/0.6, 28/0.5, 29/0.3}. This is called disjunctive
possibility distribution.
2.5 Fuzzy XML Model 55

(b) multiple value attributes: XML restricts attributes to a single value, but it
is often the case that some data item is known to have multiple values-
these values may be unknown completely and can be specified with a possi-
bility distribution. For example, the e-mail address of a person may be
multiple character strings because he or she has several e-mail addresses
available simultaneously. In case we do not have complete knowledge of
the e-mail address for Tom Smith, we may say that the e-mail address may be
“[email protected]” with possibility 0.60, and “[email protected]”
with possibility 0.85. This is called conjunctive possibility distribution.
For ease of understanding, we interpret the above two kinds of fuzziness with a
simple fuzzy XML document d 1 in Fig. 2.1. In Fig. 2.1, we talk about the universities
in an area of a given city, say, Detroit, Michigan, in the USA.

(a) Wayne State University is located in downtown Detroit, and thus the possibility
that it is included in the universities in Detroit is 1.0. For pair < Val Poss = 1.0
> … < /Va > is omitted (see Lines 50–51).
(b) Oakland University, however, is located in a nearby county of Michigan, named
Oakland. Whether Oakland University is included in the universities in Detroit
depends on how to define the area of Detroit, the Greater Detroit Area or only
the City of Detroit. Assume that it is unknown and the possibility that Oakland
University is included in the universities in Detroit is assigned 0.8 (see Line 3).
The cases 1–2 are the fuzziness in elements. The degree associated with such an
element represents the possibility that a university is included in universities in
Detroit.
(c) For the student Tom Smith, if his age is unknown so far, i.e., he has fuzzy value
in the attribute age. Since age is known to have a single unique value, we can
use the disjunctive possibility distribution to represent such value (see Lines
23–35).
(d) The e-mail address of Tom Smith may be multiple character strings because he
has several e-mail addresses simultaneously. If we do not know his exact e-mail
addresses, and we use the conjunctive possibility distribution to represent such
information and may say that the e-mail address may be “[email protected]”
with possibility 0.6 and “[email protected]” with possibility 0.45 (see Lines 37–
45). Note that, the cases 3–4 are the fuzziness in attribute values of elements.
In an XML document, it is often the case that some values of attributes may be
unknown completely and can be specified with possibility distributions.

2.5.2 Fuzzy XML Representation Models and Formalizations

In the following, we introduce fuzzy XML representation models, including the

representation of fuzzy data in the XML document, and two fuzzy XML document
structures fuzzy DTD and fuzzy XML Schema.
56 2 Fuzzy Sets and Fuzzy Database Modeling

Fig. 2.1 A fragment of a fuzzy XML document d1

2.5 Fuzzy XML Model 57

2.5.2.1 Representation of Fuzzy Data in XML Document

In order to represent the fuzzy data in XML documents, it is shown in the previous
part that several fuzzy constructs (such as Poss, Val and Dist) are introduced.
It is not difficult to see from the example given above that a possibility attribute,
denoted Poss, should be introduced first, which takes a value in [0, 1]. This possibility
attribute is applied together with a fuzzy construct called Val to specify the possibility
of a given element existing in the fuzzy XML document (see Line 3 in Fig. 2.1).
Another fuzzy construct called Dist, which specifies a possibility distribution,
is introduced. Based on pair <Val Poss> and </Val>, possibility distribution for an
element can be expressed. Also, possibility distribution can be used to express fuzzy
element values. For this purpose, we introduce another fuzzy construct called Dist to
specify a possibility distribution. Typically, a Dist element has multiple Val elements
as children, each with an associated possibility. Since we have two types of possibility
distribution, the Dist construct should indicate the type of a possibility distribution
being disjunctive or conjunctive (see Lines 24–34 and Lines 38–44 in Fig. 2.1).
Again consider Fig. 2.1. Lines 24–34 are the disjunctive Dist construct for the age
of student “Tom Smith”. Lines 38–44 are the conjunctive Dist construct for the email
of student “Tom Smith”. It should be noted, however, that the possibility distributions
in Lines 24–34 and Lines 38–44 are all for leaf nodes in the ancestor– descendant
chain. In fact, we can also have possibility distributions and values over non-leaf
nodes. Observe the disjunctive Dist construct in Lines 6–19, which express the two
possible statuses for the employee with ID 85,431,095. In these two employee values,
Lines 7–12 are with possibility 0.8, and Lines 13–18 are with possibility 0.6.
The structure of an XML document can be described by Document Type Definition
(DTD) or XML Schema (Antoniou and van Harmelen 2004). A DTD, which defines
the valid elements and their attributes and the nesting structures of these elements in
the instance documents, is used to assert the set of “rules” that each instance document
of a given document type must conform to. XML Schemas provide a much more
powerful means than DTDs by which to define your XML document structure and
limitations. It has been shown that the XML document must be extended for fuzzy
data modeling. As a result, several fuzzy constructs have been introduced.

2.5.2.2 DTD Modification

In order to accommodate these fuzzy constructs, it is clear that the DTD of the source
XML document should be correspondingly modified. In this section, we focus on
DTD modification (i.e., fuzzy DTD) for representing the structure of the fuzziness
in XML document as introduced in Sect. 2.5.1.
Firstly we define the basic elements in a fuzzy DTD as follows:
58 2 Fuzzy Sets and Fuzzy Database Modeling

<!ELEMENT element 1 (element 2 ?, *, +)>

//element 1 contains element 2 , and the appearance times of element 2 are
restricted by the cardinalities: ? denotes 0 or 1 time; * denotes 0 or n times; +
denotes 1 or n times; No cardinality operator means exactly once
<!ELEMENT element 1 (element 2 , element 3 , …)>
//element1 contains element 2 , element 3 , … in order
<!ELEMENT element 1 (element 2 |element 3 ,|…)>
//element 1 contains either element 2 or element 3 , …
<!ELEMENT element 1 (#PCDATA)>
//#PCDATA, which is the only atomic type for elements, denotes element 1
may have any content
<!ELEMENT element 1 (empty)>
//element 1 is an empty element
Moreover, the attributes of an element element i can be represented as follows:
<!ELEMENT element i …>
<!ATTLIST element i AttName AttType ValType>
Here AttName is the name of the attribute, AttType is the type of the attribute, and
ValType is the value type which can be #REQUIRED, #IMPLIED, #FIXED “value”,
and “value” (Antoniou & van Harmelen, 2004).
Then, we define Val and Dist elements as follows:
<!ELEMENT Val (#PCDATA|basic_definition)>
<!ATTLIST Val Poss CDATA “1.0”>
//basic_definition represents any case of the basic element definitions above
<!ELEMENT Dist (Val+)>
<!ATTLIST Dist type (disjunctive|conjunctive) “disjunctive”>
Finally, based on the Val and Dist elements, we modify the basic element defi-
nitions above so that all of the elements can use possibility distributions (Dist). In
summary, the basic elements can be classified into two types, i.e., the leaf element
and the non-leaf element:
• for the leaf element which only contains #PCDATA, say leafElement, its definition
is modified from <!ELEMENT leafElement (#PCDATA)> to

<!ELEMENT leafElement (#PCDATA|Dist)>

That is, a leaf element may be fuzzy and takes a value represented by a possibility
distribution.
• for the non-leaf element which contains the other elements, say nonleafElement,
its definition is modified from <!ELEMENT nonleafElement (basic_definition)>
to

<!ELEMENT nonleafElement (basic_definition|Val+|Dist)>.

2.5 Fuzzy XML Model 59

That is, a non-leaf element may be crisp, e.g., student in Fig. 2.1, and thus the non-leaf
element student can be defined as
<!ELEMENT student (sname?, age?, sex?, email?)>.

Also, a non-leaf element may be fuzzy and takes a value represented by a possibility
distribution. We differentiate two cases: the first one is the element takes a value
connected with a possibility degree, e.g., university in Fig. 2.1, which can be defined
as

<!ELEMENT university (Val+)>

and the second one is the element takes a set of values and each value is connected
with a possibility degree, e.g., age of student in Fig. 2.1, which can be defined as

<!ELEMENT age (Dist)>

Based on the above modified fuzzy DTD definitions, Fig. 2.2 gives the fuzzy DTD
D1 w.r.t. the fuzzy XML document d 1 in Fig. 2.1.

2.5.2.3 Fuzzy XML Schema

In the following, we define the XML Schema modification (i.e., fuzzy XML Schema)
for representing the structure of the fuzziness in XML document as introduced in
Sect. 2.5.1.
First, we define Val element as follows:
<xs: element name=“Val” type= “valtype”/>
<xs:complexType name=“valtype”>
<xs:sequence>
<xs:element name=“original-definition” minOccurs=“0” maxOccurs=
“unbounded”/>
<xs:attribute name=“Poss” type=“xs:fuzzy” minOccurs=“0” maxOccurs=
“unbounded” default=“1.0”/>
</xs:sequence>
</xs:complexType>
Then we define Dist element as follows:
<xs:element name= “Dist” type= “disttype”/>
<xs:complexType name= “disttype”>
<xs:element name=“Val” type=“valtype” minOccurs=”1” maxOccurs=
“unbounded”/>
<xs:attribute values=“disjunctive conjunctive” default=“disjunctive”/>
</xs:complexType>
60 2 Fuzzy Sets and Fuzzy Database Modeling

Fig. 2.2 The fuzzy DTD D1 w.r.t. the fuzzy XML document d1 in Fig. 2.1

Now we modify the element definition in the classical Schema so that all of the
elements can use possibility distributions (Dist). For a sub-element that only contains
leaf elements, its definition in the Schema is as follows.
<xs:element name=”leafElement” type=”leafelementtype”/>
<xs:complexType name=”leafelementtype”>
<xs:sequence>
<xs:element name=”original-definition” type=“xs:type” minOccurs=“0”
maxOccurs=“unbounded”/>
<xs:element name=“Dist” type=“disttype” minOccurs=“0” maxOccurs=
“unbounded”/>
</xs:sequence>
</xs:complexType>
2.5 Fuzzy XML Model 61

For an element that contains leaf elements without any fuzziness, its definition in
the Schema is as follows.
<xs:element name=“original-definition” type=“xs:type” minOccurs=“0”
maxOccurs=“unbounded”/>
For an element that contains leaf elements with fuzziness, its definition in the
Schema is as follows.
<xs:element name=“leafElement” type=“leafelementtype”/>
<xs:complexType name=“leafelementtype”>
<xs: element name= “Dist” type= “disttype” minOccurs= “0” maxOccurs=
“unbounded”/>
</xs:complexType>
For a sub-element that does not contain any leaf elements, its definition in the
Schema is as follows.
<xs: element name= “nonleafElement” type= “nonleafelementtype”/>
<xs: complexType name= “nonleafelementtype”>
<xs: sequence>
<xs:element name=“original-definition” type=“xs:type” minOccurs=“0”
maxOccurs=“unbounded”/>
<xs:element name=“Dist” type=“disttype” minOccurs=“0” maxOccurs=
“unbounded”/>
<xs:element name=“Val” type=“valtype” minOccurs=“0” maxOccurs=
“unbounded”/>
</xs: sequence>
</xs: complexType>
For an element that does not contain leaf elements without any fuzziness, its
definition in the Schema is as follows.
<xs: element name= “nonleafElement” type= “nonleafelementtype”/>
<xs: complexType name= “nonleafelementtype”>
<xs: element name= “original-definition” type= “xs:type” minOccurs=“0”
maxOccurs=“unbounded”/>
</xs: complexType>
For a sub-element that does not contain leaf elements but a fuzzy value, its
definition in the Schema is as follows.
<xs:element name=“nonleafElement” type=“nonleafelementtype”/>
<xs:complexType name=“nonleafelementtype”>
<xs:element name=“Val” type=“valtype” minOccurs=“0” maxOccurs=
“unbounded”/>
</xs:complexType>
For a sub-element that does not contain leaf elements but a set of fuzzy values,
its definition in the Schema is as follows.
62 2 Fuzzy Sets and Fuzzy Database Modeling

<xs:element name=“nonleafElement” type=“nonleafelementtype”/>

<xs:complexType name=“nonleafelementtype”>
<xs:element name=“Dist” type=“disttype” minOccurs=“0” maxOccurs=
“unbounded”/>
</xs:complexType>
The fuzzy XML Schema w.r.t. the fuzzy XML document in Fig. 2.1 is shown as
follows:
<? xml version= “1.0”?>
<xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema”>
<xs:element name=“universities”>
<xs:complexType>
<xs:element name=“university” type=“universityype” minOccurs=“0” maxOc-
curs=“unbounded”/>
</xs:complexType> </xs:element> <xs:complexType name=“universityype”>
<xs:element name=“Val” type=“valtype” minOccurs=“1” maxOc-
curs=“unbounded”/>
<xs:attribute name=“UName” type=“xs:IDREF” use=“REQUIRED”/>
</xs:complexType> <xs:complexType name=“valtype”>
<xs:sequence>
<xs:element name=“department” type=“worktype” minOccurs=“0” maxOc-
curs=“unbounded”/>
<xs:element name=“fname” type=“xs:string” minOccurs=“0” maxOc-
curs=“1”/>
<xs:element name=“position” type=“xs:string” minOccurs=“0” maxOc-
curs=“1”/>
<xs:element name=“office” type=“xs:string” minOccurs=“0” maxOc-
curs=“1”/>
<xs:element name=“course” type=“xs:string” minOccurs=“0” maxOc-
curs=“1”/> </xs:sequence> <xs:attribute name=“Poss” type=“xs:fuzzy” minOc-
curs=“0” maxOccurs=“unbounded” default=“1.0”/>
</xs:complexType>
<xs:complexType name=“worktype”>
<xs:sequence>
<xs:element name=“employee” type=“employeetype” minOccurs=“0” maxOc-
curs=“unbounded”/>
<xs:element name=“student” type=“studenttype” minOccurs=“0” maxOc-
curs=“unbounded”/>
</xs:sequence>
<xs:attribute name=“DName” type=“xs:IDREF” use=“REQUIRED”/>
</xs:complexType>
<xs:complexType name=“employeetype”> <xs:element name=“Dist”
type=“disttype”>
<xs:attribute name=“FID” type=“xs:IDREF” use=“REQUIRED”/>
</xs:complexType>
2.5 Fuzzy XML Model 63

<xs:complexType name=“disttype”>
<xs:element name=“Val” type=“valtype” minOccurs=“1” maxOc-
curs=“unbounded”/>
<xs:attribute values=“disjunctive conjunctive” default=“disjunctive”/>
</xs:complexType> <xs:complexType name=“studenttype”> <xs:sequence>
<xs:element name=“sname” type=“xs:string” minOccurs=“0” maxOc-
curs=“1”/>
<xs:element name=“age” type=“agetype” minOccurs=“0” maxOccurs=“1”/>
<xs:element name=“sex” type=“xs:string” minOccurs=“0” maxOccurs=“1”/>
<xs:element name=“email” type=“emailtype” minOccurs=“0” maxOc-
curs=“1”/>
</xs:sequence>
<xs:attribute name=“SID” type=“xs:IDREF” use=“REQUIRED”/>
</xs:complexType>
<xs:complexType name=“agetype”>
<xs:element name=“Dist” type=“disttype”/>
</xs:complexType>
<xs:complexType name=“emailtype”>
<xs:element name=“Dist” type=“disttype”/>
<xs:attribute values=“conjunctive”/>
</xs:complexType>
</xs:schema>

2.5.2.4 Formalization of Fuzzy XML Models

Being similar to the classical XML document, a fuzzy XML document can be intu-
itionally seen as a syntax tree. Figure 2.3 shows a fragment of the fuzzy XML
document d 1 in Fig. 2.1 and its tree representation.
Based on the tree representation of the fuzzy XML document, in the following
we define the formalization of fuzzy XML models in Ma et al. (2010), Zhang et al.
(2013). It can be found from Fig. 2.2 that a fuzzy DTD is made up of element
type definitions, and each element may have associated attributes. Each element
type definition has the form E → (α, A), where E is the defined element type (e.g.,
university and student), α, called the content model such as university (UName, Val
+ ), and A are attributes of E.
For the sake of simplicity, we assume that the symbol T denotes the atomic types of
elements and attributes such as #PCDATA and CDATA, E denotes the set of elements
including the basic elements (e.g., university and student) and the special elements
(e.g., Val and Dist), A denotes the set of attributes, and S = T ∪ E.
A fuzzy DTD D is a pair (P, r), where P is a set of element type definitions, and r [
E is the root element type, which uniquely identifies a fuzzy DTD. Each element type
definition has the form E → (α, A), constructed according to the following syntax:
64 2 Fuzzy Sets and Fuzzy Database Modeling

Fig. 2.3 A fragment of the fuzzy XML document and its tree representation

α ::= S|empty|(α1 |α2 )|(α1 , α2 )|α?|α ∗ |α + |any

A ::= empty|(AN, AT, VT)

Here:
1. S = T ∪ E; empty denotes the empty string; “|” denotes union, and “,” denotes
concatenation; α can be extended with cardinality operators “?”, “*”, and “+”,
where “?” denotes 0 or 1 time, “*” denotes 0 or n times, and “+” denotes 1 or n
times; the construct any stands for any sequence of element types defined in the
fuzzy DTD;
2. AN ∈ A denotes the attribute names of the element E; AT denotes the attribute
types; and VT is the value types of attributes which can be #REQUIRED,
#IMPLIED, #FIXED “value”, “value”, and disjunctive/conjunctive possibility
distribution.
The formal definition of fuzzy XML Schemas can be analogously given following
the procedure above. Next, we give a formal definition of the fuzzy XML documents.
A fuzzy XML document d over a fuzzy DTD D is a tuple d = <N, <, λ, η, r),
where:
• N is a set of nodes in a fuzzy XML document tree.
• < denotes the child relation between nodes, i.e., for two nodes vi , vj ∈ N, if vi <
vj , then vi is the parent node of vj .
• λ: N → E ∪ A is a labeling function for distinguishing elements and attributes
(where E and A are the sets of elements and attributes in the fuzzy DTD, and
attributes are preceded by a “@” to distinguish them from elements) such that if
2.6 Summary 65

λ (v) = e ∈ E, we say that v is an element type; if λ (v) = @a ∈ A, then v is an

attribute @a.
• η: N × N → dom is a function for mapping attributes to values (where dom is a
set of values satisfying the constraints of attribute value types in the fuzzy DTD)
such that for each pair nodes vi , vj ∈ N with vi < vj , if λ(vj ) = @aj ∈ A, then η(vi ,
vj ) = d j ∈ dom. In particular, if λ(vj ) = e ∈ N is a leaf element node E, then η(vi ,
vj ) = d j ∈ dom.
• r is the root node of a fuzzy XML document tree.

In the following, we further give the formalization of fuzzy XML data models,
which is defined based on the characteristic of the tree structure of fuzzy XML data
models as mentioned above.
In short, the basic structure of a fuzzy XML data model is a tree. Let N be a
finite set (of vertices), E ∈ N × N be a set (of edges) and λ: E → L be a mapping
from edges to a set L of strings called labels. The triple G = (N, E, λ) is an edge
labeled directed graph. It should be noted that the tree structure only briefly describes
the characteristic of fuzzy XML data models, and ignores a number of fuzzy XML
features. Here, we further provide a more detailed formal definition of fuzzy XML
tree.
A fuzzy XML tree t can be a tuple t = (N, σ, λ, η, ρ, γ , ∝):
• N = {N 1 , …, N n } is a set of vertices.
• σ ⊂ {(N i , N j ) | N i , N j ∈ N}, (N, σ ) is a directed tree.
• λ: N → (L [ {NULL}), where L a set of strings called labels. For n ∈ N and l ∈
L, λ(n, l) specifies the set of objects that may be children of n with label l.
• η → T, where T is a set of fuzzy XML types (Oliboni & Pozzani, 2008).
• ρ is a mapping from the set of objects v ∈ V to local possibility functions. It defines
the possibility of a set of children of an object existing given that the parent object
exists.
• γ associates with n ∈ N, each label l ∈ L, and an integer-valued interval function,
i.e., γ (n, l) = [min, max]. γ is used to define the cardinality constrains of children
with a label.
• ∝ is a possibly empty partial order on N. Here, a relation “∝” is a partial order
on a set N if the following three characteristics hold: (1) reflexivity: θ ∝ θ for
all θ ∈ N; (2) antisymmetry: θ ∝ ω and ω ∝ θ implies ω = θ; (3) transitivity:
θ ∝ ω and ω ∝ ε implies θ ∝ ε.

2.6 Summary

In real-world applications, information is often imprecise or uncertain. For modelling

fuzzy information in the area of databases, Zadeh’s fuzzy logic (Zadeh, 1965) is
introduced into databases to enhance the classical databases such that uncertain
66 2 Fuzzy Sets and Fuzzy Database Modeling

and imprecise information can be represented and manipulated. This resulted in

numerous contributions, mainly with respect to the popular fuzzy database models.
In this chapter, we first introduce the fundamentals of fuzzy sets and fuzzy graphs,
which are required for this book. Then, we introduce several popular fuzzy database
models, including fuzzy XML model, fuzzy relational database model and fuzzy
object-oriented database model. Based on the widespread studies and the relatively
mature techniques of fuzzy databases, it is not surprising that fuzzy databases have
been the key means for providing some technique supports for managing fuzzy data.
Recent years have witnessed many new application perspectives such as Big Data
and artificial intelligence (AI). As a result, some new data models are emerging
beyond the traditional data models. Clearly it is not enough for the existing fuzzy
data models and their extensions to represent necessary data semantics. It is essential
to invent some new fuzzy data models. Being one kind of special graph data model,
RDF recommended by the W3C is finding more and more uses in a wide range of
semantic data management scenarios. On this basis, topics related to the modelling
of fuzzy data can be considered very interesting in RDF as will be introduced in the
following chapter.

References

Abiteboul, S., Segoufin, L., & Vianu, V. (2006). Representing and querying XML with incomplete
information. ACM Transactions on Database Systems (TODS), 31(1), 208–254.
Antoniou, G., & van Harmelen, F. (2004). A Semantic Web Primer. MIT Press.
Berzal, F., Marín, N., Pons, O., & Vila, M. A. (2007). Managing fuzziness on conventional object-
oriented platforms. International Journal of Intelligent Systems, 22(7), 781–803.
Bordogna, G., & Pasi, G. (2001). Graph-based interaction in a fuzzy object-oriented database.
International Journal of Intelligent Systems, 16, 821–841.
Bosc, P., & Prade, H. (1993). An introduction to fuzzy set and possibility theory based approaches
to the treatment of uncertainty and imprecision in database management systems. In Proceedings
of the Second Workshop on Uncertainty Management in Information Systems: From Needs to
Solutions.
Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F., & Cowan, J. (2000). Extensible
markup language (XML) 1.0.
Buckles, B., & Petry, F. (1982). A fuzzy representation for relational databases. Fuzzy Sets and
Systems, 7, 213–226.
Cao, T. H., & Rossiter, J. M. (2003). A deductive probabilistic and fuzzy object-oriented database
language. Fuzzy Sets and Systems, 140, 129–150.
Chaudhry, N., Moyne, J., & Rundensteiner, E. A. (1999). An extended database design methodology
for uncertain data management. Information Sciences, 121(1–2), 83–112.
Chen, G. Q. (1999). Fuzzy Logic in Data Modeling; Semantics, Constraints, and Database Design.
Kluwer Academic Publisher.
Chen, G. Q., Vandenbulcke, J., & Kerre, E. E. (1992). A general treatment of data redundancy
in a fuzzy relational data model. Journal of the American Society of Information Science, 43,
304–311.
Codd, E. F. (1986). Missing information (applicable and inapplicable) in relational databases.
SIGMOD Record, 15, 53–78.
References 67

Codd, E. F. (1987). More commentary on missing information in relational databases (applicable

and inapplicable information). SIGMOD Record, 16(1), 42–50.
Cross, V. (1996, September). Towards a unifying framework for a fuzzy object model. In Proceedings
of IEEE 5th International Fuzzy Systems (Vol. 1, pp. 85–92). IEEE.
Cross, V. (2001). Fuzzy extensions for relationships in a generalized object model. International
Journal of Intelligent Systems, 16, 843–861.
Cross, V., Caluwe, R., & Vangyseghem, N. (1997). A perspective from the fuzzy object data
management group (FODMG). In Proceedings of Fuzzy Systems (pp. 721–728).
Cross, V., & Firat, A. (2000). Fuzzy objects for geographical information systems. Fuzzy Sets and
Systems, 113(1), 19–36.
Cubero, J. C., Marín, N., Medina, J. M., Pons, O., & Vila, M. A. (2004). Fuzzy object management
in an object-relational framework. In Proceedings of the 10th International Conference on Infor-
mation Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU’2004
(pp. 1767–1774).
Dalvi, N., & Suciu, D. (2007). Management of probabilistic data: foundations and challenges.
In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database
Systems (pp. 1–12).
De, S. K., Biswas, R., & Roy, A. R. (2001). On extended fuzzy relational database model with
proximity relations. Fuzzy Sets and Systems, 117, 195–201.
de Caluwe, R. (1998). Fuzzy and Uncertain Object-Oriented Databases: Concepts and Models.
World Scientific Pub Co.
de Tré, G., & de Caluwe, R. (2003). Level-2 fuzzy sets and their usefulness in object-oriented
database modeling. Fuzzy Sets and Systems, 140, 29–49.
Dubois, D., Prade, H., & Rossazza, J. P. (1991). Vagueness, typicality, and uncertainty in class
hierarchies. International Journal of Intelligent Systems, 6, 167–183.
Fan, T., Yan, L., & Ma, Z. (2019). Mapping fuzzy RDF (S) into fuzzy object-oriented databases.
International Journal of Intelligent Systems, 34(10), 2607–2632.
Fan, T., Yan, L., & Ma, Z. (2020). Storing and querying fuzzy RDF (S) in HBase databases.
International Journal of Intelligent Systems, 35(4), 751–780.
Galindo, J. (2008). Handbook of Research on Fuzzy Information Processing in Databases. IGI
Global.
Gaurav, A., & Alhajj, R. (2006). Incorporating fuzziness in XML and mapping fuzzy relational data
into fuzzy XML. In Proceedings of the 2006 ACM Symposium on Applied Computing (pp. 456–
460).
George, R., Srikanth, R., Petry, F. E., & Buckles, B. P. (1996). Uncertainty management issues in
the object-oriented data model. IEEE Transaction on Fuzzy Systems, 4(2), 179–192.
Gottlob, G., & Zicari, R. (1988). Closed world databases opened through null values. In Proceedings
of the 1988 International Conference on Very Large Data Bases (pp. 50–61).
Grant, J. (1979). Partial values in a tabular database model. Information Processing Letters, 9(2),
97–99.
Gyseghem, N. V., & Caluwe, R. D. (1995). Fuzzy behavior and relationships in a fuzzy OODB-
model. In Proceedings of the Tenth Annual ACM Symposium on Applied Computing, Nashville,
TN (pp. 503–507).
Holub, J., & Melichar, B. (1998). Implementation of Nondeterministic Finite Automata for
Approximate Pattern Matching. Third International Workshop on Implementing Automata
(pp. 92–99).
Kauffmann, A. (1973). Introduction to the Theory of Fuzzy Sets. Academic Press Inc.
Kianmehr, K., Özyer, T., Lo, A., Jida, J., Jiwani, A., Alimohamed, Y., Spence, K., & Alhajj, R.
(2010). Human centric data representation: from fuzzy relational databases into fuzzy XML. In
Soft Computing in XML Data Management (pp. 55–77). Springer.
Koyuncu, M., & Yazici, A. (2003). IFOOD: an intelligent fuzzy object-oriented database
architecture. IEEE Transaction on Knowledge and Data Engineering, 15(5), 1137–1154.
68 2 Fuzzy Sets and Fuzzy Database Modeling

Lee, J., & Fanjiang, Y. (2003). Modeling imprecise requirements with XML. Information and
Software Technology, 45(7), 445–460.
Lee, J., Xue, N. L., Hsu, K. H., & Yang, S. J. H. (1999). Modeling imprecise requirements with
fuzzy objects. Information Sciences, 118, 101–119.
Li, G., Yan, L., & Ma, Z. (2019a). An approach for approximate subgraph matching in fuzzy RDF
graph. Fuzzy Sets and Systems, 376, 106–126.
Li, G., Yan, L., & Ma, Z. (2019b). A method for fuzzy quantified querying over fuzzy resource
description framework graph. International Journal of Intelligent Systems, 34(6), 1086–1107.
Li, G., Yan, L., & Ma, Z. (2019c). Pattern match query over fuzzy RDF graph. Knowledge-Based
Systems, 165, 460–473.
Ma, Z. M. (2005a). Advances in Fuzzy Object-Oriented Databases: Modeling and Applications.
Idea Group Publishing.
Ma, Z. M. (2005b). Fuzzy Database Modeling with XML (The Kluwer International Series on
Advances in Database Systems). Springer-Verlag.
Ma, Z., Li, G., & Yan, L. (2018). Fuzzy data modeling and algebraic operations in RDF. Fuzzy Sets
and Systems, 351, 41–63.
Ma, Z. M., Liu, J., & Yan, L. (2010). Fuzzy data modeling and algebraic operations in XML.
International Journal of Intelligent Systems, 25(9), 925–947.
Ma, Z. M., & Mili, F. (2002). Handling fuzzy information in extended possibility-based fuzzy
relational databases. International Journal of Intelligent Systems, 17(10), 925–942.
Ma, Z., & Yan, L. (2016). Modeling fuzzy data with XML: A survey. Fuzzy Sets and Systems, 301,
146–159.
Ma, Z. M., & Yan, L. (2007). Fuzzy XML data modeling with the UML and relational data models.
Data & Knowledge Engineering, 63(3), 970–994.
Ma, Z. M., & Yan, L. (2008). A literature overview of fuzzy database models. Journal of Information
Science and Engineering, 24(1), 189–202.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2000). Semantic measure of fuzzy data in extended possibility-
based fuzzy relational databases. International Journal of Intelligent Systems, 15, 705–716.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2004). Extending object-oriented databases for fuzzy
information modeling. Information Systems, 29(5), 421–435.
Majumdar, A. K., Bhattacharya, I., & Saha, A. K. (2002). An object-oriented fuzzy data model for
similarity detection in image databases. IEEE Transactions on Knowledge and Data Engineering,
14, 1186–1189.
Manolis, N., & Tzitzikas, Y. (2011). Interactive exploration of fuzzy RDF knowledge bases. In
Extended Semantic Web Conference (pp. 1–16). Springer.
Marín, N., Pons, O., & Vila, M. A. (2001). A strategy for adding fuzzy types to an object oriented
database system. International Journal of Intelligent Systems, 16, 863–880.
Marín, N., Vila, M. A., & Pons, O. (2000). Fuzzy types: A new concept of type for managing vague
structures. International Journal of Intelligent Systems, 15, 1061–1085.
Mathew, S., Mordeson, J. N., & Malik, D. S. (2018). Fuzzy Graph Theory (Vol. 363). Springer
International Publishing.
Nam, M., Ngoc, N. T. B., Nguyen, H., & Cao, T. H. (2007). FPDB40: A fuzzy and probabilistic
object base management system. Proceedings of the FUZZ-IEEE, 2007, 1–6.
Ndouse, T. D. (1997). Intelligent systems modeling with reusable fuzzy objects. International
Journal of Intelligent Systems, 12, 137–152.
Nierman, A., & Jagadish, H. V. (2002). ProTDB: Probabilistic data in XML. In VLDB’02: Proceed-
ings of the 28th International Conference on Very Large Databases (pp. 646–657). Morgan
Kaufmann.
Oliboni, B., & Pozzani, G. (2008). Representing fuzzy information by using XML schema. In 2008
19th International Workshop on Database and Expert Systems Applications (pp. 683–687). IEEE.
Ozgur, N. B., Koyuncu, M., & Yazici, A. (2009). An intelligent fuzzy object-oriented database
framework for video database applications. Fuzzy Sets and Systems, 160, 2253–2274.
References 69

Parsons, S. (1996). Current approaches to handling imperfect information in data and knowledge
bases. IEEE Transactions on Knowledge Data Engineering, 8, 353–372.
Petry, F. E. (1996). Fuzzy Databases: Principles and Applications. Kluwer Academic Publisher.
Prade, H., & Testemale, C. (1984). Generalizing database relational algebra for the treatment of
incomplete or uncertain information and vague queries. Information Sciences, 34, 115–143.
Raju, K., & Majumdar, A. (1988). Fuzzy functional dependencies and lossless join decomposition
of fuzzy relational database systems. ACM TODS, 13(2), 129–166.
Rosenfeld, A. (1975). Fuzzy graphs. In L. A. Zadeh, K. S. Fu, & M. Shimura (Eds.), Fuzzy Sets
and Their Applications (pp. 77–95). Academic Press.
Rundensteiner, E., & Bic, L. (1992). Evaluating aggregates in possibilistic relational databases.
Data and Knowledge Engineering, 7, 239–267.
Senellart, P., & Abiteboul, S. (2007). On the complexity of managing probabilistic XML data. In
Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of
Database Systems (pp. 283–292).
Shenoi, S., & Melton, A. (1999). Proximity relations in the fuzzy relational database model. Fuzzy
Sets Syst (Suppl) 100, 51–62.
Straccia, U. (2009). A minimal deductive system for general fuzzy RDF. International Conference
on Web Reasoning and Rule Systems (pp. 166–181). Springer.
Sunitha, M. S. (2001). Studies on fuzzy graphs (PhD thesis). Cochin University of Science and
Technology, India.
Sunitha, M. S., & Vijayakumar, A. (2005). Blocks in fuzzy graphs. The Journal of Fuzzy
Mathematics, 13(1), 13–23.
Tseng, C., Khamisy, W., & Vu, T. (2005). Universal fuzzy system representation with XML.
Computer Standards & Interfaces, 28(2), 218–230.
Turowski, K., & Weng, U. (2002). Representing and processing fuzzy information—an XML-based
approach. Knowledge-Based Systems, 15(1–2), 67–75.
Umano, M., & Fukami, S. (1994). Fuzzy relational algebra for possibility-distribution-fuzzy-
relational model of fuzzy data. Journal of Intelligent Information Systems, 3, 7–27.
Umano, M., Imada, T., Hatono, I., & Tamura, H. (1998). Fuzzy object-oriented databases and
implementation of its SQL-type data manipulation language. In Proceedings of the 7th IEEE
International Conference on Fuzzy Systems (pp. 1344–1349).
Yan, L., & Ma, Z. M. (2012). Comparison of entity with fuzzy data types in fuzzy object-oriented
databases. Integrated Computer-Aided Engineering, 19(2), 199–212.
Yan, L., Ma, Z. M., & Zhang, F. (2012). Algebraic operations in fuzzy object-oriented databases.
Information Systems Frontiers, 1–14.
Yan, L., Ma, Z., Zhang, F., & Ma, Z. (2014). Fuzzy XML data management. Springer.
Yazici, A., & Koyuncu, M. (1997). Fuzzy object-oriented database modeling coupled with fuzzy
logic. Fuzzy Sets System 89, 1–26.
Yazici, A., & George, R. (1999). Fuzzy Database Modeling. Physica-Verlag.
Yeh, R. T., & Bang, S. Y. (1975a). Fuzzy graphs, fuzzy relations, and their applications to cluster
analysis. In L. A. Zadeh, K. S. Fu, & M. Shimura (Eds.), Fuzzy Sets and Their Applications
(pp. 125–149). Academic Press.
Yeh, R. T., & Bang, S. Y. (1975b). Fuzzy relations, fuzzy graphs, and their applications to clustering
analysis. Fuzzy Sets and their Applications to Cognitive and Decision Processes, 159(159), 125–
149.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Systems, 1(1), 3–28.
Zhang, F., Ma, Z. M., & Yan, L. (2013). Construction of fuzzy ontologies from fuzzy XML models.
Knowledge-Based Systems, 42, 20–39.
Zicari, R., & Milano, P. (1990). Incomplete information in object-oriented databases. ACM SIGMOD
Record, 19(3), 5–16.
Chapter 3
Fuzzy RDF Modeling

3.1 Introduction

RDF is a World Wide Web Consortium (W3C) recommendation, which can represent
structured data as well as unstructured data and is quickly becoming the de-facto stan-
dard for the representation and exchange of information. Nowadays RDF data model
is finding more and more use in a wide range of web data management scenarios.
However, information suffers from imperfections in real-world applications. In in an
open environment like the Web, the underlying RDF data is unreliable and imprecise.
Additionally, in the context of the Semantic Web, the need for fuzzy data arises in
information classifications, which are fuzzy by nature.
Identifying that it is essential to explicitly represent and process imperfect infor-
mation, managing imprecise and uncertain information (mainly with respective forms
of fuzzy sets or probability distributions) has been extensively investigated in the
context of the relational model, conceptual data models, object-oriented databases,
XML, and so on. Unfortunately, although it is the current standard for data represen-
tation and exchange over the Semantic Web, RDF is not able to represent and process
imprecise and uncertain data, although the databases with imprecise and uncertain
information have been extensively discussed. The extension of the RDF data model
is particularly important.
At present, a number of research efforts have proposed multiple mechanisms to
model uncertain RDF. One such approach is to incorporate additional semantics into
the RDF data model to represent uncertainty. Several extensions of RDF have been
proposed in order to deal with truth of imprecise information (Mazzieri et al., 2008;
Straccia 2009), time (Tappolet et al., 2009), trust (Hartig, 2009) and provenance
(Dividino et al., 2009). Particularly imprecise and uncertain data has become an
emerging topic for various applications of the Semantic Web. However, RDF lacks
sufficient power to represent and process imprecise and uncertain data although the
databases with imprecise and uncertain information have been extensively discussed

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 71

(Ma et al., 2010). Thus, it is necessary to use uncertainty methods to accommo-

date different types of imprecise and uncertainty semantics information in the RDF
database.
Based on probability theory, there are some existing works for combining proba-
bilities with RDF data. For example, Fukushige (2005) extends the RDF vocabulary
to allow probabilistic confidence existing in RDF data. Udrea et al. (2006) extend
acyclic RDF statements with a model-theoretic semantics and fix-point semantics
and they consider probabilistic RDF data that are represented by a graph with proba-
bilistic edges. Similarly, Huang and Liu (2009) model probabilistic RDF data by the
probabilistic database, which also assumes that edges in the RDF graph have exis-
tence probabilities. Unfortunately, despite fuzzy values have been employed to model
and handle imprecise information in database since Zadeh introduced the theory of
fuzzy sets (Zadeh, 1978), relatively little work has been carried out in extending
RDF towards the representation of imprecise and uncertain concepts. Straccia (2009)
presents Fuzzy RDF in a general setting where triples are annotated with a degree
of truth in [0, 1]. Other similar approaches for Fuzzy RDF (Mazzieri et al., 2008;
Zimmermann et al., 2011), provide the syntax and semantics, along with RDF and
RDFS interpretations of the annotated triples. However, all these proposals share a
common approach of extending the RDF data model by attaching meta-information
about the RDF triples. These proposals lack enough expressive power to properly
describe multiple granularities of data fuzziness in RDF.
With the recent publications of large quantities of RDF data, the implementa-
tion of data retrieval system is an important issue for RDF data management. As
evidenced by the relational database management system, a formal algebra is essen-
tial for applying standard database style optimization to query processing. Therefore,
several different RDF algebras have been proposed by research groups coming from
both industry and academia, such as Frasincar et al. (2002) present algebraic opera-
tions on the lines of the RQL query language, Robertson (2004) studies an algebra
of triadic relations for RDF, and Chen et al. (2005) introduce a set of operations for
manipulating RDF graphs. Although they evidence the power of having an algebraic
approach to query RDF, the frameworks presented in each of these works do not
support the fuzzy RDF query. Therefore, a fuzzy RDF algebra with a proper under-
standing and a sound theoretical foundation is required. This paper identifies this
need and proposes a fuzzy RDF algebra suitable for defining the semantics of fuzzy
RDF data model and for performing algebraic optimizations.
In this chapter, we introduce some basic notions of fuzzy RDF models, including
fuzziness in RDF documents, fuzzy RDF representation model, and fuzzy RDF
algebraic operations (Ma et al., 2018).
3.2 Fuzzy RDF Graph 73

3.2 Fuzzy RDF Graph

In the following, we first discuss the fuzziness in RDF. Then we formally present
the fuzzy RDF data model and some basic notions, which are used with our algebra
discussed in the later sections of this chapter.

3.2.1 Fuzzy Information in RDF Graph

In real-world applications, some works have used Zadeh’s fuzzy set theory to accom-
modate different types of imprecise and uncertain information in database systems
(Piattini et al., 2006). In order to represent and manipulate imprecise RDF data, RDF
should be extended using fuzzy set theory. From the perspective of semantics, there
are two level structures in RDF data model, namely the triple-level (statement-level)
and the element-level (i.e., subject, predicate, and object of a triple). Accordingly,
there may be two kinds of fuzziness in a fuzzy RDF dataset: one is the fuzziness in
triples, i.e., a fuzzy membership degree associated with an RDF triple represents the
amount of disagreement on the corresponding statement, the other is the fuzziness
in the element of triple, i.e., we do not know the crisp value of an element, and the
value of the element may be represented by a fuzzy set.
To represent fuzzy information at the triple-level, the fuzzy extension of RDF
model (Lopes et al., 2010; Mazzieri et al., 2008) simply associates to each of the
underlying components of the RDF model i.e., triples, a membership value repre-
senting a measure of uncertainty. Formally, this kind of fuzzy RDF model is a 4-tuple
(s, p, o) [n], where s, p and o are subject, predicate and object respectively, and n is
a numeric value between 0 and 1. The numeric value has a syntactic nature different
from the others: it is not an element of the domain of the discourse, but a property
related to the formalism used by the data model to represent the fuzzy truth-value of
the assertion (statement) made by the RDF triple. From the perspective of semantics,
a fuzzy membership degree associated with an RDF triple represents the amount
of disagreement on the corresponding statement. Although this typed of fuzzy RDF
model is simple, the added numeric value can be interpreted as various different mean-
ings in the real world such as uncertainty (Udrea et al., 2006, 2010), trust (Hartig,
2009), provenance (Dividino et al., 2009) and so on. This allows it to describe all
kinds of things in the objective world, and have great potential for application. For
instance, <Diner, writer, Barry Levinson> [0.75] is a fuzzy triple, intending that the
possibility that Barry Levinson is the writer of Diner is 0.75. From a graphical point
of view, a fuzzy RDF graph is a set of fuzzy RDF triples. The edges represent the
predicates of triples. They are annotated with the predicate identifier as usual and
with an additional label for the consumer-specific value of the corresponding triple.
74 3 Fuzzy RDF Modeling

However, this data model only considers the membership degree of triple which
represent the possibility of the triple being a member of the corresponding RDF
graph. And the element value in the triple is still crisp. Although the expression
ability is better than classic RDF data model, there is still a great limitation for
full fuzzy RDF modeling and reasoning. The semantics of an RDF graph relies on
the connectivity of the resources described. This is not properly considered in mere
triple-level vagueness. Therefore, it appears natural to allow element of triple as a
fuzzy concept and not only statement. In generally, the subject, object, and predicate
of a triple do not always need to be a crisp resource, but they can also be fuzzy
concepts.
In order to represent the fuzziness of three components of triples, Ma et al. propose
a fuzzy extension of RDF, which annotates three components of triples (i.e., subject,
predicate, and object) instead of whole triples with degrees in [0, 1]. This is a
general fuzzy RDF graph model that considers the element-level fuzziness based
(on fuzzy graph theory. ) The element-level fuzzy RDF model formally is fuzzy triple
μ S /s, μ R / p, μ Q /o , where s is fuzzy subject and μs ∈ [0, 1] denote the member-
ship degree of subject to the universe of an RDF dataset, p is fuzzy predicate and
μ p ∈ [0, 1] express the fuzzy degree to the property or relationship being described,
o is fuzzy object and μo ∈ [0, 1] represent the fuzzy degree of the property value.
Such RDF triples also can be conceived of as a directed fuzzy graph which allows
users to describe arbitrary resources in terms of their attributes and their relationships
to other resources. Each vertex and edge of fuzzy RDF graph are associated with a
membership value, e.g., each element of triple existence fuzzy value, which is quite
different from the model assumption in that only considers the existence membership
degree of triples to appear in reality.
The fuzzy degree associated with each vertex indicates the possibility that vertex
takes the label. In particular, if a vertex label is URI which represents a resource that
corresponds to a real-world entity, the fuzzy membership specifies the possibility
that resource identify across datasets. Another case, if a vertex label is literal which
represents property value of an entity, the fuzzy membership specifies the possibility
that the property value distributes across dataset. Furthermore, the edge between two
vertices represents semantic relationship, and the fuzzy degree associated with each
edge represents the possibility whether particular relationships exist.

3.2.2 Fuzzy RDF Data Model

It is shown that the RDF should be extended for fuzzy data modeling. In order
to accommodate the fuzzy information, the RDF syntax must be modified corre-
spondingly. In this subsection, we focus on syntax modification for fuzzy RDF data
modeling.
3.2 Fuzzy RDF Graph 75

Table 3.1 Fuzzy EBNF of N-Triples grammar

fntripleDoc ::= line*
line ::= ws* (comment|triple)? eoln
comment ::= ‘#’ (character-(cr|lf))*
triple ::= degree ‘/’subject ws+ degree ‘/’predicate ws+ degree ‘/’object ws* ‘.’ ws*
degree ::= 0.[0-9] +|1

3.2.2.1 Fuzzy RDF Syntax

To represent fuzzy information, we will define a syntactic extension of RDF. We

will try to be as plain as possible and make as few RDF syntax changes as possible.
Although there are a number of syntaxes available for encoding RDF data, the simple
concrete syntax we define is as a modification of the Extended Backus-Naur Form
(EBNF) of N-Triples as given in Grant et al. (2022), a line-based, plain text format
for encoding an RDF graph, used to show RDF test cases. We just add the fuzzy
information to the values of the triple’s element, and other definitions such as white
space, end-of-line characters, comment, subject, predicate and object are unchanged.
Modifications to N-Triples for representing fuzzy information are given in Table 3.1.
The added element has a syntactic nature different from the others: it is not an element
of the domain of the discourse, but a property related to the formalism used by the
data model to represent fuzziness.

3.2.2.2 Fuzzy RDF Graph Data Model

In the following, we further give the formalization of fuzzy RDF data models, which
is defined based on the characteristic of fuzzy graph (Sunitha, 2001) in the fuzzy set
theory.
In short, the basic structure of a fuzzy RDF data model is a graph. We start by
introducing some simple concepts. Let V be a finite set of vertices, E ⊂ Vi j × Vi be
a set of edges and L : V ∪ E → Σ be a mapping from vertices and edges to a set Σ
of string called labels. Quadruple G M = (V , E, Σ, L) is a directed labeled graph.
Assume M is a set of RDF triples each represented by (s, p, o) ∈ U ∪ B × U × U
∪ L ∪ B. A conversion function from M to GM include the following two steps for
each (s, p, o) ∈ M: (i) add vertices vs , vo to V and assign L(vs ) = s, L(v0 ) = 0, (ii)
add a directed edge (vs , vo ) into E and assign L(vs , v0 ) = p. It should be noted that
the graph structure only briefly describes the structure characteristic of RDF data
model, and ignores fuzzy contents of vertices and edges of RDF data model. Here,
we further provide a more detailed formal definition of fuzzy RDF graph data model.
76 3 Fuzzy RDF Modeling

Definition 3.1 (Fuzzy RDF data graph). Fuzzy RDF data graph G is represented by
a 6-tuple (V , E, Σ, L , μ, ρ). Here
1. V is a finitude set of vertices
2. E ⊂ Vi × V j is a set of directed edges, where Vi , V j ⊂ V
3. Σ is a set of labels
4. L = {L V , L E }, L V : V → Σ is a function assigning labels to vertices, and
L E : E → Σ is a function assigning labels to edges
5. μ : V → [0, 1] is a fuzzy subset of V
6. ρ : (E → )[0, 1] is a fuzzy ( )relation on fuzzy subset μ. Note that ∀vi , v j ∈
V , ρ vi , v j ≤ μ(vi ) ∧ μ v j , where ∧ stands for minimum.

In Definition 3.1, each vertex vi ∈ V of graph G has one label, L V( (vi ), corre- )
sponding to either subjects or objects in RDF triples datasets. Moreover,
( )vi , v j ∈ E
is a directed edge from vertex vi to vertex vj , with an edge label L E vi , v j that corre-
sponds to the predicate in RDF triples. The labels values of vertices are associated
with fuzzy degree indicating the possibility that vertices take the labels, and the fuzzy
value associated with each edge represents the amount of disagreement on the corre-
sponding relationship between vertices. A fuzzy RDF data graph may contain both
fuzzy vertices (resp. edges) and crisp vertices (resp. edges) as a fuzzy vertex (resp.
edge) with a degree of 0 or 1 can be considered as crisp. Along the same line, a crisp
RDF graph is simply a special case of fuzzy RDF (data graph ) (where
) μ : V → {0, 1}
for all vi ∈ V and ρ : V × V → {0, 1} for all vi , v j ∈ E , and the fuzzy RDF
graph is a generalization of the crisp RDF graph.
Moreover, in our model, each vertex and edge of fuzzy RDF graphs is associated
with a membership value, e.g., each elements of triple existence fuzzy value, which
is quite different from the model assumption in Udrea et al. (2010}, Zimmermann
et al. (2011), that only considers the existence membership degree of triples to appear
in reality.
An example of fuzzy RDF data graph with some fuzzy vertices and edges is given
in Fig. 3.1, which describes some information about movies and actors. Here the
genre of the film with ID film1 is tragedy, the audience rating is “9.5”, the box office
is “$35 million”, the star is the person with ID pid1 and the director is the person
with ID pid1 who born in “area1”. From the graph, the genre of the film1 has label
“tragedy” with possibility 0.95, and it exactly corresponds to the object of triple
(film1, <genre>, 0.95/“tragedy”). Similarly, the vertex labeled “city1” is connected
to another vertex labeled “region1” through the directed edge labeled “locateIn” with
possibility 0.85, and it corresponds to the triple (city1, 0.85/<locateIn>, “region1”).
Therefore, this graphic representation is generic enough to capture the correlations or
constraints among labels of vertices and edges. In this example, the degree is based
on a simple statistical notion, which can be acquired from statistics of historical data
or reliability of data sources.
3.2 Fuzzy RDF Graph 77

Fig. 3.1 Fuzzy RDF data graph

Definition 3.2 (Fuzzy RDF path). Assume G = (V , E, Σ, L , μ, ρ) is a fuzzy

RDF graph. A directed path p between vertices v1 and vn in G is defined as a
finite sequence p = v1 , e1 , v2 , e2 , ..., vn−1 , en−1 , vn such that vi ∈ V and (vi ,
ei , vi+1 ) ∈ E for i ∈ [1, n − 1]. A path expression of the RDF graph can
be described as a vertex-edge alternating sequence. The label of such a path p,
denoted by L(p), is the set of all vertex/edge labels in the path p. i.e., L( p) =
L(v1 ), L(e1 ), L(v2 ), L(e2 ), . . . , L(vn−1 ), L(en−1 ), L(vn ).
We define fuzzy membership δ for path p with δ(p) = ρ(v1 , v2 ) ∧ ρ(v2 , v3 ) ∧
… ∧ ρ(vn−1 , vn ) ∧ μ(v1 ) ∧ μ(v2 ) ∧ … ∧ μ(vn ), i.e., it is the minimum fuzzy value
of the edge or vertex in the fuzzy path. If there lie several paths from v1 to vn , then
possible set of paths is P = { pk | pk = vk1 , ek1 , vk2 , ek2 , · · · , vkn−1 , ekn−1 , vkn and k ≥
1}, among the possible paths, value of maximum intensity path δ p is gathered as,
δ p (P) = ∨δ( pk ).
'
Definition
( ' ' ' 3.3 (Fuzzy) RDF sub-graph). A fuzzy RDF graph G =
' ' '
V , E , Σ , L , μ , ρ is called a partial fuzzy sub-graph of G = (V , E, Σ, L , μ, ρ)
if
1. μ' ⊆ μ, ρ ' ⊆ ρ, V ' ⊆ V , E ' ⊆ E and Σ ' ⊆ Σ;
2. ∀u ∈ V ' , μ' (u) ≤ μ(u);
3. ∀(u, v) ∈ E ' , ρ ' (u, v) ≤ ρ(u, v).

In particular, a partial fuzzy sub-graph G ' is called a fuzzy sub-graph of G, written

as G ' G, if
{ }
1. ∀u ∈ x ∈{V ' : μ' (x) > 0 , μ' (u) = μ(u) and }
2. ∀(u, v) ∈ (x, y) ∈ V ' × V ' : ρ ' (x, y) > 0 , ρ ' (u, v) = ρ(u, v).

Definition 3.4 (Fuzzy RDF graph isomorphism). Given two fuzzy RDF graphs G 1 =
(V1 , E 1 , Σ1 , L 1 , μ1 , ρ1 ) and G 2 = (V2 , E 2 , Σ2 , L 2 , μ2 , ρ2 ), an isomorphism from
G1 to G2 is a bijective function h: V 1 → V 2 such that:
78 3 Fuzzy RDF Modeling

1. ∀u ∈ V1 , h(u) ∈ V2 , L 1 (u) = L 2 (h(u)) and μ1 (u) = μ2 (h(u));

2. ∀ (u, v) ∈ E 1 , (h(u), h(v)) ∈ E 2 , L 1 (u, v) = L 2 (h(u), h(v)) and ρ 1 (u, v) = ρ 2
(h(u), h(v)).

If such a function h exists, then G1 is isomorphic to G2 , denoted it as G 1 ∼

= G2.
Given two fuzzy RDF graphs Q and G, Q is sub-graph isomorphic to G, denoted
as QG, if Q is isomorphic to at least on sub-graph G' of G, and G' is a matching
of Q in G.

Theorem 3.1 Isomorphism between fuzzy RDF graphs is an equivalence relation.

Proof:
1. Reflexivity: Consider the identity map h: V → V such that ∀ v ∈ V, h (v) = v.
This (h is a )bijective
( map satisfying
( )) ∀v ∈ V, μ(v) = μ(h(v)) and ∀ (vi , vj ) ∈
E, ρ vi , v j = ρ h(vi ), h v j . Hence h is an isomorphism of the fuzzy graph
to itself. Therefore, it satisfies reflexivity.
2. Symmetry: Given two fuzzy RDF graphs G1 and G2 . Let h : V1 → V2 be
an isomorphism from G1 to G2 then h is a bijective (map h(v)1 ) = v2 , v1 ∈
)) ( μ1 (v)1 ) = μ2 (h(v1 )), ∀v1 ∈ V1 and ρ1 v1i , v1 j = ρ2 (h(v1i ) ,
v1( satisfying
h v1 j , ∀ v1i , v1 j ∈ E 1 .
−1
As h is ( bijective,
) by (h(v1 ) = ( v2 ,))
v1 ∈
( V1 then ) h (v2 ) = v(1 ∀v 2 ∈ ) V2 ;
−1
Using ρ1 v1i , v1 j = ρ2 (h(v1i ), h v1 j (, ∀ ))
v1i , v1 j ( ∈ E 1 then
) ( μ1 h )(v2 ) =
μ2 (v2 ) ∀ v2 ∈ V2 and ρ1 h −1 (v2i ), h −1 v2 j = ρ2 v2i , v2 j , ∀ v2i , v2 j ∈ E 2 .
Thus, we get a 1–1, onto map h −1 : V2 → V1 , which is an isomorphism from
G2 to G1, i.e., G 1 ∼
= G2 ⇒ G2 ∼ = G1.
3. Transitivity: Given three fuzzy RDF graphs G1 , G2 and G3 . Supposed h : V1 →
V2 and h 2 : V2 → V3 are isomorphism of the fuzzy RDF graph G1 onto G2 and
G2 onto G3 respectively.

As h1 is a bijective map ( h 1 (v)1 ) = ( v2 , v1 ∈( V)) 1 satisfying

( )μ1 (v1 ) =
μ2 (h(v1 )), ∀v1 ∈ V1 and ρ1 v1i , v1 j( = ρ2 )h(v1i ), (h v1 j ,)∀ v(1i , v1 j )∈ E 1 . i.e.,
μ1 (v1 ) = μ2 (v2 ), ∀v1 ∈ V1 and ρ1 v1i , v1 j = ρ2 v2i , v2 j , ∀ v1i , v1 j ∈ E 1 . In
the same way, as h1 is a bijective
( map) h 2 (v(2 ) = v3 , v2(∈ V))2 satisfying
( )μ2 (v2 ) =
μ3 (h 2 (v2 )), ∀v2 ∈ V2 and ρ2 v2i , v2 j = ρ3 h 2 (v2i ), h 2 v2 j , ∀ v2i , v2 j ∈ E 2 .
From what has been discussed above we have

μ1 (v1 ) = μ2 (v2 ) = μ3 (h 2 (v2 )) = μ3 (h 2 (h 1 (v1 )))(∀v1 ∈ V1 );

( ) ( )
ρ1 v1i , v1 j = ρ2 v2i , v2 j
( ( ))
= ρ3 h 2 (v2i ), h 2 v2 j
( ( ( )))
= ρ3 h 2 (h 1 (v1i )), h 2 h 1 v1 j (∀(v1i , ∈ E 1 )).

Hence h2 ° h1 is an isomorphism between G1 and G3 , i.e., it satisfies transitivity.

3.2 Fuzzy RDF Graph 79

In conclusion, isomorphism between fuzzy RDF graphs is an equivalence relation.

The notion of graph pattern provides a simple yet intuitive specification of the
structural and semantic requirements of interest in the input graph. Graph pattern as
the basic operational unit is central to the semantics of many operations in fuzzy RDF
algebra. Essentially, a fuzzy graph pattern is a directed crisp graph with predicate on
vertexes and edges, and regular expressions that denote path over edges.

Definition 3.5 (Fuzzy RDF graph pattern). A fuzzy RDF graph pattern is a 5-tuple
P = (V P , E P , FV , FE , R E ) where

1. V P is a finite set of vertexes.

2. E P is a finite set of directed edges.
3. F V is a function defined on V P such that for a given vertex u ∈ V P , FV (u) is the
predicate applied on the value of a label of vertex u. This predicate is a Boolean
combination of atomic predicates such that each predicate compares a constant
c specified in the pattern with the value V i using a given operator θ (e.g., <, ≤,
=, ≥, >, /=). Let cj be a constant and θ j be a comparison operator, F V (u) is the
combination of atomic predicates of the form: V i θ j cj by the logical connectives
(∧, ∨, ¬).
4. F E is the counterpart of F V for edges, which is a condition (or predicate) on the
labels of the edges. This predicate is a conjunction of atomic formulas that each
of them compares a constant c specified on the pattern with the actual value of
a label of the edge e using a given comparison operator θ. The comparison is
performed using any of the following operators: <, ≤, =, ≥, >, /=.
5. RE : E P → r e(E) is a function defined on E P s. t. for each (u, v) in E P , re(E)
is a path regular expression (Fan et al., 2011) in which E is a set of composed
of U, B of the data graph G, variables and wildcard *, and it can be constructed
inductively as R ::= e|R1 |R2 |R1 |R2 |R + . Here e denotes either an edge labeled by
e or a wildcard symbol matching any label in Σ, R1 · R2 denotes a concatenation
of expressions, R1 |R2 denotes disjunction and is an alternative of expressions, R+
denotes one or more occurrences of R.

For example, a pattern graph P for the RDF graph shown in Fig. 3.1 is given in
Fig. 3.2. This pattern is applied to model information concerning actor (?p) who is
born in country1. The box office of the film (?film) that the actor starred is more than
30 million (?b > $ 30 million) and its genre is a tragedy.

Fig. 3.2 Fuzzy pattern

graph
80 3 Fuzzy RDF Modeling

3.2.2.3 Fuzzy RDF Semantics

Depending on the meaning we want to give to a certain RDF graph we will consider
different kinds of fuzzy interpretations, e.g., simple, RDF, RDFS, and D, etc. For
each one of them there will be some special semantic conditions.
Intuitively, a fuzzy interpretation will represent a possible configuration of the
world, such that we can verify whether or not what is said on a graph G is true within
the framework of fuzzy logic. This leads us to think of an RDF graph as something
which satisfies the possible world, thus providing some information.
As described in Hayes (2004) any interpretation is relative to a certain vocabulary,
so we( will in general speak) of a fuzzy interpretation of the vocabulary V. For a
triple μs /s, μ p / p, μo /o can be thought of as stating that a certain binary predicate
associated to μp /p holds for the couple (μs /s, μo /o). A fuzzy interpretation will give
us this association, and given a fuzzy RDF graph, it will be true if none of its triple
state something false within the framework of fuzzy logic.

Definition 3.6 (Fuzzy Simple Interpretation). Given a set of V, a fuzzy simple

interpretation If of V is a 7-tuple I f = (V , I r, I p, Lv, I s, I l, I ext). Here,

1. V is a fuzzy set of vocabulary and each element x ∈ V has a degree μx ∈ [0, 1]

to give an estimation of the belonging of x to V,
2. Ir is a non-empty fuzzy set of resources, called the domain or universe of I f ,
3. Ip is a finite fuzzy set of all property objects of I f ,
4. Lv is a fuzzy set of literal values,
5. Is: URIref → Ir ∪ Ip is a function, and μ I s(x) ∈ [0, 1] indicates the degree to an
object which is mapped from an x ∈ URIref via Is belonging to Ir ∪ Ip,
6. Il: L T → Ir is a function from typed literals to Ir, and μIl(x) ∈ [0, 1] indicates the
degree to a type literal x mapped via Il belonging to Ir,
7. Iext is a function from Ip to the powerset of Ir × Ir, and μ(xz)y ∈ [0, 1] indicates
the membership degree which a pair (Is(x), Is(z)) belongs to the set Iext(Is(y)),
where y and z are elements of fuzzy set V.

Given a fuzzy triple (μs /s, μp /p, μo /o), if min (μs , μp , μo , μIs(s) , μIs(p) , μIs(o) ) ≥ α
(α is a given threshold), then If (μs /s, μp /p, μo /o) = true, otherwise, If (μs /s, μp /p,
μo /o) = false. Given a set of triples S, If (S) = false if If (μs /s, μp /p, μo /o) = false
for some triple (μs /s, μp /p, μo /o) in S, otherwise If (S) = true. If satisfies S, written
as If |≈ S, if If (S) = true, in this case, we say If is a fuzzy simple interpretation of S.
Fuzzy simple interpretation, instead of associating a value in {0, 1} to each element
of the corresponding set, accepts any value in the closed unit interval [0, 1].

Definition 3.7 (Fuzzy Simple Entailment). Let S be a set of fuzzy RDF graphs, and
G a fuzzy RDF graph, then S fuzzy{ simply entails G if and only
} if for every fuzzy
simple interpretation If we have: I f | ≈ G|∀H ∈ S, I f | ≈ H . In that case we note
S|≈f G.
3.3 Fuzzy RDF Schema 81

3.3 Fuzzy RDF Schema

On the basis of RDF, RDF Schema (RDFS) is used to describe the RDF vocabulary.
RDF and RDF Schema together implement data exchange at the semantic level
of any vocabulary between different machines. Here, RDF is the part of the data
model and RDF Schema is the semantic interpretation part with additional ability
to describe resources. While in RDF the main construct is the extension, the RDF
Schema semantics is stated in terms of classes (Hayes, 2004). As a class is a resource
with a class extension, which represents a set of domain element, the definition of
class relies on the definition of extension. If an extension is a set of couples, and a
fuzzy extension is a fuzzy set of couples, fuzzy class extensions in RDF Schema are
fuzzy sets of domain’s elements.
RDF Schema has a larger vocabulary then RDF, composed of URIs in the rdfs:
namespace. The semantics is conveniently expressed in terms of classes: a class is a
resource with a class extension, which is a subset of resource. As a consequence of
this definition, a class can have itself as a member. The relation between a class and
a class member is given using the RDF vocabulary property rdf: type, and the set of
all classes is IC.
With the RDFS, classes, properties, and relationships between classes and prop-
erties can be declared. The modeling primitive rdfs: Class and rdf: Property, for
example, are applied to define classes and properties, respectively, which are the
generalization of rdfs: Resource. In addition, the modeling primitive rdf: type is
applied to state that a resource is an instance of a class. In particular, class inheritance
and property inheritance can be described by rdfs: subClassOf and rdfs: subProp-
ertyOf , respectively. Furthermore, RDFS provides rdfs: domain and rdfs: range to
constrain the domain and range of properties, respectively.
In the following, we define fuzzy RDF Schema (Fan et al., 2019) for modeling
primitives, which can organize fuzzy RDF vocabularies into hierarchies. The formal
fuzzy RDFS is given as follows:

Definition 3.8 (Fuzzy RDF Shema graph). Fuzzy RDF(S) data graph GF is
represented by a 7-tuple = {V, E, Σ, L, μ, ρ, A}. Here.

1. V is a finitude set of vertices.

2. E ⊂ Vi × V j is a set of directed edges, where V i , V j ⊂ V.
3. Σ = {IC, OP, LP, D} is a set of labels, where IC is a set of class resource labels,
OP is a set of object property resource labels, LP is a set of datatype property
resource labels, and D is a set of datatype labels.
4. L = {L V , L E } is a function assigning labels to vertices and edges, respectively.
L V : V → Σ is a function assigning labels to vertices, and L E : E → Σ is a function
assigning labels to edges.
5. μ: V → [0, 1] is a fuzzy subset of V.
6. ρ: E → [0, 1] is a fuzzy relation on fuzzy subset μ. Note that ∀vi , vj ∈ V, ρ (vi ,
vj ) ≤ μ (vi ) ∧ μ (vj ), where ∧ stands for minimum.
7. A is a set of axioms as shown in Table 3.2.
82 3 Fuzzy RDF Modeling

Table 3.2 The fuzzy relational schemas of a fuzzy relational database

Fuzzy RDFS triples Fuzzy RDFS axiom
(L(vi ) ∈ Σ. C ρ(vi ,vj )/rdfs subClassOf L(vj ) ∈ Σ. C) Fuzzy class axioms
ρ(vi ,vj )/rdfs: subClassOf (L(vi )
L(vj ))
(L(vi ) ∈ Σ. C L E (vi , vj ) ∈ Σ. LP μj /L(vj ) ∈ Σ. C) Fuzzy property axioms
(L(vi ) ∈ Σ. C ρ(vi ,vj )/L E (vi , vj ) ∈ Σ. OP L(vj ) ∈ Σ. C) DatatypeProperty (L E (vi , vj )
domain(L(vi )) range(μj /L(vj )))
ObjectPropertyρ(vi ,vj )/L E (vi , vj )
domain(L(vi )) range(L(vj )))
(L(vi ) ∈ Σ. T ρ(vi ,vj )/type L(vj ) ∈ Σ. C) Individual axioms
Individual (L(vi ) ρ(vi , vj )/type
(L(vj ))…
value (L ∈ Σ. LP, μ1 L(n1 ) ∈ Σ.
D) …
value (ρ 1 /L E' ∈ Σ. OP, L(n1 ' ) ∈
Σ.C)…

In Definition 3.8, the fuzzy RDFS data graph GF is a directed labeled graph, in
which each vertex and the directed edge is assigned a label. The set of axioms A
denotes the semantic of the fuzzy RDFS data. In this case, the labels contain the
semantic information that can be used in the set of axioms. Let vi ∈ V and vj ∈ V
be a subject vertex and an object vertex of the graph GF , and their labels be L(vi )
∈ Σ. C and L(vj ) ∈ Σ. C, respectively. If edge label L E (vi ,vj ) is rdfs: subClassOf
and the label value is ρ(vi ,vj ), the class axiom can be represented as ρ(vi ,vj )/rdfs:
subClassOf (L(vi ) L(vj )). In a similar way, the extended fuzzy RDFS graph model
can describe not only its instance information but also its structure information and
the inferred semantic data can be derived from the graph. Table 3.1 shows the fuzzy
RDFS triples and their corresponding axioms.
In addition, Definition 3.8 explicitly classifies the element Σ that is the set of
labels into four categories: class resource labels, object property resource labels,
datatype property resource labels, and datatype labels. Along the same time, a crisp
RDFS graph is simply a special case of fuzzy RDFS data graph with fuzzy values of
0 or 1 on all vertices (resp. edges).

3.4 Similarity Matching of Fuzzy RDF Graphs

Data matching is the process of bringing data from different data sources together
and comparing them to find out whether they represent the same real-world object
in a given domain (Dorneles et al., 2011). Fuzzy RDF data matching is a funda-
mental problem in the integration of fuzzy RDF data. Based on the fuzzy RDF data
model, we propose an approach for fuzzy RDF graph matching in this section. The
method computes multiple measures of similarity among graph elements: syntactic,
3.4 Similarity Matching of Fuzzy RDF Graphs 83

semantic and structural. These measures are composed in a principled manner for
graph matching. In particular, an iterative similarity function is introduced with the
consideration of structural information of fuzzy RDF graph.

3.4.1 Matching Semantics

RDF data have a natural representation in the form of labeled directed graphs, in
which vertices represent resources and values (also called literals), and edges repre-
sent semantic relationships between resources. So, RDF data matching problem has
been often addressed in terms of graph matching approach.

Definition 3.9 (Fuzzy RDF graph matching). Given two fuzzy RDF graphs GS and
GT from a given domain, the matching problem is to identify all correspondences
between graphs GS and GT representing the same real-world object. The match result
is typically represented by a set of correspondences, sometimes called a mapping.
A correspondence c = (id, E s , E t , m) interrelates two element E s and E t from
graphs GS and GT . An optional similarity degree m ∈ [0, 1] indicates the similarity
or strength of the correspondence between the two elements.

Definition 3.10 (Similarity function). Let GS and GT are two datasets, a similarity
function is defined as: Fs (s, t) → [0, 1], where (s, t) ∈ GS × GT , i.e., the function
computers a normalized value for every pair (s, t). The higher the score value, the
more similar sand tare. The advantage of using similarity functions is to deal with a
finite interval for the score values.

Definition 3.11 (Predecessor set and successor

{ (set). Let ) G }be a fuzzy RDF graph,
for any vertex v ∈ V of graph G, pr e(v) = v{' | v,( v ' ∈) E is
} the predecessor set
(i.e., forward neighbors) of v and succ(v) = v ' | v ' , v ∈ E is the successor set
(i.e., backward neighbors) of v.

For example, Fig. 3.3a, b illustrate two fragments of fuzzy RDF data graph with
some fuzzy elements, and crisp ones. The edge “pid2-has_address-addid2” associ-
ated with membership degree 0.5 in target graph Fig. 3.3a represents the fact that
the person labeled pid2, whose address label is addid2. And the possibility of the
fact is 0.5. Note that opaque labels exist as shown in Fig. 3.3b. The resource “_:”
is distinct from others, and it makes the resource name garbled. According to the
RDF specification (Manola et al., 2004), a blank vertex can be assigned an identifier
prefixed with “_:”.
With the presence of dislocated matching (Zhu et al., 2014), some vertices in an
RDF graph can be starting/ending vertices. We add the following restriction on fuzzy
RDF graph:
84 3 Fuzzy RDF Modeling

Fig. 3.3 The fuzzy RDF graphs. a Source graph; b target graph

1. There is one and only one vertex in the RDF graph that is called the home vertex,
denoted by v̂, which indicates the virtual beginning/end of all paths in an RDF
graph. We specify that the label of the home vertex is “_: H”, i.e., L(v̂) = “_:H”.
2. There are paths from the home vertex to any other vertices in fuzzy RDF graph.
That is, for each vertex v ∈ V except v̂, we add two edges (v, v̂) and (v̂, v). Thus, a
path can begin (or end
) with
( vertex
) v at all the locations where v occurs. Moreover,
we associate ρ v, v̂ ρ v̂, v = μ(v), i.e., we regard the fuzzy degree associated
with each vertex represents the possibility that the vertex exists in the graph as
the fuzzy degree of edge from home vertex to the vertex.
Matching procedure takes as input two fuzzy RDF graphs and outputs a set of
correspondences of the two graphs. Figure 3.4 illustrates an overview of the frame-
work and it has three main stages: First, the procedure is to compute a vertex-to-vertex
similarity score using different similarity functions. Label similarity functions adopt
different computation strategies to compute multiple types of vertex label similarity
scores. Structural similarity function iteratively computes similarity scores for every
vertex pair by aggregating the similarity scores of edge and immediate neighbors’
vertices. Then, we obtain the overall similarity by combining label similarity scores
and structural similarity scores. Finally, we select the potential correspondences
based on the similarity scores and include them in the alignment.
3.4 Similarity Matching of Fuzzy RDF Graphs 85

Fig. 3.4 The framework of fuzzy RDF graph matching

3.4.2 Matching Approach

3.4.2.1 Label Similarity Function

1. Syntactic Similarity

Intuitively, the element label denoting an element typically captures the most distinc-
tive characteristic of the element in the RDF graph model. The syntactic similarity
assigns a normalized similarity value to every pair (s, t) by applying the Levenshtein
distance (Levenshtein, 1966) to the name labels of s and t. Formally the syntactic
similarity sim sy (s, t) between two name labels s and t is defined as following.

L D(s.label, t.label)
sim sy (s, t) = 1 − (3.1)
max(|s.label|, |t.label|)

Here s.label and t.label denote the name label of s and t, respectively, max(|s.label|,
|t.label|) is the max length of the name string in s and t, and LD(w1 , w2 ) is the
Levenshtein distance between two words w1 and w2 .

2. Semantic Similarity

For semantic similarity, we use WordNet::Similarity package (Pedersen et al., 2004)

to get the semantic relatedness between element labels based on their linguistic
correlations. In our work, we use Jaccard similarity (Jaro, 1989) measures. In many
cases, the element labels whose relatedness is being measure are phrase or short
sentence, e.g., “house-number” and “room number” in Fig. 3.3. In these cases, we
need a new measure that computes degree for element labels expressed as a sentence
or phrase. To this end, we use a simple measure from the natural language processing.
The method takes two phrases or short sentence as input and computes a semantic
similarity as output in four steps:
Step 1: Tokenize the names of both labels. We token each label as a set of words.
Denote by s.tok as the j-th token in the name labels.
Step 2: Search the synset of each token applying WordNet. Denote by syn(w) the
WordNet synset of a word w.
86 3 Fuzzy RDF Modeling

Step 3: Compute the Semantic similarity. We use the Jaccard similarity to calculate
the Semantic similarity on the synsets of each pair of tokens.
Step 4: Return the average-max Semantic similarity as the result. The formula is
shown as follows:
1 Σ
sim se (s, t) = max( J accar d(syn(s.toki ), syn(t.tok j )))
|s.tok| i j

(3.2)

Here |s.tok| is the number of tokens in the name of s, Jaccard denotes the Jaccard
similarity between two sets, and syn(w) denotes the WordNet synset of a token w.

3.4.2.2 Structural Similarity Function

Inspired by the work in Nejati et al. (2011), we propose a structural similarity

function for matching RDF graph, in which we further take edge information into
consideration.
1. Structural Similarity Function
'
Let vs is a vertex of fuzzy RDF graph Gs , i.e. vs ∈Gs , vs ∈ pre(vs ), and es ∈ E s
is a directed edge from vertex vs to vertex vs '. Our matching method iteratively
computers a similarity degree for every vertex pair (vs , vt ) of two fuzzy RDF graphs
by aggregating the similarity degrees between the immediate neighbors of vs and
vt . By neighbors, we mean either successor or predecessor, depending on which bi-
similarity notion is being used. The method iterates until either the similarity degree
between all vertex pairs stabilize or a maximum number of iterations is reached.
If the similarity of edge es is different from the similarity of et or the fuzzy
membership ρs deviates far from the fuzzy membership ρt , the similarity degree of vs'
and vt' will have less effect on the similarity degree of vs and vt . For proper aggregation
of similarity degrees between vertices of two fuzzy RDF graphs, we further take
edge labels similarity and edge fuzzy membership into consideration. We compare
edge labels using label similarity functions defined in Sect. 3.4.2.1. These similarity
functions assign a similarity degree sim e (es , et ) to every edge labels pair (es , et ). We
use a measure of similarity of fuzzy membership degrees (Pappis & Karacapilidis,
1993), which is based on the difference as well as the sum of corresponding grades
of membership. Let ρs and ρt are the fuzzy membership degrees of edges es and et ,
respectively. The similarity of fuzzy values ρ s and ρ t is defined by

(ρs ∨ ρt ) − (ρs ∧ ρt )
sim ρ (ρs , ρt ) = 1 − (3.3)
ρs + ρt

Here (ρs ∨ ρt ) = max(ρs , ρt ) and (ρs ∧ ρt ) = min(ρs , ρt ).

We now describe the computation of forward similarity. For every vertex pair
(vs , vt ), the similarity degree Sim i (vs , vt ) is computed from (i) similarity degree
3.4 Similarity Matching of Fuzzy RDF Graphs 87

between vs and vt after step i − 1, i.e., Sim i−1 (vs , vt ); (ii) similarity degree between
( )
the forward neighbors of vs and those of vt after step i − 1, i.e., Sim i−1 vs' , vt' ;
(iii) similarity degree between the edge labels relating vs and vt to their forward
neighbors, i.e., sime (es , et ); (iv) similarity degree between the edges fuzzy values of
es and et .
To find the best match for vs among the forward neighbors ( of) vt , we need to
maximize the value sim e (es , et ) × sim Q (ρs , ρt ) × Sim i−1 vs' , vt' . The similarity
degrees between the forward neighbors of vs and their best matches among the
forward neighbors of vt after the ith iteration are computed by

1 Σ
sim i (vs , vt ) = max sim e (es , et )
| pr e(vs )| ' vt' ∈ pr e(vt )
vs ∈ pr e(vs )
( )
× sim ρ (ρs , ρt ) × Sim i−1 vs' , vt' (3.4)

And the similarity degrees between the forward neighbors of vt and their best
matches among the forward neighbors of vs after iteration i are computed by

1 Σ
sim i (vt , vs ) = max sim e (et , es )
| pr e(vt )| vs' ∈ pr e(vs )
vt' ∈ pr e(vt )
( )
× sim e (ρt , ρs ) × Sim i−1 vt' , vs' (3.5)

Note that this sim measure is asymmetric, i.e., sim i (vs , vt ) / = |sim i (vt , vs ).
In conclusion, we define the forward similarity degrees of vertex pair (vs , vt ) after
the ith iteration as follows:
(( ) )
Sim i (vs , vt ) = sim i (vs , vt ) + sim i (vt , vs ) /2 + Sim i−1 (vs , vt ) /2 (3.6)

For backward similarity degrees calculating, we perform the above formula for
vertex vs and vt , but consider their backward neighbors instead of their forward
neighbors.
2. Iterative Computation

To calculate sim(vt , vs ) from forward neighbors, we present an iteration method by

iteratively applying Formula (3.6). The computation has two phases: the initialization
phase which assigns sim 0 (vt , vs ) for every vertex pair (vt , vs ), and the iteration
phase which update the degree of sim i (vt , vs ) by using simi−1 (vt , vs ) according to
Formula (3.6), when i ≥ 1. The principle of the method is that the similarities between
two vertices must depend on the similarities between their adjacent vertices. We
summarize the procedure of iterative computation of forward similarity in Algorithm
3.1.
88 3 Fuzzy RDF Modeling

Algorithm 3.1 Structural similarity algorithm

Input: two fuzzy RDF graphs GS and GT , and a constant ε
Output: matching similarity Sim
1: for each vs ∈ V s , vt ∈ V t do
2: ρ(v̂s , vs ) ← μ(vs ) and ρ(v̂t , vt ) ← μ(vt )
3: for each es ∈ V s × {v̂s }, et ∈ V t × {v̂t } do
4: sim(es , et ) ← 1
5: for each vs ∈ V s and vt ∈ V t do
6: if (vs = v̂s ) and (vt =v̂t ) then
7: Sim0 (vs , vt ) ← 1
8: else
9: Sim0 (vs , vt ) ← 0
10: repeat
11: i ← i + 1
12: for each vs ∈ V s and vt ∈ V t do
13: Simi (vs , vt ) ← ((simi (vs , vt ) + simi (vt , vs ))/2 + Simi−1 (vs , vt ))/2
14: until|Simi (vs , vt ) − Simi-1 (vs , vt )| < ε
15: return Sim

Algorithm 3.1 is an iteration method by iteratively applying Formula (3.6). We

begin by assigning the fuzzy degree μ(v) associated with each vertex v ∈ V except ( v̂)
to edges (v, v̂) and
( ( v̂, v)
) in lines 1–2. Then we initialize the similarity sim0
ês , êt
for the edge pair ês , êt formed by the home vertices v̂s or v̂t connected with other
real vertices in lines 3–4. The similarity sim 0 (ês , êt ) is set to 1.0 since their edge
labels are inexistence. At the same time,( we initialize
) the similarity
( ) for vertex pair
in lines 5–9. For every home vertex pair v̂s , v̂t , we set sim0 v̂s , v̂t = 1.0 in line 7.
For every other vertex pair (vs , vt ), we assign the similarity degrees sim 0 (vs , yt ) =
0 as initial similarities in line 9. We further use iteration method to calculate the
matching similarity in lines 10–14. In each iteration, we update the similarity degree
for each vertex pair (vs , vt ) using the similarities of their neighbors in the previous
iteration. The value sim i (vs , yt ) is non-decrease as i increase, and the method iterates
until either the similarity degree between all vertex pairs stabilize or a maximum
number of iterations is reached. Note that, the similarities between home vertices and
real vertices are not refreshed during the iteration. Finally, we return the matching
similarity Sim as the result in line 15.
3. Convergence
Lemma 3.1: The iterative Formula (3.6) is bounded in the interval [0, 1].
Proof : This follows the analysis and the definitions of the iterative formula. It is
quite straightforward. For all vs ∈ V s , vt ∈ V t , i ≥ 1, Simi (vs , vt ) ∈ [0, 1], i.e., iterative
formula is bounded.
Lemma 3.2: The iterative Formula (3.6) is monotone non-decreasing.
3.4 Similarity Matching of Fuzzy RDF Graphs 89

Proof : This can be simply proved by mathematical induction.

Basis: Let show that the monotone holds for i = 1. If v̂s ∈ pre(vs ) and v̂t ∈ pre(vt ),
we have sim 1 (vs , vt ) ∈ [0, 1], otherwise, sim1 (vs , vt ) = 0.
Similarly, we have sim 1 (vt , vs ) ∈ [0, 1]. According to the iterative Formula (3.6),
Sim 1 (vs , vt ) ∈ [0, 1].
Since the initial similarities Sim 0 (vs , vt ) = 0, we have Sim 0 (vs , vt ) = 0. Thus
Sim 0 (vs , vt ) ≤ Sim 1 (vs , vt ). Therefore, the monotone non-decreasing holds for i =
1.
Inductive step: Assume Sim k−1 (vs , vt ) ≤ Sim k (vs , vt ) holds for i = k. According
to the above iterative formula definitions, we have

1 Σ
sim k (vs , vt ) = max sim e (es , et ) × sim ρ (ρs , ρt )
| pr e(vs )| ' vt' ∈ pr e(vt )
v ∈ pr e (v )
s s

( ) 1 Σ
× Sim k−1
vs' , vt ≤
'
max sim e (es , et )
| pr e(vs )| ' vt' ∈ pr e (vt )
vs ∈ pr e (vs )
( )
× sim ρ (ρs , ρt ) × Sim k vs' , vt' = sim k+1 (vs , vt ).

Thus, we have Sim k (vs , vt ) ≤ Sim k+1 (vs , vt ), that is, the monotone non-
decreasing holds for i = k + 1.
Since both the basis and the inductive step have been performed, by mathematical
induction, the monotone non-decreasing holds for all i ≥ 1.

Theorem 3.2 Iterative Formula (3.6) is convergence.

Proof : Based on the Lemma 3.1 and Lemma 3.2, it is obviously proved.

3.4.2.3 Combining Similarities and Alignment Extraction

In order to obtain the overall similarity degrees between vertices, we need to aggregate
the results of different similarity functions. There are several approaches to this,
including linear averages, nonlinear averages and machine learning techniques. In
our works, we use a simple approach based on linear averages. Firstly, we obtain the
label similarity (SimL ) by taking an average of the syntactic similarities (simsy ) and
semantic (simse ) similarities. Then, total similarity (Sim) is calculated by combining
the label similarity and structural similarity (SimS ). To more accurately distinguish
between similarity scores that are close to the median, we use a non-linear function,
sigmoid function s (Ehrig and Sure, 2004), to compute each similarity score. The
idea behind using a sigmoid function is quite simple: it allows reinforcing similarity
scores higher than 0.5 and to weaken those lower than 0.5. That is to say, the sigmoid
function provides high values for the best matches and lower ones for the worse
matches. This treatment is meant to clearly separate two zones: the positive and
negative correspondences. In this way, the general formula for this combination task
can be given as following:
90 3 Fuzzy RDF Modeling

Sim(vs , yt ) = ω sig(Sim L (vs , vt )) + (1 − ω)sig(Sim S (vs , vt )) (3.7)

Here ω is a pre-defined weight, sig(x) = 1+e−α(x−0.5)

1
, α being a parameter for the
slope.
To obtain a correspondence relation between input fuzzy RDF graphs, we set a
threshold δ for translating the overall similarity degrees into a binary relation. Pairs
of data with similarity degree above the threshold are included in and the rest are left
out. However, the choice of the thresholds δ is difficult: an increment of δ results in
increased matching quality (i.e., a low number of false positives), but simultaneously
reduces the matching coverage (i.e., a low number of false negatives). Similarly, a
smaller δ decreases the matching quality along with a higher matching coverage. In
practice, we expect to produce a small decrease the matching quality if it can bring
about a comparable increase the matching coverage. Because it is easier for us to
remove incorrect matches rather than find the missing ones in the process of data
matching.

3.5 Algebraic Operations in Fuzzy RDF Graphs

As evidenced by the database management systems, a formal algebra is essential

for applying database-style optimization to query processing. Similarly, along the
fuzzy RDF model as introduced in the previous chapters, the fuzzy RDF algebraic
operations should be defined for supporting fuzzy RDF queries. In this section, we
introduce a general algebraic framework for supporting imprecise and uncertain RDF
queries (Ma et al., 2018). The algebra serves as a target language for translation from
declarative user-oriented query language for fuzzy RDF. It is user-friendly and can
provide a concise representation of query execution.
In the following, we introduce several common fuzzy RDF algebraic operators
for SPARQL graph pattern: for example, union, selection, left join and projection,
because these operations can be directly applied in the UNION, FILTER, OPTIONAL
and SELECT expressions of SPARQL, respectively. In addition, we need to inves-
tigate some additional operations to deal with RDF graph model because SPARQL
queries run on RDF graph data model. To satisfy both general and RDF specific
properties, we design our algebra which can be classified into three main categories
of operations: graph-set operations, pattern-matching operations and construction
operations. Graph-set operations take a collection of graphs (a collection of vertices
or edges are the extreme cases) and perform set-theoretical operations, although some
may not have exactly identical semantics as their relational counterparts. Pattern-
matching operations are oriented to structural selection and extraction by employing
pattern graphs. Construction operations are designed to facilitate the result graph
construction for RDF queries by providing a means for creating and inserting new
vertices/edges and manipulating the extracted structures.
3.5 Algebraic Operations in Fuzzy RDF Graphs 91

3.5.1 Algebraic Operations

1. Set Operations

Set operations take a set of graphs as input and then perform set-theoretical oper-
ations. Here we identify four standard fuzzy set-graph operations, which are fuzzy
union (∪), fuzzy intersection (∩), Cartesian product (×) and fuzzy difference (−).
Fuzzy union: Let G 1 = (V1 , E 1 , Σ1 , L 1 , μ1 , ρ1 ) and G 2 =
(V2 , E 2 , Σ2 , L 2 , μ2 , ρ2 ) be two fuzzy RDF sub-graphs of G, respectively. The
fuzzy union of G1 and G2 is defined as follows.

G 1 ∪ G 2 = (Vr , Er , Σr , L r , μr , ρr )

Here Vr = V1 ∪ V2 , Er = E 1 ∪ E 2 , Σr = Σ1 ∪ Σ2 , and L r = L 1 ∪ L 2 are the

classic set theoretical union, μr and ρ r are the membership degree of fuzzy union
(Sunitha, 2001) result,
⎧ respectively. Here
⎨ μ1 (v), ∀v ∈ V1 − V2
μr (v) = μ2 (v), ∀v ∈ V2 − V1 ,
⎩
μ1 (v) ∨ μ2 (v), ∀v ∈ V1 ∩ V2
⎧
( ) ⎨ ρ 1 (vi , v j ), ∀(vi , v j ) ∈ E 1 − E 2
ρr vi , v j = ρ 2 (vi , v j ), ∀(vi , v j ) ∈ E 2 − E 1 , and
⎩
ρ 1 (vi , v j ) ∨ ρ 2 (vi , v j ), ∀(vi , v j ) ∈ E 1 ∩ E 2
a ∨ b denoted the maximum of a and b (i.e., a ∨ b = max (a, b)).
For example, we apply fuzzy union operation to the fuzzy RDF graphs shown in
Figs. 3.5a and 3.1. Then we have the result of the union operation shown in Fig. 3.5b.
Fuzzy intersection: Let G 1 = (V1 , E 1 , Σ1 , L 1 , μ1 , ρ1 ) and G 2 =
(V2 , E 2 , Σ2 , L 2 , μ2 , ρ2 ) be two fuzzy RDF sub-graphs of G, respectively. The
fuzzy intersection of G1 and G2 is defined as follows.

G 1 ∩ G 2 = (Vr , Er , Σr , L r , μr , ρr )

Here Vr = V1 ∩ V2 , Er = E 1 ∩ E 2 , Σr = Σ1 ∩ Σ2 , L r = L 1 ∩ L 2 with
the( classic
) set theoretical
( ) ( μr (v)) = μ1 (v) ∧ μ2 (v), ∀v ∈ V1 ∩ V2 and
) intersection,
(
ρx vi , v j = ρ1 vi , v j ∧ ρ2 vi , v j , ∀ vi , v j ∈ E 1 ∩ E 2 are the membership degree
of fuzzy intersection (Sunitha, 2001) result respectively, and a ∧ b denoted the
minimum of a and b, i.e., a ∧ b = min(a, b).
For example, we apply a fuzzy intersection operation to the fuzzy RDF graphs
in Figs. 3.5a and 3.1. Then we get the result of the intersection operation shown in
Fig. 3.6.
92 3 Fuzzy RDF Modeling

Fig. 3.5 Fuzzy union operation

Fig. 3.6 Fuzzy intersection

operation

Fuzzy Cartesian product: Let G 1 = (V1 , E 1 , Σ1 , L 1 , μ1 , ρ1 ) and G 2 =

(V2 , E 2 , Σ2 , L 2 , μ2 , ρ2 ) be two fuzzy RDF sub-graphs of G, respectively. Then the
fuzzy Cartesian product (Sunitha, 2001) of G1 and G2 is defined as follows.

G 1 × G 2 = (Vr , Er , Σr , L r , μr , ρr )
3.5 Algebraic Operations in Fuzzy RDF Graphs 93

Fig. 3.7 Fuzzy Cartesian product operation

Here

Vr = V1 × V2 , Er = {(u, u 2 )(u, v2 )|u ∈ V1 , u 2 v2 ∈ E 2 }∪

{(u 1 , w)(v1 , w) |w ∈ V2 , u1v1 ∈ E 1 }, μr (u 1 , u 2 )
= (μ1 × μ2 )(u 1 , u 2 ) = μ1 (u 1 ) ∧ μ2 (u 2 ), ∀(u 1 , u 2 ) ∈ V

, and

ρ1 ρ2 ((u, u 2 )(u, v2 ) = μ1 (u) ∧ ρ2 (u 2 v2 ), ∀u ∈ V1 , ∀u 2 v2 ∈ E 2

ρr =
ρ1 ρ2 ((u 1 , w)(v1 , w) = μ2 (w) ∧ ρ1 (u 1 v1 ), ∀w ∈ V2 , ∀u 1 v1 ∈ E 1

In the above definitions, an edge between two vertices u and v is denoted by uv

rather than (u, v), because in the Cartesian product of two graphs, a vertex of the
graph itself is an ordered pair.
For example, for two simple fuzzy RDF graphs G and G' in Fig. 3.7a–c shows
the result of fuzzy Cartesian product of G and G'.
Fuzzy difference: Let G 1 = (V1 , E 1 , Σ1 , L 1 , μ1 , ρ1 ) and G 2 =
(V2 , E 2 , Σ2 , L 2 , μ2 , ρ2 ) be two fuzzy RDF sub-graphs of G, respectively, Then the
fuzzy difference (Sunitha, 2001) of G1 and G2 is defined as follows.

G 1 − G 2 = (Vr , Er , Σr , L r , μr , ρr )

Here E K = E 1 − E 2 with the classic set theoretical difference, V r consists

precisely of those vertices
( which
) are
( induced
) ( by the ) set of edges in E r , μr (v) =
μ1 (v), ∀v ∈ V r and ρr vi , v j = ρ1 vi , v j , ∀ vi , v j ∈ E 1 − E 2 .
Actually, the fuzzy difference of G1 and G2 defines a new fuzzy RDF graph which
is formed by removing the edges of G2 from the edges of G1 . Note that G1 − G2 is
different from G2 − G1 .
94 3 Fuzzy RDF Modeling

Fig. 3.8 Fuzzy difference operation

For example, Fig. 3.8 shows the result of fuzzy difference operation G1 − G2 ,
where the graph G1 is the fuzzy RDF graph in Fig. 3.1, and graph G2 is the fuzzy
RDF graph of Fig. 3.5a.

Theorem 3.3 Let G 1 = (V1 , E 1 , Σ1 , L 1 , μ1 , ρ1 ) and G 2 =

, , Σ ,
(V2 2 2 2 2 2 )
E L , μ , ρ be two fuzzy RDF graphs, respectively. Then we have
the followings.

1. G1 ∪ G2 is a fuzzy RDF graph.

2. G1 ∩ G2 is a fuzzy RDF graph.
3. G1 × G2 is a fuzzy RDF graph.
4. G1 − G2 is a fuzzy RDF graph.

Let G 1 = (V1 , E 1 , Σ1 , L 1 , μ1 , ρ1 ), G 2 = (V2 , E 2 , Σ2 , L 2 , μ2 , ρ2 ), …, G i , =

(V i , E i , Σi , , L i , μi , ρi , ) be two fuzzy RDF graphs, respectively. Then we say G =
(V, E, Σ, L, μ, ρ) is the reconstruction of G1 , G2 , …, Gi if G comes from G1 , G2 ,
…, Gi based on Theorem 3.3.

2. Fuzzy Selection Operation

The fuzzy selection operation can filter the fuzzy graphs using a graph pattern. It
accepts a set of fuzzy graphs and a fuzzy graph pattern as input. The output is a fuzzy
collection composed of all subgraphs that match the given graph pattern, which is
not only the content of the right result but also the structure of objective graphs.
Fuzzy selection: Let G = (V , E, Σ, L , μ, ρ) be a fuzzy RDF data graph. For a
given RDF graph pattern P = (V P , E P , FV , FE , R E ), we have the definition of
fuzzy selection as follows.

σ P (G) = {<g, δ P (g)>|g = ∈(P, G), δ P (g) > 0}

3.5 Algebraic Operations in Fuzzy RDF Graphs 95

Fig. 3.9 The result graph of fuzzy selection operation

Here g is a subgraph of G, function ∈(P, G) is used for fuzzy RDF graph pattern P
matching with G, and δ P (g) is the satisfaction degrees. In case of duplicates (a same
graph appearing with several satisfaction degrees), the highest satisfaction degree is
kept.
For example, Fig. 3.9 is the answer of σ P (G) where P is the RDF graph pattern of
Fig. 3.2 and G is the fuzzy data graph of Fig. 3.1. From the graph, the box office of
the film labeled Film1 is over 3.5 billion and its genre is tragedy. Two people labeled
pid1 and pid2 respectively are the stars of the film, and they are born in counrty1.
Furthermore, the path going from pid1/pid2 to country1 satisfies the regular express
RE = “* · locateIn+ ”. Thus, there are two answers (Fig. 3.9a, b) matching the graph
pattern P in the fuzzy data graph G. As the satisfaction degree is the minimum of
satisfaction degrees induced by Definition 3.4, we have δ P (g1 ) = 0.7 in Fig. 3.9a and
δ P (g2 ) = 0.3 in Fig. 3.9b, respectively.

3. Fuzzy Projection Operation

Selection and projection are orthogonal operations in relational algebra. With RDF
graphs, selection and projection are not so obviously orthogonal. However, they
have different semantics that respectively correspond to two returned semantics for
matching pattern P against a fuzzy RDF graph G operation, and they are general-
izations of their respective relational counterparts. The fuzzy projection in our data
model takes a collection fuzzy graph as input, an RDF graph pattern P and a projec-
tion list PL as parameters. A projection list is a list of objects (vertices and edges)
labels appearing in the pattern P, possibly adorned with *. The output of projection
includes the all objects appearing in the P, however emphasizing that the (partial)
hierarchical relationship among the retained objects in the original input graph struc-
ture is preserved. Note that, if this projection list is empty, just the matching graphs
are returned. This implies that the fuzzy projection may be regarded as eliminating
96 3 Fuzzy RDF Modeling

Fig. 3.10 The result graph of fuzzy projection operation

objects other than specified in the fuzzy RDF data graph. The projection operation
is defined as follows.
Fuzzy projection: Let G = (V , E, Σ, L , μ, ρ) be a fuzzy RDF data graph, is a
fuzzy projection function and P is an RDF graph pattern. Then the fuzzy projection
can be defined as follows.

π P,P L (G) = {<g, δT (g)>|g = (P, P L , G), δT (g) > 0}

The result of the projection operation is a fuzzy set of graphs, and δT (g) is the
satisfaction degrees. The fuzzy projection operation returns a fuzzy set composed of
all subgraphs of G that match the fuzzy graph pattern P.
For example, we apply the same pattern graph of Fig. 3.2 and a projection to the
fuzzy RDF graph of Fig. 3.1. Then we obtain the result of the projection operation
shown in Fig. 3.10. The satisfaction degree δT (g) is 0.3. The difference in the output
structures of selection and projection operations is obvious.

4. Fuzzy Join Operation

The fuzzy join operation joins data graphs on a pattern. As in relational algebra, join
can be expressed as a Cartesian product followed by a fuzzy selection. The condition
of selection is to compare a property of the first graph with the other graph. In a valued
join, the join condition is a predicate on vertex labels of the constituent graphs. In a
structural join, the constituent graphs can be concatenated by edges or unification.
Fuzzy join: Let G1 and G2 be two fuzzy RDF graphs and P be an RDF graph pattern.
Then the fuzzy join operation is defined as follows.

G 1 ▷ P G 2 = {g|g = σ P (G 1 × G 2 )}

Here P is to be matched against (G 1 × G 2 ), at least one predicate f in the F V of

P is L(v1 ) = L(v2 ), here v1 matches vertices in G1 , and v2 matches vertices in G2 .
That is, L(v1 ) refers to a vertex label in G1 and L(v2 ) to one in G2 .
The left join of the above expressions is defined as G1 ⟕P G2 , which has the
following semantics: P1 and P2 are the two parts in P that are matched against G1
3.5 Algebraic Operations in Fuzzy RDF Graphs 97

and G2 respectively, if no matching graph G '2 obtained from σ P2 (G 2 ) satisfies the join
condition L(v1 ) = L(v2 ), then output just σ P1 (G 1 ); otherwise, output σ P (G 1 × G 2 ).
5. Construction Operations

Querying a fuzzy RDF graph implies not only extracting interesting content from the
input model but also constructing an output model by inserting new vertices/edges
or by deleting vertices/edges from the extracted graph. Construction operations are
designed to facilitate the result graph construction for RDF queries.
The vertex deletion operation removes identify vertices from a graph. A delete
specification is used to identify vertices, and it indicates by vertex label which vertices
to delete.
Vertex deletion: Formally, the delete operation takes a fuzzy data graph G =
(V , E, Σ, L , μ, ρ) as input and a delete specification DS as parameter. A delete
specification is a set of vertices labels appearing in G. It generates a fuzzy graph
defined as follows:
{ ( )}
K (G, DS) = g|g = V ' , E ' , Σ, L , μ, ρ

Here V ' = {v|v ∈ V and L(v) ∈/ DS} and E ' is the restriction of E over V ' × V ' .
Edge deletion has the same idea with vertex deletion. It removes the relationship
from an RDF graph.
Edge deletion: Edge deletion operation takes as input a fuzzy graph G, a set of edge
labels ES, it returns a fuzzy graph defined as follows:
{ ( )}
λ(G, E S) = g|g = V, E ' , Σ, L , μ, ρ

Here E ' = {e|e ∈ E and L(e) ∈ / DS}.

Note that, vertex deletion is very similar to projection. In fact, it can be viewed as
projection with a complemented projection list, specifying vertices to be eliminated
rather than vertices to be retained.
The vertex insertion operation possibly adds a new vertex to the fuzzy RDF data
graph. The type of the new vertex is a resource, blank or literal, and the label of
the new vertex is a URIs if the vertex represents a resource or a string if the vertex
represents a literal.
Vertex insertion: Let G be a fuzzy RDF graph, IS be an insert specification which is
a set of vertices labels and δ be fuzzy degree of the insert vertex. The vertex insertion
operation returns a fuzzy graph including the inserting vertices.
( )
Φ(G, I S) = (g|g = V ' , E, Σ ' , L , μ, ρ }
{ ( ) ( ) }
Here V ' = V ∪ v ' |L v ' ∈ I S and μ v ' = δ and Σ ' = Σ ∪ I S.
The edge insertion operation adds a new property edge to connect subject and
object in the RDF data graph.
98 3 Fuzzy RDF Modeling

Edge insertion: Let G be a fuzzy RDF graph, ES be the edges labels and δ be
fuzzy degree of the insert edges. The edge insertion operation returns a fuzzy graph
including the inserting edges.
( )
φ(G, E S) = (g|g = V , E ' , Σ ' , L , μ, ρ }
{ ( ) ( ) }
Here E ' = E ∪ e' |L e' ∈ E S and ρ e' = δ and Σ ' = Σ ∪ E S.

3.5.2 Equivalences

Equivalence laws can be applied to rewrite algebra expressions in a form that satis-
fies certain needs. In this section, we present some algebraic equivalences based on
data graph isomorphism. Algebraic laws are important for query optimization. Since
our RDF graph algebra shares some operations with relational algebra and there-
fore related properties and laws defined in relational algebra carry along. We focus
here on graph patterns properties that are unique to our algebra. First, we define an
equivalence relationship between graph patterns.

Definition 3.12 (Equivalence of graph patterns). Let P1 and P2 be two graph pattern
expressions. For any valuation ξ of P1 and P2 over G, it holds that ξ (P1 ) = ξ (P2 ).
Then the two graph pattern expressions P1 and P2 are equivalent, denoted by P1 ≡ P2 .
There are some properties for the fuzzy RDF algebra. For

Proposition 3.1 (Commutativity of ∪, ∩, ×, ⨝). If the operator is one of the

operators ∪, ∩, ×, and ⨝ then
1. G1 ∪ G2 = G2 ∪ G1
2. G1 ∩ G2 = G2 ∩ G1
3. G1 × G2 = G2 × G1
4. G 1 ▷G 2 = G 2 ▷G 1

Proposition 3.2 (Associativity of ∪, ∩, ×, ⨝). If operator is one of the operators

∪, ∩, ×, and ⨝, then
1. (G 1 ∪ G 2 ) ∪ G 3 = G 1 ∪ (G 2 ∪ G 3 )
2. (G 1 ∩ G 2 ) ∩ G 3 = G 1 ∩ (G 2 ∩ G 3 )
3. (G 1 × G 2 ) × G 3 = G 1 × (G 2 × G 3 )
4. (G 1 ▷G 2 )▷G 3 = G 1 ▷(G 2 ▷G 3 )

Proposition 3.3 (Commutativity of σ with ∪, −, ⨝, ⟕). Let G1 , G2 be fuzzy data

graphs, P be the RDF graph pattern. Then we have
1. σ P (G 1 ∪ G 2 ) = σ P (G 1 ) ∪ σ P (G 2 )
2. σ P (G 1 − G 2 ) = σ P (G 1 ) − σ P (G 2 )
3. σ P (G 1 ▷G 2 ) = σ P (G 1 )▷G 2
3.5 Algebraic Operations in Fuzzy RDF Graphs 99

4. σP (G1 ⟕ G2 ) = σP (G1 ) ⟕ G2

Proposition 3.4 (Commutativity of π with ∪). Let G1 , G2 be fuzzy data graphs, P1

and P2 be the RDF graph pattern. Then we have
1. π P (G 1 ∪ G 2 ) = π P (G 1 ) ∪ π P (G 2 )
2. π P1 (π P2 (G)) = π P1 ∩P2 (G)

Proposition 3.5 (Distributivity of ∪ with ⨝, ⟕). Let G1 , G2 and G3 be fuzzy data

graph. Then we have
1. (G1 ⨝ (G2 ∪ G3 )) ≡ ((G1 ⨝ G2 ) ∪ (G1 ⨝ G3 ))
2. (G1 ⟕ (G2 ∪ G3 )) ≡ ((G1 ⟕ G2 ) ∪ (G1 ⟕ G3 ))
3. ((G1 ∪ G2 ) ⟕ G3 ) ≡ ((G1 ⟕ G3 ) ∪ (G2 ⟕ G3 ))

Proposition 3.6 (Decomposition and elimination). Let G be fuzzy data graphs, P1

and P2 be the RDF graph pattern. Then we have
1. σ P1 ∧P2 (G) = σ P1 (σ P2 (G))
2. σ P1 ∨P2 (G) = σ P1 (G) ∪ σ P2 (G)
3. σ P1 (σ P2 (G)) = σ P2 (σ P1 (G))

The list above is not comprehensive by any means. Further study of other algebraic
properties of RDF graph patterns is part of our current research focuses. We believe
that studying these algebraic properties can yield fruitful results that can further be
implemented in tasks like caching RDF query results, views management and query
results reuse.

3.5.3 Relationship of SPARQL and the Algebraic Operations

In order to meet the needs of practical application, it is not enough for modeling fuzzy
RDF and querying fuzzy RDF is very necessary. This section investigates the fuzzy
RDF query processing according to the definitions of fuzzy RDF algebraic operations
presented above. We begin with the description of the characters of the SPARQL
query language in the fuzzy RDF and then explain the translation of SPARQL queries
into equivalent RDF algebraic expressions.
1. SPARQL Query in the Fuzzy RDF
SPARQL (Prud’hommeaux & Seaborne, 2008) is a proposal of a protocol and query
language designed for easy access to RDF format datasets. It defines a query language
with a SQL-like syntax, including joins and the capability to retrieve and combine
data from several graphs, where a simple query is based on graph patterns, and query
processing consists of the binding of variables to generate pattern solutions. SPARQL
comes with a powerful graph matching facility, whose basic construct are so-called
triple patterns. On top of that, SPARQL provides a number of advanced functions
100 3 Fuzzy RDF Modeling

for constructing more expressive queries, for stating additional filtering conditions,
and for formatting the final output.
The overall structure of the query language resembles SQL with its three major
parts, denoted by the upper-case key words SELECT, FROM, and WHERE.
1. The key word SELECT determines the result specification including solution
modifiers. The statements after SELECT refer to the remainder of the query: the
listed names are identifiers of variables for which return values are to be retrieved.
In contrast to SQL, SPARQL allows several forms of returning the data: a table
using SELECT, a graph using DESCRIBE or CONSTRUCT, or a TRUE/FALSE
answer using ASK.
2. The key word FROM specifies a dataset of one default graph and zero or more
named graphs to be queried.
3. The key word WHERE initiates the actual query, in which is composed of a graph
pattern. Informally speaking, this clause is given by a pattern that corresponds
to an RDF graph where some resources have been replaced by variables. But not
only that, more complex patterns are also allowed, which are formed by using
some algebraic operators. This pattern is used as a filter of the values of the
dataset to be returned.
Classical SPARQL query suffers from a lack of query flexibility. The given query
condition and the contents of the RDF repositories are all crisp. In this context, a
query answer will either definitely or definitely not satisfy the condition. In the fuzzy
RDF repositories, however, an answer may satisfy the query condition with a certain
possibility and a certain membership degree even if the condition is crisp due to the
fact that datasets are vagueness (or imprecision). Therefore, just like the definition of
fuzzy selection operation given above, one needs to compute appropriate trustworthi-
ness for the query results when fuzzy data are transformed through SPARQL queries.
Thus, we introduce one additional expression “WITH <threshold>”. The optional
parameter [WITH <threshold>] indicates the condition that must be satisfied as
the minimum membership degree threshold in [0, 1]. Users choose an appropriate
value of <threshold> to express his/her requirement. Therefore, a canonical SPARQL
statement is of the form: SELECT—FROM—WHERE—[WITH <threshold>].
Utilizing such SPARQL, one can get such answers that satisfy the given query
condition and the given threshold. Therefore, depending on the different thresholds
that are values in [0, 1], the same query for the same fuzzy RDF may have different
query answers. Queries for fuzzy RDF databases are concerned with the numerous
choices of threshold. Note that the item WITH <threshold> can be omitted. The
default of <threshold> is exactly 1 at this moment.
2. Translating SPARQL Pattern into Fuzzy RDF Algebraic Formalism
A principal motivation in designing fuzzy RDF graph model is to use it as a basis for
efficient implementation of high level RDF query language. As the standard query
language for RDF, SPARQL allows us to build complex group graph pattern. Group
patterns can be used to restrict the scope of query conditions to certain parts of the
pattern. Moreover, it is possible to define sub-patterns as being optional, or to provide
3.5 Algebraic Operations in Fuzzy RDF Graphs 101

Table 3.3 The translation

Original SPARQL syntax Algebraic Syntax
rules of a SPARQL query
pattern into RDF graph {t} (t)
algebra expressions {P1 } OPTIONAL {P2 } P1 ⟕ P2
{P1 } UNION {P2 } P1 ∪ P2
{P1 . P2 } P1 ⨝ P2
{P FILTER R} σR (P)

multiple alternative patterns. In this section, we begin with the expressive power of
fuzzy RDF algebra w.r.t the core fragment of SPARQL query languages. Then,
we show that every SPARQL query pattern can be translated into our fuzzy RDF
algebraic terminology introduced above, and provide the procedure that performs
this translation.
Our fuzzy RDF algebra is designed taking SPARQL’s power of expression into
consideration. SPARQL pattern expressions from the WHERE clause can easily be
translated into fuzzy RDF algebraic expressions. The vice versa translation is not
always possible as there are fuzzy RDF algebra expressions (e.g. expressions with
construction operations) that are not expressible in SPARQL. Before providing the
procedure that performs this translation, we discuss the translation rules of SPARQL
pattern into RDF algebra expression. We do not recall the complete surface syntax
of SPARQL here but simply introduce the underlying algebraic operations using our
notation. Let G be a RDF graph over a RDF dataset D, t denotes a triple pattern,
P, P1 , P2 be basic SPARQL graph patterns, and R a filter condition, and S a set of
variables. Table 3.3 shows the translation rules of SPARQL query mode and RDF
algebraic expression.
An SPARQL query pattern is either a basic graph pattern or group graph pattern,
consisting of the triple blocks, FILTER, OPTIONAL, and UNION graph pattern.
Some of them contain other graph patterns. The above translation is applied to a single
SPARQL group graph pattern. Nested group graph pattern blocks in the WHERE
clauses can be handled quite easily, leading to the following result:

Theorem 3.4 Fuzzy RDF algebra expressions can express SPARQL query patterns.

Proof: SPARQL individual triple patterns can be expressed by “triple pattern match-
ing” expressions. Basic graph patterns in SPARQL imply a join on common variables
among individual triple patterns. The UNION, FILTER, OPTIONAL and SELECT
expressions can be directly mapped to “union”, “selection” “leftjoin” and “projec-
tion” operators in fuzzy RDF algebra operations. These identified pattern expressions
in the nesting sequence, inside out, and then can be expressed by a cascade of “join”
operator in the same way that natural join is defined in relational algebra.

Besides the conversion rules as such, it is of course also necessary to define how to
transform SPARQL queries into expressions of this algebra in the first place. Based on
the above translation rules and Theorem 3.4, we can transform any SPARQL patterns
into algebra expression. For the sake of readability, we assume that the translation
102 3 Fuzzy RDF Modeling

of triple blocks (basic graph pattern) is given (this translation is straightforward). In

Algorithm 3.2, we show a transformation function Translate (G) of patterns in the
SPARQL syntax into the algebraic formalism presented in Sect. 3.5.

Algorithm 3.2 Transformation of SPARQL pattern syntax into fuzzy RDF algebraic
expression
Translate (group graph pattern G)
Input: a SPARQL pattern G
Output: an algebraic expression A
1: A= φ; F = φ
2: for each syntactic form g in G do
3: if g is triple pattern t then
4: A = (A ⨝ (t))
5: if g is OPTIONAL {P} then
6: A = (A ⟕ Translate (P))
7: if g is {P1 } UNION…UNION {Pn } then
8: if n>1 then
9: A' = (Translate (P1 ) ∪…∪ Translate (Pn ))
10: else
11: A' = Translate (P1 )
12: A = (A ⨝ A' )
13: if g is FILTER {R} then
14: F = F ∧ {R}
15: end for
16: if F /= φ then
17: A = σ F(A)

Algorithm 3.2 consists of three phases. In the first phase (Lines 1), initially the set
A and F are empty, in which store pattern and filtering conditions respectively. In the
second phase (Lines 2–15), translating is performed to obtain all algebraic expression
of g in group graph pattern G. For each loop of translating, if sub pattern g is a
triple pattern or triple block, a join operation is performed for collecting triples and
blocks (Line 3–4). Then, for each sub pattern g with Optional, a left join operation
is performed to provide optional matching (Lines 5–6). Next, all occurrences of
UNION are expressed using the binary operator union for specifying alternatives
(Lines 7–12). In case of a longer chain of alternatives, the patterns are processed
two at a time in accordance with the association rules for UNION. Finally, if g is an
operator FILTER, and R is a SPARQL built-in condition, a conjunction operator is
performed for combining filter condition R and F as basic constraints (Lines 13–14).
This procedure is repeated until all sub patterns in G have been translated. If F is not
empty, combine it with A with the selection operator in fuzzy RDF algebra operations
(Lines 16–17).
In Algorithm 3.2, we focus on the core fragment of SPARQL query pattern and,
thus, we impose the following restrictions on graph patterns and the translation
process. First, we will be mainly focused on the procedure that performs this transla-
tion of SPARQL patterns, that is, we do not take into account the solution modifiers
and the output of a SPARQL query. Second, we are not considering blank vertices.
3.5 Algebraic Operations in Fuzzy RDF Graphs 103

We make this simplification here to concentrate on the pattern matching part of the
language. And third, we concentrate on the set semantics of graph patterns.

Proposition 3.7 Algorithm 3.2 is correct and complete for translating the SPARQL
pattern into RDF algebraic expressions.

This proposition can be proved inductively. First, the set of algebraic expressions
is complete for the empty set, and at each step the SPARQL graph pattern G is
completely extended for the current syntactic form f , and the number of syntactic
forms being finite in SPARQL graph pattern. The algorithm proceeds recursively
until all syntax forms have been translated into algebraic expressions completely.
The procedure ends having an algebraic expression for each syntactic form in G.
In essence, a SPARQL query is a result constructor wrapped in a set of vari-
able bindings generated by the graph pattern. Therefore, the final step of translation
work generates the operators for the result type. The official W3C Recommenda-
tion (Prud’hommeaux & Seaborne, 2008) defines four different types of queries on
top of expressions, namely SELECT, ASK, CONSTRUCT, and DESCRIBE queries.
Depending to the result type of the query, the translation creates the appropriate oper-
ator and connects it to the algebraic expression generated so far, i.e., it constructs a
dataflow to the algebraic expression representing the graph pattern. We will restrict
our discussion to SELECT queries in Example 9. The various expressive features of
a SELECT query can be successively replaced by an expression using fuzzy RDF
algebra operators. We do not recall the complete surface syntax of SPARQL here but
simply introduce the underlying algebraic operations using our notation.
In the following, we show how a fuzzy RDF algebraic expression is used to
represent an SPARQL query. For convenience, we firstly use the natural language
to express the fuzzy RDF queries. Then, we provide the SPARQL query statement
written according to the official SPARQL syntax along with their equivalent RDF
algebraic expression.
For example, suppose that we would query the name of a movie and its starring
name. The movie’ box office is more than more than 30 million. The birthplace of
the starring is “region1” and, optionally (i.e., if available), their partner. At the same
time, trustworthiness for the query result is more than 0.2. Consider the SPARQL
query written according to the official SPARQL syntax.
PREFIX ex: <http://example.org/>
SELECT ?x, ?z, ?p
FROM G
WHERE { ?x ex: box office ?y
FILTER (?y > $ 30 million)
?x ex: starring ?z.
?z ex: bornIn ?c.
?c ex: locateIn ?r
FILTER (?r = “region1”)
OPTIONAL {?z ex: marriedTo ?p } }
WITH <0.2>
104 3 Fuzzy RDF Modeling

Following the grammar of SPARQL, the above pattern (WHERE clause) is parsed
as a single group graph pattern that contains the syntactic forms triple block, filter,
triple blocks, filter, and optional graph pattern in that order. This final optional graph
pattern syntactic form contains a group graph pattern with a single triple block
syntactic form. The translation procedure in Algorithm 3.1 starts with A = {} and F
= {}. Then we consider all the syntactic forms in the pattern to obtain:

A = (({}▷ T ranslate(t1 )▷T ranslate (t2 ))▷T ranslate(gp1 )

F = ((?y > “$ 30 million”) ∧ (?r = “region1”))

Here t 1 is ?x ex: box office ?y, t 2 is ?x ex: starring ?z. ?z ex: born in ?c. ?c
ex: locateIn ?r, and gp1 is { ?z:marriedTo ?p}. The translations Translate (t 1 ) and
Translate (t 2 ) are simply {(?x ex: box office ?y)} and {(?x ex: starring ?z), (?z ex: born
in ?c), (?c ex: locateIn ?r)}, respectively. To compute Translate (gp1 ) the algorithm
proceeds recursively and gives as output the pattern:
A' = ({} ⨝ (?z:marriedTo ?p))
Finally, the graph pattern of the query in the algebraic syntax is:

P = σ F (A)

Here A = (({} ⨝ {(?x ex: box office ?y)} ⨝ {(?x ex: starring ?z), (?z ex: born
in ?c), (?c ex: locateIn ?r)}) ⟕ ({} ⨝ {(?z:marriedTo ?p)}) and F = ((?y > “$ 30
million”) ∧ (?r = “region1”)). Assume that the input RDF graph G is given in Fig. 1.
Then the above SPARQL query evaluated on the fuzzy RDF graph G is equivalent
to the RDF algebraic expression:

π P,L S (G)

Here P = σ F (A) is the pattern graph, LS = {?x, ?z, ?p} is the projection list and
G is the input RDF graph. It is easily verified that answers are as follows.
πP, LS (G) = {<{?x → film1, ?z → pid1}, 0.7>, <{?x → film1, ?z → pid2, ?p →
pid3}, 0.3>}.
Similar translations are also feasible for other SPARQL query types. The main
challenge of SPARQL query translation to the algebraic expression lies in the core
fragment of the query pattern, which is common to all query types. We will briefly
introduce the translation process corresponding to different SPARQL query types.
A CONSTRUCT query can copy existing triples from a dataset, or can create
new triples. For the former case, the triple graph (result graph) can be directly
retrieved from the data source by the selection operation. For the latter case, the
intermediate graph can be extracted by the selection operation firstly. And then the
required triples can be extracted by the projection operation. Finally, construction
operations are designed to facilitate the result graph construction for RDF queries by
providing a means for creating and inserting new vertices/edges and manipulating the
3.6 Summary 105

extracted structures. Of course, this process may be repeated using multiple construc-
tion operations to complete. And the specific number of construction and complexity
determined by the size of the query problem.
ASK asks a query processor whether a given graph pattern describes a set of triples
in a given dataset or not, and the processor returns a boolean true or false depended
on whether there is a result graph. We can use a selection operation to extract a result
graph from a specific data source, based on a given graph pattern.
In a DESCRIBE query, it takes each of the resources identified in a solution,
together with any resources directly named by IRI, and assembles a single RDF graph
by taking a “description” which can come from any information available including
the target RDF dataset. It is worth noting that the description is determined by the
SPARQL query processor, according to the SPARQL 1.1 specification. This has led
to inconsistent implementations of DESCRIBE queries. In our solution, similar to
the CONSTRUCT query, the query pattern is utilized to create a result set. And
selection and projection operations are designed to return an RDF graph describing
a set of IRIs and the resources that are bound to given variable names, i.e., it returns
all the triples in the dataset involving these resources. Finally, the result RDF graph
is obtained through the construction operation.

3.6 Summary

Incorporation of fuzzy information in data model has been an important topic of

database community because such information extensively exists in real-world appli-
cations, in which fuzzy data play an import role in nature. Classical RDF model cannot
satisfy the need for modeling and processing fuzzy information. Therefore, topics
related to the modeling of fuzzy data are considered very interesting in the RDF
data context. In this chapter, we address the need for considering fuzzy data as part
of the RDF data model. We propose a fuzzy RDF graph data model to manipulate
the fuzzy information in RDF. We extend the ability of RDF that represents fuzzy
information without changing the current RDF standard. We also introduce a fuzzy
algebra based on a fuzzy RDF data model, which incorporates fuzzy information into
query answering. This algebra consists of a family of operations that make it possible
to express the data content and the structure of the fuzzy RDF graph. Moreover, we
discuss how to use our algebra to capture queries expressed in popular SPARQL
query languages. We investigate translation theorems and give the form of fuzzy
querying with SPARQL.
In order to meet the needs of practical application, just providing the modelling
technology of fuzzy RDF is not enough, Fuzzy RDF data management is also very
necessary. RDF data management, especially fuzzy RDF data management, typically
faces two primary technical challenges, which are scalable storage and efficient
queries of RDF data. Among these two issues, RDF data storage is the infrastructure
of RDF data management. It is also true that fuzzy RDF data storage is very important
106 3 Fuzzy RDF Modeling

for fuzzy RDF data management. How to store RDF with imprecise or uncertain
information has raised certain concerns as will be introduced in the following chapter.

References

Chen, L., Gupta, A., & Kurul, M. E. (2005). A semantic-aware RDF query algebra. In Proceedings
of the International Conference on Management of Data, Hyderabad, India.
Dividino, R., Sizov, S., Staab, S., & Schueler, B. (2009). Querying for provenance, trust, uncertainty
and other Meta knowledge in RDF. Journal of Web Semantics: Science, Services and Agents on
the World Wide Web, 7(3), 204–219.
Dorneles, C. F., Gonçalves, R., & dos Santos Mello, R. (2011). Approximate data instance matching:
A survey. Knowledge and Information Systems, 27(1), 1–21.
Ehrig, M., & Sure, Y. (2004). Ontology mapping—An integrated approach. In European Semantic
Web Symposium (pp. 76–91). Springer.
Fan, T., Yan, L., & Ma, Z. (2019). Mapping fuzzy RDF(S) into fuzzy object-oriented databases.
International Journal of Intelligent Systems, 34(10), 2607–2632.
Fan, W., Li, J., Ma, S., Tang, N., & Wu, Y. (2011). Adding regular expressions to graph reacha-
bility and pattern queries. In Proceedings of the 27th IEEE International Conference on Data
Engineering, Hannover, Germany (pp. 39–50).
Frasincar, F., Houben, G. J., Vdovjak, R., & Barna, P. (2002). RAL: an algebra for querying RDF.
In Proceedings of the Third International Conference on Web Information Systems Engineering
(pp 173–181).
Fukushige, Y. (2005). Representing probabilistic relations in RDF. In Proceedings of the Interna-
tional Semantic Web Conference, Galway, Ireland (pp. 106–107).
Grant, J., & Beckett, D. (2002). RDF test cases. http://www.w3.org/TR/2002/WD-rdf-testcases-
20021112/
Hartig, O. (2009). Querying trust in RDF data with tSPARQL. In Proceedings of the 6th European
Semantic Web Conference on the Semantic Web: Research and Applications, Heraklion, Crete,
Greece (pp. 5–20).
Hayes, P. (2004). RDF Semantics, W3C Recommendation. http://www.w3.org/TR/rdf-mt/
Huang, H., & Liu, C. (2009). Query evaluation on probabilistic RDF databases. In Proceedings
of the 10th International Conference on Web Information Systems Engineering, Poznań, Poland
(pp. 307–320).
Jaro, M. A. (1989). Advances in record-linkage methodology as applied to matching the 1985
census of Tampa, Florida. Journal of the American Statistical Association, 84(406), 414–420.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals.
In Soviet physics doklady (Vol. 10, No. 8, pp. 707–710).
Lopes, N., Polleres, A., Straccia, U., & Zimmermann, A. (2010). Anql: Sparqling up annotated rdfs.
In Proceedings of the 9th International Semantic Web Conference, Shanghai, China (pp. 518–533).
Ma, Z., Li, G., & Yan, L. (2018). Fuzzy data modeling and algebraic operations in RDF. Fuzzy Sets
and Systems, 351, 41–63.
Ma, Z. M., Liu, J., & Yan, L. (2010). Fuzzy data modeling and algebraic operations in XML.
International Journal of Intelligent Systems, 25(9), 925–947.
Manola, F., Miller, E., & McBride, B. (2004). RDF primer. W3C Recommendation, 10(1–107), 6.
Mazzieri, M., & Dragoni, A. F. (2008). A Fuzzy Semantics for the Resource Description Framework,
Uncertainty Reasoning for the Semantic Web I: ISWC International Workshops, URSW 2005–
2007 (pp. 244–261). Springer.
Nejati, S., Sabetzadeh, M., Chechik, M., Easterbrook, S., & Zave, P. (2011). Matching and merging
of variant feature specifications. IEEE Transactions on Software Engineering, 38(6), 1355–1375.
References 107

Piattini, M., Galindo, J., & Urrutia, A. (2006). Fuzzy Databases: Modeling, Design and Implemen-
tation.
Pappis, C. P., & Karacapilidis, N. I. (1993). A comparative assessment of measures of similarity of
fuzzy values. Fuzzy Sets and Systems, 56(2), 171–174.
Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet:: Similarity-Measuring the
Relatedness of Concepts. In AAAI (Vol. 4, pp. 25–29).
Prud’hommeaux, E., & Seaborne, A. (2008). SPARQL Query Language for RDF. W3C Recommen-
dation. http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/
Robertson, E. L. (2004). Triadic relations: An algebra for the semantic web. In Proceedings of the
Second International Workshop on Semantic Web and Databases, Toronto, Canada (pp. 91–108).
Straccia, U. (2009). A minimal deductive system for general fuzzy RDF. In Proceedings of the Third
International Conference Web Reasoning and Rule Systems, Chantilly, VA, USA (pp. 166–181).
Sunitha, M. S. (2001). Studies on fuzzy graphs. PhD thesis, Cochin University of Science and
Technology, India.
Tappolet, J., & Bernstein, A. (2009). Applied temporal RDF: Efficient temporal querying of RDF
data with SPARQL. In Proceedings of the 6th European Semantic Web Conference on the Semantic
Web: Research and Applications, Heraklion, Crete, Greece (pp. 308–322).
Udrea, O., Recupero, D. R., & Subrahmanian, V. S. (2010). Annotated RDF. ACM Transactions on
Computational Logic, 11(2), 1–41.
Udrea, O., Subrahmanian, V. S., & Majkic, Z. (2006). Probabilistic RDF. In 2006 IEEE International
Conference on Information Reuse and Integration, Waikoloa Village, HI (pp. 172–177).
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Systems, 1(1), 3–28.
Zimmermann, A., Lopes, N., Polleres, A., & Straccia, U. (2011). A general framework for repre-
senting, reasoning and querying with annotated semantic web data. Journal of Web Semantics,
11(3), 72–95.
Zhu, X., Song, S., Lian, X., Wang, J., & Zou, L. (2014). Matching heterogeneous event data.
In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
(pp. 1211–1222).
Chapter 4
Persistence of Fuzzy RDF and Fuzzy
RDF Schema

4.1 Introduction

RDF represents an emerging data model that provides the means to describe resources
in a semi-structured manner for real-world applications. In practice, RDF is gaining
widespread momentum and usage in different domains, such as the Semantic Web,
Linked Data, Open Data, social networks, digital libraries, bioinformatics, or business
intelligence. With the wide application of RDF, the scale of available RDF data is
increasing dramatically. At this point, the scalable storage and efficient queries of
RDF data are becoming increasingly crucial. The former is the infrastructure for
RDF data management (Ma et al., 2016).
We can identify three major types of RDF data store: memory-based storage,
traditional databases-based storage, and NoSQL databases-based storage (Harris &
Gibbins, 2003). While the memory-based storage [e.g., BitMat (Atre et al., 2009),
BRAHMS (Janik & Kochut, 2005), and RDFox (Nenov et al., 2015)] has the fastest
speed of processing RDF, this method can only store the most necessary RDF struc-
tural data due to the memory usage restriction. A more common RDF storage
method is based on traditional databases, such as relational databases and object-
oriented databases. In the context of relational databases, we can further identify
three approaches.
With the first one called vertical stores or triple stores [e.g., RDFPeers (Cai &
Frank, 2004), 3store (Harris & Gibbins, 2003), RDF-3X (Neumann & Weikum,
2008; Neumann & Weikum, 2010a, 2010b), and Hexastore (Weiss et al., 2008)],
each RDF triple is stored as a tuple in a relational table with the relational schema
(subject, predicate, object), in which each column corresponds to an element of RDF
triple. The disadvantage of this approach is that too many self-join operations must
be applied while querying RDF data stored in the relational table.
The second approach called horizontal stores [e.g., SW-Store (Abadi et al., 2009)
and C-Store (Weiss et al., 2018)] divides RDF triples vertically based on their pred-
icates. Then the triples with the same predicate are stored in a relational table. Such

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 109
Z. Ma et al., Modeling and Management of Fuzzy Semantic RDF Data,
Studies in Computational Intelligence 1057,
https://doi.org/10.1007/978-3-031-11669-8_4
110 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

a predicate-oriented relational table does not contain null values and multivalued
attributes. But this approach involves complicated join operations among different
relational tables to get RDF data stored in multiple relational tables.
With the third approach called property stores (Chong et al., 2005; Sintek & Kiesel,
2006; Wilkinson et al., 2003), for the same or similar subject, multiple attributes are
designed in the form of n-ary table columns. Then each row stores the same or
similar subject and its corresponding attribute value. This approach can reduce join
operations but faces the problems of null values and multivalued attributes.
Although RDF storage based on relational database provides a convenient way
to manage RDF data, relational database cannot well support the storage of massive
RDF data. Therefore, to store large-scale RDF data, some distributed storage archi-
tectures are developed specifically for massive RDF data in the distributed RDF data
management systems. More recently, NoSQL databases such as CouchDB, HBase
(Sun & Jin, 2010), and graph databases (Peng et al., 2016) are used to store and
manage large-scale RDF data (Cudré-Mauroux et al., 2013).
Note that all the aforementioned works assume that the underlying RDF data
are reliable and precise. However, information is often imprecise and uncertain in
many real-world applications, and many sources can contribute to the imprecision
and uncertainty of data or information. Therefore, the study of reengineering fuzzy
RDF in fuzzy database models has received attention. Fuzzy databases such as fuzzy
relational databases and fuzzy object-oriented databases (Quasthoff & Meinel, 2011)
can store a large set of semantic information. And reengineering fuzzy RDF into fuzzy
database models may satisfy the needs of storing fuzzy RDF data in fuzzy databases.
Currently, there have been many efforts in storing crisp RDF data based on various
databases while few in storing fuzzy RDF data (Bornea et al., 2013; Chen et al.,
2006).
Currently, there have been many efforts in storing crisp RDF data based on various
databases while few in storing fuzzy RDF data. To the best of our knowledge, there
are only two efforts in the storage of fuzzy RDF. Ma and Yan (2018) investigated
the formal mapping from the fuzzy RDF model to the fuzzy relational databases,
which is based on the fuzzy relational database model and supports the storage of
fuzzy RDF triples. Considering the storage of fuzzy RDFS in addition to fuzzy RDF
triples, Fan et al. (2019) presented an approach for reengineering fuzzy RDF(S) into
fuzzy object-oriented database models. Like the situation of crisp RDF storage in
databases, the fuzzy relational databases and fuzzy object-oriented databases cannot
effectively support large-scale fuzzy RDF data management. In this chapter, we intro-
duce the issue on reengineering fuzzy RDF into fuzzy database models, including
fuzzy relational database model, fuzzy object-oriented database model, and HBase
databases.
4.2 Fuzzy RDF Mapping to Relational Databases 111

4.2 Fuzzy RDF Mapping to Relational Databases

Because of its success in data storage and management, and because the triple form
of RDF data subject-predicate-object can be easily mapped to the relational data table
model, the relational database is used by many researchers as RDF. Depending on the
table structure of the RDF triples mapped to the relational database, the corresponding
storage methods are also different. To reengineer fuzzy RDF into fuzzy relational
database model, Ma and Yan (2018) investigated the formal mapping from the fuzzy
RDF model to the fuzzy relational database. In this section, we investigate the strate-
gies and approaches to mapping fuzzy RDF data to fuzzy relational databases based
on the research work of Ma and Yan (2018). It is important to note that the fuzzy
RDF model in this section differs from the model defined in the previous section for
the sake of simplicity. That is, the fuzzy RDF model in this section only considers
the fuzziness of triples, and does not consider the fuzziness of element-level.

4.2.1 Fuzzy Triple Stores Model

In traditional RDF management, one straightforward way to maintain RDF triples is

to store triple statements in a table like structure. In particular, in this approach, the
input RDF data is maintained as a linearized list of triples, storing them as ternary
tuples. Hence, an initial idea for fuzzy RDF data storage is to use a single relational
table and all fuzzy RDF triples are directly stored in this relational table. Then each
fuzzy RDF triple becomes a tuple of relational databases. Corresponding to the four
components of each fuzzy RDF triple, the schema of the relational table includes
three common columns for subject, property, and object as well as one additional
column for membership degree. This kind of approach for storing fuzzy RDF data
in relational databases is called fuzzy triple stores in this book.
Formally let a fuzzy relational schema be the form of (subject, predicate, object,
μ). Then a fuzzy RDF triple, say (s, p, (o, λ)), is directly mapped into a tuple t = ⟨s,
p, o, λ⟩ of the fuzzy relation. Here, t[subject] = s, t[predicate] = p, t[object] = o and
t[μ] = λ. Here, we use t[A] to represent t’s value on attribute (column) A.
For the fuzzy RDF triples and fuzzy RDF graph presented in Fig. 4.1, their
relational representation of fuzzy triple stores is shown in Table 4.1.
It can be seen from the example that fuzzy triple stores use a fixed relational
schema. As a result, new triples can be inserted without considering RDF data types.
So, fuzzy triple stores can handle dynamic schemas of fuzzy RDF data. It can be
seen also from the example that for one subject, its objects with respect to different
properties appear in different tuples. Therefore, fuzzy triple stores generally involve
a number of self-join operations for querying.
112 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Fig. 4.1 RDF triples and fuzzy graph view. a Fuzzy RDF data, b fuzzy RDF graph

4.2.2 Fuzzy Horizontal Stores

In order to overcome the problem of self-joins in fuzzy triple stores, a single rela-
tional table containing all different predicates as columns may be applicable. In the
relational table, for each unique predicate of RDF triples, a subject–object relation
is directly represented, in which the predicate is as a column name and the object is
a value of this column. Triples with the same subject become a tuple of relational
databases. Note that several triples with the same subject may have the same predicate
and different objects. In Fig. 4.2 for example, we have three triples (IBM, industry,
Software, 1.0), (IBM, industry, Hardware, 1.0), and (IBM, industry, Services, 0.9).
They are mapped into a tuple, which value on attribute “industry” is a fuzzy set
represented by {(Software, 1.0), (Hardware, 1.0), (Services, 0.9)}. The approach for
storing fuzzy RDF data in a single relational table database is called fuzzy horizontal
stores in this paper.
Formally, for a given set T of fuzzy triples, suppose that n different predicates, say
p1 , p2 , …, pn , are included. Then we have a fuzzy relational schema with the form of
(subject, p1 , p2 , …, pn ). For any triple (si , pi , (oi , λi )) ∈ T, it should correspond to a
tuple t i in the fuzzy relation. If there are not any tuples which have value si on attribute
subject in the fuzzy relation, t i is a new tuple and inserted into the fuzzy relation. At
this point t i [subject] = si , t i [pi ] = {(oi , λi )}, and the values of t i on other attributes
are null values. If there exists a tuple ti which has value si on attribute subject in the
fuzzy relation (i.e., t i [subject] = si ), we need to further determine if t i [pi ] is a null
4.2 Fuzzy RDF Mapping to Relational Databases 113

Table 4.1 Triple stores of fuzzy RDF data

Subject Predicate Object μ
Charles Flint Born 1850 0.7
Charles Flint Died 1934 0.8
Charies Flint Founder IBM 1.0
Larry Page Born 1973 0.9
Larry Page Founder Google 1.0
Larry Page Board Google 0.8
Larry Page Home Palo Alto 0.6
Android Developer Google 0.9
Android Version 4.1 0.7
Android Kernel Linux 0.8
Android Preceded 4.0 0.8
Android Graphics OpenGL 0.8
Google Industry Software 0.9
Google Industry Internet 1.0
Google Employees 54,604 0.8
Google HQ Mountain view 0.7
IBM Industry Software 1.0
IBM Industry Hardware 1.0
IBM Industry Services 0.9
IBM Employees 433,362 0.8
IBM HQ Armonk 0.8

value or not. If t i [pi ] is a null value, then t i [pi ] = {(oi , λi )}, otherwise t i [pi ] = t i [pi ]
∪ {(oi , λi )}. For the fuzzy RDF triples and fuzzy RDF graph presented in Fig. 4.2,
their relational representation of fuzzy horizontal stores is shown in Table 4.2. There
are five different subjects and 13 unique predicates, and so the single relational table
contains five tuples and 13 columns (attributes).
It can be seen from the example that fuzzy horizontal stores use a single rela-
tional table which contains all different predicates as columns. When new triples
are inserted, new predicates result in changes of the relational schema and dynamic
schemas of RDF data cannot be handled. In addition, it is a common case that in
the single relational table containing all predicates as columns, a subject occurs only
with some predicates, which leads to a sparse relational table with many null values.
To solve the problem of too many null values in fuzzy horizontal stores, we
propose two variations of fuzzy horizontal stores in the following, which are called
fuzzy column stores and fuzzy type stores in the paper. The basic idea of these two
kinds of stores is to vertically partition the single table of fuzzy horizontal stores into
a set of tables by the predicates. Each table contains one predicate (in fuzzy column
stores) or several predicates (in fuzzy type stores).
114 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Fig. 4.2 Column stores of fuzzy RDF data

1. Fuzzy column stores

Fuzzy column stores use a set of relational tables and each unique predicate
corresponds to a relational table. Then each fuzzy RDF triple becomes a tuple of
the corresponding relational table, in which a binary relation between a subject
and an object with respect to the given predicate is represented. Each relational
table has a similar schema and includes two common columns for subject and
property as well as one additional column for membership degree. It can be
seen that the two fuzzy RDF triples which have the same property but different
subjects become two different tuples of the same relational table.
Generally speaking, for a triple (s, p, (o, λ)), we have a fuzzy relational table over
the schema (subject, p, μ), in which o is placed in the p column of row s and λ is
placed in the μ column of row s. For any two triples, say (si , pi , (oi , λi )) and (sj ,
pj , (oj , λj )), if pi = pj , we have two tuples ⟨si , oi , λi ⟩ and ⟨sj , oj , λj ⟩ in the same
fuzzy relational table. But if pi /= pj , tuples ⟨si , oi , λi ⟩ and ⟨sj , oj , λj ⟩ occur in two
different fuzzy relational tables. It is clear that the number of fuzzy relational
tables is the same as the number of unique predicates in the RDF datasets.
For the fuzzy RDF triples and fuzzy RDF graph presented in Fig. 4.1, the rela-
tional representation of fuzzy column stores is shown in Fig. 4.2. There are 13
unique predicates, and so a set of relational tables contains 13 relational tables.
In fuzzy column stores, each fuzzy relational table contains only one predicate
as a column, and so fuzzy column stores solve the problem of null values. But
it should be noted that fuzzy column stores use a set of relational tables. As a
Table 4.2 Horizontal stores of fuzzy RDF data
Subject Born Died Founder Board Home Developer
Charles Flint {(1850,0.7)} {(1934,0.8)} {(IBM,1.0)}
Larry Page {(1973,0.9)} {(Google,1.0)} {(Google,0.8)} {(Palo Alto,0.6)}
Android {(Google,0.9)}
Google
IBM
4.2 Fuzzy RDF Mapping to Relational Databases

Subject Version Kernel Preceded Graphics Industry Employees

Charles Flint
Larry Page
Android {(4.1,0.7)} {(Linux,0.8)} {(4.0,0.8)} {(OpenGL,0.8)}
Google {(Software,0.9), (Internet,1.0)} {(54,604,0.8)}
IBM {(Software,1.0), (Hardware,1.0),(Services,0.9)} {(433,362,0.8)}
115
116 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

result, the descriptions of a subject in its properties and objects are partitioned in
multiple relational tables, and this generally involves too many join operations
for querying. In addition, when new triples are inserted, new predicates result in
new relational tables and dynamic schemas of RDF data cannot be handled also.
To solve the problem of too many join operations for querying, we introduce
fuzzy type stores in the following.
2. Fuzzy type stores
Fuzzy type stores also use a set of relational tables. But, instead of vertically
partitioning the single table of fuzzy horizontal stores into a set of tables by each
predicate, in fuzzy type stores, each relational table contains some predicates
as its columns. The predicates included in a relational table generally have the
same data types. The fuzzy RDF triples which properties have the same data
types will arise in the same relational table. Here, fuzzy triples with the same
subject become a tuple of the corresponding relational table. For the triples which
have the same subject, the same predicate and different objects, they are mapped
into a tuple, which value on the predicate as an attribute is represented as a fuzzy
set. The different objects contained in this fuzzy set act as its supports.
Formally, suppose that we have a set T of fuzzy triples having m unique predicates
with the same data type, say p1 , p2 , …, pm . For these fuzzy triples, we have a fuzzy
relational schema with the form of (subject, p1 , p2 , …, pm ). For any two triples (si ,
pi , (oi , λi )) ∈ T and (sj , pj , (oj , λj )) ∈ T, they arise in the same relational table with
one row for each subject. Furthermore, when si = sj and pi /= pj , oi and oj are placed
in different columns pi and pj of the same row in the forms of {(oi , λi )} and {(oj ,
λj )}, respectively; when si = sj and pi = pj , oi and oj are placed in the same column
of the same row in the forms of {(oi , λi ), (oj , λj )}; when si /= sj and pi = pj , oi and
oj are placed in the same column of different rows si and sj in the forms of {(oi , λi )}
and {(oj , λj )}, respectively.
For the fuzzy RDF triples and fuzzy RDF graph in Fig. 4.1, the predicates are
identified as three data types: people, companies and operation systems. Then their
relational representation of fuzzy type stores is shown in Fig. 4.3.
The approach of fuzzy type stores is actually a trade-off between fuzzy hori-
zontal stores and fuzzy column stores. Fuzzy horizontal stores use a single relational
table, which generally contain many null values but do not involve join operations.
Fuzzy column stores use a set of relational tables, which do not contain null values
but involve too many join operations. Fuzzy type stores contain fewer null values
compared with fuzzy horizontal stores and involve less join operations compared
with fuzzy column stores.
Summarily the strategies and approaches to storing fuzzy RDF data in fuzzy
relational databases are presented in the above, including fuzzy triple stores, fuzzy
horizontal stores, fuzzy column stores and fuzzy type stores. Since fuzzy RDF data
are stored in fuzzy relational databases and SPARQL (Simple Protocol and RDF
Query Language, the RDF query language recommended by W3C) cannot be applied
directly, a consequent issue emerges, that is how to query fuzzy RDF data stored in
fuzzy relational databases. A possible way is to translate SPARQL queries for RDF
4.2 Fuzzy RDF Mapping to Relational Databases 117

Fig. 4.3 Type stores of fuzzy RDF data

data to SQL (Structured Query Language, the standard query language for relational
databases) queries for relational databases. Let us look at fuzzy RDF data in Fig. 4.1.
Suppose that we have a SPARQL query:
SELECT DISTINCT ?p ?company
WHERE {{?p founder ?company.
UNION
?p board ?company.}
?company industry “Software”.
}
This SPARQL query returns (Charles Flint, IBM) and (Larry Page, Google).
Now we store fuzzy RDF data in Fig. 4.1 in fuzzy relational databases, say the
fuzzy relational databases in Fig. 4.2. Suppose that these tables are named with
their predicates and we have t_born, t_died, t_founder, t_board, t_home, t_version,
t_developer, t_kernel, t_preceded, t_graphics, t_industry, t_employees and t_HQ.
Then the SPARQL query above is translated to a SQL query correspondingly:
SELECT DISTINCT *
FROM
(((SELECT subject, founder as company FROM t_founder) AS t_1
LEFT OUTER JOIN
(SELECT subject as company, board FROM t_board) AS t_2
On (false))
UNION
((SELECT subject as company, board FROM t_board) AS t_3
LEFT OUTER JOIN
(SELECT subject, founder as company FROM t_founder) AS t_4
On (false))
) AS t _5
INNER JOIN
118 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

(SELECT subject as company

FROM t_industry AS t_6
WHERE industry = “Software”
) AS t_7
This SQL query returns <Charles Flint, IBM> and <Larry Page, Google>. It is
clear that two queries get the same result. So, it is feasible that fuzzy RDF data are
stored in fuzzy relational databases.

4.3 Fuzzy RDF Mapping to Object-Oriented Databases

The classical relational database model and its fuzzy extension do not satisfy the need
of modeling complex objects with imprecision and uncertainty. In order to model
uncertain data and complex-valued attributes as well as complex relationships among
objects, current efforts have concentrated on the fuzzy object-oriented databases as
introduced in Chap. 2. Therefore, reengineering fuzzy RDF into fuzzy object-oriented
database model may satisfy the needs of storing fuzzy RDF data in fuzzy databases
and help to the interoperability between fuzzy object-oriented database model and
fuzzy RDF. Based on the similar idea in Sect. 4.2, in the following, we introduce
how to reengineer fuzzy RDF into fuzzy object-oriented database model, provide a
set of rules for mapping fuzzy RDF into fuzzy object-oriented database model and
implement a prototype to demonstrate our approach.
Note that, we apply the fuzzy object-oriented databases instead of the fuzzy rela-
tional databases because the fuzzy object-oriented database model can represent
complex objects and relationships with fuzziness more effectively. More impor-
tantly, the fuzzy object-oriented databases are very suitable for storing some impor-
tant concepts in the fuzzy RDFS such as fuzzy classes, instances, properties, and
fuzzy class/property hierarchies.
To deal with uncertainties in RDF Schema layer, Fan et al. (2019) extended the
definition of fuzzy RDF graph model, which explicitly classifies the element Σ that
is the set of labels into five categories. That is Σ = {C, OP, LP, D, T } is a set
of labels, where C is a set of class resource labels, OP is a set of object property
resource labels, LP is a set of datatype property resource labels, D is a set of datatype
labels, and T is a set of instance resource labels. In particular, we investigate how
to formally map the fuzzy RDF model to the fuzzy object-oriented database model
in the subsection. We develop mapping rules and implement a prototype system to
demonstrate the feasibility of our approach.
In the fuzzy RDF graph, its label elements include class resource labels, property
resource labels, datatype labels, and instance resource labels. The elements on the
fuzzy RDF semantic layer can identify the types of resources that the vertices and
edges in the fuzzy RDF graph model correspond to. It can be seen that they are very
similar to the elements of the fuzzy object-oriented database. The interpretation of
the semantic layer of the fuzzy RDF graph model mainly includes four aspects:
4.3 Fuzzy RDF Mapping to Object-Oriented Databases 119

(a) fuzzy classes and their relationships,

(b) fuzzy properties and their relationships,
(c) datatypes, and
(d) fuzzy instances.
Among them, the first three aspects are the elements of the fuzzy RDF Schema
layer, and the fuzzy instances are the elements of the fuzzy RDF layer. These four
aspects can all find the corresponding elements in the fuzzy object-oriented database
model. So, it is feasible to map the fuzzy RDF model to the fuzzy object-oriented
database model. In the two layers of fuzzy RDF model, which are fuzzy RDF Schema
layer and fuzzy RDF layer, based on the four aspects of fuzzy RDF mentioned above,
we propose the mapping rules and algorithms to map the fuzzy RDF model to the
fuzzy object-oriented database model. To concisely describe the mapping algorithm,
we introduce several primary functions as follows:
(a) nQueue (Q, Element): Inserting an element Element at the rear end of the queue
Q.
(b) sEmpty (Q): Determining if the queue Q is empty or not, which will return true
if empty, otherwise, return false.
(c) eQueue (Q): Removing and returning an element from the front end of the queue
Q.
(d) asChildNodes (Node): Determining if a node Node has a son node, which will
return true if yes, otherwise, return false.
(e) ubClassOnNextLevel (C): Returning all son nodes of class node C.
(f) ubPropertyOnNextLevel (P): Returning all son nodes of property node P.

4.3.1 Mapping of Fuzzy Classes

In the fuzzy RDF model, classes are the elements of the RDFS layer. The fuzzy class
differs from the classic class because the behavior and state of the object contained
in the fuzzy class are uncertain. That is, the properties of the fuzzy classes are
fuzzy ones. In addition, the inheritance relationships between fuzzy classes are also
uncertain. It means that two (fuzzy) classes have a subclass-superclass relationship
with a membership degree.
For the fuzzy classes in the fuzzy RDF, we need to map not only these classes
themselves but also their relationships. For this purpose, we propose two mapping
rules in the following. Here we use a function Γ that maps the elements in the fuzzy
RDF model to the corresponding elements in the fuzzy object-oriented databases.
Rule 1: L v (vi ) ∈ Σ.C ⇒ Γ (vi ) = fci ∈ FC FS
When a vertex label of the fuzzy RDF graph model is a class label, it is mapped to
a class in the fuzzy object-oriented database model and then named after the label.
Rule 2: L v (vi ) ∈ Σ.C ∧ L v (vj ) ∈ Σ.C ∧ L E (vi × vj ) = subClassOf ⇒ Class fci is-a
fci /μ type-is ft k ∈ FT.
120 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Fig. 4.4 Fuzzy classes

When an edge label of the fuzzy RDF graph model is subClassOf , the fuzzy class
which corresponds to the start point is a subclass of the fuzzy class which corresponds
to the end point. And the label value is the membership degree of the subclass to the
superclass.
Let us look at a fuzzy RDF subgraph model shown in Fig. 4.4. It is shown in
Fig. 4.4 that there are three vertex labels, which are the class labels Person, Student
and Staff , and two edge labels, which are named subClassOf . The labels values are
0.8 and 0.9, respectively. They are mapped to the class Person, the class Student, and
the class Staff in the fuzzy object-oriented database model respectively, the keyword
is-a in the type expression denotes that the class Student and the class Staff are
subclasses of the class Person.
The corresponding mapping structure is as follows:
Class Person type-is
Union Student/0.8, Staff /0.9
End
Class Student is-a Person/0.8 type-is
Record
…
End
Class Staff is-a Person/0.9 type-is
Record
…
End
Note that the above Rule 1 and 2 do not take the order of mapping into account.
When a fuzzy subclass is mapped and meanwhile its superclass is not mapped accord-
ingly, an error will occur. In the following, we organize the fuzzy classes into a
hierarchical structure and present an algorithm of mapping fuzzy class hierarchies
as shown in Algorithm 4.1.
4.3 Fuzzy RDF Mapping to Object-Oriented Databases 121

Algorithm 4.1. Mapping algorithm of fuzzy classes

Input: FRDFS Model
Output: FOODB Model
1. EnQueue (Q, “rdfs:Class”)
2. while (!isEmpty(Q))
3. C = DeQueue(Q)
4. if (C! = “rdfs:Class”)
5. Mapping the fuzzy class C according to Rule 1 and Rule 2
6. if (hasChildNodes(C))
7. for each C i ∈ subClassOnNextLevel() (i = 1, …, n)
8. EnQueue (Q, C i )

The root node “rdfs:Class” is the parent node of all fuzzy class nodes after the
fuzzy classes are organized in a hierarchical structure. Algorithm 4.1 uses a breadth-
first traversal. With the algorithm, the root node “rdfs:Class” first enters the queue
Q. Since the root node is just an abstract class, instead of mapping it, it is judged
whether it has a son node and if so, all its son nodes are enqueued. If the queue Q
is not empty, a fuzzy class node is sequentially dequeued from the queue Q, and it
is mapped according to Rule 1 and 2. This fuzzy class node is further determined
whether it has a son node, and if so, all its son nodes are enqueued. In a similar
way, each node in the queue is dealt with until the queue is empty. Finally, all fuzzy
classes in the fuzzy RDF graph are mapped in order according to their hierarchical
relationship.

4.3.2 Mapping of Fuzzy Properties

In the classic RDF, the property modeling primitive is “rdf:Property”, which is an

element of the RDF Schema. The range of a property can be an RDF literal or
a resource described by a URI. Therefore, we can identify two types of proper-
ties in the fuzzy RDF, which are the properties whose values are RDF literals and
RDF resources, respectively. The former is called fuzzy datatype properties, which
describe uncertain property values, and the latter is called fuzzy object properties,
which describe uncertain properties themselves. In the following, we present two
mapping rules: Rule 3 maps fuzzy datatype properties and Rule 4 map fuzzy object
properties.
1. Mapping of fuzzy datatype properties
Rule 3: L v (vi ) ∈ Σ.C ∧ L v (vj ) ∈ Σ.D ∧ L E (vi × vj ) ∈ Σ.LP ⇒ Γ (vi ) = fci ∈
FC FS ∧ Γ (vj ) = fbj ∈ FB ∧ Γ (L E (vi × vj )) = fai (fci ) ∈ FAFS
When an edge label of the fuzzy RDF graph model is a datatype property label,
the start point label is a class label and the end point label is a datatype label. At this
point, the edge is mapped into an attribute of the fuzzy class, which corresponds to
its start point and is named after the datatype property label; the endpoint is mapped
into a simple/complex type or a fuzzy-type-based simple/complex type in the fuzzy
122 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Fig. 4.5 Fuzzy datatype

property sex

object-oriented database according to the mapping rules of the datatypes given in

Sect. 4.3.3.
Let us look at a fuzzy RDF subgraph model shown in Fig. 4.5. It is shown in
Fig. 4.5 that there are two vertex labels, which are the class label Student and the
datatype label &xsd; string, and one edge label, which is the datatype label and named
sex. The vertices are mapped to the class Student and the simple type string in the
fuzzy object-oriented database model, respectively. The edge is mapped to the fuzzy
attribute Sex of the class Student. Here a keyword FUZZY is applied in the type
expression to denote that the attribute Sex of the class Student is a fuzzy attribute.
The corresponding mapping structure is as follows:
Class Student is-a Person type-is
Record
FUZZY Sex: string
…
End

2. Mapping of fuzzy object properties

In the RDF graph model, for the edge label that is an object property label,
its start point label and end point label are both class labels. At this point, the
edge description is actually a nonhierarchical binary relationship between the
instances of these two classes that correspond to the start point and the end point.
Note that a fuzzy object property is different from a crisp object property because
the relationship described by the fuzzy object property is uncertain rather than
determined. As a result, a membership degree is used to describe the uncertain
binary relationship. In the fuzzy object-oriented database, the binary relationship
is defined by the attribute elements with traversing the path. Therefore, the fuzzy
object properties of the fuzzy RDF graph model can be mapped into attributes in
the fuzzy object-oriented database model. In the fuzzy object-oriented database
model, cardinality constraints are a kind of common binary relationships, which
are generally divided into one-to-one cardinality constraints (1:1), one-to-many
cardinality constraints (1: n or n: 1) and many-to-many cardinality constraints
(m: n). Here, for a relationship in the class declaration, its cardinality constraint
can explicitly appear in the cardinality description of the type of expression. In
the fuzzy RDF graph model, however, there are not explicitly defined primitives
for the cardinality constraints. Rule 4 gives a mapping rule for the fuzzy object
properties on the premise that the cardinality constraints are already known.
Rule 4: L v (vi ) ∈ Σ.C ∧ L v (vj ) ∈ Σ.C ∧ L E (vi × vj ) ∈ Σ.OP ⇒ Γ (vi ) = fci ∈ FC FS
∧ Γ (vj ) = fcj ∈ FC FS ∧ Γ (L E (vi × vj )) = fai (fci ) ∈ FAFS .
4.3 Fuzzy RDF Mapping to Object-Oriented Databases 123

When the edge label of the fuzzy RDF graph is an object property label, the start
points label and the end point label are both class labels. In this case, the edge is
mapped into an attribute of the fuzzy class, which corresponds to the start point and is
named after the object property label; the endpoint is mapped into the corresponding
fuzzy class according to Rule 1.
Note that in the above mapping of fuzzy properties, the relationship between
fuzzy property and fuzzy subproperty is not considered. Such a mapping process
does not need to consider the mapping order under the premise of not considering
time efficiency. In the following, we organize the fuzzy properties into a tree with
height 2, which root node is the node “rdf:Property”. We present an algorithm of
mapping fuzzy properties as shown in Algorithm 4.2

Algorithm 4.2. Mapping algorithm of fuzzy properties

Input: FRDFS Model
Output: FOODB Model
1. begin:
2. EnQueue (Q, “rdf:Property”)
3. P = DeQueue (Q)
4. if (hasChildNodes(P))
5. for each Pi = SubPropertyOnNextLevel (P) (i = 1, …, n)
6. EnQueue (Q, Pi )
7. end for
8. while (!isEmpty(Q))
9. if (L v (P = DeQueue (Q)) ∈ Σ. LP)
10. Mapping fuzzy datatype property P according to Rule 3
11. else
12. Mapping the fuzzy object property P according to Rule 4
13. end

The root node “rdf:Property” is the parent node of all fuzzy property nodes after
the fuzzy properties are organized in a hierarchical structure. Algorithm 4.2 uses a
breadth-first traversal. With this algorithm, the root node “rdf:Property” first enters
the queue Q. The root node is just an abstract property, so it is dequeued rather than
mapped. Then it is judged if it has a son node, and if so, all its son nodes are enqueued.
If the queue Q is not empty, a fuzzy property node is sequentially dequeued from the
queue Q, and it is further determined if the node is a fuzzy datatype property node or
a fuzzy object property node. Then the fuzzy property is mapped according to Rule
3 or Rule 4. In a similar way, each node in the queue is handled until the queue is
empty. Finally, all fuzzy properties in the fuzzy RDF graph are completely mapped.

4.3.3 Mapping of Datatypes

In the classic RDF, only one datatype rdf:XMLLiteral is predefined and users are
recommended to use the basis datatypes defined in XML Schema. In the fuzzy RDF,
the basis datatypes are not fuzzy and we can still use the basis datatypes defined
124 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Table 4.3 Mapping of XSD datatypes into fuzzy object-oriented database datatypes
Datatype XSD datatype FOODB datatype
Numerical xsd:decimal Decimal
xsd:integer Integer
xsd:short Short
xsd:long Long
xsd:float Float
xsd:double Double
Enumeration xsd:enum Enum
String xsd:string String
Boolean xsd:boolean Boolean
Date and time xsd:date Date
xsd:time Time

by XML Schema, such as integer, float, string, date, time and so on. The major
basis datatypes in XML Schema and their corresponding datatypes in the fuzzy
object-oriented database model are shown in Table 4.3.
The datatypes used in the fuzzy RDF have the corresponding datatypes in the fuzzy
object-oriented databases. As shown in Table 4.3, for example, the XSD datatype
xsd:string is mapped to the datatype string in the fuzzy object-oriented databases.
Note that XML Schema supports custom complexType. At this point, the fuzzy
object-oriented databases need to provide a type generator to support the definition
of structured literal so that complexType can be mapped accordingly. Suppose that
XML Schema defines the following complexType element Degree:
<xsd:element name = “Degree”>
<xsd:complexType>
<xsd:sequence>
<xsd:element name = “school_name” type = “xsd:string”/>
<xsd:element name = “degree_type” type = “xsd:string”/>
<xsd:element name = “degree_year” type = “xsd:short”/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Then the complexType element Degree is mapped to the structured literal Degree
in the fuzzy object-oriented databases as follows:
struct Degree{
string school_name;
4.3 Fuzzy RDF Mapping to Object-Oriented Databases 125

string degree_type;
short degree_year;
};

4.3.4 Mapping of Fuzzy Instances

In the fuzzy RDF graph model, the description of a fuzzy instance is realized by
describing the fuzzy property value of the fuzzy class. When the edge label is “type”,
the labels of the start point and the end point are the instance label and the class label,
respectively. In this case, the edge indicates that the start point is an instance of the
corresponding class of the end point. Rule 5 gives a rule of mapping the fuzzy RDF
instances to the FOODB instances.
Rule 5: L v (vi ) ∈ Σ.T ∧ L v (vj ) ∈ Σ.C ∧ L E (vi × vj ) = type ⇒ (Γ (vi ) = foi ∈
FOFS ) ∧ (Object foi belong-to fcj /μj has-value [fa1 : fb1 , …, fak : fbk ]).
When the edge label is “type”, the starting point is mapped into an instance of the
fuzzy object-oriented database model, which is the instance of the fuzzy class that
corresponds to the end point. All vertices and edges associated with the start point are
mapped into the corresponding attributes of the instance in the fuzzy object-oriented
database model. In the fuzzy object-oriented database model, an object with identifier
OID uniquely identifies an object and is named after the label of the start point.
In the fuzzy RDF data subgraph shown in Fig. 4.6, for example, there are two
edges labeled as “type”. The label of the start point is the instance label student1
and the label of the end point is the class label Student. The membership degree of
the edge is 0.9, indicating that the object student1 belongs to the class Student with
a membership degree of 0.9. The other label of the start point is the instance label
book1 and the label of the end point is the class label Book. The membership degree
of the edge is 0.85, indicating that the object book1 belongs to the class Book with
a membership degree of 0.85. For the fuzzy instances of the fuzzy RDF shown in
Fig. 4.6, the corresponding mapping structure is as follows:
Object student1 belong-to Student/0.9
has-value FUZZY Name: 1.0/Bob, FUZZY Sex: 0.85/male, FUZZY Age: 0.9/20,
FUZZY Read: 0.75/book1
End
Object book1 belong-to Book/0.85
has-value FUZZY Title: 0.8/A Semantic Web Primer, FUZZY Author:
0.85/Antoniou, FUZZY Category: 0.9/Science Information
End

And their corresponding database instances in the fuzzy object-oriented databases

are finally shown in Fig. 4.7.
126 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Fig. 4.6 Fuzzy instances

Fig. 4.7 Fuzzy database instance

4.3.5 Implementation

Based on the mapping rules proposed in Sect. 4.3, we implement a prototype called
FRDF2FOODB, which can map the fuzzy RDF model to the fuzzy object-oriented
4.4 Fuzzy RDF Mapping to HBase Databases 127

Fig. 4.8 The overall

architecture of
FRDF2FOODB

database model. In the following, we briefly explain the implementation of the proto-
type, which consists of three main modules: parsing module, mapping module, and
output module. Figure 4.8 shows the overall architecture of the FRDF2FOODB. The
functions of the three main modules of the FRDF2FOODB are described below:
1. Parsing module: The Parsing module parses the input fuzzy RDF model, which
is described in the form of triples, into classes, properties, instances, etc., and
stores the parsed results, which are the input of the mapping module.
2. Mapping module: The mapping module maps the fuzzy RDF classes, properties,
instances and other elements, which are obtained by the parsing module, into the
corresponding fuzzy object-oriented database classes and instances according to
the mapping rules proposed in Sect. 4.3.
3. Output module: The output module is actually an interface module, displaying the
input fuzzy RDF model, and the resulting fuzzy object-oriented database model
after mapping the fuzzy RDF model. Also, this module displays the specific
storage of the RDF in the fuzzy object-oriented databases after the mapping is
completed.

4.4 Fuzzy RDF Mapping to HBase Databases

With the explosive growth of RDF data, some efforts have carried out massive RDF
data store. Several proposals are introduced to store RDF data in Hadoop (Farhan
Husain et al., 2009; Myung et al., 2010; Rohloff & Schantz, 2010). The drawback
of Hadoop-based RDF store is that RDF data are directly stored in HDFS, resulting
in a lack of efficient index structure. HBase, a column-oriented NoSQL database,
128 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

implements a global, distributed index with sorting the row key of HBase table by
dictionary. There have been some works proposed to store RDF data in HBase. In
Sun and Jin (2010) presented an approach for storing RDF data into six HBase tables,
which are S_PO, P_SO, O_SP, PS_O, SO_P, and PO_S. The row key of table S_PO
is the subject of RDF triple, and the column is the tuple (predicate, object). Similarly,
RDF data are repeatedly stored in different HBase tables according to the different
organizational forms of RDF triple elements. Papailiou et al. (2012) presented a fully
distributed RDF store method, H2RDF, which can reduce the number of HBase tables
in Sun and Jin (2010) from six to three (i.e., SP_O, PO_S, and OS_P). The row key
of table SP_O is the tuple (subject, predicate), and the column is the object of RDF
triple. At the same time, Abraham et al. (2010) also used three HBase tables to store
RDF data in which are Ts, Tp, and To. These three HBase tables, respectively, take
the subject, predicate, and object as the row key, and take the other terms as column
values.
Like the situation of crisp RDF storage in databases, the fuzzy relational databases
and fuzzy object-oriented databases cannot effectively support large-scale fuzzy
RDF data management. To manage large-scale fuzzy RDF data efficiently and effec-
tively, some work has already investigated the storage of fuzzy RDF data in NoSQL
databases. Since HBase databases support high-reliability underlying storage and
have high-performance computing power. Fan et al. (2020) proposed a fuzzy RDF
storage schema with fuzzy HBase databases. With the distributed fuzzy RDF(S)
storage approach proposed by Fan et al. (2020) in this section, we present a distributed
fuzzy RDF storage approach based on HBase databases. This approach makes use
of the index function of HBase databases. In addition, according to different organi-
zational forms of the fuzzy triple patterns, we propose a set of FHBase-based query
algorithms to deal with the query of fuzzy triples from different fuzzy HBase tables.
On the basis, we implement a prototype system to demonstrate the feasibility of our
approach.

4.4.1 Fuzzy RDF Storage in Fuzzy HBase

In the fuzzy RDF graph model, it covers both fuzzy RDF schema layer and fuzzy
RDF instance layer. The former mainly describes two kinds of information about
fuzzy classes and fuzzy properties in fuzzy RDF ontology data, and the latter mainly
describes the specific information of fuzzy RDF instance data. To improve the query
efficiency of the storage of fuzzy RDF, we separately store the fuzzy RDFS data to
ensure the retrieval efficiency. As a result, we design two FHBase tables to store the
fuzzy RDFS data and other two FHBase tables to store the fuzzy RDF instance data.
4.4 Fuzzy RDF Mapping to HBase Databases 129

4.4.1.1 Storage of Fuzzy RDFS

The fuzzy RDFS data describes the information about fuzzy classes and fuzzy proper-
ties in fuzzy RDF ontology data. The information related to fuzzy classes refers to the
corresponding fuzzy classes information of each fuzzy instance, the corresponding
fuzzy properties information of each fuzzy class and the subclass-superclass rela-
tionships between fuzzy classes, and so forth. And the information related to fuzzy
properties refers to the relationships between fuzzy properties, such as the inheri-
tance relationships, equivalence relationships, the domains and ranges of each fuzzy
property, and so forth.
To store the fuzzy classes and fuzzy properties of fuzzy RDFS data, we design two
FHBase tables which named FClassRelation and FPropertyRelation in the following.
The specific table structures and storage examples of FClassRelation and FProper-
tyRelation are shown in Tables 4.4 and 4.5, respectively. Note that for the sake of
simplicity of discussion, timestamp is omitted.
The FHBase table FClassRelation shown in Table 4.4 takes the fuzzy class name
as the row key and the class relationship as the column family name. For the relation-
ship between classes include fuzziness and a method is developed to calculate the
membership degree of fuzzy subclass/superclass relationship in (Ma et al., 2004) the

Table 4.4 The specific table structure of FHBase table FClassRelation

Row key Column family: EquivalentClass Column family: SubClass
FC1 EquivalentClass: ρ 1 /equivalentclass = FC2 SubClass: ρ 3 /subclass = FC3
SubClass: ρ 4 /subclass = FC5
FC2 EquivalentClass: ρ 2 /equivalentclass = FC1
FC3 SubClass: ρ 5 /subclass = FC4
FC4 EquivalentClass: ρ 6 /equivalentclass = FC3
FC5 SubClass: ρ 7 /subclass = FC4

Table 4.5 The specific table structure of FHBase table FPropertyRelation

Row key Column family: Column family: Column family: Domain
EquivalentProperty SubProperty
FP1 EquivalentProperty: SubProperty: Domain: domain = FC2
ρ 1 /equivalentproperty = ρ 3 /subproperty = FP4
FP3 SubProperty:
ρ 4 /subproperty = FP5
FP2 Domain: domain = FC3
FP3 EquivalentProperty: Domain: domain = FC1
ρ 2 /equivalentproperty =
FP1
FP4 Domain: domain = FC5
FP5 Domain: domain = FC4
130 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

column name is named according to the following expression: a relationship name

which is the same as column family name but lowercase followed by a number in [0,
1] and a notation “/”, in which the number represents the membership degree. Note
that this expression can be shortened and only a relationship name is left when its
number is 1. The cell value is the other corresponding fuzzy class name.
Likewise, the FHBase table FPropertyRelation shown in Table 4.5 takes the fuzzy
property name as the row key and the property relationship as the column family
name. The column name is named according to the following expression: a relation-
ship name which is the same as column family name but lowercase followed by a
number in [0, 1] and a notation “/”, in which the number represents the membership
degree between fuzzy properties. The cell value is usually the other corresponding
fuzzy property name. In particular, when the column family name is “Domain” and
the column name is “domain”, the cell value is the corresponding domain of the
fuzzy property represented by the row key, and in this case, the cell value is a fuzzy
class name.

4.4.1.2 Storage of Fuzzy RDF Instance Data

Fuzzy RDF data are modeled by describing the fuzzy property value of the fuzzy class
in fuzzy RDFS. For the purpose of storing fuzzy RDF instance data correctly and
supporting efficient queries for different triple pattern forms, we design two different
FHBase tables which named FHTS_PO and FHTO_PS, respectively. These two
tables both take “Object Property,” “Datatype Property,” and “Type” as the column
family name, while the former takes the subject in fuzzy RDF triple as the row key
and the latter takes the object as the row key. The specific table structure and storage
example of FHTS_PO and FHTO_PS are shown in Tables 4.6 and 4.7, respectively.
The FHBase table FHTS_PO shown in Table 4.6 takes the subject in fuzzy RDF
triple as the row key and stores the fuzzy RDF triples corresponding different prop-
erties in different column families. When the predicate of a fuzzy RDF triple is
an object property, for example, it is stored in the cell corresponding to the column
family which named “Object Property.” The category of predicate in fuzzy RDF triple
can be obtained from the axioms of the fuzzy RDF graph data model. The column
name and cell value are different in different cases. First, when the column family
name is “Object Property,” the column name is named according to the following
expression: the property name in fuzzy RDF triple followed by a number in [0, 1]
and a notation “/”, in which the number represents the membership degree of the
instance corresponding to the row key belonging to a class. And the cell value is the
corresponding class. Second, when the column family name is “Datatype Property,”
the column is named after the datatype property name. And the cell value is named
according to the following expression: the object name in fuzzy RDF triple followed
by a number in [0, 1] and a notation “/” where the number represents the membership
degree of object in fuzzy RDF triple.
In particular, when the column family name is “Type,” the column name is named
according to the following expression: “type” followed by a number in [0, 1] and
4.4 Fuzzy RDF Mapping to HBase Databases 131

Table 4.6 The specific table structure of FHBase table FHTS_PO

Row key Column family: object Column family: datatype Column family: type
Property Property
S1 Object property: ρ 11 /OP1 = Datatype property: LP1 = Type: ρ 3 /type = FC1
O111 μ11 /L111
Object property: ρ 12 /OP1 = Datatype property: LP1 =
O111 μ12 /L111
Object property: ρ 12 /OP1 = Datatype property: LP1 =
O112 μ12 /L112
Object property: ρ 11 /OP3 = Datatype property: LP2 =
O111 μ2 /L121
S2 Object property: ρ 2 /OP2 = Type: ρ 4 /type = FC1
O221
S3 Type: ρ 5 /type = FC4
S4 Type: ρ 6 /type = FC2
S5 Datatype property: LP3 = Type: ρ 7 /type = FC3
μ3 /L531

Table 4.7 The specific table structure of FHBase table FHTO_PS

Row key Column family: object Column family: datatype Column family: type
Property Property
O111 Object property: ρ 11 /OP1
= S1
Object property: ρ 11 /OP3
= S1
Object property: ρ 12 /OP1
= S1
O112 Object property: ρ 12 /OP1
= S1
O221 Object property: ρ 2 /OP2
= S2
μ11 /L111 Datatype property: LP1 = S1
μ12 /L111 Datatype property: LP1 = S1
μ12 /L112 Datatype property: LP1 = S1
μ2 /L121 Datatype property: LP2 = S1
μ3 /L531 Datatype property: LP3 = S5
FC1 Type: ρ 3 /type = S1
Type: ρ 4 /type = S2
FC2 Type: ρ 6 /type = S4
FC3 Type: ρ 7 /type = S5
FC4 Type: ρ 5 /type = S6
132 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

a notation “/” where the number represents the membership degree of predicate in
fuzzy RDF triple. And the cell value is the corresponding object.
Likewise, the FHBase table FHTO_PS shown in Table 4.7 takes the object in
fuzzy RDF triple as the row key, and it can be obtained from the table FHTS_PO.
Specifically, the row key of table FHTO_PS is the cell value of table FHTS_PO, on
the contrary, its cell value is the row key of table FHTS_PO. At the same time, both
the tables have the same column families and columns. In particular, when the column
family name is “Type,” the cell values are the instances of the class corresponding
to the row key with uncertainties.

4.4.2 FHBase-Based RDF Queries

On the basis of the storage model of fuzzy HBase for fuzzy RDFS and fuzzy RDF
instance data proposed in Sect. 4.4.2, in this section, we investigate the issue of
queries which support the fuzzy HBase-based retrieval.

4.4.2.1 Triple Matching Algorithm

The aim of SPARQL queries is to get the triples that satisfy all the conditions in
the WHERE clause of the given SPARQL query. In the RDF data query based
on classical HBase database, the SPARQL query is first parsed into a set of triple
patterns, and then the triple matching algorithm proposed by Abraham et al. (2010)
is used to determine whether the given triple pattern matches. The input of matching
algorithm is a given triple pattern and a triple to be judged, and it will return true
if the triple matches the triple pattern or false otherwise. Note that this algorithm is
mainly for classical RDF triples while not considering the fuzzy RDF triples. Here,
we present a more general triple matching algorithm, MatchFTP-T, to support both
triple matching and fuzzy triple matching.

Algorithm 4.3 MatchFTP-T

Input: Fuzzy triple pattern tp = (ps, ρ/pp, μ/po) and fuzzy triple t = (s, ρ i /p, μi /o)
Output: true or false
1. if (tp.ps is var || tp.ps == t.s) && (tp.pp is var || tp.pp == t.p) && (tp. ρ is var || tp.ρ == p.
ρ i ) && (tp.po is var || tp.po == t.o) && (tp.μ is var || tp.μ == t.μi ) then
2. if (tp.ps == tp.pp && t.s != t.p) || (tp.ps == tp.po && t.s != t.o) || (tp.pp == tp.po && t.p
!= t.o) then
3. return false
4. endif
5. return ture
6. endif
7. return false
4.4 Fuzzy RDF Mapping to HBase Databases 133

Similar to the classical triple matching algorithm proposed in Abraham et al.

(2010), to check that tp matches t, three conditions must be satisfied:
(a) a variable in tp can match any URI or literal in t,
(b) a URI or literal in tp must match itself in t exactly, and
(c) if a variable in tp occurs more than once, then the fuzzy triples that match tp
must have the same term for all occurrences.

4.4.2.2 Triple Pattern Query Algorithm

Given that the fuzzy RDF data are stored in FHDB, to get all the fuzzy triples
satisfying the parsed fuzzy triple patterns, we need to query different fuzzy HBase
table to get the fuzzy triples by judging if these fuzzy triples match the given fuzzy
triple pattern. For each fuzzy triple (s, ρ/p, μ/o), there are five elements: subject,
predicate, object, membership degree of predicate, and membership degree of object.
Note that when the predicate is an object property, the membership degree of object
is 1 which means it is determined, and similarly, when the predicate is a datatype
property, the membership degree of predicate is 1. As a result, unlike the eight
organizational forms of the classic triple pattern as shown in Table 4.8, there are 32
for fuzzy triple pattern as shown in Table 4.9.
Regardless of which organizational form of the fuzzy triple pattern to query, it is
closely related to the storage schema of fuzzy RDFS and fuzzy RDF data proposed

Table 4.8 Organizational forms of the classic triple pattern

(S, P, O) (S, P, ?O) (S, ?P, O) (S, ?P, ?O)
(?S, P, O) (?S, P, ?O) (?S, ?P, O) (?S, ?P, ?O)

Table 4.9 Organizational forms of the fuzzy triple pattern

Type Fuzzy triple pattern
Type 1: subject and predicate are known (S, ρ/P, O) (S, ρ/P, ?O) (S, ?ρ/P, O) (S, ?ρ/P, ?O)
(?S, P, μ/O) (?S, P, ? μ/O) (?S, ?P, μ/O) (?S, ?P,
?μ/?O)
Type 2: subject and object are known (S, ρ/?P, O) (S, ?ρ/?P, ?O) (S, ?P, μ/O) (S, ?P, ?μ/O)
Type 3: predicate and object are known (?S, ρ/P, O) (?S, ?ρ/P, O) (?S, P, μ/O) (?S, P, ?μ/O)
Type 4: subject is known (S, ρ/?P, ?O) (S, ?ρ/?P, ?O) (S, ?P, μ/?O) (S, ?P,
?μ/?O)
Type 5: predicate is known (?S, ρ/P, ?O) (?S, ?ρ/P, ?O) (?S, P, μ/?O) (?S, P,
?μ/?O)
Type 6: object is known (?S, ρ/?P, O) (?S, ?ρ/?P, O) (?S, ?P, μ/O) (?S, ?P,
?μ/O)
Type 7: all are unknown (?S, ρ/?P, ?O) (?S, ?ρ/?P, ?O) (?S, ?P, μ/?O) (?S, ?P,
?μ/?O)
134 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

in Sect. 4.4.1. When dealing with different fuzzy triple pattern matches, we need
to select different FHBase tables and algorithms to query according to any known
elements in the fuzzy triple pattern.
On the basis of the storage schema of fuzzy RDF and organizational forms of the
fuzzy triple pattern mentioned above, we propose several specific query algorithms
as follows. Note that the function SPLIT (Expression) in the following algorithms
returns the element after the notation “/” of the expression (i.e., predicate or object
of the fuzzy triple). The following algorithms all deal with the query according to
whether the predicate of the fuzzy triple pattern is an object property or a datatype
property.
1. Query algorithm Query_FS_PO
When the given fuzzy triple pattern is one of the Type 1 contains, that is, the
subject and predicate in the fuzzy triple pattern are known. In this case, the fuzzy
HBase table that needs to be queried is FHTS_PO. Then we propose the query
algorithm Query_FS_PO.

Algorithm 4.4: Query_FS_PO

Input: Fuzzy triple pattern tp = (S, ρ/P, O) or tp = (S, ρ/P, ?O) or tp = (S, ?ρ/P, O) or tp = (S,
?ρ/P, ?O) or tp = (S, P, μ/O) or tp = (S, P, μ/?O) or tp = (S, P, ?μ/O) or tp = (S, P, ?μ/?O)
Output: Result which matches tp
1. Initialize Result
2. selectRowkey ← S
3. if tp = (S, ρ/P, O) or tp = (S, ρ/P, ?O) or tp = (S, ?ρ/P, O) or tp = (S, ?ρ/P, ?O)
4. selectColumnfamily ← Object Property
5. selectColumn ← P
6. select O1 , O2 , …, On , from FHTS_PO where rowkey = selectRowkey and columnfamily
= selectColumnfamily and SPLIT(column) = selectColumn
7. foreach t = (S, ρ i /P, Oi ) do
8. if MatchFTP_T(tp, t) then
9. Result.add(t)
10. end if
11. end for
12. end if
13. else if tp = (S, P, μ/O) or tp = (S, P, μ/?O) or tp = (S, P, ?μ/O) or tp = (S, P, ?μ/?O)
14. selectColumnfamily ← Datatype Property
15. selectColumn ← P
16. select μ1 /O1 , μ2 /O1 , μ1 /O2 , …, μs /On , μt /On , from FHTS_PO where rowkey =
selectRowkey and columnfamily = selectColumnfamily and column = selectColumn
17. foreach t = (S, P, μi /Oj ) do
18. if MatchFTP_T(tp, t) then
19. Result.add(t)
20. end if
21. end for
22. end if
23. return Result

Algorithm 4.4 starts with some initialization work, such as initializing the result
set to be returned and determining the row key to look for is the given subject. When
4.4 Fuzzy RDF Mapping to HBase Databases 135

the predicate is an object property, determine the column family and column to look
for are “Object Property” and the given predicate. Next, query the table FHTS_PO
and use the index function of the FHBase table to get all cell values according to
the determined row key name S, column family name “Object Property” and column
name P. This step can get the candidate fuzzy triples, then call the MatchFTP_T
algorithm to filter these fuzzy triples that match the condition and add them to the
result set. Finally, Algorithm 4.4 returns the result set which matches the given fuzzy
triple pattern.
2. Query algorithm Query_FSO_P
Query_FSO_P When the given fuzzy triple pattern is one of the Type 2 contains,
that is, the subject and object in the fuzzy triple pattern are known. In this case,
the fuzzy HBase table that needs to be queried is FHTS_PO. Then we propose
the query algorithm Query_FSO_P.

Algorithm 4.5: Query_FSO_P

Input: Fuzzy triple pattern tp = (S, ρ/?P, O) or tp = (S, ?ρ/?P, O) or tp = (S, ?P, μ/O) or tp =
(S, ?P, ?μ/O)
Output: Result which matches tp
1. Initialize Result
2. selectRowkey ← S
3. if tp = (S, ρ/?P, O) or tp = (S, ?ρ/?P, O)
4. selectColumnfamily ← Object Property
5. selectCell ← O
6. select ρ 1 /P1 , ρ 2 /P1 , ρ 1 /P2 , …, ρ s /Pn , ρ t /Pn , from FHTS_PO where rowkey =
selectRowkey and columnfamily = selectColumnfamily and cell = selectCell
7. foreach t = (S, ρ i /Pj , O) do
8. if MatchFTP_T(tp, t) then
9. Result.add(t)
10. end if
11. end for
12. end if
13. else if tp = (S, ?P, μ/O) or tp = (S, ?P, ?μ/O)
14. selectColumnfamily ← Datatype Property
15. selectCell ← O
16. select P1 , P2 , …, Pn , from FHTS_PO where rowkey = selectRowkey and columnfamily
= selectColumnfamily and SPLIT(cell) = selectCell
17. foreach t = (S, Pi , μj /O) do
18. if MatchFTP_T(tp, t) then
19. Result.add(t)
20. end if
21. end for
22. end if
23. return Result

Algorithm 4.5 first initializes the result set to be returned and determines the row
key to look for is the given subject. Because the object of the given fuzzy triple
pattern is known, when the predicate is an object property, determine the column
family and cell value to look for are “Object Property” and the given object. Next,
136 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

query the table FHTS_PO and use the index function of the FHBase table to get all
column values according to the determined row key name S, column family name
“Object Property” and cell value O. This step can get the candidate fuzzy triples, then
call the MatchFTP_T algorithm to filter these fuzzy triples that match the condition
and add them to the result set. Finally, Algorithm 4.5 returns the result set which
matches the given fuzzy triple pattern. Similarly, Algorithm 4.5 can perform similar
operations when the predicate is a datatype property.
3. Query algorithm Query_FS_OP
Query_FS_OP When the given fuzzy triple pattern is one of the Type 4 contains,
that is, just the subject in the fuzzy triple pattern is known. In this case, the fuzzy
HBase table that needs to be queried is FHTS_PO. Then we propose the query
algorithm Query_FS_OP.

Algorithm 4.6: Query_FS_OP

Input: Fuzzy triple pattern tp = (S, ρ/?P, ?O) or tp = (S, ?ρ/?P, ?O) or tp = (S, ?P, μ/?O) or tp
= (S, ?P, ?μ/?O)
Output: Result which matches tp
1. Initialize Result
2. selectRowkey ← S
3. if tp = (S, ρ/?P, ?O) or tp = (S, ?ρ/?P, ?O)
4. selectColumnfamily ← Object Property
5. select ρ s /Pi , Oj , from FHTS_PO where rowkey = selectRowkey and columnfamily =
selectColumnfamily
6. foreach t = (S, ρ s /Pi , Oj ) do
7. if MatchFTP_T(tp, t) then
8. Result.add(t)
9. end if
10. end for
11. end if
12. else if tp = (S, ?P, μ/?O) or tp = (S, ?P, ?μ/?O)
13. selectColumnfamily ← Datatype Property
14. select Pi , μs /Oj , from FHTS_PO where rowkey = selectRowkey and columnfamily =
selectColumnfamily
15. foreach t = (S, Pi , μs /Oj ) do
16. if MatchFTP_T(tp, t) then
17. Result.add(t)
18. end if
19. end for
20. end if
21. return Result

Algorithm 4.6 initializes the result set and determines the row key in the same
way as Algorithms 4.4 and 4.5. Because the predicate and object of the fuzzy triple
patterns processed by Algorithm 4.6 are both unknown. When the predicate is an
object property, just determine the column family to look for is “Object Property.”
Next, query the table FHTS_PO and use the index function of the FHBase table to
get all column values and corresponding cell values according to the determined row
4.4 Fuzzy RDF Mapping to HBase Databases 137

key name S, column family name “Object Property.” Then call the MatchFTP_T
algorithm to filter the candidate fuzzy triples that match the condition and add them
to the result set. Finally, Algorithm 4.6 returns the matched result set.
4. Query algorithm Query_FOP_S
When the given fuzzy triple pattern is one of the Type 3 contains, that is, the
predicate and object in the fuzzy triple pattern are known. In this case, the fuzzy
HBase table that needs to be queried is FHTO_PS. Then we propose the query
algorithm Query_FOP_S.

Algorithm 4.7: Query_FO P_S

Input: Fuzzy triple pattern tp = (?S, ρ/P, O) or tp = (?S, ?ρ/P, O) or tp = (?S, P, μ/O) or tp =
(?S, P, ?μ/O)
Output: Result which matches tp
1. Initialize Result
2. selectRowkey ← O
3. if tp = (?S, ρ/P, O) or tp = (?S, ?ρ/P, O)
4. selectColumnfamily ← Object Property
5. selectCell ← P
6. select S 1 , S 2 , …, S n , from FHTO_PS where rowkey = selectRowkey and columnfamily =
selectColumnfamily and SPLIT(column) = selectColumn
7. foreach t = (S i , ρ j /P, O) do
8. if MatchFTP_T(tp, t) then
9. Result.add(t)
10. end if
11. end for
12. end if
13. else if tp = (?S, P, μ/O) or tp = (?S, P, ?μ/O)
14. selectColumnfamily ← Datatype Property
15. select S 1 , S 2 , …, S n , from FHTO_PS where rowkey = selectRowkey and columnfamily =
selectColumnfamily and column = selectColumn
16. foreach t = (S i , P, μj /O) do
17. if MatchFTP_T(tp, t) then
18. Result.add(t)
19. end if
20. end for
21. end if
22. return Result

Algorithm 4.7 first initializes the result set and determines the row key to look
for is the given object. Being different from the above query algorithms, Algorithm
4.7 queries the table FHTO_PS rather than FHTS_PO. When the predicate is an
object property, determine the column family and column to look for are “Object
Property” and the given predicate. Next, query the table FHTO_PS and use the index
function of the FHBase table to get all cell values according to the determined row key
name S, column family name “Object Property” and column name P. Then call the
MatchFTP_T algorithm to filter the candidate fuzzy triples that match the condition
and add them to the result set. Finally, Algorithm 4.7 returns the matched result
138 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

set. Similarly, Algorithm 4.7 can perform similar operations when the predicate is a
datatype property.
5. Query algorithm Query_FP_SO
When the given fuzzy triple pattern is one of the Type 5 contains, that is, just the
predicate in the fuzzy triple pattern is known. In this case, we propose the query
algorithm Query_FP_SO.

Algorithm 4.8: Query_FP_SO

Input: Fuzzy triple pattern tp = (?S, ρ/P, ?O) or tp = (?S, ?ρ/P, ?O) or tp = (?S, P, μ/?O) or tp
= (?S, P, ?μ/?O)
Output: Result which matches tp
1. Initialize Instances
2. Initialize S
3. let S add all ri from PropertyRelation where rowkey = P and columnfamily = Domain and
column = domain
4. foreach ri in S do
5. select rii from ClassRelation where rowkey = ri and (columnfamily = EquivalentClass and
column = 1.0/equivalentclass) or (columnfamily—SubClass and column—1.0/subclass)
6. if rii not in S then
7. S.add(t)
8. end if
9. end for
10. foreach ri in S do
11. let Instances.add all Ii from FHTO_PS where rowkey = ri and columnfamily = Type and
column = 1.0/type
12. end for
13. Initialize Result
14. foreach I in Instances do
15. selectRowkey ← I
16. call the Query algorithm Query_FS_PO
17. end for
18. return Result

The fuzzy triple patterns processed by Algorithm 4.8 are only the predicate is
known, Algorithm 4.8 first queries the table FPropertyRelation according to the given
predicate P to get the domains of P and add them to the set S. Second, get the deter-
mined equivalentclasses and subclasses of all fuzzy classes in the set S by querying
the table FClassRelation and add them to the set S. And then get the instances corre-
sponding to each fuzzy class in the set S by querying the table FHTO_PS and add
them to the set Instances. Next, for each instance in the set Instances, which is equiv-
alent to the subject in the fuzzy triple pattern, call the Algorithm 4.4 Query_FS_PO
to get the fuzzy triples that match the condition and add them to the result set. Finally,
Algorithm 4.8 returns the matched result set.
6. Query algorithm Query_FO_PS
When the given fuzzy triple pattern is one of the Type 6 contains, that is, just the
object in the fuzzy triple pattern is known. In this case, the fuzzy HBase table
4.4 Fuzzy RDF Mapping to HBase Databases 139

that needs to be queried is FHTO_PS. Then we propose the query algorithm

Query_FO_PS.

Algorithm 4.9: Query_FO_PS

Input: Fuzzy triple pattern tp = (?S, ρ/?P, O) or tp = (?S, ?ρ/?P, O) or tp = (?S, ?P, μ/O) or tp
= (?S, ?P, ?μ/O)
Output: Result which matches tp
1. Initialize Result
2. selectRowkey ← O
3. if tp = (?S, ρ/?P, O) or tp = (?S, ?ρ/?P, O)
4. selectColumnfamily ← Object Property
5. select ρ s /Pi , S j from FHTO_PS where rowkey = selectRowkey and columnfamily =
selectColumnfamily
6. foreach t = (S j , ρ s /Pi , O) do
7. if MatchFTP_T(tp, t) then
8. Result.add(t)
9. end if
10. end for
11. end if
12. else if tp = (?S, ?P, μ/O) or tp = (?S, ?P, ?μ/O)
13. selectColumnfamily ← Datatype Property
14. select Pi , S j from FHTO_PS where rowkey = selectRowkey and columnfamily =
selectColumnfamily
15. foreach t = (S j , Pi , μs /O) do
16. if MatchFTP_T(tp, t) then
17. Result.add(t)
18. end if
19. end for
20. end if
21. return Result

Algorithm 4.9 starts with some initialization work, such as initializing the result
set to be returned and determining the row key to look for is the given object. Because
the subject and predicate of the fuzzy triple patterns processed by Algorithm 4.9 are
both unknown. When the predicate is an object property, just determine the column
family to look for is “Object Property.” Next, query the table FHTO_POS and use the
index function of the FHBase table to get all column values and corresponding cell
values according to the determined row key name S, column family name “Object
Property.” Then call the MatchFTP_T algorithm to filter the candidate fuzzy triples
that match the condition and add them to the result set. Finally, Algorithm 4.9 returns
the matched result set. Note that Algorithm 4.9 can perform similar operations when
the predicate is a datatype property.
7. Query algorithm Query_FSPO
When the given fuzzy triple pattern is one of the Type 7 contains, that is, the
subject, predicate, and object in the fuzzy triple pattern are all unknown. In this
case, we need to take all the fuzzy triples in the FHDB and add them to the
candidate result set. It means whether we need to query the table FHTS_PO or
FHTO_PS. Then we propose the query algorithm Query_FSPO.
140 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Algorithm 4.10: Query_FSPO

Input: Fuzzy triple pattern tp = (?S, ρ/?P, ?O) or tp = (?S, ?ρ/?P, ?O) or tp = (?S, ?P, μ/?O) or
tp = (?S, ?P, ?μ/?O)
Output: Result which matches tp
1. Initialize Result
2. if tp = (?S, ρ/?P, ?O) or tp = (?S, ?ρ/?P, ?O)
3. selectColumnfamily ← Object Property
4. select S i ρ s /Pj , Ok from FHTS_PO or FHTO_PS where columnfamily =
selectColumnfamily
5. foreach t = (S i , ρ s /Pj , Ok ) do
6. if MatchFTP_T(tp, t) then
7. Result.add(t)
8. end if
9. end for
10. end if
11. else if tp = (?S, ?P, μ/?O) or tp = (?S, ?P, ?μ/?O)
12. selectColumnfamily ← Datatype Property
13. select S i , Pj , Ok from FHTS_PO or FHTO_PS where columnfamily = selectColumnfamily
14. foreach t = (S i , Pj , μs /Ok ) do
15. if MatchFTP_T(tp, t) then
16. Result.add(t)
17. end if
18. end for
19. end if
20. return Result

Being different from all query algorithms proposed above, the subject, predicate,
and object of the fuzzy triple patterns processed by Algorithm 4.10 are all unknown.
So, Algorithm 4.10 can get all fuzzy triples in the FHDBs, which are added to the
candidate result set by querying table FHTS_PO or FHTO_PS. Specifically speaking,
Algorithm 4.10 first initializes the result set to be returned. When the predicate is an
object property, the column family is just determined to look for “Object Property.”
Next, the table FHTS_PO or FHTO_PS is queried to get all fuzzy triples, which
are added to the candidate result set. Then, the MatchFTP_T algorithm is called to
filter the eligible fuzzy triples, which are added to the result set. Finally, Algorithm
4.10 returns the matched result set. Of course, Algorithm 4.10 can perform similar
operations when the predicate is a datatype property.

4.4.3 Design and Implementation

On the basis of the storage and query methods proposed in Sects. 4.4.1 and 4.4.2,
we design and implement a prototype called FRDF2FHBase, which can store the
fuzzy RDF data into the FHDB and support basic fuzzy triple patterns queries. In
the following, we briefly discuss the implementation of the FRDF2FHBase.
4.5 Fuzzy RDF Graph Mapping to Property Graph 141

Fig. 4.9 The overall

architecture of
FRDF2FHBASE

The prototype FRDF2FHBase consists of four main modules: data loading

module, data storage module, FHBase-based query module, and parsing module.
The overall architecture of FRDF2FHBASE is shown in Fig. 4.9.

1. Data loading module: The data loading module loads fuzzy RDF data which is
described in the form of triples. The data is divided into fuzzy RDFS data and
fuzzy RDF data, respectively.
2. Data storage module: The storage module stores the fuzzy RDF data in a target
FHDB according to the storage model proposed in Sect. 4.4.1.
3. FHBase-based query module: The FHBase-based query module processes the
input f-SPARQL queries, this module parses the f-SPARQL query into a set of
fuzzy triple patterns and returns the candidate result set satisfying the parsed
fuzzy triple patterns according to the FHBase-based RDF(S) query algorithms
proposed in Sect. 4.4.2.
4. Parsing module: The parsing module processes the candidate result set obtained
by the FHBase-based query module, it uses a greedy multiple connection join
strategy for f-SPARQL BGP processing and returns the final result set.

4.5 Fuzzy RDF Graph Mapping to Property Graph

Although RDF triples can be stored in a relational database, fundamentally speaking,

RDF models can be viewed as a special case of graph models, so using a graph-
shaped database would be more appropriate. Bonstrom et al. (2003) show that the
142 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

advantages of storing RDF data in a graph structure are: (i) Graph structures can
directly map to RDF models, avoiding the need to convert RDF data to accommodate
storage structures (ii) Query semantic information of RDF data does not require
reconstruction RDF graphs. The graph model conforms to the semantic level of the
RDF model, and can maintain the semantic information of the RDF data to the utmost
extent. In addition, many graphs theory-based algorithms can be applied to optimize
the inferential query of RDF data.
There has been some related work on RDF data graph storage. Zou et al. (2014)
proposed a method for storing and processing RDF data using a graph model called
gStore, which converted RDF graphs into data signature graphs and used vertex signa-
ture (VS)*-tree indexes to reduce maintenance overhead. Hartig (2014) proposed
a formal definition of the Property Graph model and introduced transformations
between Property Graphs and RDF*. Libkin et al. (2018) introduced a triple-based
model called Trial, which combined the concept of triple storage in RDF with the
concept of graph data, and illustrated the difference between the RDF graph model
based on triples and the standard graph database model. De Virgilio (2017) proposed
a method of using ontology and related constraint rules to convert RDF data storage
into a graph database. In order to realize the distributed management of Web-scale
RDF data, Zeng et al. (2013) proposed a distributed graph engine that stores RDF
data in the form of primitive graphs instead of triples or bitmap matrices, called
Trinity RDF. However, all the above works did not consider the issues of fuzzy RDF
graph data storage and query.
In order to solve the problem of fuzzy RDF data storage and query, it is an effective
method to establish the mapping relationship between fuzzy RDF and attribute graph.
In this section, we discuss the methodology to make a lossless transformation of a
fuzzy RDF graph into a property graph. The main idea is to represent any ordinary
RDF triple as property graph edge, and the fuzzy degree of the corresponding triple
can be expressed as an attribute of the edge. Specifically, our research goal is to
convert a fuzzy RDF graph into a property graph, and further realize the mapping of
a SPARQL query on G to a Cypher query over GP .

4.5.1 Preliminaries

1. SPARQL Query in the Fuzzy RDF

In the section, we use the SPARQL extension method in Sect. 3.5.3 and we add
an optional query statement “WITH ⟨threshold⟩” based on the standard SPARQL
statement to indicate the minimum membership threshold that the query result
should satisfy. The user chooses an appropriate value of threshold to express
his/her needs. In this way, the classic SPARQL query statement has the following
form: SELECT—FROM—WHERE—[WITH ⟨threshold⟩]. Utilizing this kind
of SPARQL query, users can obtain query results that meet the query conditions
and preset thresholds at the same time. Therefore, the query process of fuzzy
RDF database involves many choices of threshold. It should be emphasized that
4.5 Fuzzy RDF Graph Mapping to Property Graph 143

if the default value of ⟨threshold⟩ is 1, then the item WITH ⟨threshold⟩ can be
omitted.
Suppose that we want to seek an action movie, whose director is an American. At
the same time, trustworthiness is more than 0.6. According to the extended SPARQL
syntax, the SPARQL SELECT statement that meets the above query conditions is
expressed as follows.
PREFIX le: < http://fuzzy RDF example.org/>
SELECT ?x
WHERE {
?x le: Genre “action”
?x le: Director ?z.
?z le: birthPlace “Ameircan”
}
WITH ⟨0.6⟩
Here, “WITH ⟨0.6⟩” is the threshold expression, which specifies the lowest possi-
bility of the matching subgraph. The symbol “?x” represents the film that we want
to retrieve.
2. Property Graph Model
Assume that the set D of data types contains the string type S, that is, S ∈ D, and
D may also include the data type of the collection type. For each data type D,
dom(D) represents the value space of type D, that is, all possible value sets of
data type D, and dom(S) represents all string sets. The formal definition of the
Property Graph is as follows:
A property graph GP , is a 6-tuple ⟨V P , E P , src, tgt, lbl, P⟩ and ⟨V P , E P , src, tgt,
lb⟩ represents a directed label multigraph, where V P and E P represent the set of
vertices and edges respectively; function src: E P → V P indicates that each edge has
a start(head) vertex; function tgt: E P → V P indicates that each edge has a termina-
tion(tail) vertex; lbl: E P → dom(S) means that each edge has a label. The function
P: V P ∪ E P → 2P indicates every vertex v ∈ V P and edge e ∈ E P are associated with
a set of pairs ⟨key, value⟩ called properties.
Neo4j is a management system for crisp property graph databases, whose prim-
itives are vertices, relationships, and attributes. Different types of vertices are iden-
tified by labels, which can be IRI, Literal, or Blank. Vertices can have zero or more
attributes, which exist as key-value pairs. The vertex of IRI has two attributes, namely
kind and IRI. The vertex of Blank has one attribute. The vertex of Literal has four
attributes: kind, value, datatype, and language. The attributes of the same vertex are
stored in a linked list. A relationship consists of a start vertex and an end vertex. As
with vertices, relationships can also have multiple attributes and labels.
Figure 4.10 is an example of simple Property Graph that contains two vertices
and a relationship between the two vertices. Among them, the relationship marked
as “partner” starts at the vertex “Pratt” and ends at the vertex “Statham”. In addi-
tion, some boxes associated with graph elements (vertices and edges) represent the
144 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Fig. 4.10 A property graph

with two vertices

attributes of these elements. For example, the vertex of Chris Pratt has two attributes,
which represent the name and year of birth of the famous actor. The partnership
has only one attribute, which is used to indicate the certainty of whether Statham is
Pratt’s partner.
Cypher is the standard query language of Neo4j graph database, which is like SQL
in relational database, querying property graph in a crisp way. It is composed of is
composed of a START clause followed by a MATCH and a RETURN clause, where
START indicates a starting vertex of the matching subgraph, MATCH describes all
edges of the matching subgraph, and WHERE describes the attributes expression on
vertex and edges of the subgraph as a filter condition. An example of a Cypher query
that uses these three clauses to find the mutual partners of actor named James Guan
is:
START a = node: actor (name = “James Guan “)
MATCH (a) − [: partner] - > (b) − [: partner] - > (c), (a) − [: partner] - > (c)
RETURN b, c

4.5.2 Transform Fuzzy RDF Graph to Property Graph

In order to adapt the fuzzy RDF data model to the property graph data model, we use
the following rules to convert each triple in the RDF dataset into a property graph:
(i) any subject or object vertex in RDF becomes a vertex with a unique integer ID
in property graph, (ii) object property in RDF is designated as the adjacent edge in
the property graph, where the source and the target of the edge are vertex IDs, and
the edge is identified by integer ID, (iii) the datatype property in RDF is specified
as vertex attributes in the property graph, (iv) fuzzy degree information is converted
into vertex and edge attributes.
As we all know, a basic requirement for conversion is that any possible IRI must be
explicitly mapped to a different string. The IRI string indicates that this requirement
can be met. Therefore, we define an injective function called IRI-to-string im: I →
dom(S).
In view of these preliminary knowledge, the conversion rules are defined as
follows. Let G = (V, E, Σ, L, μ, ρ) is PG-convertible RDF graph. V = {x ∈ (I
∪ B ∪ L) | ⟨s, p, o⟩ ∈ G and x ∈ {s, o}} is the set of vertex elements. The property
graph corresponding to graph G can be expressed as GP = ⟨V P , E P , src, tgt, lbl, P⟩:
4.5 Fuzzy RDF Graph Mapping to Property Graph 145

• V P contains |V| vertices, and each vertex represents a different RDF item in V. In
other words, there is a function v: V → V P , such that each x ∈ V can be mapped
to a different vertex v(x) ∈ V P .
(i) If RDF item is IRI, then P(v(u)) = {⟨“kind”, “IRI”⟩, ⟨“IRI”, im(u)⟩}, here
u ∈ I, v(u) ∈ V P and im is the IRI-to-string mapping mentioned above.
(ii) If RDF item is blank vertex, then P(v(b)) = {⟨“kind”, “blank vertex”⟩}, here
b ∈ B and v(b) ∈ V P .
(iii) If RDF item is literal, then P(v(l)) = {⟨“kind”, “literal”⟩, ⟨“literal”, vm−1 (l)⟩,
⟨“datatype”, im(dtype(l))⟩} ∪ lang, here l ∈ L, v(l) ∈ V P , vm−1 is the
inverse operation of the value-to-literal bijective mapping, and lang =

{⟨”languge”, lang(l)⟩} i f l ∈ dom(lang)
φ else
(iv) If RDF item is x ∈ (I ∪ B ∪ L), the property set is defined as P(v(x)) =
{⟨“fdegree”, vm (μ(x))⟩}, here v(x) ∈ V P .
• E P contains |E| edges, and each edge corresponds to an RDF triple t ∈ G. Therefore,
a bijective function e: E → E P is defined, such that each triple t = ⟨s, p, o⟩ ∈ G
can be mapped to an edge e(t) ∈ E P .
(i) The edge label of e(t) is im(p), and the labels of the two adjacent vertices
corresponding to the edge e(t) are v(s) and v(o), respectively, which are
formally defined as: src(e(t)) = v(s), lbl(e(t)) = im(p), and tgt(e(t)) = v(o).
(ii) Moreover, the property P(e(t)) can be defined as P(e(t)) = {⟨“fdegree”, vm
(ρ(t))⟩}.
This conversion can represent any fuzzy RDF triple as an edge in the Property
Graph, and its attributes include the relationship and ambiguity of the RDF triples.
The two adjacent vertices of this edge correspond to the subject and object of the
RDF triples. Each vertex introduces two attributes: (i) kind indicates whether the
corresponding data type is IRI, Literal or Blank, (ii) and value indicates the corre-
sponding value. It should be noted that if the data type is Literal, another attribute
should be introduced, namely the type, to describe the type of value.
For the sake of clarity, we shall give an example to illustrate the global steps of our
proposed approach. Figure 4.11 shows a fuzzy RDF graph, in which vertices are used
to represent entity resources such as actors, movies, etc., while edges represent the
relationship between them. For readability reasons, each vertex in the graph uses the
name of the entity resource or the literals instead of the URI itself. The label on the
vertex is associated with the ambiguity to indicate the likelihood of the vertex being
labeled. For instance, the genre of the movie Guardian of the Galaxy 2 is labeled as
“action” and the possibility is 0.91. The fuzzy RDF graph G is PG-convertible in this
example, and the given conversion rule can be used to translate fuzzy RDF graph into
a Property Graph. The generated Property Graph GP is shown in Fig. 4.12, which
contains the following elements:
V P = {v1 , v2 , …, v7 }, E P = {e1 , e2 , …, e7 }, src(e1 ) = v1 , lbl(e1 ) = “Rating”,
tgt(e1 ) = v3 , src(e2 ) = v1 , tgt(e2 ) = v2 , lbl(e2 ) = “Genre”, src(e3 ) = v1 , tgt(e3 )
146 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Fig. 4.11 A fuzzy RDF

graph inspired by IMDB

Fig. 4.12 A property graph converted from the fuzzy RDF

= v4 , lbl(e3 ) = “Starring”, src(e5 ) = v5 , tgt(e5 ) = v4 , lbl(e5 ) = “partner”, src(e6 )

= v5 , tgt(e6 ) = v6 , lbl(e6 ) = “birthPlace”, src(e7 ) = v5 , tgt(e7 ) = v7 , lbl(e7 ) =
“Age”, P(v1 ) = {⟨“kind”, “IRI”⟩, ⟨“IRI”, “Guardian of the Galaxy 2”⟩}, P(v2 ) =
{⟨“kind”, “Literal”⟩, ⟨“value”, action⟩, ⟨“datatype”, string⟩, ⟨“fdegree”, 0.91⟩}, P(v3 )
= {⟨“kind”, “Literal”⟩, ⟨“value”, 8.4⟩, ⟨“datatype”, float⟩, ⟨“fdegree”, 0.8⟩}, P(v4 ) =
{⟨“kind”, “IRI”⟩, ⟨“IRI”, “Vin. Diesel”⟩}, P(v5 ) = {⟨“kind”, “IRI”⟩, ⟨“IRI”, “James
Guan”⟩}, P(v6 ) = {⟨“kind”, “Literal”⟩, ⟨“value”, American⟩, ⟨“datatype”, string⟩},
P(v7 ) = {⟨“kind”, “Literal”⟩, ⟨“value”, 43⟩, ⟨“datatype”, int⟩}, P(e1 ) = φ, P(e2 )
= φ, P(e3 ) = {⟨“fdegree”, 0.7⟩}, P(e4 ) = φ, P(e5 ) = {⟨“fdegree”, 0.4⟩}, P(e6 ) =
{⟨“fdegree”, 0.8⟩}, P(e7 ) = φ.

4.5.3 Query Fuzzy RDF Graph in Neo4j

Since the fuzzy RDF data is stored in the Neo4j database, SPARQL cannot be directly
applied. The problem that follows is how to query the fuzzy RDF data stored in the
Neo4j database. There are two possible ways to implement the query of fuzzy RDF
data stored in Neo4j database: One way is to convert a SPARQL query into a Cypher
language to implement the query. Another way is to use Cypher to directly implement
the query. The former way keeps the SPARQL language extracting information
4.6 Summary 147

from Neo4j through supported plug-in. The plug-in was developed as a wrapper for
Neo4j graph database. It is tailor-made for reusing the advanced features of Neo4j to
efficiently store, index, and query graph structures using the core API of Neo4j. The
latter way focuses on the uses of Cypher query. Similar to SPARQL, this approach
considers that all entities and relations stored in the database are formed by the triple
storage of the [entity]-(relationship /predicate)-[entity] pattern—the first element of
the triple is also called as “subject”. In a graph database, a directed edge connecting
two vertices (that is, the relationship is directional) is used to indicate the “subject”
of a particular triple. In addition, the Cypher query language also supports grouping
(GROUP BY), filtering (WHERE), and sorting (ORDER BY) operations, which are
like the SQL language.
RDF graphs are usually queried by specifying a graph pattern using the standard
SPARQL query language, which returns matching subgraphs. There are several ways
to express pattern matching queries in Cypher. The most straightforward method is
to start with a vertex in the matching pattern graph, and then match all edges in
a MATCH statement in the Cypher query. In this research, we just focus on the
Cypher’s basic query approach and their advantages in handling fuzzy RDF data.
Cypher queries also enable users to implement some query functions that cannot be
implemented in SPARQL. For instance, in the attribute path query, Cypher allows
users to use of more powerful path expressions than those provided by SPARQL.
Let us consider a Cypher query with the same functionality as the SPARQL query
in the previous example. The query also specifies a threshold δ t (δ t = 0.25), which
is used to return matching items with possible greater than δ t . The Cypher query
statement in this example is presented as follows.
START v1 = node: nodes (IRI = “Guardian of the Galaxy 2”)
MATCH (v1 ) − [: Director] - > (v5 ) − [e: birthPlace] - > (v6 {vlaue: “Ameircan”})
WHERE v2 .fdegree > 0.6
MATCH (v1 ) − [: Genre] - > (v2 {vlaue: “action”})
WHERE e.fdegree > 0.6
RETURN v1
When translating the threshold expression into the corresponding Cypher, we
define the format of the conditional expression as: fdegree > δ t , which means that the
overall possibility of the matching answer must satisfy the fuzzy degree δ t ∈ [0, 1].
In the example, the Cypher language equivalent of the threshold expression “WITH
< 0.6>” is fdegree > 0.6. When the query contains multiple triple patterns, we must
aggregate the results of each pattern.

4.6 Summary

With the prompt development of the Internet, the requirement of managing informa-
tion based on the Web has attracted much attention both from academia and industry.
RDF is widely regarded as the next step in the evolution of the World Wide Web, and
148 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

has been the de-facto standard. This creates a new set of data management require-
ments involving RDF. On the other hand, fuzzy sets and possibility theory have
been extensively applied to deal with information imprecision and uncertainty in the
practical applications, and reengineering fuzzy RDF into fuzzy database models is
receiving more attention for managing fuzzy RDF data. In this chapter, we proposed
some approaches for reengineering fuzzy RDF into fuzzy database models, including
fuzzy relational database models, fuzzy object-oriented database models, and HBase
database models, respectively. Moreover, we investigate the storage and query of
fuzzy RDF graph represented by the labeled directed graph structure to Property
Graphs database storage model. We manage these data by a chosen Neo4j Graph
DBMS in order to support expressive querying services over the stored data.
The two-way mappings between the fuzzy database models to the fuzzy RDF
models pay an important role for establishing the overall management system of fuzzy
RDF data. Moreover, for processing fuzzy RDF data intelligently, fuzzy RDF query
is also very necessary. How to query RDF with imprecise or uncertain information
has raised certain concerns as will be introduced in the following chapter.

References

Abadi, D. J., Marcus, A., Madden, S. R., & Hollenbach, K. (2009). SW-Store: A vertically partitioned
DBMS for semantic web data management. The VLDB Journal, 18(2), 385–406.
Abraham, J., Brazier, P., Chebotko, A., Navarro, J., & Piazza, A. (2010). Distributed storage
and querying techniques for a semantic web of scientific workflow provenance. In 2010 IEEE
International Conference on Services Computing (pp. 178–185).
Atre, M., Srinivasan, J., & Hendler, J. A. (2009). BitMat: A main memory RDF triple store. In
Tetherless World Constellation, Rensselar Plytehcnic Institute, Troy NY.
Bönström, V., Hinze, A., & Schweppe, H. (2003). Storing RDF as a graph (detailed view). In
Proceedings of the First Latin American Web Congress (pp. 27–36).
Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., & Bhat-
tacharjee, B. (2013). Building an efficient RDF store over a relational database. In Proceedings
of the 2013 ACM SIGMOD International Conference on Management of Data (pp. 121–132).
Cai, M., & Frank, M. (2004). RDFPeers: A scalable distributed RDF repository based on a struc-
tured peer-to-peer network. In Proceedings of the 13th International Conference on World Wide
Web (pp. 650–657).
Chen, H., Wu, Z., Wang, H., & Mao, Y. (2006). RDF/RDFS-based relational database integration.
In 22nd International Conference on Data Engineering (ICDE’06) (pp. 94–94).
Chong, E. I., Das, S., Eadon, G., & Srinivasan, J. (2005). An efficient SQL-based RDF querying
scheme. In Proceedings of the 31st International Conference on Very Large Data Bases (pp. 1216–
1227).
Cudré-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., … & Wylot,
M. (2013). NoSQL databases for RDF: An empirical evaluation. In International Semantic Web
Conference (pp. 310-325). Springer.
De Virgilio, R. (2017). Smart RDF data storage in graph databases. In Proceedings of the 17th
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (pp. 872–881).
Fan, T., Yan, L., & Ma, Z. (2019). Mapping fuzzy RDF (S) into fuzzy object-oriented databases.
International Journal of Intelligent Systems, 34(10), 2607–2632.
References 149

Fan, T., Yan, L., & Ma, Z. (2020). Storing and querying fuzzy RDF (S) in HBase databases.
International Journal of Intelligent Systems, 35(4), 751–780.
Farhan Husain, M., Doshi, P., Khan, L., & Thuraisingham, B. (2009), Storage and retrieval of
large rdf graph using hadoop and mapreduce. In IEEE International Conference on Cloud
Computing (pp. 680–686). Springer.
Harris, S., & Gibbins, N. (2003). 3store: Efficient bulk RDF storage. In: R. Volz, S. Decker, & I. F.
Cruz (Eds.), Proceedings of the First International Workshop on Practical and Scalable Semantic
Systems (pp. 1–15). CEUR-WS.org.
Hartig, O. (2014). Reconciliation of RDF* and property graphs. Technical report, University of
Waterloo. http://arxiv.org/abs/1409.3288
Janik, M., & Kochut, K. (2005). Brahms: A workbench RDF store and high-performance memory
system for semantic association discovery. In International Semantic Web Conference (pp. 431-
445), Springer.
Libkin, L., Reutter, J. L., Soto, A., & Vrgoč, D. (2018). TriAL: A navigational algebra for RDF
triplestores. ACM Transactions on Database Systems (TODS), 43(1), 1–46.
Ma, Z., & Yan, L. (2018). Modeling fuzzy data with RDF and fuzzy relational database models.
International Journal of Intelligent Systems, 33(7), 1534–1554.
Ma, Z., Capretz, M. A., & Yan, L. (2016). Storing massive resource description framework (RDF)
data: A survey. The Knowledge Engineering Review, 31(4), 391–413.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2004). Extending object-oriented databases for fuzzy
information modeling. Information Systems, 29(5), 421–435.
Myung, J., Yeon, J., & Lee, S. G. (2010). SPARQL basic graph pattern processing with iterative
MapReduce. In Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud (pp. 1–
6).
Nenov, Y., Piro, R., Motik, B., Horrocks, I., Wu, Z., & Banerjee, J. (2015). RDFox: A highly-scalable
RDF store, In International Semantic Web Conference (pp. 3-20). Springer.
Neumann, T., & Weikum, G. (2008). RDF-3X: A RISC-style engine for RDF. Proceedings of the
VLDB Endowment, 1(1), 647–659.
Neumann, T., & Weikum, G. (2010a). The RDF-3X engine for scalable management of RDF data.
The VLDB Journal, 19(1), 91–113.
Neumann, T., & Weikum, G. (2010b). x-RDF-3X: Fast querying, high update rates, and consistency
for RDF databases. Proceedings of the VLDB Endowment, 3(1–2), 256–263.
Papailiou, N., Konstantinou, I., Tsoumakos, D., & Koziris, N. (2012). H2RDF: Adaptive query
processing on RDF data in the cloud. In Proceedings of the 21st International Conference on
World Wide Web (pp. 397–400).
Peng, P., Zou, L., Özsu, M. T., Chen, L., & Zhao, D. (2016). Processing SPARQL queries over
distributed RDF graphs. The VLDB Journal, 25(2), 243–268.
Quasthoff, M., & Meinel, C. (2011). Supporting object-oriented programming of semantic-web soft-
ware. IEEE Transactions on Systems, Man, and Cybernetics Part C (Applications and Reviews),
42(1), 15–24.
Rohloff, K., & Schantz, R. E. (2010). High-performance, massively scalable distributed systems
using the MapReduce software framework: The SHARD triple-store. In Programming Support
Innovations for Emerging Distributed Applications (pp. 1–5).
Sintek, M., & Kiesel, M. (2006). RDFBroker: A signature-based high-performance RDF store.
In European Semantic Web Conference (pp. 363–377). Springer.
Sun, J., & Jin, Q. (2010). Scalable RDF store based on Hbase and mapreduce. In 2010 3rd Inter-
national Conference on Advanced Computer Theory and Engineering (ICACTE) (Vol. 1, pp.
V1–633).
Weiss, C., Karras, P., & Bernstein, A. (2008). Hexastore: Sextuple indexing for semantic web data
management. Proceedings of the VLDB Endowment, 1(1), 1008–1019.
Wilkinson, K., Sayers, C., Kuno, H. A., & Reynolds, D. (2003). Efficient RDF storage and retrieval
in Jena2. In SWDB (Vol. 3, pp. 131–150).
150 4 Persistence of Fuzzy RDF and Fuzzy RDF Schema

Zeng, K., Yang, J., Wang, H., Shao, B., & Wang, Z. (2013). A distributed graph engine for web
scale RDF data. Proceedings of the VLDB Endowment, 6(4), 265–276.
Zou, L., Özsu, M. T., Chen, L., Shen, X., Huang, R., & Zhao, D. (2014). gStore: A graph-based
SPARQL query engine. The VLDB Journal, 23(4), 565–590.
Chapter 5
Fuzzy RDF Queries

5.1 Introduction

The Resource Description Framework (RDF) has been widely applied to represent
and exchange domain information because of its machine-readable characteristic.
With a huge amount of RDF data available, retrieving RDF data is essential, so that
many RDF query approaches have been developed. Solving the RDF data retrieval
task can usually be achieved in two ways: The first way is to solve the problem
with the query language of the RDF database system. Another approach is to use
graph pattern matching algorithms to implement queries, since RDF data can be
represented as graphs. However, in many real applications, the RDF data are often
noisy, incomplete, and inaccurate. Traditional approaches generally cannot handle
imprecise and uncertain information, and this seriously prevents a large number
of common users from obtaining information in RDF datasets. Therefore, in this
chapter, we focus on fuzzy RDF queries. We present methods of pattern match
query, approximated fuzzy RDF subgraph matching query, fuzzy quantified query
over fuzzy RDF graph and investigate the problem of fuzzy RDF query based on
extended SPARQL.
In classical RDF graph pattern matching the task is to find inside a given graph
G some specific smaller graph Q, called pattern. A naive idea of this approach is
to compare all possible subgraph in G and its label bindings with the pattern graph
Q, i.e., obtaining all the candidate subgraphs with the existing techniques. Then,
check the dominating relationship and return true answers. Although there have been
many studies (Neumann & Weikum, 2008; Zou & Özsu, 2017) on RDF subgraph
matching, none of these works considers the problem that the RDF graph could
contain fuzzy information in some applications. Moreover, these methods are not effi-
cient in response time because of the need to perform subgraph isomorphism checks
on Q and G, producing a large number of unnecessary intermediate results, which
have now been shown to be NP-complete (Ullman, 1976). Therefore, a threshold-
based RDF subgraph pattern matching query method is introduced in Sect. 5.2. Based

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 151
Z. Ma et al., Modeling and Management of Fuzzy Semantic RDF Data,
Studies in Computational Intelligence 1057,
https://doi.org/10.1007/978-3-031-11669-8_5
152 5 Fuzzy RDF Queries

on the traditional subgraph isomorphism matching method, the fuzzy RDF subgraph
matching problem is solved efficiently, Specifically, we want to retrieve all qualified
matches of a query pattern in the fuzzy RDF graph.
In order to alleviate the time-consuming exhaustive search, the other method
resorts to the approximate matching strategy. This can relax rigid structure and label
matching constraints of subgraph isomorphism and other traditional graph simi-
larity measures. These various approaches (Costabello, 2014; Virgilio et al., 2015)
to approximate matching on RDF graph data rely on heuristics, based on similarity
or distance metrics, on the use of specific indexing structures to improve the perfor-
mance of the algorithm. However, the existing inexact graph matching algorithm
ignore many features of RDF graph. For example, these algorithms only take the
similarity of vertices and edges into account in RDF graph but did not concern the
structure among the vertices and edges. More importantly, these algorithms disre-
gard the semantic relationships between resources, and cannot process and manage
fuzzy information about RDF graph in the matching process as well. Inspired by the
method of joining the path query graph introduced in (Virgilio et al., 2015; Moustafa
et al., 2014; Zhao & Han, 2010), we choose the path instead of the vertex as the basic
matching unit and propose a new path-based solution to efficiently answer subgraph
pattern queries over fuzzy RDF graph. We introduce this path-based approximate
RDF subgraph pattern matching method in Sect. 5.3.
It has been widely recognized that classical querying suffers from a lack of flex-
ibility due to crisp querying conditions and querying objects. Flexible queries play
important roles in intelligent information retrieval and have become the main means
to realize data querying. Bosc and Pivert (1992) point out that a query is flexible if a
qualitative distinction between the selected entities is allowed. The case arises when
the query conditions are crisp but databases being queried contain imperfect infor-
mation. As a special kind of flexible query, fuzzy quantified queries have been long
recognized for their ability to express different types of imprecise and flexible infor-
mation needs in a relational database context. However, in the specific RDF/SPARQL
setting, the current approaches from the literature that deal with quantified queries
consider crisp quantifiers only (Bry et al., 2010; Fan et al., 2016) over crisp RDF
data. In Sect. 5.4, we intend to integrate linguistic quantifier in a subgraph patterns
addressed to a fuzzy RDF graph database and use graph pattern matching approach to
evaluate fuzzy quantified query. This extension allows to express fuzzy preferences
on values present in the graph as well as on the structure of the data graph, which
has not been proposed in any previous fuzzy RDF graph pattern matching.
SPARQL (Prudhommeaux, 2008), the official W3C recommendation as an RDF
query language, plays the same role for the RDF data model as SQL does for relational
data model. In SPARQL query, the “where” clause consists of triple patterns that
contain either variables or literals. Actually, each SPARQL query can be represented
by a graph pattern. As a result, any SPARQL query can be equivalently transformed
a subgraph query problem, which locates the subgraph in RDF data graph matching
with the query graph. Nevertheless, SPARQL requires accurate knowledge about the
graph structure and contents. As users are not very clear about the contents and the
data distribution of the database, such a strict query often leads to the Few Answers
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 153

Problem: the user query is too selective and the number of answers is not enough.
More importantly, classical SPARQL lacks of some expressiveness and usability
capabilities as it follows a crisp (Boolean) querying of RDF data for which the
response is either false or true. As a result, it lacks the ability to deal with flexibility
aspects (including queries with user preferences or vagueness), which is significant
in real-world applications. Therefore, we extend the SPARQL language in Sect. 5.5,
for querying fuzzy RDF data.

5.2 Exact Pattern Match Query Over Fuzzy RDF Graph

Traditional specialized pattern graph matching models are usually defined in terms
of subgraph isomorphism and its extensions (e.g., edit distance), which identify
subgraphs that are exactly or approximately isomorphic to pattern graphs. A compar-
ison of various specialized algorithms for graph pattern matching has been done
recently (Lee et al., 2012). The exact RDF graph matching algorithm (Carroll, 2002;
Wang et al., 2005) is not efficient in terms of response time, and it has been proved that
its complexity is NP-complete (Ullmann, 1976). Existing RDF matching algorithms
based on inexact graph matching (Costabello, 2014; Virgilio et al., 2015; Zhang et al.,
2012) ignored many features of RDF graph. For example, most of the algorithms
(Costabello, 2014; Zhang et al., 2012) disregarded the fuzzy data and the semantic
relationships between vertices, which in turn results in the loss of some potential
answers. Worse still, the traditional approach is incapable to recognize and evaluate
the fuzzy information in the matching process, which further results in the incapacity
of obtaining all the satisfactory answer. Therefore, traditional graph querying tech-
niques are not able to capture good quality matches in this context. Moreover, the
existing techniques (Ma et al., 2011) for processing twig-patterns over fuzzy XML
tree cannot be effectively applied to handle graph pattern matching over an RDF
graph. It is because a graph does not have nice property such that every two vertices
are connected along a unique path.
In this section, we study pattern matching in the context of large fuzzy RDF
graphs. Specifically, we want to retrieve all qualified matches of a query pattern in
the fuzzy RDF graph. We carefully defined the syntax and semantic of an extension
of the query pattern graph that makes it possible to express and interpret such queries.
We defined fuzzy graph patterns that allows: (i) to query a fuzzy RDF data model,
and (ii) to express preferences on data through fuzzy conditions and on the structure
of the data graph with regular expressions as edge constraints. In addition, in order to
answer subgraph pattern queries efficiently over fuzzy RDF data graph, we propose
an approach for evaluating RDF graph patterns.
154 5 Fuzzy RDF Queries

5.2.1 Graph Pattern Matching Problem

The basic graph pattern matching problem is to find matches in a graph for a spec-
ified pattern. We first introduce graph pattern matching on precise graphs based
on subgraph isomorphism. Then we will proceed to discuss fuzzy graph pattern
matching.
Subgraph isomorphism is a graph matching technique which is to find all
subgraphs of G that are isomorphic to Q (see (Gallagher, 2006) for a survey). Given
a query pattern graph Q = (V q , E q ) with n vertices {u1 , …, un } and a precise data
graph G = (V, E), a pattern match query based on subgraph isomorphism retrieves
all matches of Q in G. For a given Q and an n vertex set m = {v1 , …, vn } in G, m is
a match of Q in G, if (1) the n vertices {v1 , …, vn } in G have the same vertex labels
as the corresponding vertices {u1 , …, un } in Q; and (2) for any an edge (ui , uj ) in Q,
there exists a corresponding edge (vi , vj ) in G such that edge (vi , vj ) have the same
edge labels with edge (ui , uj ). This makes graph pattern matching NP-complete, and
hence, hinders its scalability in finding exact matches. Moreover, a bijective function
is often too restrictive to identify patterns in emerging applications.
Graph matching in our scenario is essentially finding a homomorphism (Hahn &
Tardif, 1997) from the pattern graph Q to elements of the data graph G. The traditional
notion is, however, often too restrictive for graph matching in emerging applications.
So, we introduce PRDF homomorphism (Alkhateeb et al., 2009) for checking if
an RDF graph pattern is a consequence of an RDF graph. The notion extends graph
homomorphism to deal with vertices connected with regular expression patterns, that
can be mapped to vertices connected by paths, rather than edge-to-edge mappings.
Here PRDF homomorphism is used for answering fuzzy RDF graph pattern.

Definition 5.1 [PRDF homomorphism (Alkhateeb et al., 2009)] Let G be an RDF

graph, and Q be an RDF graph pattern. A PRDF homomorphism from Q into G is a
map φ from V q into V such that: ∀<s, R, o> ∈ Q, either

(i) the empty word ∈ ∈ L(R) and φ(s) = φ(o); or

(ii) ∃<n0 , p1 , n1 >, …, <nk–1 , pk , nk > in G such that n0 = φ(s), nk = φ(o), and p1 , …,
pk ∈ L(φ(R)).

Here ∈ is an empty label, R is a regular expression pattern, L(R) is the label of a

regular expression path R and L(φ(R)) is the set of all edge labels in the path φ(R).
For the fuzzy graph pattern matching, in this paper, we focus on threshold-based
RDF pattern matching (T-RPM) over a large fuzzy RDF graph where vertices and
edges are fuzzy. Specifically, given a large fuzzy RDF data graph G, a query pattern
graph Q, and a user-specified satisfaction threshold δ q ∈ [0, 1], a T-RPM query
retrieves all vertex sets M = {v1 , …, vn } in G (i.e. n vertices in G), such that the
satisfaction threshold of M in G is at least δ q . That is, we want to retrieve fuzzy
subgraph, which contain the pattern graph and have high existence possibilities.
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 155

Naively, this problem could be solved by directly performing traditional subgraph

pattern matching over RDF graph. However, there are three key questions to answer
subgraph queries efficiently over fuzzy RDF data graph:
• How to effectively build a pattern graph that satisfies the user’s query require-
ments?
• How to efficiently find a possible answer that matching pattern graph in fuzzy
RDF graphs?
• How to decide the satisfaction degree of matches?
In order to deal with these challenging problems mentioned above, we carefully
design the corresponding solutions. As far as the first question is concerned, query
graph pattern specifies the structural and semantic requirements that a subgraph of
G must satisfy. In order to satisfy the user’s query requirements, we assign predicate
conditions on vertices to express user preferences, with regular expressions as edge
constraints to express graph structure. For the second question, we will define a graph
pattern matching algorithm based on a revised notion of graph homomorphism. This
forms the basis for the algorithms discussed in Sect. 5.5, which further speed up
fuzzy subgraph pattern matching. Lastly, the satisfaction degree of a match M on G
is the aggregation of the satisfaction degree of a set of matching vertices.

5.2.2 RDF Graph Pattern

The notion of graph pattern provides a simple yet intuitive specification of the struc-
tural and semantic requirements of interest in the input graph. Graph pattern as the
basic operational unit is central to the semantics of many operations in fuzzy RDF.
Essentially, a fuzzy graph pattern is a directed crisp graph with predicate on query
vertices, and regular expressions that denotes the path over relationships as edges’
labels. For the following, we assume the existence of an infinite set VAR of variables
such that VAR ∩ (U ∪ L) = ∅. By convention, we prefix the elements of VAR by a
question mark symbol.

Definition 5.2 (Fuzzy graph pattern) A fuzzy graph pattern is a labeled directed
graph defined as Q = (V q , E q , F V , RE ), where

(i) V q is a finite set of vertices.

(ii) E q ⊆ V q × V q is a finite set of directed edges, where (ui , uj ) denotes an edge
from vertex ui to uj .
(iii) F V is a function on V q such that for each vertex u ∈ V q , F V (u) can be a constant
value c, a variable ?x, or a fuzzy condition C defined as the form “?x op c”, “?x
op ?y” and “?x is Fterm”. Here ?x, ?y ∈ VAR, c ∈ (U ∪ L), op is a fuzzy or crisp
comparator (e.g., <, ≤ , =, ≥, >, /=), and Fterm is a predefined or user-defined
fuzzy term like high, long, young and so on. One can extend fuzzy condition to
support fuzzy conjunction ∧ (resp. disjunction ∨), usually interpreted by the
156 5 Fuzzy RDF Queries

Fig. 5.1 Fuzzy pattern graph

triangular norm minimum (resp. maximum). To simplify the discussion, we

focus on fuzzy conditions in the simple form given above.
(iv) RE : E q → re(ui , uj ) is a function defined on E q s. t. for each edge (ui , uj ) in E q ,
re(ui , uj ) is a path regular expression, and it can be constructed inductively as R
= ϶ |e|R1 · R2 |R1 |R2 |R+ . Here ϶ is a fuzzy regular expression denoting the empty
pattern, e denotes either an edge label or a wildcard symbol * matching any label
in U, R1 · R2 denotes a concatenation of expressions, R1 |R2 denotes disjunction
and is an alternative of expressions, R+ denotes one or more occurrences of R.

Essentially, the predicate F V (u) of a vertex u specifies a search condition. As will

be seen shortly, an edge (ui , uj ) in a pattern Q is mapped to a path p in a data graph
G, and a regular expression is used to constrain the edges on the path. This differs
from the traditional notion of graph pattern matching defined in terms of subgraph
isomorphism. Note that traditional graph patterns (Lee et al., 2012) are a special case
of the patterns defined above, when (i) a vertex carries its label as its only attribute,
and (ii) all edges have no regular expression constraints.

Example 5.1 We want to find a rating-high action movie with more than 20 million
at the box office. Specifically, the film is starred by American actors. Query graph Q
in Fig. 5.1 is a possible way to express this information need. Here ?b, ?film and ?p are
three variables, expression ?b > “20 million” is a crisp compare expressions, expres-
sion “?r is high” is a fuzzy condition expression, and expression RE = birthPlace ·
locateIn+ is a regular expression. This pattern “models” information concerning high
rating (?r is high) action films (?film). The box office of the film is over 20 million
(?b > “0 million”). Moreover, the actors (?p) starred in the film are American.

5.2.3 Fuzzy Graph Pattern Matching

The notion of graph pattern Q specifies the topological and content-based constraints
chosen by the user. Next, we introduce the notion of fuzzy RDF graph pattern
matching which generalizes result subgraph homomorphism with evaluation of the
RDF graph pattern. Intuitively, given a fuzzy RDF data graph G, the semantics of a
graph pattern Q defines a set of matching, where each matching (from variable of Q
to URIs and literals of G) matches the pattern to a homomorphism subgraph of G.
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 157

Definition 5.3 (Fuzzy graph pattern matching) A fuzzy graph pattern Q = (V q , E q ,

F V , RE ) is matching with a fuzzy RDF data graph G = (V, E, Σ, L, μ, ρ) with a
satisfaction degree threshold δ t , if there exists an injective mapping φ: Q → G which
is a total mapping from vertices and regular expression edges of Q to vertices and
paths of G such that:
(i) (matching vertices) Every vertex u ∈ V q has an image vertex φ(u) ∈ V by
the injective function. More precisely, if u is a constant vertex (F V (u) ∈ U ∪
L), then φ(u) is a matching vertex associated with a satisfaction degree δ u =
μ(φ(u)), and their labels are matched (i.e., L(u) = L(φ(u))); if u is a variable
vertex (F V (u) ∈ VAR), then φ(u) ∈ (U ∪ L) is the set of matches of the variable
vertex u, i.e., vertices φ(u) induced by all matches φ of Q in G, in which each
item is associated with a satisfaction degree δ u = μ(φ(u)).
(ii) (checking condition on vertices) For the fuzzy condition C of vertex u ∈ V q ,
φ(u) satisfies the fuzzy condition with a satisfaction degree δ co defined as
follows, according to the form of C:
• if C is of the form “?x op c”, then φ(?x) satisfies the condition C with
a degree of δ co = μop (φ(?x), c). Here μop is membership function of the
fuzzy or crisp comparator. In particular, crisp comparison operators have a
Boolean semantics, if the condition is evaluated to true, then the satisfaction
degree is 1, otherwise 0.
• if C is of the form “?x op ?y”, then φ(?x) and φ(?y) satisfy the condition C
with a degree of δ co = μop (φ(?x), φ(?y)).
• if C is of the form “?x is Fterm”, then φ(?x) satisfies the condition C to the
degree δ co = μFterm (φ(?x)). Here μFterm is fuzzy membership function of
the fuzzy term Fterm.
• if C is of the form C 1 ∧ C 2 or C 1 ∨ C 2 , we use the usual interpretation of
the fuzzy operator involved (minimum for the conjunction, maximum for
the disjunction).
(iii) (matching edges) For each edge (ui , uj ) ∈ E q , there exist two vertices φ(ui ) and
φ(uj ) of V such that φ(ui ) and φ(uj ) match vertices u1 and u2 respectively, and
there is a path p in G from vertex φ(ui ) to φ(uj ) s. t. p matches regular expression
re(ui , uj ) with a satisfaction degree, δ re (p), defined as follows, according to the
form of re (in the following, R, R1 and R2 are regular expressions):
• re is of the form ϶ . If p is empty then δ re (p) = 1 else δ re (p) = 0.
• re is of the form e with e ∈ U (resp. “*”). If p is an edge e' from vertex φ(ui )
to φ(uj ), where e' = e (resp. where e' ∈ U) then δ re (p) = ρ(φ(ui ), φ(uj )) else
δ re (p) = 0.
• re is of the form R1 · R2 . We denote by P the set of all pairs of
paths (p1 , p2 ) such that p is of the form p1 p2 . One has δ re (p) =
max P (min(δ R1 ( p1 ), δ R2 ( p2 ))).
• re is of the form R1 |R2 . One has δ re (p) = max(δ R1 ( p1 ), δ R2 ( p2 )).
158 5 Fuzzy RDF Queries

• re is of the form R+ . Let P be the set of all tuples of paths (p1 , …,

pn ) (n > 0) such that p is of the form p1 … pn . One has δ re (p) =
max P (min(δ R ( p1 ), . . . , δ R ( pn ))).
(iv) (aggregating satisfaction degree) The satisfaction degree to the overall query,
denoted by δ Q (G), is the aggregation of the satisfaction degrees to each elemen-
tary matching and conditions from (i), (ii) and (iii). As far as the satisfaction
degree aggregation function, different types of aggregation may be considered
for different application. The minimum, for instance, is a cautious choice; it
assumes the satisfaction of a set of triples is only as satisfactory as the least
satisfied triple. The median, a more optimistic choice, is another reasonable
satisfaction degree aggregation function. In this paper, we use minimum as
the satisfaction degree aggregation function. Note that, the satisfaction degree
δ Q (G) must be greater than δ t . If there is no matching, then δ Q (G) = 0, i.e., G
does not match Q.

Intuitively, when the graph pattern Q is evaluated on a data graph G, the result is
a binary relation M ⊆ V q × V such that:
(a) for each u ∈ V q , there exists v ∈ V such that (u, v) ∈ M;
(b) for each edge (ui , uj ) in E q , there exists a nonempty path p from vi to vj in G
such that (i) the vertex label L(vi ) of vi satisfies the predicate condition specified
by F V (ui ); (ii) the path p is constrained by the regular expression re(ui , uj ); and
(iii) (uj , vj ) is also in M.
From this one can see that pattern query are defined in terms of an extension of
graph simulation (Henzinger et al., 1995), by (i) imposing query conditions on the
labels of vertices; (ii) mapping an edge in a pattern to a nonempty path in a data
graph; and (iii) constraining the edges on the path with a regular expression. This
also differs from the traditional notion of graph pattern matching defined in terms of
subgraph isomorphism (Gallagher, 2006).
Let us now come to the definition of matching result. Since our primary focus is on
fuzzy RDF graph matching, the above definition does not delve into the satisfaction
degree. We need to extend the query evaluation from returning a set of mappings
to returning a set of pairs. Given a fuzzy RDF graph G, a query pattern graph Q,
and a satisfaction degree threshold δ t (0 ≤ δ t ≤ 1), a graph pattern matching query
returns vertices mapping pair sets M = {(m, δ m )|m: V q × V ∧ δ m ≥ δ t }, where m is a
mapping from variable of Q to URIs and literals of G and δ m denotes the satisfaction
degree associated with the mapping.
Note that, a match M is a relation rather than a function. Hence, for each u in V q
there may exist multiple vertices v in V such that (u, v) is in M, i.e., each vertex in
Q is mapped to a nonempty set of vertices in G. Hence, we refer to the relation M
grouped by vertices in V q as a match in G for Q. There may be multiple matches in
a graph G for a pattern Q. Nevertheless, below we show that there exists a unique
maximum match in G for Q. That is, there exists a unique match QM (G) in G for Q
such that for any match M in G for Q, M ⊆ QM (G).
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 159

Proposition 5.1 For any data graph G and any graph pattern query Q, there is a
unique maximum match QM (G) in G for Q.
Proof
1. By Definition 5.3, we show that there exists a match, which covers all the vertices
in V q and is maximum. And it is the union of all matches in G for Q.
2. We then show the uniqueness by contradiction. That is, if there exist two distinct
maximum matches M 1 and M 2 , then M 3 = M 1 ∪ M 2 is a match that is larger
than both M 1 and M 2 .
By (1) and (2) Proposition 5.1 follows.

The task of graph pattern matching problem is finding the set M of subgraph of
G that “match” the pattern Q. Problem formulations often require that Q represent
a single connected graph and, therefore, that M is connected as well. A graph is
connected if there exists some path between every pair of its vertices.
We introduce result graph, to better illustrate the meaning of maximum match. A
result graph Gr = (V r , E r ) is a graph representation of the maximum match QM (G)
in G for Q, where (i) V r is the set of vertices of G in M, and (ii) there is an edge er
= (vi , vj ) ∈ E r if and only if there is an edge (ui , uj ) ∈ E q , such that (ui , vi ) ∈ M and
(uj , vj ) ∈ M. We use the following Example 5.1 to illustrate result graphs.

Example 5.2 Let us consider the fuzzy graph pattern Q of Example 5.1. We evaluate
this matching according to the fuzzy RDF data graph G of Fig. 5.2. The query also
specifies a threshold δ t (δ t = 0.25 in the example), to indicate that only matches
with possible larger than δ t should be returned. The matching process is depicted as
following.

Intuitively, this pattern retrieve the list of films in G, and the matching value of
?film is potentially Diner, Iron Man 2 and Chef . The actors in the three films are
American actor Mickey Rourke, Steve Gullenberg and Robert Downey Jr. respec-
tively. Because three paths p1 = Jon Favreau—birthPlace—New York—locateIn—
America, p2 = Steve Gullenberg—birthPlace—Florida—locateIn—America and
p3 = Robert Downey Jr.—birthPlace—New York—locateIn—America match the
regular expression RE . And the satisfaction degrees are δ re (p1 ) = 0.3, δ re (p2 ) = 0.4
and δ re (p3 ) = 0.75, respectively. However, the genre of film Diner is comedy, which
is not an action movie. And the box office of film Chef is 11 million, which does
not satisfy the condition ?b > 20 million. So, Iron Man 2 is the only movie which
is an action movie with satisfaction degree δ u (“action”) = 0.85 and it’s box office is
over 20 million with satisfaction degree δ co (“29 million”) = 0.7. If we suppose that
μhigh (7.1) = 0.65 and the satisfaction degree of the condition ?r is high is 0.65, which
is the minimum of satisfaction degrees induced by μhigh (7.1) and δ u (7.1). Moreover,
vertex labeled Iron Man 2 and vertex labeled Jon Favreau in G match with vertex
?film and vertex ?p in pattern graph with satisfaction degree 1, respectively. Thus,
the matching result graph is depicted in Fig. 5.3. As the satisfaction degree is the
minimum of satisfaction degrees induced by the results described above, we have
δ Q (G) = 0.3, which satisfy the minimum satisfaction degree threshold constraint.
160 5 Fuzzy RDF Queries

Fig. 5.2 A fuzzy RDF data graph G inspired by IMDB

Fig. 5.3 Subgraphs of G matching Q

5.2.4 Query Evaluation Algorithms

In this section, we discuss implementation issues related to our proposal. We describe

how to incrementally build a possible map set of queries by making use of a backtrack
algorithm, following a similar approach to (Alkhateeb et al., 2009). In particular, we
need to produce answers with satisfaction degree.
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 161

RDF graph pattern matching in our scenario is essentially finding a homomor-

phism from the query graph Q to the data graph G. A feasible method for evaluating
RDF graph patterns, i.e., enumerating all RDF homomorphism from the patterns
graph into the data graph, is based on a backtracking technique that generates each
possible map from the current one by traversing the parse tree in a depth-first manner
and using the intermediate results to avoid unnecessary computations.
Specifically, the algorithm consists of four parts. Algorithm 5.4 (Reach) is used
to compute reachable paths that match regular expressions. Then, Algorithm 5.4 is
used by Algorithm 5.3 (Eva), which, given a fuzzy RDF graph and a triple pattern,
calculates the set of maps that satisfy the triple pattern. The results of Algorithm 5.3
are used to calculate the RDF homomorphism of a pattern graph into a data graph
in Algorithm 5.2 (Candidates), which returns all possible candidate images in data
graph for the current vertex satisfying the partial map. Algorithm 5.1 describes the
framework of the candidate retrieves.

5.2.4.1 Pattern-Match Algorithm

Algorithm 5.1 illustrates a general frame work for a pattern match query Q over a
fuzzy RDF graph G, which is a recursive version of the basic backtrack algorithm
(Golomb & Baumert, 1965). The input of this algorithm is: an RDF graph pattern Q,
an RDF graph G, and a partial map μp , which includes a set of pairs {(<u, v>, δ)} such
that u is a term of Q, v is the image of u in G and δ is a satisfaction degree associated
with the mapping. If we call this algorithm with (Q, G, μø ), where μø is the map with
the empty domain, then at the end of the algorithm we have all homomorphism from
the pattern graph Q into the fuzzy RDF graph G. The algorithm perform as follows:

Algorithm 5.1: Pattern-Match (Q, G, µp ).

Input: an RDF graph pattern Q, a fuzzy RDF graph G, and a partial map μp .
Output: extends the partial map to a set of RDF homomorphism.
1. if |μp | == |V q | then
2. return μp ;
3. pick a vertex u of V q ;
4. for each <v, μ> ∈ Candidates (μp , u, G, Q) do
5. Pattern-Match (Q, G, μp ▷◁ {(<u, v>, δ)} ▷◁ μ);

The procedure first checks whether all homomorphism from the pattern graph Q
into the fuzzy RDF graph G are obtained in line 1. If all the homomorphisms are
obtained, we can stop the recursion process and return the complete solution in line 2.
Otherwise, the procedure chooses a term u ∈ V q to obtain a possible homomorphism
in line 3. After that, Pattern-Match takes each candidate v of the current term u ∈ V q
and the possible map μ, puts v in the mapping pairs, and tries to generate the possible
candidates of v in lines 4–5. This is done recursively in a depth-first manner through
the call of Pattern-Match (note that μp , {(<u, v>, δ)}, and μ are compatible, since the
set <v, μ> is calculated with respect to μp ). At the end of the algorithm, we have a
162 5 Fuzzy RDF Queries

tree that contains one level with a term from Q, i.e., a vertex from Q, and one level
with the possible images of that term in G. The input to each vertex of each level is
the current map. Each possible path in the tree from the root to a leaf labeled by a
term of G represents a possible homomorphism.

5.2.4.2 Candidates Algorithm

Algorithm 5.2 calculates all possible candidate maps in G for the current term u
satisfying the partial map μp . It returns all sets of pairs <v, μ> such that v is a possible
map of u, and μ is the possible map from the terms of each regular expression pattern
Ri appearing in a triple with u and one of the terms in V q already mapped in μp . That
is, if there is no term in V q involved in a triple with u, then the possible candidate
images of u are all v in G such that u can be mapped to v. Otherwise, there exists a set
of terms x 1 , …, x k ∈ V q involved in a triple with u, which are already mapped in μp .
In this case, the maps of x i and v satisfies μ(Ri ), where Ri is the regular expression
pattern appearing in the predicate position of the triple between x i and u. The order
in which the two mapping vertices of x i and v satisfy μ(Ri ) depends on the order
in which u and x i appear in the triple, that is, if the triple is <x i , Ri , u> then <μ(x i ),
v> satisfies μ(Ri ) in G, otherwise <v, μ(x i )> satisfies μ(Ri ) in G. μ maps the terms
appearing in the regular expression patterns of Q into the terms appearing along the
paths in G with respect to μp , that is, μ is a possible map such that μ and μp are
compatible.
At the beginning, we use collection T s to store triples <u, Ri , x i > in the line 1, in
which one of the predecessor vertices of u already mapped in μ. We use T s to store
triples <x i , Ri , u> in the same way in the line 2. If there is no term in T s and T o ,
we calculate the candidate matching information according to the type of u in the
lines 4–10. If u is simple variable and u is not mapped in μp , the candidates are all
v in G such that u can be mapped to v (Line 5). Otherwise, the candidate is μp (u)
(Line 6). If u is a constant or conditional expression, a candidate matching result
is obtained according to the matching operation (Line 9). After that, the algorithm
checks whether the edges between u and already matched query vertices of Q have
corresponding edges between v and already matched data vertices of G in lines 12–
15. It calls Eva to check whether the maps of x i and v satisfies μ(Ri ), and it obtains
temporary candidate set. At the same time, the algorithm updates T s and T o in line
13 and 15. Next, the algorithm proceeds to refine the candidate from T s and T o . it
updates status information in the line 17 and 19 and all changes done are restored.
Finally, we return candidates in line 20. The results of Algorithm 5.2 are used to
calculate the RDF homomorphism of a graph pattern Q into an RDF graph G by
successive joins in Algorithm 5.1.
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 163

Algorithm 5.2: Candidates (µp , u, G, Q).

Input: A fuzzy RDF graph G, a vertex u from graph pattern Q and a map μp .
Output: The set <v, μ> such that v is a possible image of u in G, and μ extends μp to the vertex u.
1. T s ← {<u, Ri , x i >}|<u, Ri , x i > ∈ Q and x i ∈ dom(μp )};
2. T o ← {<x i , Ri , u>}|<x i , Ri , u> ∈ Q and x i ∈ dom(μp )};
3. if T s = = φ and T o = = φ then
4. if u is variable then
5. if u ∈
/ dom(μp ) then c ← {<v, μp >|v ∈V };
6. else if μp (u) ∈ V then c ← <μp (u), μp >;
7. else c ← φ;
8. else
9. if u ∈ V then c ← <μp (u), μp >;
10. else c ← φ;
11. else
12. if T s /= φ then
13. t ← {<s, μ>|<u, Ri , x i > ∈ T s and (s, o, μ) ∈ Eva(μp , u, Ri , μp (x i ), G); T s ← T s \{<u, Ri , x i >};
14. else
15. t ← {<o, μ>|<x i , Ri , u> ∈ T o and (s, o, μ) ∈ Eva(μp , μp (x i ), Ri , u, G); T o ← T o \{<x i , Ri ,
u>};
16. for each <u, Ri , x i > ∈ T s do
17. c ← {<s, μ' > | <s, μ1 > ∈ t, (s, o, μ2 ) ∈ Eva(μp , u, Ri , μp (x i ), G), μ1 , μ2 are compatible,
and μ’ ← merge(μ1 , μ2 )}; t ← c;
18. for each <x i , Ri , u> ∈ T o do
19. c ← {<o, μ' >|<o, μ1 > ∈ t, (s, o, μ2 ) ∈ Eva(μp , μp (x i ), Ri , u, G), μ1 , μ2 are compatible, and
μ’ ← merge(μ1 , μ2 )}; t ← c;
20. return c;

5.2.4.3 Evaluate Algorithm

Algorithm 5.3 calculates the set of maps μ such that <μ(ui ), μ(uj )> satisfies R in G
with the map μ (it is said that μ satisfies <ui , R, uj > in G). The results of Algorithm
5.3 are used to calculate the candidate homomorphism in Algorithm 5.2.
The algorithm first checks ui . If ui is a constant, i.e., a URIs or a literal, the
result set are obtained via calling the function Reach in line 2. The argument used to
the function is ui itself. Otherwise, the result set is then computed, by using Reach
algorithm in line 4. In this case, ui is a variable, and it constructs the map pair <ui , s>
as the argument used to call function in line 6. Algorithm 5.3 then checks uj , along
the same lines as ui . If uj is a constant, the result set of the algorithm is (s, uj , μ) in
lines 5–6, where s ∈ V. Otherwise, the result set is (s, o, μ' ) in line 8, where μ' ←
μ ▷◁ (uj ← o). Finally, the map result is returned in line 9.
164 5 Fuzzy RDF Queries

Algorithm 5.3: Eva(µp , ui , R, uj , G).

Input: A fuzzy RDF graph G, a graph pattern triple <ui , R, uj >, and a partial map μp .
Output: The set of maps μ satisfying triple <ui , μp (R), uj > in G.
1. if ui is a constant then
2. S ← Reach(G, R, ui , φ);
3. else
4. S ← ∪s∈V Reach(G, R, s, {<ui , s>});
5. if uj is a constant then
6. S ← {(s, uj , μ) ∈ S};
7. else
8. S ← {(s, o, μ' )|(s, o, μ) ∈ S, (μ, (uj ← o)) are compatible, and μ' ← μ ▷◁ (uj ← o)}};
9. return {μ|(s, o, μ) ∈ S};

5.2.4.4 Reach Algorithm

Regular path queries have been studied and used for querying databases and semi-
structured data. Liu et al. (2004) presented the algorithm Reach, which included
complete algorithms and data structures for directly and efficiently solving existential
and universal parametric regular path queries. Given a graph G, a regular expression
R, and a start vertex v0 in G, the authors consider a graph to be a set G of labeled
edges of the form <v1 , el, v2 >, with source and target vertices v1 and v2 respectively
and edge label el. They calculate Reach (G, R, v0 , μi ) called the reach set, which
are the set of triples <v, s, μ> such that some path from v0 to v in G matches some
path from s0 to s in R under map μ. The principle of the algorithm is based on the
following two rules:
Rule 1: if <v0 , el, v> ∈ G, <s0 , tl, s> ∈ R and μ ∈ match(tl, el), then <v, s, μ> ∈
Reach(G, R, v0 , μi );
Rule 2: if <v, s, μ> ∈ Reach(G, R, v0 , μi ), <v, el, v1 > ∈ G, <s, tl, s1 > ∈ R, μ1 ∈
match(tl, el) and μ2 = merge(μ, μ1 ), then <v1 , s1 , μ2 > ∈ Reach(G, R, v0 , μi ).
Here, match(tl, el) is the set of minimal substitutions μ such that el matches tl
under μ.
In order to realize the reachable query of fuzzy RDF regular path, we propose a
path reachable algorithm based on this method.
Algorithm 5.4 describes the detailed process, which computes all pairs <v, μ>
such that there is some path from v0 to vertex v that matches some path from s0 to
some vertex in A under map μ with satisfaction degree δ. In Algorithm 5.4, H is
the set of triples already considered for the reach set, W is the worklist of triples
yet to consider, E is the matching result, and we can compute Reach (G, R, v0 ,
μi ) by repeatedly adding triples according to the aforementioned two rules. We use
adjacency list to store adjacency information of each vertex of fuzzy RDF graph,
i.e., a list of triples (vertex ID, edge label, edge membership degree) ordered by the
vertex ID. We use nested arrays, hash tables, or combinations of them for R and W,
as well as for S.
5.2 Exact Pattern Match Query Over Fuzzy RDF Graph 165

Algorithm 5.4: Reach(G, R, v0 , µi )

Input: A fuzzy RDF graph G, a regular expression R, and a start vertex v0 in G.
Output: {(v0 , vk , μ)}
1. Construct the NDFA of R: A ← <S, s0 , β, F>;
2. Initialize reach set H, worklist W, query result E and the satisfaction degree δ;
3. for <v0 , el, v> ∈ G
4. for <s0 , tl, s> ∈ A
5. for μ in match (tl, el)
6. W ← W ∪ {<v, s, μ ▷◁ μi >};
7. while exists <v, s, μ> in W
8. H ← H ∪ {<v, s, μ>}; W ← W – {<v, s, μ>};
9. for <v, el, v1 > ∈G
10. for <s, tl, s1 > ∈ A
11. if μ in match(tl, el) then
12. μ1 ← {(tl, el)}; μ2 ← (μ ▷◁ μ1 ); δ ← min(δ(v), δ(v1 ), δ(el));
13. if (<v1 , s1 , μ2 >) ∈
/ H then
14. W ← W ∪ {<v1 , s1 , μ2 >};
15. if s ∈ F then
16. E ←E ∪ {<v0 , v, μ>};
17. return E;

This algorithm calculates the set of triples <v0 , vk , μ>, where vk is a vertex of G
and μ is a map from terms of R into terms of G such that there exists a sequence T
= (v0 , …, vk ) of vertices of G and a path label ω ∈ L(R) with T is a path label of
ω in G according to μ. We convert regular expression pattern straightforwardly to
a nondeterministic finite automaton, denoted by NDFA (Holub & Melichar, 1998)
in line 1. An automaton is a set A of labeled transitions of the form <s1 , tl, s2 >,
with source and target states s1 and s2 respectively and transition label tl, a finite
state set S, a start state s0 , and a final state set F ∈ S. To construct a NDFA that
generates an equivalent language to a given regular expression, we use the same
way described in (Aho & Hopcroft, 1974). Then we initialize reach set H, worklist
W and query result E in line 2. We compute possible map by adding triples yet to
consider into worklist W according to Rule 1 in lines 4–6. Given an edge label el
and a transition label tl, let match (tl, el) in line 5, which takes a set of symbols as
an implicit argument, be the set of minimal substitutions μ such that el matches tl
under μ. For each triple <v, s, μ> took from worklist, we add it to the set of triples
already considered for the reach set and update worklist in lines 7–8. We map a
pair <v, s> to the set of triples <v1 , s1 , μ1 > such that there is <v, el, v1 > in G and
<s, tl, s1 > in A and match (tl, el) = μ1 according to Rule 2 in lines 9–12. When a
mapping is dynamically constructed, we add it to the array of mappings if it is not
already present. To efficiently check whether it is present, we can maintain a nested
array structure representing all previously constructed mappings. We simply check
whether el matches tl under each of the extensions in line 11. In case of matching,
we merge the mapping with previously constructed mappings, and we calculate the
degree of satisfaction after the connection in line 12. If the extensions mapping is
166 5 Fuzzy RDF Queries

not in the set R, we add the result to the worklist W in lines 13–14. If s is the final
states (s ∈ F), the algorithm terminates execution and we add the matching result to
E in lines 15–16. We return E in line 17.

5.2.4.5 Correctness and Complexity

Proposition 5.2 Algorithm 5.1 is correct and complete for enumerating all RDF
homomorphism from a given pattern graph into a fuzzy RDF graph.

Proof We can prove this by means of induction. The set of all homomorphism
is complete for the empty set at the beginning of the algorithm. Because Algorithm
5.4 is complete (Alkhateeb et al., 2009) and the number of vertices being finite, the
partial homomorphism, i.e., μp , are completely extended for the current vertex at
each step. Finally, the procedure ends having a homomorphism mapping for each
vertex in Q.
Reach algorithm considers each triple <v, s, μ> in W and R, iterates over all
outgoing edges of v and outgoing transitions of s, and computes a match and possibly
a merger taking time O(predicatesize) and O(vars(Ri )), respectively, in each iteration.
The factor map is used because only substitutions that are the third component of a
triple in W and R, i.e., that match some path from v0 in G with some path from s0 in R,
are considered. So, Reach algorithm has worst-case running time O(|G| × |R| × maps
× (predicatesize + vars(Ri ))). For each triple <u1 , Ri , u2 > in Q, the Reach algorithm
is called by the Evaluate algorithm once if u1 is a constant; otherwise it is called for
each vertex in G multiplied by the number of variables in Q in the subject position.
So, Eva algorithm has overall time complexity O((vars(Q) × subj(G) + const(Q))
× |G| × |R| × maps × (predicatesize + vars(Ri ))), where vars(Q) and const(Q) are
the number of variables and constants appearing in the subject position in a triple of
Q. This result shows an exponential complexity O(pred(G)vars(R) ). However, vars(R)
can be a constant since it is usually considered very small with regards to the data
graph. Hence, the complexity of query evaluation is O(|G|2 ).

5.3 Approximate Fuzzy RDF Subgraph Match Query

At the core of many advanced RDF graph operations, lies a common and critical graph
matching primitive. Particularly, as one of the most important topics in this area,
efficiently finding all occurrences of a subgraph pattern have received considerable
attention (Lian and Chen, 2011; Moustafa et al., 2014). Subgraph pattern matching
is meaningful and useful in many applications. For example, answering SPARQL
queries in RDF database is actually equivalent to conducting subgraph isomorphism
match over graphs, in which users need to pose a query with strict conditions over
the database. Nevertheless, as users are not very clear about the contents and the
data distribution of the database, such a strict query often leads to the Few Answers
5.3 Approximate Fuzzy RDF Subgraph Match Query 167

Problem (Yan et al., 2017): the user query is too selective and the number of answers is
not enough. In the worst case, they even cannot get matching results of some queries.
More importantly, classical SPARQL querying assumes that RDF data are certain
and accurate and it does not consider fuzzy information in the querying process.
This motivates us to investigate fuzzy subgraph matching techniques suitable for
query answering, which can relax the rigid structural and label matching constraints
of subgraph isomorphism and other traditional graph similarity measures. In order
to efficiently answer the subgraph pattern query over the fuzzy RDF data graph,
inspired by the method of joining the path query graph introduced in (Virgilio et al.,
2015; Moustafa et al., 2014; Zhao & Han, 2010), we choose the path instead of the
vertex as the basic matching unit and propose a new path-based solution to efficiently
answer subgraph pattern queries over such fuzzy RDF graphs. The process of fuzzy
RDF subgraph pattern matching is as follows: the pattern graph is firstly decomposed
into a set of paths that start from a root vertex and end into a destination vertex, then
these paths are matched against the data graph, and the candidate paths that best
match the query paths are finally reconstructed to generate the answer. At the same
time, we calculate the path match membership (referring to an absolute possibility
of a match), and then aggregate into an overall match membership above a given
threshold during the query evaluation process.

5.3.1 Problem Definition

5.3.1.1 Path in Fuzzy RDF Graph

In the context of RDF graph, different paths denote different semantic relation-
ships between vertices. For an RDF graph, its root vertex is a vertex with indegree
(number of incoming edges) zero. While a destination vertex is a vertex with outde-
gree (number of outgoing edges) zero. A path whose starting vertex is a root is called
an absolute path. In addition, if there is no root vertex in the RDF graph, the starting
vertex of the path is the vertex with the largest difference between outdegree and
indegree. We call such vertices hubs.

Definition 5.4 (path). Assume G = (V, E, Σ, L, μ, ρ) is a fuzzy RDF graph. A

directed path p in G is defined as a finite sequence of distinct vertices p = v1 , v2 , …,
vn such that vi ∈ V and (vi , vi+1 ) ∈ E for i ∈ [1, n − 1]. Because RDF graphs have
a structure in which not only vertices but also edges have labels, a path expression
of the RDF graph can be described as a vertex-edge alternating sequence. We use
path-label PL(p) to denote the set of all vertex/edge labels in the path p. i.e., PL(p)
= L(v1 ), L(e1 ), L(v2 ), L(e2 ), …, L(vn−1 ), L(en−1 ), L(vn ), where L is a function that
assigns labels to vertices and edges.
168 5 Fuzzy RDF Queries

In our work, path expressions can be extracted from RDF graph G by breadth-
first traversal on every vertex starting from the roots. For each step, the absolute path
expressions from all roots to the current vertex and the vertex itself are output and
stored in relational tables path and resource, respectively (Matono et al., 2005).

Definition 5.5 (path subsumption). Given two paths p and p' in RDF graph, p = v1 ,
e1 , v2 , e2 , v3 , …, em−1 , vm , p ' = v1' , e1' , v2' , e2' , v3' , . . . , en−1
' '
, vn and m ≥ n. For each
1 < k < n, if ∀vk , ek ∈ p , such that ∃ vi , ei ∈ p, and vi = vk , ei = ek' , we say that p'
' ' ' '

is subsumed by p, denoted by p' ⊆ p. The subsumption is important to decrease the

number of considered paths.

Example 5.3 Let us consider for instance the fuzzy RDF graph G in Fig. 5.4a. This
graph has two root vertices (mid1 and mid2) and three destination vertices (country1,
country2 and country3). Three examples of paths of G are: p1 = mid1—Title—
Movies1, of length 1, p2 = mid2—Director—pid3—bornIn—City3, of length 2,
and p3 = mid2—Director—pid3—bornIn—City3—locateIn—Country3, of length
3. Among them, p2 is subsumed by p3 , namely p2 ⊆ p3 .

Definition 5.6 (path join). Assume G = (V, E, Σ, L, μ, ρ) is a fuzzy RDF graph, p

= v1 , v2 , …, vm and q = v1' , v2' , · · · , vn' are two fuzzy directed paths in G. The join
of p and q, denoted by p ▷◁ q, is defined as an induced graph on the vertex set {v1 ,
v2 , …, vm } ∪ {v1' , v2' , · · · , vn' }, where {v1 , v2 , …, vm } ∩ {v1' , v2' , · · · , vn' } /= φ are
expressed as intersection points between the paths p and q. To preserve the structural
information, intersection points between paths are represented as join predicates
that must be satisfied when the paths are joined into a full graph. For each pair of
overlapping paths, the join predicates of p and q are defined as JoinPredicate(p, q)
= {L(vi ) = L(vj )|vi ∈ p, vj ∈ q}, i.e., p and q are joinable if they share at least one
common vertex.

Example 5.4 Let us now consider the query graph Q in Fig. 5.4b, which has two
root vertices (mid1 and mid2) and two destination vertices (City2 and tragedy). We
decompose Q into three paths q1 , q2 , and q3 that start from a root vertex and end into
a destination vertex. It follows that the paths of Q are:

q1 = mid2 − Dir ector − pid3 − marriedT o − pid2 − bor n I n − City2

q2 = mid1 − Starring − pid2 − bor n I n − City2
q3 = mid1 − Genr e − tragedy

The intersection points between the paths q1 and q2 are pid2 and City2 and the join
predicates are JoinPredicate(q1 , q2 ) = {(q1 .pid2 = q2 .pid2), (q1 .City2 = q2 .City2)}.
In the same way, the intersection points between the paths q2 and q3 are mid1, and
the join predicates are JoinPredicate(q2 , q3 ) = (q2 .mid1 = q3 .mid1).
5.3 Approximate Fuzzy RDF Subgraph Match Query 169

Fig. 5.4 An example of data and query graph

5.3.1.2 Subgraph Pattern Matching Over RDF Graph

A subgraph query is to identify the occurrences of the query subgraph in the fuzzy
RDF database graph. A query graph Q = (V Q , E Q , L Q ) is an RDF graph, where
each vertex v ∈ V Q is labeled with a label L Q (v) ∈ Σ. The query graph specifies the
structural and semantic requirements that a subgraph of G must satisfy. Abstractly,
a subgraph query takes a query graph Q as input, retrieves the data graph G that
contains (or is similar to) the query graph, and returns the retrieved graphs or new
graphs composed from the retrieved graphs. In the fuzzy RDF database graph, we
formally define subgraph matching below.
170 5 Fuzzy RDF Queries

Fig. 5.5 Query processing phase schematic diagram

Given a fuzzy RDF data graph G, a query graph Q, and a user-specified satisfaction
threshold δ th ∈ [0, 1], where |V Q | ≤ |V|, a subgraph matching query is composed of
several parts including elements (vertices and edges) matching, structure matching
and match membership (referring to an absolute possibility of a match). Its answer
is a set of subgraphs M, such that (1) subgraph m ∈ M is similar to query graph Q,
and (2) matching membership δ m > δ th holds.
Naively, this problem can be solved by directly performing traditional subgraph
pattern matching over RDF graph. However, there are two key issues that need to be
solved:
How to effectively search for possible subgraphs in RDF graph?
How to effectively calculate the match satisfaction degree?
In order to deal with these two issues, we carefully design the corresponding
solutions to these two problems. As far as the first question is concerned, Zhao
and Han (2010) analyzed that paths have more advantage than trees and graphs as
appropriate indexing patterns in large graphs. Although more structural information
can be preserved by trees and graphs, their potentially massive size and expensive
pruning cost even outweigh the advantage for search space pruning. Thus, we choose
the path as the graph indexing during graph query processing. For the second issue,
the membership of a match M on G is an aggregation of the membership of a set of
matching paths. As the paths in this set are exactly those paths containing all vertices
in V M with correct labels as well as all edges in E M .
In the remainder of this section, we show how to measure path similarity by
calculating path edit distance and calculate the satisfaction degree of a given match
directly. This forms the basis for the algorithms discussed in Sect. 5.3.2, which further
speed up fuzzy subgraph pattern matching.
5.3 Approximate Fuzzy RDF Subgraph Match Query 171

5.3.1.3 Measuring Path Similarity

In order to compare the data paths to an input query path and decide which of the data
paths is most similar to the query path, it is necessary to define a distance measure
for paths. Similar to the string matching problem where edit operations are used
to define the string edit distance (Wagner and Fischer, 1974), we define a path edit
distance that is based on the idea of altering the paths by means of edit operations
until there exist a path equaling to the query path.

Definition 5.7 (Edit Operation). Given an RDF path p, a basic path edits operation
ω(p) on p is any of the following:

L(v) → σ, v ∈ V, σ ∈ Σ V : substituting an RDF entity or literal (i.e., modification

the label L(v) of vertex v by σ ).
L(e) → σ ' , e ∈ E, σ ' ∈ Σ E : substituting an RDF property (i.e., modification the
label L(e) of edge e by σ ' ).
v → ∈, v ∈ V: deleting an RDF instance or literal (i.e., deleting the vertex v from
p).
e → ∈, e ∈ E: deleting an RDF property (i.e., deleting the edge e from p).
∈ → v, v ∈ V: inserting an RDF instance or literal (i.e., adding a vertex v into p).
∈ → e = (v1 , v2 ), v1 , v2 ∈ V: inserting an RDF property between two existing
vertices v1 , v2 of p.

Here ∈ is an empty RDF entity, literal, or property.

These six operations in Definition 5.7 are sufficient to transform a path p into
another path p' . Therefore, it is always possible to find a sequence of basic edit
operations that transform a path p into another path p' .

Definition 5.8 (Edited Path). Given an RDF path p and a sequence T = (ω1 , ω2 , …,
ωn ) of edit operations, the edited path, T (p), is a path T (p) = ωn (…ω2 (ω1 (p))…).
In order to model the fact that certain edit operations are more likely than others,
each basic path edit operations ωi is assigned a certain cost c(ωi ). The cost c(ωi )
of an edit operation varies according to the type of edit operation and the nature of
the involved RDF element (Gao et al., 2010). For example, modifying vertex label
is less relevant to vertex insertion because the latter increases the semantic distance
between paths. It is obvious that how to determine the similarity of components in
paths and define costs of edit operations are the key issues. In order to make the
problem simple, in our work, we fix the cost of basic edit operations of insertion,
deletion, and labeling modification to 1, 0.5 and 0, respectively.
Σn
The total cost of the transformation of p into T (p) is given by c(T ) = i=1 c(ωi ).
In other words, the cost of edited path is the sum of the costs of all edit operations
in the sequence T. It is not difficult to see that there is usually no less one sequence
of edit operations that transforms one path p to another path T (p). For our path edit
distance measure, we are particularly interested in the sequence with the least cost.
172 5 Fuzzy RDF Queries

Definition 5.9 (Path edit distance). Given two paths p and p' , the path edit distance
between p and p' is defined as: dist ( p, p ' ) = MinTi ∈T {c(Ti )| T i is a sequence of
path edit operations that transformation p to p' }.

According to the above definition, we can conclude that the smaller the path edit
distance between a data model and an input query path, the more similar they are.
Intuitively, we will calculate the graph similarity distance by computing alignments
on the paths. It follows that a matching answer of Q over a data graph G is a set of
matching of all the paths of Q that forms a connected component of G (Virgilio et al.,
2015).

5.3.1.4 Calculating Matching Membership

In a classical RDF database, the answer to a query Q is either true or false definitely.
However, in a fuzzy RDF database, the system computes the answers and for each
answer computes a membership score representing the possibility. In terms of fuzzy
RDF graphs, the existential possibility associated with an element (vertex and edge)
should be the possibility of the state of the world among these elements. On the
surface, each possibility in the fuzzy RDF graph is a relative one based upon the
assumption that the element exists is independent. Therefore, we consider this possi-
bility as a relative possibility. However, each element in the RDF graph is dependent
upon the graph structure. Correspondingly, the existential possibility of a substruc-
ture (such as a path or a subgraph) composed of some basic elements in a graph
must depend on the relative possibility of the elements. For example, the existential
possibility of a path is related to each the relative possibility of each element (vertex
and edge) in the path. Therefore, we consider this possibility as an absolute possi-
bility. In order to calculate the absolute possibility (whole membership) of a match,
we must consider all the relative possibilities in the match. In general, the absolute
possibility of a match can be computed by aggregating the relative possibilities in
the match.
In a fuzzy RDF graph, we define three kinds of fuzzy structures, namely the triple
structure, the path structure and the graph structure. The fuzziness membership of
these three structures can be defined as follows.
(i) The fuzziness membership in the single triple.
In RDF graph, every triple describes a directed edge labeled with p from the
vertex labeled with s to the vertex labeled with o. The interpretation of each
triple is that subject s has property p with value o. Thus, an RDF triple can be
seen as a relationship from the subject vertex to the object vertex. Hence, the
absolute possibility of a triple can be computed by aggregating the possibilities
of s, p and o. We introduce a membership aggregation function to calculate the
fuzziness memberships for RDF triples.
5.3 Approximate Fuzzy RDF Subgraph Match Query 173

Definition 5.10 Assume G = (V, E, Σ, L, μ, ρ) is a fuzzy RDF graph and t = (vi ,

ρ, vi+1 ) is a fuzzy RDF triple in G. The fuzzy membership δ t for triple t is defined as

δt = tm(μ(vi ), ρ(vi , vi+1 ), μ(vi+1 ))

where tm is an application-specific membership aggregation function.

It should be pointed out that applications have the freedom to choose a function
that fits their use cases. The minimum, for instance, is a cautious choice. It assumes
that the possibility of a triple is simply the possibility of the least possibility item
of the triple. The median is another reasonable membership aggregation function.
In our work, we choose the Zadeh’s logical product (minimum) t-norm (Zou et al.,
2014) for aggregating the relative possibilities.
(ii) The fuzziness membership in the single path.
The concept of a fuzzy relationship plays a fundamental role in modeling a fuzzy
graph. Let V be a set of vertices, a fuzzy relationship on V is a mapping function
ρ: V × V → [0, 1] where ρ(x, y) indicates the degree of relationship between x
and y. The fuzzy relation ρ may be viewed as a fuzzy subset on V × V, which
be used to represent the relationship between vertices. An important operation
on fuzzy relations is composition. In general, fuzzy relationship composition
is applied to derive new relationships between two relationships by reusing
already existing relationships.

Definition 5.11 (Zimmermann, 1996). Let V be a set of vertices. For i ∈ {1, 2, 3},
μi is a function from V i into [0, 1], and for i ∈ {1, 2}, ρ i is a function from V i ×
V i+1 into [0, 1], i.e. ρ 1 and ρ 2 be two fuzzy relations on μ1 × μ2 and μ2 × μ3 ,
respectively. The composition of ρ 1 and ρ 2 , denoted by ρ 1 ◦ ρ 2 , is defined as
∀ (u1 , u3 ) ∈ V 1 × V 3 , we have (ρ 1 ◦ ρ 2 ) (u1 , u3 ) = supu 2 ∈V2 {ρ 1 (u1 , u2 ) ∧ ρ 2 (u2 ,
u3 )}, here ∧ is the minimum.

To define the composition of more than two relationships, we can n − 1 times

apply the binary compose operator (◦). Starting with the first relationship ρ 1 , we
compose succeeding relationships along the ordered chain of relationship with the
binary operator. The result of one binary compose step is used as input for the next
step until we processed the last relationship ρ n of the ordered chain. Let P = (ρ 1 , ρ 2 ,
…, ρ n ) be ordered chain of relationship. Then the n-ary composition of P is defined
as compose(P) = (… (ρ 1 ◦ ρ 2 ) ◦…) ◦ ρ n . Thus, new relationships are generated
indirectly via one or more intermediate relationships.
As discussed above, different paths of the RDF graph denote different relation-
ships between vertices in fuzzy RDF graph. A path of the RDF graph can be composed
by more RDF triples as a vertex-edge alternating sequence. Hence, the absolute possi-
bility of a matching path can be computed by aggregating the possibilities of the set
of triples comprising the matching path.
174 5 Fuzzy RDF Queries

Definition 5.12 Assume G = (V, E, Σ, L, μ, ρ) is a fuzzy RDF graph, and p = v1 ,

v2 , …, vn is a fuzzy directed path in G. The fuzzy membership δ p for path p is defined
as

δ p = tm(δt1 , δt2 , . . . , δtn )

where tm is an application-specific fuzziness membership aggregation function, δ ti

is the set of triples.

Like the triple membership aggregation function in Definition 5.9, we also choose
the minimum t-norm for aggregating the relative possibilities. It is clear that δ p =
ρ(v1 , v2 ) ∧ ρ(v2 , v3 ) ∧ … ∧ ρ(vn−1 , vn ) ∧ μ(v1 ) ∧ μ(v2 ) ∧ … ∧ μ(vn ), i.e., it is the
minimum fuzzy value of the edge or vertex in the fuzzy path.
(iii) The fuzziness membership in the graph.
The fuzziness memberships of an RDF subgraph can be computed by aggre-
gating the possibilities of the set of paths comprising the subgraph. Hence, we
introduce a membership aggregation function to calculate the memberships for
RDF subgraphs.

Definition 5.13 Assume G = (V, E, Σ, L, μ, ρ) is a fuzzy RDF graph, and G' ∈ G

is a fuzzy subgraph which is joined by the set of paths P = (p1 , p2 , …, pn ), i.e., G' =
(p1 ▷◁ p2 ▷◁ … ▷◁ pn ). A membership aggregation function for fuzzy RDF subgraphs
is a function tm which assigns each fuzzy RDF graph G' an aggregated fuzziness
value δ G' that represents the fuzziness of G' , which is defined as
( )
δG ' = tm δ p1 , δ p2 , · · · , δ pn

where tm is an application-specific membership aggregation function. In our work,

we choose the minimum as our function, i.e., the overall membership value δ G' is
the minimum of the membership degrees produced by paths.

5.3.1.5 Motivating Example

In this subsection, we will illustrate the path-based query processing by a simple

motivating example. Assume that a user wants to seek the Country1 excellent
actors/actresses who are married to the director of Movie2. Specifically, the genre of
the movie that the actor/actress starred in is tragedy. Query graph Q in Fig. 5.4b is
a possible way to express this information need. A query also specifies a minimum
threshold δ th (δ th = 0.25 in the example), to indicate that only matches with possibility
larger than δ th should be returned.
5.3 Approximate Fuzzy RDF Subgraph Match Query 175

The query processing phase first decomposes the query graph Q into a set of paths
that start from a root and end into a destination. In our example, we decompose Q
into three paths as described in Example 5.4.
Then the query method extracts all the paths of data graph G in Fig. 5.4a that
align with these query paths taking advantage of a special index structure that is built
off-line. In our example, the following data paths of G would be extracted:

p1 = mid2 − Dir ector − pid3 − marriedT o − pid2 − bor n I n − City2 [0.3]

p2 = mid1 − Starring − pid2 − bor n I n − City2 [0.3]
p3 = mid1 − Genr e − tragedy [0.95]

At the same time, absolute matching membership of a match can be computed by

aggregating the relative memberships in the match. For instance, matching member-
ship of path p2 is computed by minimizing the three vertex label memberships (1,
1, 1) and two edge memberships (0.85, 0.3), resulting in a match possibility of 0.3,
which is above our cutoff of 0.25. Similarly, the other two memberships of path
match, p1 and p3 , are 0.3 and 0.95, respectively, and they also satisfy the minimum
threshold constraint.
Finally, the candidate paths are suitably joined to generate the answer to the query.
The process starts from the matches of one path and progressively adds matches of
joining paths, based on the join predicates of joining paths. In our example, the
join predicates between the paths q1 and q2 are JoinPredicate(q1 , q2 ) = {(p1 .pid2
= p2 .pid2), (p1 .City2 = p2 .City2)}, which have been described in Example 5.4.
Therefore, paths p1 and p2 are joined by merging pid2 and City2. In the same way,
paths p2 and p3 are joined by merging vertex p2 .mid1 and vertex p3 .mid1. Thus, we
obtain an induced subgraph A of G in Fig. 5.4a enclosed by dashed lines, which
is a possible match for the query Q. The matching membership of that potential
answer is computed by minimizing the three path memberships (0.3, 0.3, 0.95). The
induced graph contains a match for the query Q with possibility 0.3, which satisfies
the minimum threshold constraint, and so it is the only answer to our query.
We tackle the problem of querying the RDF graph by finding the best combinations
of the paths of the data graph that best align with the paths of the query graph.

5.3.2 The Matching Algorithm

In this section, we introduce the approximate subgraph matching algorithm. We first

start with an overview of this algorithm in Sect. 5.3.2.1. Then we describe the graph
matching processing algorithm in detail in Sect. 5.3.2.2. We analyze the complexity
of our algorithm in Sect. 5.3.2.3.
176 5 Fuzzy RDF Queries

5.3.2.1 Overview of Our Approach

Based on the above analysis, we propose a path-based solution to the fuzzy RDF
subgraph matching. The approach is composed of two main phases:
1. Data Preprocessing: The graph traverse algorithm is very time-consuming, since
it is made in every user interaction. Thus, we need to build an indexing structure
that contains information about vertices and edges in fuzzy RDF data graph.
The graph indexing is executed only one time, independently of the user interac-
tion. Based on the fact that paths have more advantage than trees and graphs as
appropriate indexing patterns in large graphs (Zhao & Han, 2010), we propose
a novel graph indexing method, context-aware path indexing, to capture infor-
mation about the graph paths and their membership degrees, enabling efficient
retrieval of candidate matches.
An optimization strategy by starting only from the root vertices is then considered.
In order to extract the set of all paths that reach a given vertex v, we started the
exploration of G from the roots by using a breadth first search. For each step, the
corresponding path expressions from all roots to the current vertex and the vertex
itself are output and stored in the path table and resource table, respectively. The
resource table can be used to locate the destination vertex of each candidate path,
such that given a vertex v in the query graph, we can easily figure out its candidate
paths. The path table enables us to skip the expensive graph traversal at runtime. In
order to increase efficiency of path-based query processing, we introduce reverse-
path expressions and build a B+ tree index in the path table. In addition, we pre-
computed and store the underlying membership degree of each path by applying
the corresponding aggregation functions as specified in Sect. 5.3.1.4. Thus, the
path index contains all reverse absolute arc-path expressions from the current
vertex to all roots in the fuzzy RDF graph, with an aggregation membership
degree δ.
2. Query Processing: This is the subgraph matching phase, which consists of three
sub phases, namely path decomposition, finding candidate path and jointing
candidate path. Figure 5.5 illustrates a general framework for a pattern match
query Q over a fuzzy RDF graph G. We briefly present each step in the following.
• Path Decomposition. In this step, we partition the query graph into a set of
paths Q = {q1 , q2 , …, qk } by decomposition algorithm. To facilitate recon-
struction answers subgraph, we employ a k-partition intersection graph to
preserve the structure information of the graph query. In the k-partite inter-
section graph, a vertex corresponds to a query path q while an edge (qi , qj )
5.3 Approximate Fuzzy RDF Subgraph Match Query 177

means that the paths qi and qj share at least one common vertex, i.e., paths
qi and qj are jointly and there are at least one intersection points between
them. Moreover, intersection points between the paths are expressed as join
predicates, which have to be satisfied when combining (reconstructing) path
matches into a full query match.
• Finding path candidates. For each query path q ∈ Q, we first conduct fuzzi-
ness membership (the membership degree must be greater than or equal to the
user-specified threshold δ th ) to obtain a set of qualified candidates matches in
the indexed paths of data graph G. Then we use path edit distance dist (q, p)
between query path q and data path p to further filter the remaining match set.
By using these later, the system generates from G all paths that have a good
candidate of the query paths.
• Combination. In this step, we obtain the full graph matches by reconstruction
candidate paths using a graph explore algorithm, which performs message
passing in the k-partition intersection graph where each partition corresponds
to a path in the query decomposition. The results are a set of approximate
subgraphs included in G, and it generated from joining all candidate paths
matching with the paths in the decomposition. In the end, the actual matching
answers are ranked according path edit distance, and the user is able to explore
these subgraphs to get more information about vertices.

5.3.2.2 Graph Matching Processing

In this section, we discuss how graph matches are processed. Given a query graph Q,
we first study how Q can be split into a set of paths, among which parts of paths with
good selectivity are then selected as candidates. Q is then reconstructed by joining
the selected candidate paths until every edge in Q has been examined at least once.
We discuss each step of the query processing in the following subsections.
1. Query Path Decomposition
Given a query graph Q, the main task of query decomposition is to split Q into
a set of possibly overlapping paths, denoted as P, that cover the entire query, by
traversing the entire query graph Q. As finding a least-cost path decomposition
based on the number of operations involved in producing the final result is too
costly, we use a simple path decomposition method in order to reduce query
search space and improve efficiency. The idea is simple: the set of all paths from
a vertex s to another vertex t is the intersection of all paths starting at s and the
set of all paths ending at t. The task of path decomposition is to split the query
into a set of possibly overlapping paths, each of length L or less, that cover the
entire query, and whose matches can be obtained from the path index.
178 5 Fuzzy RDF Queries

The principle to decompose query graph Q into a set of paths P is that we start the
exploration of graph Q from a root by using a bread-first search and extract all paths
starting from the root and end into a destination, whose matches can be obtained
from the path index of the data graph G. In order to preserve the graph structural
information of the query, the elements of P are organized as a k-partite intersection
graph.
Now, we are ready to implement the function which will list all paths between a
pair of vertices. The implementation is easy: look at the opposite graph of Q then
find the paths beginning at the given vertex and return the reverses of each path. The
code below will find all paths between every pair of root and destination.
In Algorithm 5.5, we begin by initializing the set of paths in line 1, and then extract
root vertex of Q in line 2. We further call function findpath for each root vertex to
obtain the paths and we add them into P in lines 3–4. Finally, we establish a k-partite
intersection to keep structural information of query graph Q in lines 5–9, in which
we obtain the intersection points and join predicates between the path q and q' .
Function findpath shows the main algorithm of query path decomposition, which
operates in three stages. In the first stage, we initialize all varies, in which PathSet
be used to store decomposed path set. We initialize it as null in line1. We use π[v]
to store parent vertex of v and set the parent of root vertex s to be NIL in line 2. We
use queue Queue to store visited vertices in line 3. In the second stage, the breadth-
first search algorithm develops a spanning tree (a breadth-first search tree) with the
source vertex, s, as its root. The parent or predecessor of any other vertex in the tree
is the vertex from which it was first discovered. For each vertex v, the parent of v is
placed in the variable π[v]. After initialization, the source vertex is discovered. Line
4 initialize Queue to contain just the root vertex s. Lines 6–9 guarantees to remove
the vertex u from the queue when insert the new vertex v adjacent to the vertex u in
the queue and establish the search tree. At the same time, whether the vertex adjacent
to u is visited in the process of creating the search tree, if it is not visited, we insert it
in the queue in line 10. The breadth-first search traversal terminates until the queue is
empty, i.e., every vertex has been fully explored. In the last stage, we obtain the path
from the source vertex s to destination vertex t in lines 11–17. Breadth-first search
algorithm builds a search tree containing all vertices reachable from s. The set of
edges in the tree contains (π[v], v) for all v where π[v] /= NIL. If s is reachable from
the bottom of the tree v then there is a unique reverse path of tree edges from v to s.
We return path set PathSet in line 18.
5.3 Approximate Fuzzy RDF Subgraph Match Query 179

Algorithm 5.5: Decomposition

Input: The query graph Q.
Output: The query path set P and k-partite intersection graph.
1: P ← { };
2: S ← FS(Q);
3: foreach s ∈ S do
4: P ←P ∪ findpath(Q, s);
5: foreach q ∈ P do
6: foreach q' ∈ P −{q} do
7: if ((JoinPredicate(q, q' ) ← L(q ) ∩ L(q' )) != null) then
8: J(q) ← J(q) ∪ {q' };
9: Pathset.put ((q, J(q), JoinPredicate(q, q' )));
Function findpath (Q, s)
1: PathSet ← {};
2: π[s] ← NIL;
3: Queue ← {};
4: ENQUEUE (Queue, s);
5: while (!Queue.isEmpty())
6: u ← DEQUEUE(Queue);
7: foreach ((v ← getUnvisitedAdjacentVertex (u)) != null) do
8: v.visited ← true;
9: π[v] ← u;
10: ENQUEUE (Queue, v);
11: T ← FD(Q);
12: foreach t ∈ T do
13: p ← { };
14: while (t != NIL)
15: p.add(t);
16: t ← π[t];
17: PathSet ← PathSet ∪ p;
18: return PathSet;

2. Finding Path Candidates

After the query graph Q has been decomposed, the next step is to find candidate
matches for every query path. This step needs to solve two problems: one problem
is how to get all possible paths from G for query path q, another problem is that
how to decide that a given path is a good approximate path for q from the found
paths.

How to extract from G the paths that are similar to the query paths is important.
Every query path q ∈ PathSet has two specific labeled vertices: the root vertex,
denoted by s(q) ∈ V (Q) and the destination vertex, denoted by t(q) ∈ V (Q). From
this, if the destination vertex t is specified, we can find its correspondents in data
graph G by acceding to the extended labels L(v) of every v ∈ V (G) using the labels
similarity which is able to discover the common meaning of given labels of two
180 5 Fuzzy RDF Queries

vertices. Thus, every destination vertex t(q) has a set of similar vertices from G,
denoted by M t(q) so M t(q) = {v|v ∈ V (G), L(v) = L(t)}. The goal here is clear: for
every query path q, using vertices in M t(q) , searching in the indexed paths of RDF
graph G and discovering a set of candidate paths, denoted by CandidateSet(q), which
represent an approximation of the query path q. In order to reconstruct the query Q
in an efficient way, we build a set for every element q ∈ PathSet. Then, we group in
the same set all the paths p of data graph G having a destination vertex that matches
the destination vertex of q. Thus, each path in the same set maps counterpart path q
of PathSet.
To build the answer subgraphs, the approximate candidate paths that participating
in this building must be computed. Since there is many false positive during the
matching candidate path examining, we need to prune the false positive in the first
place. For every path q ∈ PathSet, we access the path index to get its candidate
matches set CandidateSet(q) by only keeping those paths that satisfy certain context
criteria as following:
(i) We compute the path edit distance dist(q, p) between query path q and data path
p. The main goal of the path edit distance is deciding if a given path has a good
approximation of q. It can be concluded that the smaller the path edit distance
between a query and a data path is, the more similar they are. For a path p ∈ G,
if dist(q, p) is the smallest, p be a candidate for the corresponding path q.
(ii) We should obtain the fuzzy satisfaction degrees δ p for path p. The fuzziness
membership δ p in the single path p must be greater than or equal to δ th , which
is a user-specified fuzziness membership threshold.

Algorithm 5.6: Find Paths Candidates

Input: The query path set PathSet, the data graph G.
Output: The candidates set CandidateSet.
1: foreach q ∈ PathSet do
2: t ← L(q);
3: cn ← φ;
4: C ← getpaths(G, t);
5: foreach p ∈ C do
6: if δ(p) ≥ δ th then
7: m ← dist (p, q);
8: cn.enqueue(p, m);
9: CandidateSet ← CandidateSet ∪ {(q, cn)};
10: return CandidateSet;

Given a path q ∈ PathSet, we perform the above criteria tests to efficiently obtain
the final list of candidates CandidateSet(q) from G. And extraneous paths in the data
graph are automatically ignored. Thus, we are able to compute a set of ranked tuples
containing the candidate’s paths of q and its path edit distance. And the tuples in a
set are ordered according to their path edit distance, with the lower coming first.
5.3 Approximate Fuzzy RDF Subgraph Match Query 181

Given the query path set PathSet (i.e. also the k-partite intersection graph) and a
data graph G, we retrieve and select the paths from G ending into the destinations of
the paths of PathSet, as showed in Algorithm 5.6.
In Algorithm 5.6, for each q ∈ PathSet, we firstly extract destination vertex t of q
in line 2. Then we select all possible paths p from G index matching t by the function
getpaths in line 4. This prevents a sequential scan of all paths in a large graph. After
has obtained the possible path set C, we prune the false positive in line 6. At the same
time, we compute the path edit distance of each p transformed from q and we insert p
in the set cn in ascending order in lines 8–9. At the end, we insert cn in CandidateSet
in line 9. The set CandidateSet is implemented as a map where the key is a path q
from P and the value is a set with all the paths p ending in the destination of q. Each
set is implemented as a priority queue of paths, where the priority is determined by
the path edit distance associated with each path.
3. Full Query Matches
The last step of algorithm is selecting the most relevant paths and generating the
full query matches by joining the paths with the lowest path edit distance from
each set. The join order is determined by exploring the k-partite intersection
graph where vertices represent the retrieved paths, while edges between paths
mean that they have vertices in common. The join condition is the number of
join predicates between path p and p' equaling number of join predicates between
path q and q' , where q and q' are the paths corresponding to the sets where p and
p' were included, respectively.
At the same time, a join operator that operates on fuzzy solution matching has
to consider the fuzzy membership values while combining solutions. The fuzzy
membership value of a combined solution is an aggregation of the membership
values associated with the individual paths that has been used for combining. In
other words, the absolute possibility of a match can be computed by aggregating
the relative possibilities in the match. To determine the fuzzy membership values
for solution matching, we choose the minimum as our application-specific fuzzy
membership aggregation function. Algorithm 3 outlines the combining procedure.
Algorithm 5.7 starts from the matches of one path and progressively adds
matches of joining paths, based on the k-partition intersection graph K-
partiteIntersectionGraph. Once we have obtained the set CandidateSet, our graph
search algorithm is then performed by joining the most promising paths from Candi-
dateSet. We initialize our result set to an empty set in line 1. If there are no results
after a joining process ends, we output the empty set. If we are not able to generate
k answers and the set CandidateSet is not empty in line 2, we obtain the top-k
answer by selecting and combining the paths ordered in increasing order of the
path edit distance from each set CandidateSet. Firstly, we initialize the answers
set and the fuzzy membership value in line 3. Then we choose the vertex q of
K-partiteIntersectionGraph with the largest number of vertices overlapping (join
predicates) with the existing paths in line 4. We select the set cn corresponding to q
and we dequeue the top paths p from cn in lines 5–6. The path q is added into the set
V of visited matching paths in line 7. In lines 8–9, we add p into the answer ans and
182 5 Fuzzy RDF Queries

computer the fuzzy membership value δ m of the answer. We obtain the full answer
in line 10 by a breadth-first search traversal as shown in detail in function BFS-visit.
Finally, we include the full answer ans in the set ApproximateAnswersSet in line 11.
By using this strategy, if we are not able to find k approximate answers for the query
graph Q, the process is stopped.

Algorithm 5.7: Full query matches

Input: The candidates set, k-partite intersection graph, number k (the number of answers
required).
Output: The top-k approximate answers set of query Q.
1: ApproximateAnswersSet ← { };
2: while (|ApproximateAnswersSet| < k) and (not empty CandidateSet)
3: ans ← { }; δ m ← 1;
4: q = maxCardinality(K-partiteIntersectionGraph);
5: cn ← CandidateSet.get(q);
6: p ← cn.dequeueTop( );
7: V ← {q};
8: ans ← ans ∪ {p};
9: δ m ← min(δ m , δ p );
10: BFS-visit (p, ans, CandidateSet, K-partiteIntersectionGraph, q, V )
11: ApproximateAnswersSet.put(ans, δ m );
12: return ApproximateAnswersSet;
Function BFS-visit (p, ans, CandidateSet, K-partiteIntersectionGraph, q, V )
1: foreach (q, q' ) ∈ K-partiteIntersectionGraph do
2: if q' ∈ / V then
3: cn ←CandidateSet.get(q' );
4: p' ← cn.dequeueTop( );
5: if (|L(p) ∩ L(p' )| == |L(q) ∩ L(q' )|)
6: ans ← ans ∪ {p' }; δ m = min (δ m , δ p' );
7: π[q' ] = q;
8: BFS-visit (p' , ans, CandidateSet, K-partiteIntersectionGraph, q' , V );
9: V ←V ∪ {q' };

5.3.2.3 Algorithm Complexity

We now analyze the complexity of each step of our algorithm. In the data prepro-
cessing, we need to construct a path indexing structure by traversing fuzzy RDF graph
G. For each vertex v, we exploit an optimized implementation of the breadth-first
search traversal from root vertex s to collect path information. Suppose the average
degree of s in G is d, it is straightforward to demonstrate that the time complexity of
the data preprocessing phase is O(|E| + |V| × d), where |E| is the number of relations,
|V| is the number of vertices in RDF graph G and d is the largest vertex degree.
The core procedure of the path decomposition step is a breadth-first search
processing in essence. The while-loop in the breadth-first search is executed at most
|V Q | times. The reason is that every vertex enqueued at most once. So, the complexity
is O(|V Q |). The for-loop inside the while-loop is executed at most |E Q | times since
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 183

Q is a directed graph. The reason is that every vertex dequeued at most once and we
examine (u, v) only when u is dequeued. Therefore, each edge is examined at most
once as directed. So, the complexity is O(|E Q |). Therefore, the total running time for
the sub-step is O(|V Q | + |E Q |), where |V Q | and |E Q | are the numbers of vertex and
edge of query graph Q, respectively.
The complexity of the procedure of finding path candidates step is |P| × O(D),
where |P| is the number of the query paths in set P and D is the number of paths
retrieved by the index that, in the worst case, is proportional to the size of data. That
is, we have to execute D insertions into CandidateSet for |P| times at most.
In full query matches step, the joint sub-step is most time-consuming. And it
iterates at most k times, where k is the number of the returned answers. In each
iteration, there is a call of the function BFS-visit, which explores the k-partition
intersection graph. In the worst case, it has a cost in O(h × D) since it checks h times
each data path in G, where h is the depth of K-partiteIntersectionGraph. Therefore
the complexity of this sub-step is O(k × h × D), since k times we call the function
BFS-visit to explore K-partiteIntersectionGraph, that is O(h × D).

5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph

Fuzzy queries to databases have been suitably used in several domains such as in
decision making support or linguistic summarization. In particular, fuzzy quantified
queries have proved useful in a relational database context for expressing different
types of imprecise information needs (Bosc et al., 1995). This work examines advan-
tages of fuzzy queries, which provide a better representation of the user requirements
by expressing imprecise conditions through linguistic terms. In this section, we intro-
duce fuzzy quantifiers (Zadeh, 1983) into fuzzy RDF database queries. Such quan-
tifiers can be used to express an intermediary attitude between conjunction (“all of
the criteria must be satisfied”) and disjunction (“at least one criterion must be satis-
fied”). They model linguistic expressions such as, “most of”, “about a third”, and are
notably used to construct fuzzy predicates (with quantifications).
Fuzzy quantified queries have received significant attention in the database
community for several decades. Bouchon-Meunier and Moyse (2012) proposed an
overview of linguistic summarization, presenting the main streams of a symbolic
representation and management of numerical data, which can be crisp or fuzzy. They
pointed out that fuzzy approaches bring solutions to the imprecision of quantifica-
tion and the use of subjective qualification of data. Delgado et al. (2014) presented
an overview of the existing approaches for evaluating and managing statements
involving quantification. In a graph database context, there have been some recent
proposals for incorporating quantified statements into user queries (see, (Bry et al.,
2010; Blau et al., 2002; Yager 2014; Pivert et al., 2016c). SPARQLog (Bry et al.,
2010) extended SPARQL with first-order logic (FO) rules, including existential and
universal quantification over vertex variables. QGRAPH (Blau et al., 2002) annotated
vertices and edges with a counting range (count 0 as negated edge) to specify the
184 5 Fuzzy RDF Queries

number of matches that must exist in a database. Yager (2014) briefly mentioned the
possibility of using fuzzy quantified structure queries in a social network database
context. He also suggested interpreting it using an OWA operator. However, the
author did not propose any formal language for expressing such queries. Pivert et al.
(2016c) considered a particular type of fuzzy quantified structural query in the general
context of fuzzy graph databases and showed how the fuzzy quantified structural
query could be expressed in FUDGE which is an extension of the CYPHER query
language. Castelltort and Laurent (2016) proposed an approach aimed to summarize
a (crisp) graph database by means of fuzzy quantified statements. They considered
a crisp interpretation of this concept and recall how the corresponding query can be
expressed in CYPHER. A limitation of this approach was that only the quantifier was
fuzzy. More recently, Fan et al. (2016) introduced quantified graph patterns (QGPs),
an extension of the classical graph patterns using simple counting quantifiers on
edges. The authors also showed that quantified matching in the absence of negation
did not significantly increase the cost of query processing. However, quantified graph
patterns could only express numeric and ratio aggregates, and negation besides exis-
tential and universal quantification. They did not consider fuzzy quantified patterns
matching in the fuzzy RDF graph database.
In the following, we intend to integrate linguistic quantifier in a subgraph patterns
addressed to a fuzzy RDF graph database and use graph pattern matching approach
to evaluate fuzzy quantified query. In a fuzzy RDF graph database context, fuzzy
quantified queries have an even higher potential since they can exploit the structure
of the RDF graph, beside the label values attached to the vertices or edges. In the
present section, we define the syntax and semantics of an extension of the query
pattern graph that makes it possible to express and interpret. In addition, in order to
answer subgraph pattern queries efficiently over fuzzy RDF data graph, we present
a novel approach for evaluating fuzzy quantified graph pattern.

5.4.1 Linguistic Quantifier and Fuzzy Quantified Statement

Linguistic summaries have been studied for many years and allow to sum up large
volumes of data in a very intuitive manner. They have been studied over several types
of data. However, few works have been led on graph databases. In this section, we
recall important notions about linguistic quantifiers and fuzzy quantified statement.
Linguistic quantifiers modelled by means of fuzzy sets is then proposed for modelling
the so-called fuzzy quantified statements.
1. Linguistic quantifier
The notion of fuzzy or linguistic quantifier (Zadeh, 1983) describe an inter-
mediate attitude between the universal quantifier ∀ and the existential quanti-
fier ∃. Depending if quantifiers represent imprecise quantities or proportions of
quantifiers are classified into absolute or relative quantifiers respectively.
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 185

(i) Absolute quantifiers express quantity over the total number of elements of
a particular set, stating whether this number is, for example, “much more
than 10”, “around 5”, “a great number of”, and so forth.
(ii) Relative quantifiers express measurements over the total number of
elements, which fulfill a certain condition being dependent on the total
number of possible elements (the proportion of elements). This type of
quantifier is used in expressions such as “most”, “little of”, “at least half
of”, and so forth.
Consequently, the truth of the relative quantifier depends on two quantities. In
this case, in order to evaluate the truth of the quantifier, we need to find the total
number of elements fulfilling the condition and to consider this value with respect to
the total number of elements that could fulfill it (including those that do fulfill it and
those that do not). Essentially, linguistic quantifiers are fuzzy proportions or fuzzy
probabilities.

Definition 5.14 (Linguistic quantifier). A linguistic quantifier named Q is defined

by a fuzzy set with a membership function μQ whose domain depends on whether it
is absolute or relative:

Q abs : R → [0, 1]
Q r el : [0, 1] → [0, 1]

where the domain of Qrel is [0, 1] because the division a/b ∈ [0, 1], where a is
the number of elements fulfilling a certain condition and b is the total number of
existing elements. The value μQ (x) expresses the extent to which proportion x (resp.
the cardinality x) agrees with the quantifier. Therefore, linguistic quantifiers can be
considered as fuzzy conditions which are defined on cardinalities or proportions.

Example 5.5 “Around 7” is an absolute fuzzy quantifier, defined as a triangular and

symmetrical function (Fig. 5.6a) with m = 7 and margin = 6, for example. “Most” is
a relative fuzzy quantifier, defined as shown in Fig. 5.6b. According to this linguistic
quantifier, a proportion less than 25% cannot be considered in agreement with “most”
(since μmost (p) is 0 for p ≤ 0.25). For a proportion between 25 and 100%, the closer
to 100% the proportion, the more it agrees with “most”.

2. Fuzzy Quantified Statements and Interpretation

Quantified statements are a meta description of the information in the database
and can be used to express relational knowledge about the data. Areas of appli-
cability of linguistic quantified statements are very wide. For example, in infor-
mation retrieval quantified statements are used to model natural language state-
ments for querying a database in a more flexible manner. There are two types of
quantified statements (Delgado et al., 2014):
Type I: “Q of X are f 1 ”
Type II: “Q of f 2 X are f 1 ”
186 5 Fuzzy RDF Queries

Fig. 5.6 Graphical representation of fuzzy quantifier

where Q is a linguistic quantifier, X is a finite crisp set, f 1 and f 2 are fuzzy

conditions of linguistic concepts, defined over the domain of X including fuzzy
predicates, fuzzy operators and conjunctions ∧, ∨ and so on.

A quantified statement of the Type I means that, among the elements of set X,
a quantity Q satisfies the fuzzy predicate f 1 . Such a statement can be more or less
true and many approaches can be used to interpret the quantified statement. Note
that Type II generalizes the case of Type I by considering that the set to which the
quantifier applies is itself fuzzy. An example of Type I statement is “Most of the
students are young” and of Type II is “Most of the good students are young”, where
X is a finite set of students, the quantifier is “most”, f 1 is the property “young” and
f 2 represents the property “good”.
Also associated with a linguistic quantified statement is a truth value [0,1], called
satisfactory degree of the statement. The process of calculating the satisfactory degree
of a quantified statement is usually known as an evaluation method. The problem is
to find truth value μ(Q of X are f 1 ) or μ(Q of f 2 X are f 1 ), respectively, knowing
that truth (x is f 1 ) ∀x ∈ X, which is done using Zadeh’s calculus of linguistically
quantified propositions (Zadeh, 1983). Some other quality criteria see the literature
(Delgado et al., 2014).

5.4.2 Fuzzy Quantified Graph Patterns Matching

The problem to be addressed in this section is to find the answers to a fuzzy quan-
tified statement over a fuzzy RDF graph G. The key challenge in this problem is
how to represent the query intention of the fuzzy quantified statement in a structural
way. The underlying RDF repository is a graph structured data, but, the fuzzy quan-
tified statements is unstructured data. To enable query processing, we need a graph
representation of fuzzy quantified statement.
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 187

1. Fuzzy Quantified Graph Patterns

Because fuzzy quantified statement is unstructured data and G is graph structure
data, we should fill the gap between two kinds of data. The graph pattern provides
a simple yet intuitive specification of the structural and semantic requirements
of interest in the input graph. Therefore, we propose a fuzzy Quantified Graph
Patterns (f -QGPs), by extending conventional graph patterns to express fuzzy
quantified matching conditions. In a general setting, the fuzzy quantified state-
ments considered are of the form “Q of f 2 X are f 1 ” over fuzzy RDF graph
databases, where Q is a quantifier (relative or absolute) represented by a fuzzy
set, f 2 is the fuzzy condition “to be connected (according to the pattern P(x)) to a
vertex x 0 ”, X is the set of vertices in the graph, and f 1 denotes a fuzzy (possibly
compound) condition. Thus, graph pattern bridges the gap between user’s un-
structured query intention of a fuzzy quantified statements and the structured
fuzzy RDF data G.

Definition 5.15 (Fuzzy quantified graph pattern). A fuzzy quantified graph pattern
is a labeled directed graph defined as Q(x 0 ) = (V q , E q , L q , F q ), where
(i) V q and E q are the set of pattern vertices and the set of directed pattern edges,
respectively, as defined for data graphs.
(ii) x 0 is a vertex in V q , referred to as the query focus of Q(x 0 ), for search intent
(Bendersky et al., 2010).
(iii) L q is a function that assigns a vertex label L q (v) (resp. edge label L q (e)) to each
pattern vertex v ∈ V q (resp. edge e ∈ E q ). The label can be variable, constant,
or condition. The predicates in the condition C can be defined as a combination
of atomic formulas of the form “?x op c”, “?x op ?y” and “?x is Fterm”, where
?x, ?y ∈ variable, c ∈ (U ∪ L), op is a fuzzy or crisp comparator, and Fterm
is a predefined or user-defined fuzzy term like young (see Fig. 5.7b). One can
extend fuzzy condition to support fuzzy conjunction ∧ (resp. disjunction ∨),
usually interpreted by the triangular norm minimum (resp. maximum).
(iv) F q is a function such that for a given triple pattern tp = <x 0 , p, o> ∈ Q(x 0 ), F q (tp)
is defined as the form Quant(p), here Quant is a linguistic quantifier and p is the
predicate of the triple pattern. We refer to Quant(p) as the quantifier of triple
pattern. We used this mechanism that makes it possible to attach linguistic
quantifier to triple.

Example 5.6 An example of fuzzy quantified statements is: “Most of the recent
films that actor x starred in, are directed by young directors”. The query, denoted
by Q(?actor), that aims to retrieve every actor (?actor) such that most of the recent
films (?film) that he/she starring in are directed by young directors (?director) may be
expressed in fuzzy quantified graph pattern shown in Fig. 5.8, where ?actor is its query
188 5 Fuzzy RDF Queries

Fig. 5.7 Graphical representation of fuzzy linguistic term

Fig. 5.8 Fuzzy quantified graph pattern

focus, indicating potential actors, i.e., variables ?actor should be returned in the result
set; ?film and ?d are two variables, “?y is recent” and “?a is young” are fuzzy condition
expressions. Here edge Starring(?actor, ?film) carries a linguistic quantifier “most”,
for condition (d) above. In this query, ?actor corresponds to x 0 , ?film corresponds to
X, sub-pattern f 1 (<?film, Director, ?d>, <?d, Age, ?a>, FILTER(?a is young)) corre-
spond to f 1 and f 2 (<?actor, Starring, ?film>, <?film, Date, ?y>, FILTER(?y is recent))
correspond to f 2 , respectively.

2. Semantics of Fuzzy Quantified Graph Patterns

As mentioned earlier, a query graph pattern Q specifies the structural and content-
based constraints chosen by the user. In order to answer Q, we need to find a
subgraph (in RDF graph G) that matches Q. More formally, we can introduce
the notion of fuzzy quantified graph pattern matching over fuzzy RDF graph,
which generalizes result subgraph isomorphism with evaluation of the RDF graph
pattern (formally defined in Definition 5.15).

Definition 5.16 (Fuzzy quantified graph pattern matching). A fuzzy quantified graph
pattern Q(x 0 ) = (V q , E q , L q , F q ) is matching with a fuzzy RDF graph G = (V, E, Σ,
L, μ, ρ) with a satisfaction degree δ th (G), if there exists a bijective function φ from
U ∪ L ∪ variable to U ∪ L, such that

(i) For each vertex u ∈ V q , there exists a vertex φ(u) ∈ V, associated with a
satisfaction degree δ u = μ(φ(u)), such that L q (u) = L(φ(u)). More precisely, if
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 189

u is a constant vertex (L q (u) ∈ U ∪ L), then φ(u) is a matching vertex associated

with a satisfaction degree δ u = μ(φ(u)), and L q (u) = L(φ(u) (i.e., the mapping
maps constants to themselves)); if u is a variable vertex (L q (u) ∈ variable),
then φ(u) ∈ (U ∪ L) is the set of matches that satisfy the fuzzy condition of
the variable vertex u (if it exists), i.e., vertices φ(u) induced by all matches φ
of Q in G, in which each item is associated with a satisfaction degree δ u =
min (μ(φ(u)), δ co ). Here δ co is the satisfaction degree of the fuzzy condition C
on vertex φ(u). δ co can be defined as follows, according to the form of fuzzy
condition C:
• if C is of the form “?x op c”, then φ(?x) satisfies the condition C with
a degree of δ co = μop (φ(?x), c). Here μop is membership function of the
fuzzy or crisp comparator. In particular, crisp comparison operators have a
Boolean semantics, If the condition is evaluated to true, then the satisfaction
degree is 1, otherwise 0.
• if C is of the form “?x op ?y”, then φ(?x) and φ(?y) satisfy the condition C
with a degree of δ co = μop (φ(?x), φ(?y)).
• if C is of the form “?x is Fterm”, then φ(?x) satisfies the condition C to the
degree δ co = μFterm (φ(?x)). Here μFterm is fuzzy membership function of
the fuzzy term Fterm.
• if C is of the form C 1 ∧ C 2 or C 1 ∨ C 2 , we use the usual interpretation of
the fuzzy operator involved (minimum for the conjunction, maximum for
the disjunction).
(ii) For each edge (u, u' ) ∈ E q , there is an edge (φ(u), φ(u' )) in G' , associated with
a satisfaction degree δ e = ρ(φ(u), φ(u' )), such that L q (u, u' ) = L((φ(u), φ(u' )).
This condition imposes that (a) each edge of Q is mapped to an edge of G, and
(b) the structure of Q is preserved in its image under φ in G (that is, when φ is
applied to all the terms in Q, the result is a subgraph of G).
(iii) The final satisfaction degree δ q (G) associated with each answer is the aggrega-
tion of the satisfaction degrees produced by the matching and conditions from
(i) and (ii). We will describe in more detail in the next section.

3. Evaluation of Fuzzy Quantified Query

Given Q(x 0 ) and G, quantified graph pattern matching is to compute the set of
matches, denoted by Q(x 0 , G). Query answer is the set of all matches of x 0
in Q(x 0 , G). All matches should meet the constraints of elements (vertices and
edges), structure as well as the matching satisfaction degree. Among them, the
core operation of quantified graph pattern matching is computing the satisfaction
degree of the result to a query matching. In general, the satisfaction degree of a
match can be computed by aggregating the relative possibilities in the match.
Formally, a linguistic quantified statement (Type I) is a statement of the form “Q of
X are f 1 ”, where Q belongs to a class of concepts which Zadeh (Yager, 1993; Zadeh,
1983) called linguistic quantifiers, X is a data collection, and f 1 is a fuzzy expression
of a linguistic concept. From the perspective of a pattern graph, Q is a quantifier
190 5 Fuzzy RDF Queries

(relative or absolute) on edge (x, x 0 ) represented by a fuzzy set; X designates the

fuzzy set of vertices connected, according to the pattern edge (x, x 0 ), to the vertex vx
in the graph; and f 1 is represented also by a fuzzy set and denotes fuzzy condition,
defined over the domain of some attribute associated with the graph. The semantics of
matching is “Q vertices represented by a fuzzy set X, that are connected according to
a pattern edge e to a vertex x 0 , satisfy fuzzy condition f 1 ”. Calculating the satisfaction
degree raises the problem of determining the cardinality of the set of elements from
X which satisfy f 1 . Since f 1 is fuzzy predicate, this cardinality cannot be established
precisely, and computing the quantification corresponds to establishing the value of
function μQ for an imprecise argument.
For each element that matching the query focus of Q(x 0 ), the procedure for
obtaining the satisfaction degree of a linguistic quantified graph pattern in the face
of the graph G is as follows:
(i) For each x i ∈ X, calculate μ f1i (xi ), the satisfaction degree to which x i satisfies
the condition f 1 .μ f1i (xi ) is obtained by aggregating (e.g., the minimum) all the
fuzzy degrees associated with vertices, edges and fuzzy conditions of subgraph
corresponding to condition f 1 . This step aims to determine the cardinality of
the fuzzy set X and calculate the satisfaction degrees μ f1 according to fuzzy
conditions f 1 . Its result is a set of elements {(μ f11 /x1 ), . . . , (μ f1n /xn )}, where
μ f1i is the Σ
satisfaction degree associated with the element x i .
(ii) Let r = n1 xi ∈X μ f1i (xi ), the proportion of objects in the X that satisfy f 1 .
(iii) Finally, the final satisfaction degree of r is calculated as follows:
⎛ ⎞
1 Σ
μ(A) = μ Q ⎝ μ f (xi )⎠ (5.1)
n x ∈X 1i
i

Another type of linguistic quantified statement is of the form “Q of f 2 X are f 1 ”.

In this case we are only saying something about a subpopulation of the data (satisfy
f 2 ). The procedure for calculating satisfaction degree is the same as above except in
step two. In this case, calculate μ f1i (xi ), we calculate r as the proposition of the f 2
objects in X that satisfy f 1
Σ
(μ f1i (xi ) ∧ μ f2i (xi ))
xi ∈X
r= Σ
xi ∈X μ f 2i (x i )

where “∧” denotes the minimum operator and μ f2 is the satisfaction degree to which
x i satisfies the condition f 2 . μ f2 is obtained similar to μ f1 , which aggregates all the
satisfaction degrees associated with elements corresponded to f 2 and its result is a set
of elements {(μ f21 /x1 ), . . . , (μ f2n /xn )}. Then, the final satisfaction degree associated
with each answer A can be calculated as follows:
(Σ )
xi ∈X (μ f 1i (x i ) ∧ μ f 2i (x i ))
μ( A) = μ Q Σ (5.2)
xi ∈X μ f 2i (x i )
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 191

Fig. 5.9 An excerpt of IMDB’s RDF knowledge graph

Note that the basic validity criterion, i.e., the truth of (5.1) and (5.2), is certainly
the most important, but it does not grasp all aspects of a linguistic statement.

Example 5.7 Let us consider the fuzzy quantified graph pattern Q(?actor) of
Example 5.6. We evaluate this matching query according to the graph G of Fig. 5.9.
To interpret Q(?actor), we first retrieve “the actors (?actor) who starring at least one
recent film (?film) (corresponds to f 2 ), possibly directed by young director (corre-
sponds to f 1 )”. This query returns a list of mappings of actor variables (?actor) with
their starring films (?film), along with their respective satisfaction degrees.
( )
μ f2 = min ρ Starring ( ?actor, ?film), ρ Date (?film, ?y), μr ecent (?y) and
( )
μ f1 = min ρ Driector (?film, ?d), ρ Age (?d, ?a), μ young (?a)

where μrecent and μyoung are membership function associated with the fuzzy terms
young and recent of Fig. 5.6a and b.

In this example, query concerns two actors Vin. Diesel and Chris Partt. More
specifically, for pattern edge e = Starring(x 0 , ?film), when x 0 is mapped to Vin.
Diesel, who starred in 3 films including Fast & Furious, Guardian of the Galaxy 2
and Riddick 3. Similarly, when x 0 is mapped to Chris Partt, he starred in the film
Guardian of the Galaxy 2. In contrast, Jason Statham do not belong to the result set
because he did not starring any somewhat recent films. The result set is as following:

{(?actor → Vin. Diesel, ?film → Fast & Furious, μ f 1 → 0.25, μ f 2 → 0.5),

(?actor → Vin. Diesel, ?film → Guardian of the Galaxy 2, μ f 1 → 0.13, μ f 2 → 0.5),
(?actor → Vin. Diesel, ?film → Riddick 3, μ f 1 → 0, μ f 2 → 0.34),
(?actor → Chris Partt, ?film → Guardian of the Galaxy 2, μ f 1 → 0.13, μ f 2 → 0.5)}.
192 5 Fuzzy RDF Queries

Lastly, assuming for the sake of simplicity that most(x) = x, the final matching
result, given by Formula (5.2), is Q(?actor, G) = {0.28/Vin. Diesel, 0.26/Chris Partt}.

5.4.3 Fuzzy Quantified Graph Patterns Matching

In this section, we discuss implementation issues related to our proposal. We firstly

build an indexing structure that containing information of fuzzy RDF data graph.
Then, we introduce a general algorithm for quantified graph patterns matching over
a fuzzy RDF graph. In particular, we need to produce answers with satisfaction
degree.

5.4.3.1 Data Preprocessing

The graph traverse algorithm is very time-consuming, since it is made in every user
interaction. Thus, we need to build an indexing structure that containing information
about vertices and edges in fuzzy RDF data graph. The graph indexing is executed
only one time, independently of the user interaction. For subgraph isomorphism
algorithm, we tune the disk representation of a data graph to support fast retrieval
and construction of its main memory data structures.
Our method represents a fuzzy RDF graph using three structures: (i) vertex label
list that allows access to the destination vertex label of a vertex and its corresponding
membership degree by a given ID. And we implemented ad-hoc data structures which
offer appropriate access to membership degree in vertex label list (see Fig. 5.10a).
In order to increase efficiency of query processing, we build a B+ tree for storing all
distinct vertex labels along with their frequencies; (ii) inverse vertex label list that
allows access to the vertex ID list by a given vertex label (see Fig. 5.10b). Note that
we implement the list of inverse vertex label in the RDF graph database to speed
up, although it can be constructed from the list of vertex labels; and (iii) adjacency
lists (see Fig. 5.10c) of each vertex which store adjacency information, i.e., a list of
triples (vertex ID, edge label, edge membership degree) ordered by the vertex ID.

5.4.3.2 Implementation Algorithms

Quantified graph pattern matching is essentially finding an isomorphism from a

query graph Q to a fuzzy RDF data graph G. A feasible method for evaluating
quantified graph patterns, i.e., enumerating all RDF isomorphism from the given
quantified graph patterns into the data graph, is based on a backtracking strategy
which incrementally finds partial solutions by adding joinable candidate vertices or
abandoning them when it determines they cannot be completed. In the following,
we provide an algorithm, denoted by QM, for quantified graph pattern matching. It
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 193

Fig. 5.10 Data structures of the fuzzy RDF graph

extends existing the algorithms in (Bendersky et al., 2010) for conventional subgraph
isomorphism, to incorporate quantifier checking and calculate satisfaction degree.
Before we describe the detailed workings of the algorithm, a few notational defini-
tions are in order: (i) Induced pattern of Q(x 0 ), denoted by Qπ (x 0 ), is a conventional
pattern graph, which is obtained by striping quantifier F q (tp) off from a quantified
graph pattern Q(x 0 ). (ii) M e (vx , v, Q) = {v' |φ ∈ Qπ (G), φ(x 0 ) = vx , φ(e) = (v, v' )},
the set of children of v via e and Q, i.e., the set of children of v that match u' when u
is mapped to v, subject to the constraints of Qπ , here e = (u, u' ) ∈ Q(x 0 ), vx , v ∈ G.
(iii) M e (v) = {v' |(v, v' ) ∈ G, L(v, v' ) = L q (e)}, the set of the children of v connected
by an e edge. We denote by Q(G) the set of all matches (isomorphic mappings) φ of
Q in G.
1. Pattern-Match Algorithm
Given a quantified graph pattern Q and a fuzzy RDF graph G, Algorithm QM is
to retrieve all entities that possibly correspond to x 0 , denoted as Q(x 0 , G). Each
item in Q(x 0 , G) is associated with a satisfaction degree. We will briefly present
each step as following:
(i) QM first initializes Q(x 0 , G), as well as a partial match M as a set of vertex
pairs (line 1). Each pair (u, v) in M denotes that a vertex from G matches
a pattern vertex u in Q.
(ii) Each vertex u in Q(x 0 , G) has a list C(u) of candidate vertices in RDF
graph G. QM next initializes the candidate set C(u) and auxiliary structures
with Filtercandidate (lines 2–4). QM maintains auxiliary structures for each
vertex v in C(u) as follows: (i) a Boolean variable B(u, v) indicating whether
v is a match of u via isomorphism from Q(x 0 ) to G, (ii) a variable δ(u, v) that
194 5 Fuzzy RDF Queries

represents the satisfaction degree of pattern vertex v matching u, and (iii)

a vector R, where entry R(v, e) for an edge e = (u, u' ) in Q is a pair <c(v, e),
U(v, e)>, in which c (resp. U, initialized as M e (v)) records the current size
(resp. an estimate upper bound for) |M e (vx , v, Q)|. For each pattern vertex u
in Q(x 0 ), it initializes (i) C(u) with vertices v satisfying the condition (i.e.,
C(u) ⊆ V such that L(u) ⊆ L(v)) by the rule in Definition 5, and (ii) B(u,
v) = ⊥ (let ⊥ be a symbol to indicate that the variable has no value at this
time), c(v, e) = 0 and U(v, e) = |M e (v)| for each e = (u, u' ) in Q. It removes
v from C(u) if U(v, e) does not satisfy the quantifier of e. If C(u) is empty,
we can safely exit, making early termination possible (Line 5).
(iii) After that, QM invokes a recursive procedure, SubMatch, to find mapping
pairs of a query vertex and matching data vertices at a time (line 6). QM
computes and returns query answer Q(x 0 , G) from mappings φ ∈ Q(G) by
definition (lines 7–8).

Algorithm 5.8: QM
Input: pattern Q(x 0 ), graph G
Output: the answer set Q(x 0 , G).
1: Q(x 0 , G) ← { }; Q(G) ← { }; M ← { };
2: for each u of Q do
3: C(u) ← Filtercandidate(Q, G, u);
4: B(u, v) ← ⊥, δ(u, v) ← 0, c(v, e) ← 0, U(v, e) ← |M e (v)|;
5: if C(u) = φ; then return φ;
6: SubMatch(Q, G, M, Q(G));
7: for each isomorphic mapping φ ∈ Q(G) do
8: Q(x 0 , G) ← Q(x 0 , G) ∪ {φ(x 0 )};
9: return Q(x 0 , G);

We next present function SubMatch, which takes as parameters a query graph Q,

a data graph G, and a partial match M and returns all matches of Q in G by adopting
dynamic search.
(i) In each recursive call of SubMatch, once the size of M equals to the number of
query vertices, i.e., each vertex u in Q has a match in M, it checks whether M
is a valid isomorphism (line 1). If so, a solution is found and it updates B(u, v)
= true for each pair (u, v) ∈ M, and increases the counter c(v, e). It then checks
whether the counters satisfy the quantifiers of Q. If so, it adds vx to Q(x 0 , G) (line
2). Otherwise, the algorithm calls NextQueryVertex to select a query vertex u ∈
Vq which is not yet matched (Line 3). NextQueryVertex returns the next query
vertex to match according to the query vertex matching order.
5.4 Fuzzy Quantified Query Over Fuzzy RDF Graph 195

Function SubMatch(Q, G, M, Q(G))

1. if Verify(M) then
2. Q(x 0 , G) ← Q(x 0 , G) ∪ {φ};
3. else u ← NextQueryVertex (Q);
4. for each v ∈ C(u) not matched in M do
5. if IsExtend(Q, G, M, u, v) then
6. M ← M ∪ {(u, v)};
7. SubMatch(Q, G, M, Q(G));
8. RestoreState (M, u, v);
9. return;

(ii) After that, it calls RefineCandidates to obtain a refined candidate vertex set CR
from C(u) by using algorithm-specific pruning rules (Fan et al., 2016).
(iii) Next, for each candidate data vertex v such that v is not matched yet, the
IsExtend subroutine checks whether the edges between u and already matched
query vertices of Q have corresponding edges between v and already matched
data vertices of G (Line 5). IsExtend is the final verification to determine
whether the candidate vertex can be added to the partial solution. Given a
selected pattern vertex u', a candidate v ∈ C(u), and an edge e = (u, u' ) with
quantifier, IsExtend dynamically finds best vertices (recorded in a heap S P (u' ))
from C(u' ) that are children of v (lines 4–5). If v is qualified, it is matched
to u, SubMatch then updates status information by adding the newly matched
pair (u, v) into M (Line 6), and recursively conducts the next level of search by
calling SubMatch to match the remaining query vertices of q (Line 7). It keeps
a record of M and a cursor to memorize the candidates in S P for backtracking,
using a stack. When backtracking to a candidate v ∈ S P (u) from a child v' of
v, SubMatch restores M and the cursor by calling RestoreState, which restores
the partial match state by removing (u, v) from M (line 8). It next dynamically
updates S P (u). (a) If B(u' , v' ) = false, it reduces U(v, e) by 1. (b) It applies
the selection and pruning rules to C(u) using the updated potentials w.r.t. the
changes in (a). If the upper bound U(v, e) fails the quantifier of e, v is removed
from C(u) and S P (u) without further verifying its other children. Otherwise, it
picks a new set S P (u) of candidates with top potentials. The recursion terminates
when all possible matches are found (i.e., when |M| = |V q |).

2. Selection and Pruning Rules

The performance of DM can be further improved by selection and pruning rules.
Fan et al. (2016) show that the number of verifications in algorithm QM can be
further reduced, by applying the following selection and pruning rules.
(i) The pruning rules are applied each time when SubMatch backtracks from
a candidate v' ∈ S P (u' ) to a match v ∈ S P (u) for an edge e = (u, u' ).
• local pruning rule: If the dynamically maintained upper bound U(vx , e) of
each vx ∈ C(x 0 ) fails the quantifier of e (because of a verified non-match
v' as its child), then removes v from C(u) and S P (u).
196 5 Fuzzy RDF Queries

• global pruning rule: monitors the size of current candidates C(u' ),

excluding those vertices verified to be nonmatches (i.e., B(u' , u) = false).
(ii) The selection rule is applied each time when SubMatch extends the search
to a next level, at a match v ∈ S P (u) for edge e = (u, u' ). Denote as P(u' ) the
parent set of u' in G, the potential of a match v' ∈ C(u' ) is
( ) Σ U (v ' , e)
|P(v ' ) ∩ C(u)|
1+ ∗ .
|C(u)| ∀e=(u ' ,u '' )
pe

The rule picks candidates with highest potential scores.

In this section, we focus on quantified matching algorithm by extending graph
isomorphism for answering such queries over fuzzy RDF graphs. The approach is
based on a backtracking technique that unifies conventional pattern matching and
quantifier verification in a generic search process. In addition, we further optimize
the algorithm by applying the selection and pruning techniques.

5.4.3.3 Correctness and Complexity

Proposition 5.3 Algorithm QM is correct and complete for enumerating all RDF
isomorphism from a given pattern graph into a fuzzy RDF graph.
Proof To show the correctness of DM, first observe that DM always terminates.
Indeed, DM follows the verification process of conventional subgraph isomorphism
algorithm. The process, in the worst case, enumerates all possible isomorphism
mappings from the stratified pattern Qπ to G, which are finitely many. Hence DM
terminates.
We next show that DM correctly verifies whether a candidate vx is a match of x 0
in Qπ via an isomorphism φ ∈ Qπ (G).
(i) When DM terminates, for each u ∈ Q and every candidate v in C(u) with B(u, v)
= true, v = φ(u) for some φ ' ∈ Qπ (G), guaranteed by the correctness of Match.
(ii) For each edge (u, u' ) in Q and a vertex v with B(u, v) = true, DM correctly verifies
the quantifiers by checking the updated local counter of v that keeps track of the
current |M e (φ(x 0 ), φ(u),Q)|. In addition, DM waits until either v is determined
not a valid match due to that the upper bound fails the quantifier (by the local
pruning rule), or the lower bound satisfies the quantifier (in the verification).
Hence, vx is a match if and only if vx ∈ Q(x 0 , G) when DM terminates.
Algorithm QM correctly computes Q(x 0 , G) following the definition of quantified
matching. We further analytics its time and space complexity.
For its time complexity, QM is a fuzzy quantified subgraph patterns matching
process, which includes subgraph search process and evaluation of fuzzy quantifier.
Subgraph search process is an extension of the traditional subgraph isomorphism. It
has the same time complexity as conventional subgraph pattern matching algorithms,
5.5 Extended SPARQL for Fuzzy RDF Query 197

and fuzzy conditions and linguistic quantifier checking are incorporated into the
search process.
Let us consider the evaluation of a fuzzy quantified subgraph patterns matching,
which includes z occurrences of fuzzy terms, over a graph database G. We denote
by A the set of answers of Q over G. Computing A is a conventional subgraph
pattern isomorphism problem, which is intensively studied in literature. In our case,
we assume that the time complexity of the graph matching is t(T ). Computing the
satisfaction degree of each fuzzy conditions. This is done in O(|A| × 2 × |z|). Put
together, QM takes O(t(T ) + |A| × 2|z|) time in total, where t(T ) is the time complexity
of a matching algorithm T for conventional subgraph isomorphism. It seems obvious
that this way that permit to introduce flexibility in the graph pattern are strongly
dominated in complexity by the subgraph evaluation. Due to small |A| and |z|, QM
and T have comparable performance, i.e., QM has the same time complexity as
conventional subgraph pattern matching algorithms.
For the space complexity, QM needs O(|V|) space to store the auxiliary structures
of the vertices in G. During the search process, QM maintains at most pm best matches
to be verified at each level of the search, where pm is the largest constant in quantifiers.
Due to a total |Q| search levels, QM requires a total of O(pm |Q| + |V|) space.

5.5 Extended SPARQL for Fuzzy RDF Query

With the increasing amount of fuzzy RDF data which is becoming available, the way
we query fuzzy RDF data is a crucial subject for supporting knowledge graph appli-
cations in various domains (Pivert et al., 2016a). SPARQL (Prudhommeaux, 2008),
the official W3C recommendation as an RDF query language, provides basic func-
tionalities in order to query RDF data through graph patterns. But classical SPARQL
lacks of some expressiveness and usability capabilities to deal with vagueness and
imprecise aspects as it follows a Boolean querying of crisp RDF data. As a result, the
need to query about the structure and vagueness information in fuzzy RDF knowl-
edge graph applications, has motivated research into extending SPARQL languages
to be more expressive than before.
Some works (Alkhateeb et al., 2009; Anyanwu et al., 2007; Kochut & Janik,
2007; Pérez et al., 2010) extend SPARQL by allowing to query crisp RDF through
graph patterns using regular expressions but they do not address the fuzziness in
their approaches. In order to make the expression of flexible queries, a variety of
proposals, such as f-SPARQL (Cheng et al., 2010) and SPARQLf (Ma et al., 2015),
introduce fuzzy terms and fuzzy operators into FILTER expression of SPARQL
queries. However, these works only considers crisp RDF graph. As far as fuzzy RDF
graphs are concerned, some extended SPARQL query languages already exist. Angles
and Gutierrez (2016) propose FURQL (Fuzzy RDF Query Language), a SPARQL
extension with navigational capabilities for querying fuzzy RDF data through fuzzy
graph path patterns by using regular expressions. Fuzzy conditions can be also used
to express fuzzy preferences on data. Almendros-Jiménez et al. (2017) propose
198 5 Fuzzy RDF Queries

FSA-SPARQL (Fuzzy Sets and Aggregators based SPARQL), which is focused on

extending SPARQL with fuzzy aggregators, in such a way that the user can play
with them to express preferences. However, the proposed fuzzy version of RDF in
these works is able to express vague and imprecise knowledge about membership
relationships. In other words, these works can only address the query problems of
triple-level fuzzy RDF graph. In the element-level fuzzy RDF graph, there is a need
to get the degree of a vertex and use it in a query. This motivates us to investigate
fuzzy query techniques suitable for element-level fuzzy RDF knowledge graph data.
In this section, we present an extended querying language (called ef -SPARQL:
element-level fuzzy SPARQL) for element-level fuzzy RDF graph with support of the
full SPARQL fragment. It possible to express regular expression patterns instead of
predicates in the RDF triple, which allows to express preferences on data through
fuzzy conditions and on the structure of the data graph through fuzzy regular expres-
sion. The input pattern follows the form of our proposed SPARQL extension is
different with the work (Li et al., 2019). The main differences are as follows: Firstly,
a truth degree can be associated to triple patterns. Secondly, a truth degree can be
associated to a given variable in the pattern of my query. In other word, we can get
the degree of a vertex and use it in a query. We carefully defined the syntax and
semantic of an extension of the query pattern graph that makes it possible to express
and interpret such queries. Finally, we present a case study to explain our proposed
methods.

5.5.1 The Fuzzy Query Language

In this section we are ready to extend SPARQL for querying fuzzy RDF knowledge
graph. We first introduce the graph pattern to enrich standard SPARQL graph pattern
with regular expressions and fuzzy conditions. Then we define the evaluation of the
query pattern in the proposed fuzzy RDF graph.
1. Fuzzy SPARQL Graph Pattern
Before giving the formal definition of fuzzy graph pattern, we first introduce the
concepts of fuzzy regular expressions and fuzzy conditions.
A regular expression is a property path, as specified in SPARQL 1.1 (Harris &
Seaborne, 2013). Let Rex be a path regular expression, and it can be constructed
inductively as Rex = u|R1 · R2 |R1 |R2 |R+ . Here u denotes either an edge label or
a wildcard symbol * matching any label in U, R1 · R2 denotes a concatenation
of expressions, R1 |R2 denotes disjunction and is an alternative of expressions, R+
denotes one or more occurrences of R.
A fuzzy condition is a logical combination of fuzzy terms which can be a constant
c, a variable ?x, or a fuzzy condition C defined as the form “bound(?x)”, “truth(?x)”,
“?x op c”, “?x op ?y” and “?x = Ft”. Here ?x, ?y ∈ VAR, c ∈ (U ∪ L), truth(?x) is
the truth degree of the variable ?x, op is a fuzzy or crisp comparator (e.g., <, ≤, =
, ≥, >, /=), and Ft is a predefined or user-defined fuzzy term like high, long, young
5.5 Extended SPARQL for Fuzzy RDF Query 199

and so on. If C 1 and C 2 are fuzzy condition, then ¬C 1 , C 1 ∧ C 2 , and C 1 ∨ C 2 are

fuzzy conditions. When C 1 and/or C 2 are fuzzy atomic value constraints, the ¬, ∧
and ∨ need equally to be extended to fuzzy logic. To simplify the discussion, we
focus on fuzzy conditions in the simple form given above. Many other fuzzy terms,
fuzzy operators and fuzzy relationship that may be used for expressing different
kinds of fuzzy conditions and user preferences, are described in Pivert et al. (2016b),
Almendros-Jiménez et al. (2017), Ma et al. (2015).
We introduce the notion of fuzzy SPARQL graph pattern, which is a fuzzy exten-
sion of the graph pattern notion introduced in Hayes (2004), Pivert et al. (2016b). A
fuzzy graph pattern allows to express fuzzy preferences on the vertices and edges of
a fuzzy RDF graph (through fuzzy conditions) and on the edge (or path) structure of
the graph (through fuzzy regular expressions).

Definition 5.17 (Fuzzy SPARQL Graph Pattern). Let α, τ 1 , τ 2 be fuzzy degree

variables, β a fuzzy degree value and Rex a path regular expression, then a fuzzy
SPARQL graph pattern expression is defined recursively as follows:
• A triple pattern t = <s' : τ 1 , p' , o' : τ 2 > associated to a truth degree variable α ∈ [0,
1] is a fuzzy SPARQL graph pattern, written as <t>: α, where s' ∈ {s, ?s}, p' ∈
{Rex, ?p} and o' ∈ {o, ?o}.
• If P1 and P2 are fuzzy SPARQL graph patterns, then expressions (P1 AND P2 ),
(P1 OPTIONAL P2 ), and (P1 UNION P2 ) are fuzzy SPARQL graph patterns.
• If P is a fuzzy SPARQL graph pattern and C is a fuzzy condition, then the
expression (P FILTER C [WITH β]) is a fuzzy SPARQL graph pattern.

Formally, a fuzzy RDF triple pattern has the form <t>: α. Here, α represents the
degree in which subject s' has property p' with value o' or subject s' and object o'
have a relationship p' . τ 1 and τ 2 represent the truth degree of subject s' and object o' .
Although they do not provide any additional information, we allow users to query
and use these truth degree variables. Furthermore, the optional parameter [WITH β]
indicates the condition that must be satisfied as the minimum membership degree in
[0, 1]. It is required in (Alkhateeb et al., 2009) that users need to choose an appropriate
value of β to express their requirements. If not specified, 1 is used as default.
2. Fuzzy Extension of SPARQL Query Language
In order to query fuzzy RDF knowledge graphs, we extend the declarative query
language SPARQL. Syntactically the extension naturally extends the SPARQL
one, by allowing the occurrence of fuzzy graph patterns in the WHERE clause
and the occurrence of fuzzy conditions in the FILTER clause. Its simple syntax
is given as follow:
SELECT… # Result variables
FROM … #Fuzzy RDF Dataset
WHERE … #Fuzzy RDF Graph patterns
FILTER … [WITH value] … #Value-constraints
[THRESHOLD value]
200 5 Fuzzy RDF Queries

The language is an extension of standard SPARQL in which triple patterns subject-

property-object: <s, p, o>: α are allowed. The query asks about a set of subgraphs that
match the fuzzy RDF query patterns in the WHERE statement. The argument of the
FROM statement is a fuzzy RDF repository. The argument of the SELECT statement
is a list of variables (e.g. ?x, ?y) and truth degree variables (e.g. α). Here, the truth
degree variable can be the truth degree of the triple pattern or the truth degree of the
variable (vertex). The argument of the FILTER statement is a list of value constraints
that should be verified by the query result with a degree equal to a value defined as
an argument of the WITH statement. In other words, the WITH statement gives the
truth degree of the corresponding fuzzy value constraint. When we omit the WITH
part, we obtain boolean value constraints. Moreover, a threshold of truth degree can
be associated to a given variable in the FILTER statement, which allows to perform
a cut on a particular variables of the answers. The argument of the THRESHOLD
statement performs an alpha-cut on the answers (only those having a degree of truth
greater or equal to threshold are kept). The THRESHOLD clause also is of course
optional. Standard SPARQL is a sublanguage of this extension language and thus
triple patterns can still be <s, p, o>. This is because not all the predicates have to be
fuzzy.

Example 5.8 The following query looks for the recent (importance 0.6) thriller
movies in which American actors be the leading role.
SELECT ?Movie ?Actor ?l
WHERE {
?Movie Release Date ?Date.
(?Movie starring · nationality “America”): ?l.
?Movie genre “thriller”.
FILTER (?Date = recent) WITH 0.6}
THRESHOLD 0.6

Here, ?Movie and ?Date are variables, “staring · nationality” is a regular expres-
sion, and ?l represents the degree in which the American actor (?Actor) is the leading
role of the movie (?Movie). Furthermore, “?Date = recent” is a fuzzy condition
expression.
3. Fuzzy SPARQL Graph Pattern Evaluation
A fuzzy SPARQL query defines a fuzzy graph pattern to be matched against a
given fuzzy RDF graph. Intuitively, given a fuzzy RDF data graph G, the seman-
tics of a graph pattern P defines a set of matching, where each matching (from
variable of P to URIs and literals of G) matches the pattern to a homomorphism
subgraph of G (Pivert et al., 2016b). For introducing such a concept, the notion
of matching of a regular expression must first be defined.

Definition 5.18 (Regular expression matching of a path): Let pa = (<s1 , p1 , o1 >, …,

<on , pn , on >) ⊆ G be a path of a fuzzy RDF knowledge graph G and Rex be a regular
expression. pa matches a regular expression Rex with a degree of truth, δ Rex (pa),
5.5 Extended SPARQL for Fuzzy RDF Query 201

defined as follows, according to the form of Rex (in the following, R, R1 and R2 are
regular expressions):
• Rex is of the form u with u ∈ U (resp. “*”). If pi is u (resp. any u ∈ U) then
δ Rex (pa) = ρ(pi ) else δ Rex (pa) = 0.
• Rex is of the form R1 · R2 . We denote by P the set of all pairs of paths (p1 , p2 ) such
that p is of the form p1 p2 . One has δ Rex (pa) = max P (min(δ R1 ( p1 ), δ R2 ( p2 ))).
• Rex is of the form R1 |R2 . One has δ Rex (pa) = max(δ R1 ( p1 ), δ R2 ( p2 )).
• Rex is of the form R+ . Let PA be the set of all tuples of paths (p1 , …,
pn ) (n > 0) such that p is of the form p1 … pn . One has δ Rex (pa) =
max P (min(δ R ( p1 ), . . . , δ R ( pn ))).

As we can see when a regular expression matches with a path of fuzzy RDF
knowledge graph, we consider the degree of truth associated to edge of fuzzy RDF
graph. Next, we will discuss the issue of interpretation of fuzzy conditions. And we
will consider the degree of truth associated to the vertex in the fuzzy RDF graph.
In fact, we define conditions on these degrees of truth with value constraints in the
FILTER statement.

Definition 5.19 (Interpretation of fuzzy condition): Let π be a mapping and C be

a fuzzy condition. Then π satisfies the fuzzy condition with a degree of truth δ co
defined as follows, according to the form of C:
• if C is of the form “truth(?x) ≥ γ ”, then π satisfies the condition C with a degree
of δ co = min(μ(π (?x)), γ ). Here γ ∈ [0, 1], is a threshold of truth degree. Users
need to choose an appropriate value of γ to express their requirements.
• if C is of the form “?x op c”, then π satisfies the condition C with a degree of δ co
= min(μ(π (?x)), μop (π (?x), c)). Here μop is membership function of the fuzzy
or crisp comparator. In particular, crisp comparison operators have a Boolean
semantics, if the condition is evaluated to true, then the degree of truth is 1,
otherwise 0.
• if C is of the form “?x op ?y”, then π (?x) and π (?y) satisfy the condition C with
a degree of δ co = min(μ(π (?x)), μop (π (?x), π (?y)), μ(π (?y))).
• if C is of the form “?x = Ft”, then π (?x) satisfies the condition C to the degree
δ co = min(μ(π (?x)), μFt (π (?x))). Here μFt is fuzzy membership function of the
fuzzy term Ft.
• if C is of the form ¬ C 1 , C 1 ∧ C 2 or C 1 ∨ C 2 , then the degrees of truth can be
defined as: δ co (¬C 1 ) = 1 − δ co (¬C 1 ), δ co (C 1 ∧ C 2 ) = min(δ co (C 1 ), δ co (C 2 )) and
δ co (C 1 ∨ C 2 ) = max(δ co (C 1 ), δ co (C 2 )), respectively.

Definition 5.20 (Evaluation of a fuzzy graph pattern): The fuzzy RDF graph eval-
uation of a fuzzy SPARQL graph pattern over G, denoted by [[P]]G is recursively
defined by:
• if P is of the form of a fuzzy triple graph pattern <t>: α, denoted by <s, p, o>: α,
then [[P]]G = {π | dom(π ) = var(t) ∧ π (t) ∈ G} and α = ρ(p).
202 5 Fuzzy RDF Queries

• if P is of the form of a fuzzy triple graph pattern <t>: α, denoted by <?s, Rex, ?o>:
α, then [[P]]G = {π | dom(π ) = var(t) ∧ π (t) ∈ G} and α = δ Rex (Rex), τ 1 =
truth(π (?s)), τ 2 truth(π (?o)).
• if P is of the form P1 AND P2 , then [[P]]G =[[P1 ]]G ▷◁ [[P2 ]]G .
• if P is of the form P1 OPTIONAL P2 then [[P]]G =[[P1 ]]G ⟕ [[P2 ]]G .
• if P is of the form P1 UNION P2 then [[P]]G =[[P1 ]]G ∪ [[P2 ]]G .
• if P is of the form P1 FILTER C then [[P]]G = {π ∈ [[P]]G |π n C} which denotes
the set of mappings in [[P]]G that satisfy C with a degree ≥ β.

Example 5.9 Let us consider the fuzzy SPARQL query of Example 5.8. We evaluate
this query according to the fuzzy RDF graph G of Fig. 5.1. The query also specifies a
threshold δ t (δ t = 0.6 in the example), to indicate that only matches with possibility
larger than δ t should be returned. The matching process is depicted as follows.

Intuitively, this pattern retrieves the list of movies in G, and the matching value of
?Movies is potentially Up in the Air, The Quest and Money Monster. The actors in
the three movies are Vera Farmiga, George Clooney and Julia Roberts respectively.
Because four paths p1 = Up in the Air—starring—George Clooney—nationality—
America, p2 = The Quest—starring—George Clooney—nationality—America, p3
= Money Monster—starring—George Clooney—nationality—America, and p4 =
Money Monster—starring—Julia Roberts—nationality—America match the regular
expression “staring · nationality”. The degrees of truth are δ re (p1 ) = 0.8, δ re (p2 ) =
0.9, δ re (p3 ) = 0.8, and δ re (p4 ) = 0.5. However, the genre of movies Up in the Air
and The Quest are “Romance” and “Fantasy” respectively. Moreover, if we suppose
that μrecent (2016) = 0.65, the degree of truth of “?Date = recent” is 0.65. So, Money
Monster is the only movie which is a thriller movie with degree of truth δ u (“Thriller”)
= 0.9. Thus, we obtain the following answers:

π1 = {?Movie → Money Monster, ?Actor → George Clooney, ?1 → 0.8}

π2 = {?Movie → Money Monster, ?Actor → Julia Roberts, ?1 → 0.5}

As the degree of truth of the final query result is the minimum of degrees of truth
induced by the results described above. There are δ 1P (G) = 0.65 and δ 2P (G) = 0.5 in
this example. So, π1 satisfies the minimum degree of truth threshold constraint and
is the only answer.

5.5.2 Implementation Issues

The most important and difficult part in the fuzzy SPARQL query language is the
fuzzy RDF evaluation of the query pattern P over a fuzzy RDF graph G. Actually, each
SPARQL query can be represented by a graph pattern. RDF graph pattern matching
in SPARQL is essentially enumerating all PRDF homomorphism from the patterns
graph into the data graph G. PRDF homomorphisms extend graph homomorphisms
5.5 Extended SPARQL for Fuzzy RDF Query 203

to deal with vertices connected with regular expression patterns, that can be mapped
to vertices connected by paths, rather than edge-to-edge mappings. As a result, any
SPARQL query can be equivalently transformed into a subgraph query problem,
which locates the subgraph in RDF data graph matching with the query graph. We
propose in Algorithm 5.8 a backtracking technique Q for the processing of a fuzzy
query pattern P over a fuzzy RDF graph G. The method generates each possible map
from the current one by traversing the parse tree in a depth-first manner. In particular,
we need to produce answers with truth degree of the query patterns.

Algorithm 5.9: Pattern-Match (P, G, µp )

Data: an RDF graph pattern P, a fuzzy RDF graph G, and a partial map π p .
Output: extends the partial map to a set of RDF homomorphism.
1. if Complete (π p ) //|μp | == |V p |
2. return solution-Found (π p );
3. u ← ChooseTerm(V p ); // pick a vertex u of V p ;
4. for each <v, π > ∈ Candidates (π p , u, G, P) do
5. Pattern-Match (P, G, π p ▷◁ {(<u, v>, δ)}▷◁ π );

Algorithm 5.9 describes the framework for a pattern match query Q over a fuzzy
RDF graph G, which is a recursive version of the basic backtrack algorithm. The
input of this algorithm is: an RDF graph pattern Q, an RDF graph G, and a partial
map μp , which includes a set of pairs {(<u, v>, δ)} such that u is a term of Q, v is the
image of u in G and δ is a truth degree associated with the mapping. If we call this
algorithm with (Q, G, μø ), where μø is the map with the empty domain, then it can
output all homomorphisms from the pattern graph Q into the fuzzy RDF graph G.
Specifically, we define in the following the operations used in the algorithm:
Complete (π p ) checks if each term u ∈ V p is mapped to a term in G. It returns
TRUE if all u ∈ V P are mapped, and FALSE otherwise.
ChooseTerm(V p ) chooses a term u ∈ V p to obtain a possible homomorphism.
Candidates (π p , u, G, P) calculates all possible candidate maps in G for the current
term u satisfying the partial map π p . It returns all sets of pairs <v, π >, where v is a
possible map of u, and π is the possible map from a term of regular expression pattern
Ri appearing in a triple with u to a term in V p already mapped in π p .
After that, the procedure takes each candidate v of the current term u ∈ V p and
the possible map π, puts v in the mapping pairs and tries to generate the possible
candidates of v. This is done recursively in depth-first by calling function Pattern-
Match (note that π p , {(<u, v>, δ)}, and π are compatible since the set <v, π > is
calculated with respect to π p ). Finally, we have a tree that contains one level with
a term from P, i.e., a vertex from P, and one level with the possible images of that
term in G. The input to each vertex of each level is the current map. Each possible
path in the tree from the root to a leaf labeled by a term of G represents a possible
homomorphism.
204 5 Fuzzy RDF Queries

Proposition 5.4 Algorithm 5.9 is correct and complete for enumerating all RDF
homomorphism from a given SPARQL graph pattern into a fuzzy RDF graph.
Proof We can prove this by means of induction. The set of all homomorphism
is complete for the empty set at the beginning of the algorithm. Because Candidates
(Li et al., 2019) is complete, and the number of vertices being finite, the partial
homomorphism, i.e., μp , are completely extended for the current vertex at each step.
Finally, the procedure ends having a homomorphism mapping for each vertex in P.

5.6 Summary

The flexibility of representation offered by fuzzy RDF raises challenging issues for
querying fuzzy RDF graph. In this chapter, we firstly propose three query evaluation
algorithms for processing subgraph queries of fuzzy RDF queries. In Sect. 5.2 we
propose a class of RDF graph pattern, in which a vertex is specified with a flexible
condition to express preferences on the vertex contents of the graph and an edge is
specified with a regular expression to express fuzzy preferences on the structure of the
data graph, and we study the pattern matching in a fuzzy RDF graph. Specifically, we
want to retrieve all qualified matches of a query pattern in the fuzzy RDF graph. We
further define a graph pattern matching algorithm based on a revised notion of graph
homomorphism, as opposed to the NP-completeness of graph pattern matching via
subgraph isomorphism. In Sect. 5.3 we propose a novel path-based solution to retrieve
subgraph from fuzzy RDF graph databases. In additions, the absolute possibility of
a match can be computed by aggregating the relative possibilities of each candidate
path in the match processing. In Sect. 5.4, we integrate fuzzy quantified statements
in fuzzy RDF queries addressed to a fuzzy RDF database. We present an approach to
summarize fuzzy RDF graph database in the form of linguistic summaries, and we
have showed how these statements can be defined and implemented. In Sect. 5.5, we
present an extension of SPARQL to query fuzzy RDF graph. The extension is able
to express fuzzy queries making use of regular expressions and fuzzy conditions.
We have provided syntax and semantics to the extension of SPARQL pattern. On
this basis, we have presented a query evaluation algorithm to subgraph query for
processing fuzzy RDF queries.
With the advent of the era of Big Data and artificial intelligence, dealing with
diverse fuzzy information in various fuzzy models will be very essential. We can
believe that fuzzy technique will be applied in increasing concrete application
domains and can play an increasingly important role in implementing Big Data
intelligence.
References 205

References

Aho, A. V., & Hopcroft, J. E. (1974). The design and analysis of computer algorithms. Addison-
Wesley Pub. Co.
Alkhateeb, F., Baget, J. F., & Euzenat, J. (2009). Extending SPARQL with regular expression
patterns (for querying RDF). Journal of Web Semantics, 7(2), 57–73.
Almendros-Jiménez, J. M., Becerra-Terón, A., & Moreno, G. (2017). A fuzzy extension of SPARQL
based on fuzzy sets and aggregators. In IEEE International Conference on Fuzzy Systems (FUZZ-
IEEE) (pp. 1–6).
Angles, R., & Gutierrez, C. (2016). The multiset semantics of SPARQL patterns. In International
Semantic Web Conference (pp. 20–36). Springer.
Anyanwu, K., Maduko, A., & Sheth, A. (2007). Sparq2l: Towards support for subgraph extraction
queries in RDF databases. In Proceedings of the 16th International Conference on World Wide
Web (pp. 797–806).
Bendersky, M., Metzler, D., & Croft, W. B. (2010). Learning concept importance using a weighted
dependence model. In Proceedings of the Third ACM International Conference on Web Search
and Data Mining (pp. 31–40).
Blau, H., Immerman, N., & Jensen, D. (2002). A visual language for querying and updating graphs.
University of Massachusetts Amherst Computer Science Technical Report, 37, 2002.
Bosc, P., & Pivert, O. (1992). Some approaches for relational databases flexible querying. Journal
of Intelligent Information Systems, 1(3), 323–354.
Bosc, P., Lietard, L., & Pivert, O. (1995). Quantified statements and database fuzzy querying. In
Fuzziness in Database Management Systems (pp. 275–308). Physica.
Bouchon-Meunier, B., & Moyse, G. (2012), Fuzzy linguistic summaries: Where are we, where
can we go? In IEEE Conference on Computational Intelligence for Financial Engineering &
Economics (CIFEr) (pp. 1–8).
Bry, F., Furche, T., Marnette, B., Ley, C., Linse, B., & Poppe, O. (2010). SPARQLog: SPARQL with
rules and quantification. In Semantic Web Information Management (pp. 341–370). Springer.
Carroll, J. J. (2002). Matching RDF graphs. Lecture Notes in Computer Science (pp. 5–15).
Castelltort, A., & Laurent, A. (2016). Extracting fuzzy summaries from NoSQL graph databases.
In Flexible Query Answering Systems 2015 (pp. 189–200). Springer.
Cheng, J., Ma, Z. M., & Yan, L. (2010). f-SPARQL: A flexible extension of SPARQL. In
International Conference on Database and Expert Systems Applications (pp. 487–494). Springer.
Costabello, L. (2014). Error-tolerant RDF subgraph matching for adaptive presentation of linked
data on mobile. In The Semantic Web: Trends and Challenges. Springer International Publishing.
Delgado, M., Ruiz, M. D., Sánchez, D., & Vila, M. A. (2014). Fuzzy quantification: A state of the
art. Fuzzy Sets and Systems, 242, 1–30.
Fan, W., Wu, Y., & Xu, J. (2016). Adding counting quantifiers to graph patterns. In Proceedings of
the 2016 International Conference on Management of Data (pp. 1215–1230), ACM.
Gallagher, B. (2006). Matching structure and semantics: A survey on graph-based pattern matching.
In AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection (Vol. 45).
Gao, X., Xiao, B., Tao, D., & Li, X. (2010). A survey of graph edit distance. Pattern Analysis &
Applications, 13(1), 113–129.
Golomb, S. W., & Baumert, L. D. (1965). Backtrack programming. Journal of the ACM, 12(4),
516–524.
Hahn, G., & Tardif, C. (1997). Graph homomorphisms: Structure and symmetry. Graph Symmetry.
Springer Netherlands.
Harris, S., & Seaborne, A. (2013). SPARQL 1.1 query language. In W3C Recommendation. http://
www.w3.org/TR/sparql11-query
Hayes, P. (2004). RDF semantics. http://www.w3.org/TR/2004/REC-rdf-mt-20040210/
Henzinger, M. R., Henzinger, T. A., & Kopke, P. W. (1995). Computing simulations on finite
and infinite graphs. In Proceedings of IEEE 36th Annual Foundations of Computer Science
(pp. 453–462). IEEE.
206 5 Fuzzy RDF Queries

Holub, J., & Melichar, B. (1998). Implementation of nondeterministic finite automata for
approximate pattern matching. In Automata Implementation, Third International Workshop on
Implementing Automata, WIA’98, Rouen, France, September 17–19 (pp.92–99). DBLP
Kochut, K. J., & Janik, M. (2007). SPARQLeR: Extended SPARQL for semantic association
discovery. In European Semantic Web Conference (pp. 145–159). Springer.
Lee, J., Han, W. S., Kasperovics, R., & Lee, J. H. (2012). An in-depth comparison of subgraph
isomorphism algorithms in graph databases. PVLDB., 6(2), 133–144.
Li, G., Yan, L., & Ma, Z. (2019). Pattern match query over fuzzy RDF graph. Knowledge-Based
Systems, 165, 460–473.
Lian, X., & Chen, L. (2011). Efficient query answering in probabilistic RDF graphs. ACM SIGMOD
International Conference on Management of Data (pp. 157–168).
Liu, Y. A., Rothamel, T., Yu, F., Stoller, S. D., & Hu, N. (2004). Parametric regular path queries.
ACM Sigplan Notices, 39(6), 219–230.
Ma, Z. M., Liu, J., & Yan, L. (2011). Matching twigs in fuzzy XML. Information Sciences, 181(1),
184–200.
Ma, R., Jia, X., Cheng, J., & Angryk, R. A. (2015). SPARQL queries on RDF with fuzzy constraints
and preferences. Journal of Intelligent & Fuzzy Systems, 30(1), 183–195.
Matono, A., Amagasa, T., Yoshikawa, M., & Uemura, S. (2005). A path-based relational RDF
database. In Australasian Database Conference-Volume (Vol. 39, pp. 95–103).
Moustafa, W. E., Kimmig, A., Deshpande, A., & Getoor, L. (2014). Subgraph pattern matching
over uncertain graphs with identity linkage uncertainty. IEEE, International Conference on Data
Engineering (pp. 904–915), IEEE.
Neumann, T., & Weikum, G. (2008). RDF-3x: A risc-style engine for RDF. Proceedings of the
VLDB Endowment, 1(1), 647–659.
Pivert, O., Slama, O., & Thion, V. (2016a). SPARQL extensions with preferences: A survey. In
Proceedings of the 31st Annual ACM Symposium on Applied Computing (pp. 1015–1020).
Pivert, O., Slama, O., & Thion, V. (2016b). An extension of SPARQL with fuzzy navigational
capabilities for querying fuzzy RDF data. In IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE) (pp. 2409–2416).
Pivert, O., Slama, O., & Thion, V. (2016c). Fuzzy quantified structural queries to fuzzy graph
databases. In International Conference on Scalable Uncertainty Management (pp. 260–273).
Springer.
Pérez, J., Arenas, M., & Gutierrez, C. (2010). nSPARQL: A navigational language for RDF. Journal
of Web Semantics, 8(4), 255–270.
Prudhommeaux, E. (2008). SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-
query/
Ullmann, J. R. (1976). An algorithm for subgraph isomorphism. Journal of the ACM, 23(23), 31–42.
Virgilio, R. D., Maccioni, A., & Torlone, R. (2015). Approximate querying of RDF graphs via path
alignment. Distributed and Parallel Databases, 33(4), 555–581.
Wagner, R. A., & Fischer, M. J. (1974). The string-to-string correction problem. Journal of the
ACM, 21(1), 168–173.
Wang, J., Jin, B., & Li, J. (2005). An efficient matching algorithm for RDF graph patterns. Journal
of Computer Research & Development, 42(10), 1763–1770.
Yager, R. R. (1993). On ordered weighted averaging aggregation operators in multicriteria decision
making. In Readings in Fuzzy Sets for Intelligent Systems (pp. 80–87).
Yager, R. R. (2014). Social network database querying based on computing with words. In Flexible
Approaches in Data, Information and Knowledge Management (pp. 241–257). Springer, Cham.
Zadeh, L. A. (1983). A computational approach to fuzzy quantifiers in natural languages.
Computers & Mathematics with Applications, 9(1), 149–184.
Zhang, D., Song, T., He, J., Shi, X., & Dong, Y. (2012), A similarity-oriented RDF graph matching
algorithm for ranking linked data. In IEEE 12th International Conference on Computer and
Information Technology (CIT) (pp.427–434). IEEE.
References 207

Zhao, P., & Han, J. (2010). On graph query optimization in large networks. Proceedings of the
VLDB Endowment, 3(3), 340–351.
Zimmermann, H. J. (1996). Fuzzy set theory and its applications (3rd ed.). Kluwer Academic
Publishers Norwell.
Zou, L., & Özsu, M. T. (2017). Graph-based RDF data management. Data Science and Engineering,
2, 56–70.
Zou, L., Özsu, M. T., Chen, L., Shen, X., Huang, R., & Zhao, D. (2014). gstore: A graph-based
SPARQL query engine. The VLDB Journal, 23(4), 565–590.
Index

A Fuzzy graph(s), 34, 39, 66, 74, 75, 78, 79,

Algebraic operation, 45, 72, 90, 91, 99, 94–98, 153–157, 159, 173, 184,
101, 103 197–201
Algebraic syntax, 10, 12, 13, 27, 101 Fuzzy inheritance hierarchies, 46, 49
subclass, 3, 43, 45, 49–52, 129, 138
superclass, 43, 45, 49, 51, 52, 119, 120
Fuzzy object-oriented database, 33, 43, 49,
C
66, 110, 118
Category, 20, 26, 130
Fuzzy quantified query, 183, 189
Class
Fuzzy RDF, 34, 72–83, 90–106, 110–114,
extensional class, 45
116–130, 132–134, 140–142,
intentional class, 45
144–148, 151–155, 157–161, 163,
Constraints 164, 166–170, 172–174, 176,
domain integrity constraints, 41 182–184, 186–188, 192, 193,
entity integrity constraints, 45 196–204
referential integrity constraints, 41 Fuzzy RDF algebra, 72, 79, 90, 98–103
Fuzzy RDF graph(s), 34, 73–80, 82–86, 88,
90, 91, 93–98, 100, 104, 105,
D 111–114, 116, 118, 119, 121–123,
Data management, 2, 16, 18, 23, 24, 26, 34, 125, 128, 130, 141, 142, 144–146,
66, 71, 106, 109, 110, 128, 148 148, 151–156, 158, 161, 163–168,
Data model/modeling, 1–4, 6, 17, 22–25, 172–174, 176, 182–184, 186–188,
27, 33, 57, 65, 66, 71–75, 81, 82, 90, 192, 193, 196–204
95, 105, 109, 130, 144, 152, 153, Fuzzy RDFS, 81, 82, 110, 118, 128–130,
172 132
Fuzzy RDF Schema, 81, 119
Fuzzy RDF subgraph, 120, 151, 166, 174,
176
E
Fuzzy relational database, 33, 40, 41, 44,
Entity, 22, 35, 45, 74, 145, 147, 171
53, 66, 82, 110, 111, 116–118, 128,
Extraction, 2, 89
148
Fuzzy set(s), 27, 34, 36–38, 44, 66, 71, 73,
81, 91, 112, 148, 184, 187
F Fuzzy XML, 33, 53–57, 59, 60, 63, 64
Fuzzy DTD, 55, 57, 63 Fuzzy XML Schema, 53, 55, 59, 64
© The Editor(s) (if applicable) and The Author(s), under exclusive license 209
to Springer Nature Switzerland AG 2022
Z. Ma et al., Modeling and Management of Fuzzy Semantic RDF Data,
Studies in Computational Intelligence 1057,
https://doi.org/10.1007/978-3-031-11669-8
210 Index

Fuzzy XML tree, 65, 153 membership function, 37, 157, 185, 201

G N
Graph homomorphism, 154, 155, 202, 204 Neo4j database, 23, 143, 144, 146
Graph isomorphism, 77, 98, 151–154, 156,
158, 166, 167, 188, 192, 193, 196,
197, 204
Q
Graph patterns, 12, 15, 98–103
Query, 2, 3, 10–13, 15–18, 20, 22–24, 26,
27, 45, 72, 90, 98–105, 116–118,
H 128, 132–144, 146–148, 151–156,
HBase database, 26, 110, 127, 128, 148 158–162, 164–172, 174–184,
186–192, 194, 195, 197–200,
202–204
I
Imperfect information
ambiguity, 145 R
imprecision, 33, 35, 36, 43, 46, 100, RDF data storage, 2, 16, 27, 105, 111, 142
110, 118, 148, 183 Reasoning, 3, 74
inconsistency, 35 equivalence, 40, 78, 98, 129
uncertainty, 33–35, 43, 44, 72, 73, 110, inclusion, 46, 51
118, 148 Reengineer(ing), 110, 118, 148
vagueness, 35, 43, 74, 100, 153, 197 Regular path, 164
Index, 2, 18, 19, 22, 127, 135, 139, 147, Relational algebra, 14, 95, 96, 98, 101
152, 170, 175–178, 180–183, 192 Resemblance relation, 40, 42

K S
Key SPARQL, 2, 10–17, 22–24, 27, 90, 99–105,
foreign key, 41, 42 116, 117, 132, 141–143, 146, 147,
key-value, 17, 143 151–153, 166, 167, 183, 197–204
keyword(s), 120 Subgraph pattern, 151–153, 155, 166, 167,
primary key, 41, 42 169, 170, 184, 196, 197
Subgraph query, 152, 169, 203, 204
L
Logical database
object-oriented database, 16, 33, 43–45, T
49, 53, 66, 71, 110, 118, 128, 148 Triple(s), 3–7, 9–12, 14, 17–22, 24, 26, 27,
relational database, 16, 17, 19, 33, 34, 65, 72–76, 80, 82, 102, 104, 105,
41–44, 53, 66, 72, 109–112, 116–118, 109–114, 116, 127, 128, 130, 132,
128, 141, 144, 148 133, 135–142, 145, 158, 162, 164,
165, 172–174, 192

M
Membership X
membership degree, 34, 42, 45–49, 51, XML
73, 74, 76, 80, 83, 86, 91, 100, 111, XML DTD, 53
114, 119, 120, 122, 125, 129, 130, XML Schema, 53, 55, 57, 59, 62, 64,
132, 133, 164, 174, 176, 177, 192, 199 123

RDF Query Path Optimization Using Hybrid Genetic Algorithms
No ratings yet
RDF Query Path Optimization Using Hybrid Genetic Algorithms
16 pages
DiploCloud: Scalable RDF Data Management
No ratings yet
DiploCloud: Scalable RDF Data Management
5 pages
Empowering The SDM-RDFizer Tool For Scaling Up To
No ratings yet
Empowering The SDM-RDFizer Tool For Scaling Up To
28 pages
Intelligent Systems in Big Data, Semantic Web and Machine Learning
No ratings yet
Intelligent Systems in Big Data, Semantic Web and Machine Learning
6 pages
RDF Querying with Apache Spark
No ratings yet
RDF Querying with Apache Spark
6 pages
Data Management and Analysis: Reda Alhajj Mohammad Moshirpour Behrouz Far Editors
100% (1)
Data Management and Analysis: Reda Alhajj Mohammad Moshirpour Behrouz Far Editors
261 pages
Big Data Algorithms
100% (1)
Big Data Algorithms
476 pages
Effective Garbage Data Filtering Algorithm For SNS Big Data Processing by Machine Learning
No ratings yet
Effective Garbage Data Filtering Algorithm For SNS Big Data Processing by Machine Learning
8 pages
Advanced Methods For Knowledge Discovery From Complex Data
100% (1)
Advanced Methods For Knowledge Discovery From Complex Data
375 pages
Big Data Processing With Harnessing Hadoop - Mapreduce For Optimizing Analytical Workloads
No ratings yet
Big Data Processing With Harnessing Hadoop - Mapreduce For Optimizing Analytical Workloads
6 pages
Big Graph Analyses: From Queries To Dependencies and Association Rules
No ratings yet
Big Graph Analyses: From Queries To Dependencies and Association Rules
19 pages
A Research On Machine Learning Methods For Big Data Processing, and Youming Sun
No ratings yet
A Research On Machine Learning Methods For Big Data Processing, and Youming Sun
9 pages
Provenance in Data Science: Leslie F. Sikos Oshani W. Seneviratne Deborah L. Mcguinness Editors
100% (1)
Provenance in Data Science: Leslie F. Sikos Oshani W. Seneviratne Deborah L. Mcguinness Editors
119 pages
Big Data Management vs Semantic Web
No ratings yet
Big Data Management vs Semantic Web
11 pages
Knowledge Discovery in Heterogeneous and Unstructured Data of Industry 4 Systems
No ratings yet
Knowledge Discovery in Heterogeneous and Unstructured Data of Industry 4 Systems
6 pages
Big Data and Data Science Insights
No ratings yet
Big Data and Data Science Insights
386 pages
Bridging Machine Learning and Computer Network Res
No ratings yet
Bridging Machine Learning and Computer Network Res
16 pages
RDFcache Sigmod15
No ratings yet
RDFcache Sigmod15
16 pages
Sherif Sakr (Editor), Albert Zomaya (Editor) - Encyclopedia of Big Data Technologies-Springer (2019) PDF
No ratings yet
Sherif Sakr (Editor), Albert Zomaya (Editor) - Encyclopedia of Big Data Technologies-Springer (2019) PDF
1,853 pages
Dokumen - Pub - Designing and Building Enterprise Knowledge Graphs 1nbsped 1636391745 9781636391748 9781636391755 9781636391762
No ratings yet
Dokumen - Pub - Designing and Building Enterprise Knowledge Graphs 1nbsped 1636391745 9781636391748 9781636391755 9781636391762
168 pages
Big Data Handling for Researchers
No ratings yet
Big Data Handling for Researchers
8 pages
Ai and Big Data
No ratings yet
Ai and Big Data
6 pages
Data Mining and Knowledge Discovery For Big Data - Methodologies, Challenge and Opportunities (Chu 2013-10-09)
No ratings yet
Data Mining and Knowledge Discovery For Big Data - Methodologies, Challenge and Opportunities (Chu 2013-10-09)
310 pages
Big Data Conceptual Analysis and Applications by Michael Z. Zgurovsky, Yuriy P. Zaychenko
No ratings yet
Big Data Conceptual Analysis and Applications by Michael Z. Zgurovsky, Yuriy P. Zaychenko
298 pages
Challenges in Computational Statistics and Data Mining (Matwin & Mielniczuk 2015-07-08)
No ratings yet
Challenges in Computational Statistics and Data Mining (Matwin & Mielniczuk 2015-07-08)
404 pages
Big Data With Machine Learning and Fuzzy Logic
No ratings yet
Big Data With Machine Learning and Fuzzy Logic
5 pages
E6929 IranArze
No ratings yet
E6929 IranArze
15 pages
2020, Sathyaraj - Chicken Swarm Foraging Algorithm For Big Data Classification Using The Deep Belief Network Classifier
No ratings yet
2020, Sathyaraj - Chicken Swarm Foraging Algorithm For Big Data Classification Using The Deep Belief Network Classifier
21 pages
Big Data Analysis Using Apache HADOOP (November 2013) : Abstract-Big Data Problems Are Often Complex To
No ratings yet
Big Data Analysis Using Apache HADOOP (November 2013) : Abstract-Big Data Problems Are Often Complex To
11 pages
Complex Matrices For The Approximate Evaluation of Probabilistic Queries
No ratings yet
Complex Matrices For The Approximate Evaluation of Probabilistic Queries
8 pages
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
No ratings yet
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
192 pages
Challenges For Mapreduce in Big Data: Scholarship@Western
No ratings yet
Challenges For Mapreduce in Big Data: Scholarship@Western
10 pages
Analysis of Big Data Technologies and Methods - MS Thesis
100% (1)
Analysis of Big Data Technologies and Methods - MS Thesis
90 pages
2020 Book ComputationalIntelligenceForSe
No ratings yet
2020 Book ComputationalIntelligenceForSe
150 pages
Soft and Declarative Fishing of Information in Big Data Lake
No ratings yet
Soft and Declarative Fishing of Information in Big Data Lake
15 pages
Fuzzy Techniques in Big Data Analytics
No ratings yet
Fuzzy Techniques in Big Data Analytics
15 pages
Fuzzy Information & Engineering and Operations Research & Management
No ratings yet
Fuzzy Information & Engineering and Operations Research & Management
558 pages
Moth-Flame Optimization-Bat Optimization: Map-Reduce Framework For Big Data Clustering Using The Moth-Flame Bat Optimization and Sparse Fuzzy C-Means
No ratings yet
Moth-Flame Optimization-Bat Optimization: Map-Reduce Framework For Big Data Clustering Using The Moth-Flame Bat Optimization and Sparse Fuzzy C-Means
15 pages
Knowledge Discovery and Data Mining Challenges and Realities Xingquan Zhu Download
No ratings yet
Knowledge Discovery and Data Mining Challenges and Realities Xingquan Zhu Download
52 pages
10.1007@978 981 15 6695 0
No ratings yet
10.1007@978 981 15 6695 0
228 pages
Ontology
No ratings yet
Ontology
14 pages
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
No ratings yet
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
4 pages
Big Data Challenge A Data Management Perspective
No ratings yet
Big Data Challenge A Data Management Perspective
8 pages
Data Mining and Big Data Ying Tan Digital Version 2025
No ratings yet
Data Mining and Big Data Ying Tan Digital Version 2025
76 pages
Big Data Analytics
No ratings yet
Big Data Analytics
13 pages
JS - Introduction To DOM Manipulation
No ratings yet
JS - Introduction To DOM Manipulation
27 pages
Ict Lesson Planner
No ratings yet
Ict Lesson Planner
114 pages
Brute-Forcing Stay-Logged-In Cookies
No ratings yet
Brute-Forcing Stay-Logged-In Cookies
11 pages
Lecture11 Slides PDF
No ratings yet
Lecture11 Slides PDF
27 pages
What Is AI Project Cycle
No ratings yet
What Is AI Project Cycle
6 pages
Introduction To Stata: Ucla Idre Statistical Consulting Group
No ratings yet
Introduction To Stata: Ucla Idre Statistical Consulting Group
119 pages
Catalogue
0% (1)
Catalogue
5 pages
Syserr
No ratings yet
Syserr
266 pages
Assignment - 2
No ratings yet
Assignment - 2
2 pages
Application - Engineer, Project (75297) - Vishnuvarthan S
No ratings yet
Application - Engineer, Project (75297) - Vishnuvarthan S
3 pages
AX 2012 R3 Development II Lab Guide
No ratings yet
AX 2012 R3 Development II Lab Guide
25 pages
Rotary Community Corps Organization Form
No ratings yet
Rotary Community Corps Organization Form
2 pages
Analogue Addressable Fire Alarm Control and Indicating Equipment Specification 1. General
No ratings yet
Analogue Addressable Fire Alarm Control and Indicating Equipment Specification 1. General
10 pages
Fanuc Screen Display Manual
33% (3)
Fanuc Screen Display Manual
86 pages
MAITRI AI Assistant For Astronauts Psychological and Physical Well Being
No ratings yet
MAITRI AI Assistant For Astronauts Psychological and Physical Well Being
7 pages
Company IT Contacts and Storage Needs
No ratings yet
Company IT Contacts and Storage Needs
5 pages
Priority Expiration Cache Design
No ratings yet
Priority Expiration Cache Design
3 pages
Cisco Switches Cheat Sheets: 1. Switch Power On
No ratings yet
Cisco Switches Cheat Sheets: 1. Switch Power On
13 pages
Lenovo Ideapad 510 User Guide: Ideapad 510-15ISK Ideapad 510-15IKB
No ratings yet
Lenovo Ideapad 510 User Guide: Ideapad 510-15ISK Ideapad 510-15IKB
32 pages
Upgrade Guide AOS v50
No ratings yet
Upgrade Guide AOS v50
17 pages
Standard Installation Service Scope of Work Service: Wireless LAN
No ratings yet
Standard Installation Service Scope of Work Service: Wireless LAN
3 pages
Yocto Slides
No ratings yet
Yocto Slides
300 pages
Odoo Marketplace Integration Simplified
No ratings yet
Odoo Marketplace Integration Simplified
33 pages
Google Pixel Fold Full Specifications - RecycleDevice Blog
No ratings yet
Google Pixel Fold Full Specifications - RecycleDevice Blog
4 pages
E-Commerce: Aman Kumar Singh
No ratings yet
E-Commerce: Aman Kumar Singh
12 pages
24 - 16 - 8 FE+POTS MDU CLI User Manual V1.2
No ratings yet
24 - 16 - 8 FE+POTS MDU CLI User Manual V1.2
69 pages
Synchronous vs Statistical TDM Explained
No ratings yet
Synchronous vs Statistical TDM Explained
49 pages
Problem Statement
No ratings yet
Problem Statement
4 pages
Coursersaaa - Quiz
No ratings yet
Coursersaaa - Quiz
14 pages
Topic 01 - Introduction To Business Intelligence Systems
No ratings yet
Topic 01 - Introduction To Business Intelligence Systems
25 pages