ILLUSTRATIVE EXAMPLE: INTEROPERABILITY AND
DATA INTEGRATION USING THE SEMANTIC WEB
BY AMNA BASHARAT
7TH FEBRU
UARY, 2009
Adapted from W3C Tutori
Ad ted f o W3C T to ial on the Semantic Web by
i l o the Se ti Web b
Ivan Herman, W3C
(Last update: 0
04 February 2009)
THE ROUGH STRUCTUREE OF DATA INTEGRATION
Map the various data onto
o an abstract data
representation
make the data independent of its internal representation…
Merge the resulting repre
g g p sentations
Start making queries on th
he whole!
q eries not possible on the indi id
queries not possible on the individ
d al data sets
dual data sets
Copyright © 2009, W3C (2
2)
A SIMPLIFIED BOOK STORRE DATA (DATASET “A”)
ID Author Title Publisher Year
ISBN0-00-651409-X id_xyz The Gla
ass Palace id_qpr 2000
ID Name Home P Page
id_xyz Ghosh, Amitav http://ww
ww.amitavghosh.com
ID Publ. Name
Publ City
id_qpr Harpers Collins London
Copyright © 2009, W3C (3
3)
ST
1 : EXPORT YOUR DATA
A AS A SET OF RELATIONS
Copyright © 2009, W3C (4
4)
SOME NOTES ON THE EXPORTING THE DATA
Data export does not necessarily mean physical
conversion of the data
relations can be generated on‐the
e‐fly at query time
via SQL “bridges”
scraping HTML pages
extracting data from Excel sheets
etc.
One can export part of the d
data
Copyright © 2009, W3C (5
5)
ANOTHER BOOK STOREE DATA (DATASET “F”)
A B C D E
1 ID Titre Auteur Traducteur Original
ISBN0 2020386682 Le Palais A7 A8 ISBN-0-00-651409-X
des
2
miroirs
3
4
5
6 Nom
7 Ghosh, Amitav
8 Besse, Christianne
Copyright © 2009, W3C (6
6)
ND
2 : EXPORT YOUR SECOND
S SET OF DATA
Copyright © 2009, W3C (7
7)
RD
3 : START MERG
GING YOUR DATA
Copyright © 2009, W3C (8
8)
RD
3 : START MERGING YOUR DATA (CONT.)
Copyright © 2009, W3C (9
9)
RD
3 : MERGE IDENT
TICAL RESOURCES
Copyright © 2009, W3C (10)
START MAKIN
NG QUERIES…
User of data “F” can now assk queries like:
“give me the title of the original”
This information is not in th
he dataset “F”…
…but can be retrieved by m
but can be retrieved by m
merging with dataset A !
merging with dataset “A”!
Copyright © 2009, W3C (11)
HOWEVER, MORE CAN
C BE ACHIEVED…
We “feel” that a:author and f:auteur
a should be
the same
But an automatic merge does not know that!
Let us add some extra inforrmation to the merged data:
a:author same as f:auteur
both identify a “Person”
a term that a community may have
e already defined:
a Person is uniquely identified by h
a “Person” is uniquely identified by h
his/her name and say homepage
his/her name and, say, homepage
it can be used as a “category” for cer
rtain type of resources
Copyright © 2009, W3C (12)
3RD REVISITED: USE TH
HE EXTRA KNOWLEDGE
Copyright © 2009, W3C (13)
START MAKING RICHER
R QUERIES!
User of dataset “F” can now
w query:
“give me the home page of the orriginal’s author”
The information is not in daatasets “F” or “A”…
…but was made available b
but was made available b
by:
merging datasets “A” and datasetts “F”
adding three simple extra stateme
ents as an extra “glue”
Copyright © 2009, W3C (14)
COMBINE WITH DIFFFERENT DATASETS
Via, e.g., the “Person”, the d
dataset can be combined
with other sources
For example, data in Wikipe
edia can be extracted using
dedicated tools
Copyright © 2009, W3C (15)
MERGE WITH WIKIPEDIA DATA
Copyright © 2009, W3C (16)
MERGE WITH WIKIPEDIA DATA
Copyright © 2009, W3C (17)
MERGE WITH WIKIPEDIA DATA
Copyright © 2009, W3C (18)
IS THAT SU
URPRISING?
It may look like it but, in facct, it should not be…
What happened via automa
pp atic means is done every day
y y
by Web users!
The difference: a bit of extrra rigour so that machines
could do this, too
Copyright © 2009, W3C (19)
WHAT DID
D WE DO?
We combined different dataasets that
are somewhere on the web
are of different formats (mysql, ex
xcel sheet, XHTML, etc)
have different names for relationss
We could combine the data because some URI‐s were
identical (the ISBN‐s in this c
case)
We could add some simple a
additional information (the
“glue”), also using common
g ue ), a so us g co o terminologies that a
te o og es t at a
community has produced
As a result new relations co
As a result, new relations co
ould be found and retrieved
Copyright © 2009, W3C (20)
IT COULD BECOME EVVEN MORE POWERFUL
We could add extra knowledge to the merged datasets
e.g., a full classification of variouss types of library data
geographical information
etc.
This is where ontologies, ex
xtra rules, etc, come in
ontologies/rule sets can be relativ
g / vely simple and small, or huge, or
y p , g ,
anything in between…
Even more powerful querie
es can be asked as a result
Copyright © 2009, W3C (21)
WHAT DID WEE DO? (CONT)
Copyright © 2009, W3C (22)
THE ABSTRACTION PA
AYS OFF BECAUSE…
… the graph representation
n is independent of the
exact structures
… a change in local databasse schema’s, XHTML
structures, etc, do not affect
, , t the whole
“schema independence”
… new data, new connectio
ons can be added seamlessly
Copyright © 2009, W3C (23)
THE NETWO
ORK EFFECT
Through URI‐s we can link a
any data to any data
The “network effect” is exttended to the (Web) data
( )
“Mashup on steroids” beco
ome possible
Very important to understa
y p nd the Network Effect.
Read the paper:
Metcalfe's Law, Web 2.0, and t
M t lf ' L W b 2 0 d t he Semantic Web
h S ti W b
Will be included in the Mids I
Copyright © 2009, W3C (24)
So where is the Semantic Web?
The Semantic Web provides
technologies to make such integration
posssible!