Semantic Publishing

Libraries and linked data #6: Why publish library catalogues as open linked data?

Posted on March 1, 2013 by David Shotton

This is the sixth and final paper introducing the concepts of RDF and linked data, and explaining how these Semantic Web technologies can be used to publish library catalogue data.

The previous papers in this series, which serve as technical introductions to this paper, are:

Libraries and linked data #1: What are linked data?
Libraries and linked data #2: A rough guide to Turtle.
Libraries and linked data #3: Encoding bibliographic records in RDF.
Libraries and linked data #4: A comparison of RDF and XML.
Libraries and linked data #5: Using the SPAR ontologies to publish bibliographic records.

This paper is based on a presentation given at the Ticer Librarians Summer School at the University of Tilburg, Netherlands, on 23rd August 2012.

Where are we today?

Scholarly publishing and librarianship are in the throes of a revolution, as the full potential of on-line publications and library catalogues are explored. Card catalogues have been replaced with OPAC systems, and libraries have embraced RFID technology for tracking physical holdings. But in many ways libraries are traditional, in part because of the inertia created by legacy cataloguing systems. In particular, libraries have not adopted Web standards for their metadata management, but rather continue to employ a variety of open or proprietary informational models based on XML or earlier standards such as MARC.

In contrast, modern web information management techniques employ standards such as RDF and OWL 2 to encode information in ways that permit computers to query metadata and integrate web-based information from multiple resources in an automated manner.

Thus, at this mid-point in the digital revolution, library practice is in an ill-defined transitional state—a ‘horseless carriage’ state—that lies somewhere between the world of print and paper and the world of the web and computers, with the former still exercising significantly more influence than the latter. We are now online – that is a significant start – but we should also be wholeheartedly adopting Web standards and contributing to the world of open linked data. We need to advance from the horseless carriage to the Ferrari!

Potential benefits of publishing library catalogues as open linked data

It is obvious that publishing the catalogues of major libraries as open linked data will permit their use in ways that will never be possible as long as they are kept in-house as MARC records.

If such data were available as open linked data in a triple store with a SPARQL query endpoint, they would be available to anyone, in a machine-processable format that could immediately be integrated automatically with similar data from other sources, rather than available only to human eyeballs via the library’s online catalogue (excellent as that might be).

For example, by recording the dates and geographical coordinates relating to ancient documents held by the Bodleian Libraries, and to the sites described in its archaeological publications, and by mapping these coordinates onto Google Maps or some other useful mapping system, together with similar data held at Cambridge University, it would be possible for a scholar from Sweden to see at a glance that Cambridge holds early descriptions of archeaological sites at Nimrud and Nineveh in ancient Mesopotamia, together with a large number of Mesopotamian written documents dating back 3000 years, while the Sackler Library in Oxford is particularly rich in papyri and information about Egyptian archaeological sites, while also having good holdings on cuneiform languages and Assyrian reliefs.

However, in reality, it is impossible to predict how such open data might actually be used. As Rufus Pollock famously said

“The best thing to do with your data will be thought of by someone else”.

In 2001, Tim Berners-Lee [1] predicted that the semantic web

“. . . will likely profoundly change the very nature of how knowledge is produced and shared, in ways that we can now barely imagine.”

Additionally, in his technical paper on linked data, co-authored by Chris Bizer and Tom Heath, and entitled Linked Data – The Story So Far [2], Tim Berners-Lee concludes:

“Linked Data principles and practices have been adopted by an increasing number of data providers, resulting in the creation of a global data space on the Web containing billions of RDF triples. Just as the Web has brought about a revolution in the publication and consumption of documents, Linked Data has the potential to enable a revolution in how data is accessed and utilised. . . . Linked Data realizes the vision of evolving the Web into a global data commons, allowing applications to operate on top of an unbounded set of data sources, via standardised access mechanisms. . . . We expect that Linked Data will enable a significant evolutionary step in leading the Web to its full potential.”

Examples of major users of open linked data

The BBC

The BBC is one of the largest corporations now totally committed to using RDF to store information. The BBC’s World Cup 2010 website used a high-performance dynamic semantic publishing framework underpinned by RDF and appropriate ontologies, providing far deeper and richer use of content than could have been achieved through traditional publishing solutions. Similarly, the BBC Music website is built on Linked Data and RDF, and provides a RESTful API for querying its data, and the entire BBC Natural History web site is powered by RDF, with its own Wildlife Ontology. The BBC got to this place by hiring bright people who have relevant semantic web skills.

The following diagram shows part of the linked data world, taken from the linked data cloud diagram prepared in 2010 by Richard Cyganiak. More recent versions of the diagram are too cluttered to see the details, because of the ongoing growth of the web of linked data. However, this one shows clearly how BBC Music, by making its descriptions of music, artist, orchestras and bands freely available online in RDF, has become a global resource to which others are linking.

Nature Publishing Group

On 4th April 2012, the Nature Publishing Group published as open linked data the bibliographic records of all their journal articles dating back to 1869. The following is taken from their press release describing this:

“Nature Publishing Group (NPG) today is pleased to join the linked data community by opening up access to its publication data via NPG’s Linked Data Platform, available at http://data.nature.com. The platform includes more than 20 million Resource Description Framework (RDF) statements, including primary metadata for more than 450,000 articles published by NPG since 1869. These datasets are being released under an open metadata license, Creative Commons Zero (CC0), which permits maximal use/re-use of this data.

“NPG’s platform allows for easy querying, exploration and extraction of data and relationships about articles, contributors, publications, and subjects. Users can run web-standard SPARQL Protocol and RDF Query Language (SPARQL) queries to obtain and manipulate data stored as RDF. The platform uses standard vocabularies such as Dublin Core, FOAF, PRISM, BIBO and OWL, and the data is integrated with existing public datasets including CrossRef and PubMed.

“NPG is delighted to be able to surface data on published articles from Nature and many other journals, going back to 1869,” said Jason Wilde, Business Development Director, NPG. “Linked data is an important next step in the evolution of scientific publishing and, over the coming months, we hope to be able to expose more meta-data on our content to enrich the semantic web.”

“Linked data refers to the publishing of structured data that is linked to other related data. It allows users to query, explore and link data from datasets across the web. NPG joins governments from around the world and other organizations including the British Library, the New York Times and the Open University, in providing a linked data platform.”

RDF library catalogues – what are others doing?

WorldCat

WorldCat is the largest online public access catalogue (OPAC) in the world, providing access to a global network of library content and services. It is run by OCLC (Online Computer Library Center, Inc.), a non-profit membership computer library service and research organization dedicated to the public purposes of furthering access to the world’s information and reducing information costs, founded in 1967 as the Ohio College Library Center. WorldCat presently contains ~20 million records, and its Search API provides access to a FRBR-ized set of WorldCat bibliographic records and holdings.

In June 2012, OCLC dramatically increased its exposure of linked data resources by making a downloadable linked data set for 1.2 million WorldCat resources available in this form. The WorldCat.org bibliographic metadata has been created using simple Schema.org mark-up and library vocabulary extensions.

On 6 November 2012, OCLC announced that the National Library of Poland (Biblioteka Narodowa) will add 1.3 million Polish library records to WorldCat, enriching the world’s largest resource for discovery of library materials and increasing the visibility of these collections for researchers around the world. These entries will become available as linked data.

The British Library

The British Library has recently published an open version of the British National Bibliography which it is making available in RDF as Linked Open Data. The initial offering includes published books and serial publications published or distributed in the UK since 1950.

Examples of its encoding, formatted in RDF/XML including some terms from the preliminary version of RDA, is available for books and serials. A subset of the entries in the BNB serials RDF/XML example converted into Turtle format is available here.

The Open Library

The Internet Archive‘s Open Library is a Wikipedia-like open, editable library catalogue, that aims to create a web page for every book ever published. To date, it has over 20 million records. Open Library has an open RESTful API, that permits Open Library data to be programmatically obtained in a variety of formats including RDF.

How to get from existing catalogues to linked data?

Perhaps the first thing to say is that the metadata fields required to permit resource discovery using linked data are not extensive, and should as far as possible use terms from widely-used vocabularies such as Dublin Core and FOAF, supplemented as appropriate by terms from more specialized bibliographic ontologies such as BIBO or the SPAR Ontologies. For example, the RDF metadata provided for the British National Bibliography is reasonably straightforward, involving:

provision of authors’ names, title and abstract as marked up text strings;
use of concepts such as agent, label, language, location, name, publication start date, and subject to describe bibliographic entities;
specification of identifies such as ISSN and internal British National Bibliography ID numbers;
specification of the nature of the bibliographic resource, e.g. periodical;
description of subject categories defined by the Dewey Decimal Classification schema and Library of Congress Subject Headings (already present in the British Library cataloguing data).

Thus, if one’s internal catalogue data are in good shape – and academic library cataloguers are renowned for the attention to detail with which their catalogues are curated – and are also available programmatically through an API, it should be a fairly straightforward (a) to specify the metadata items one wishes to expose as open linked data, and (b) to create a script that will automatically convert from the existing format to RDF.

An example is the conversion of the records of the National Library of Sweden, that have been online since 1997. That library opened its catalogue as open linked data in 2008, as described in [3], [4] and [5].

Other examples of mapping from non-RDF bibliographic metadata to RDF are given by our recent mappings of DataCite XML metadata to RDF [6] and, separately, of JATS (the Journal Article Tag Suite) XML markup to RDF [7].

The future of the MARC 21 cataloguing standard, and the potential for the FRBR-based RDA (Resource Description and Access) to become the new standard for library cataloguing is a quite separate topic, beyond the expertise of this writer and the scope of the papers in this series. However, the following sources are of relevance [8-11].

References

[1] Berners-Lee, Tim and Hendler, James (2001). Scientific publishing on the ‘semantic web’. Nature 410: 1023-1024. http://www.nature.com/nature/debates/e-access/Articles/bernerslee.htm.

[2] C. Bizer C, T. Heath T and Berners-Lee T (2009) Linked data—the story so far. International Journal on Semantic Web and Information Systems 5 (3): 1–22. http://eprints.soton.ac.uk/271285/1/bizer-heath-berners-lee-ijswis-linked-data.pdf.

[3] Anders Söderbäck (2009). Why libraries should embrace Linked Data. National Library of Sweden presentation, available from http://code4lib.org/files/LIBRIS_code4lib.pdf.

[4] Martin Malmsten (2008). Making a Library Catalog Part of the Semantic Web. In: 8th International Conference on Dublin Core and Metadata Applications (Berlin): Metadata for semantic and social applications. pp 146-152. Available from http://libris.kb.se/resource/bib/11306748.

[5] Martin Malmsten (2009). Exposing Library Data as Linked Data. Available from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.860&rep=rep1&type=pdf.

[6] Shotton D (2010). Revising the DataCite2RDF Mapping Document. http://opencitations.wordpress.com/2012/12/04/revising-datacite2rdf-mapping/

[7] Peroni S, Lapeyre DA and Shotton D (2012) From Markup to Linked Data: Mapping NISO JATS v1.0 to RDF using the SPAR (Semantic Publishing and Referencing) Ontologies. Proc. 2012 JATS Conference, National Library of Medicine, Bethesda, Maryland, USA, 16-17 October 2012. http://www.ncbi.nlm.nih.gov/books/NBK100491/.

[8] Coyle, Karen (2012). Linked Data Tools: Connecting on the Web. Library technology report for ALA TechSource. Available from http://www.alatechsource.org/taxonomy/term/106/linked-data-tools-connecting-on-the-web.

[9] Cronin, C. (2011). Will RDA mean the death of MARC? University of Chicago paper. Available from http://chicago.academia.edu/ChristopherCronin/Talks/33602/Will_RDA_Mean_the_Death_of_MARC_The_Need_for_Transformational_Change_to_our_Metadata_Infrastructures.

[10] JISC mailing list https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=DC-RDA has an online discussion on issues surrounding RDA, including encoding Marc21 in RDF.

[11] Kiorgaard, D. (2006). RDA and MARC21. Library of Congress paper, available from http://www.loc.gov/marc/marbi/2007/5chair12.pdf.

Posted in Linked data, Metadata, Ontologies, Semantic Publishing | Tagged bibliographic records, citation data, Journal articles, library catalogue, linked data, machine-readable metadata, ontologies, RDF, semantic publishing, standards | 4 Comments

Libraries and linked data #5: Using the SPAR ontologies to publish bibliographic records

Posted on March 1, 2013 by David Shotton

The SPAR (Semantic Publishing and Referencing) Ontologies are a suite of complementary and orthogonal ontologies written in the latest version of the Web Ontology Language OWL 2 DL, that have been specially created to permit information relating to bibliographic entities to be encoded in RDF. These SPAR ontologies are thus of specific relevance to the academic publishing and library communities, and are described at http://purl.org/spar/ as well as in earlier posts in this Semantic Publishing blog. In addition, FaBiO and CiTO are described in detail in a recent paper in the Journal of Web Semantics [1].

The original eight ontologies within this growing suite are as follows:

CiTO, the Citation Typing Ontology
http://purl.org/spar/cito

CiTO, the Citation Typing Ontology, is an ontology written to enable the existence of bibliographic reference citations to be asserted, and their factual and rhetorical nature or type characterized, both factually and rhetorically, and to permit these descriptions to be published on the Web.

The citations characterized may be direct and explicit (as in the reference list of a journal article), indirect (e.g. a citation to a more recent paper by the same research group on the same topic), or implicit (e.g. as in artistic quotations or parodies, or in cases of plagiarism).

Examples:

:paperA cito:cites :paperB ;
        cito:reviews :paperB ;
        cito:critiques :paperB .

(Note: For an explanation of the Turtle syntax used to encode the examples given in this paper, see the earlier post Libraries and linked data #2: A rough guide to Turtle. Background information on RDF, and its use to encode bibliographic records, are given in two other posts: Libraries and linked data #1: What are linked data? and Libraries and linked data #3: Encoding bibliographic records in RDF.)

The CiTO properties are summarized in the following diagram:

In this diagram, all the CiTO properties shown, except cito:cites and cito:sharesAuthorsWith, are sub-properties of cito:cites itself. The inverse property of cito:cites, namely cito:isCitedBy, and its inverse sub-properties, and the recently added properties cito:compiles, cito:isCompiledBy and cito:likes, are not shown in this diagram.

None of the CiTO properties, which are all object properties, have domain or range restrictions, permitting their use in a variety of other contexts, in addition to conventional bibliographic citations.

BiRO, the Bibliographic Reference Ontology
http://purl.org/spar/biro

BiRO is an ontology structured according to the FRBR model (see below) that provides a logical system for describing an individual bibliographic reference, such as appears in the reference list of a published article (which, depending on the house style of the journal in which the citing article appears, may lack the title of the cited article, the full names of the listed authors, or indeed the full list of authors), and the relationship of that reference to the complete bibliographic record for that cited article, which in addition to having the reference fields missing from the reference, may also include the name of the publisher, and the ISSN or ISBN of the publication.

BiRO also permits a description of the compilation of bibliographic references into bibliographic lists such as reference lists, and that the compilation of bibliographic records into bibliographic collections such as library catalogues.

Example:

:this-reference a biro:BibliographicReference ;
    frbr:partOf biro:ReferenceList ;
    biro:references :that-paper .

The following diagram, expressed using Graffoo, the Graphical Framework For OWL Ontologies created by Silvio Peroni, shows the relationships of the classes in the complete BiRO ontology:

Note the symmetry of the diagram, in which bibliographic references and reference lists in the lower half of the diagram are classified as FRBR Expressions, while bibliographic records and library catalogues in the upper half of the diagram are classified as FRBR Works, and note also that the Collections Ontology (prefix co:) is used to describe ordered lists of references and sets of bibliographic records.

Not shown in the diagram is the fact that both Works and Expressions can have different Manifestations. Thus, for example, a library catalogue can be manifested either as index cards or as an online catalogue.

FaBiO, the FRBR-aligned Bibliographic Ontology
http://purl.org/spar/fabio

FaBiO is an ontology for recording and publishing on the Semantic Web descriptions of entities that are published or potentially publishable, and that contain or are referred to by bibliographic references, or entities used to define such bibliographic references. FaBiO entities are primarily textual publications such as books, magazines, newspapers and journals, and items of their content such as poems, conference papers and editorials. However, they also include blogs, web pages, datasets, computer algorithms, experimental protocols, formal specifications and vocabularies, legal records, governmental papers, technical and commercial reports and similar publications, and also anthologies, catalogues and similar collections. FaBiO uses terms from the RDF versions of FRBR and PRISM.

Example:

:that-article a fabio:JournalArticle ;
    fabio:hasPublicationYear "2009"^^xsd:gYear ;
    prism:doi "10.1002/asi.21134" ;
    frbr:partOf [ a fabio:JournalIssue ;
        prism:issueIdentifier "9" ;
        frbr:partOf [ a fabio:JournalVolume ;
            prism:volume "60"
            frbr:partOf [ a fabio:Journal ;
                dcterms:title "Journal of the American Society for
                            Information Science and Technology" ] ] ] .

The use of FRBR in BiRO and FaBiO

The classes in the BiRO and FaBiO ontologies (but not those in other SPAR ontologies) are structured according to the FRBR (Functional Requirements for Bibliographic Records) model that contains FRBR Works, Expressions, Manifestations and Items, collectively referred to as FRBR Endeavours:

Posted in Linked data, Metadata, Ontologies, Semantic Publishing | Tagged bibliographic records, biro, c4o, citation data, cito, DoCO, fabio, Journal articles, library catalogue, linked data, machine-readable metadata, ontologies, PRO, PSO, PWO, RDF, semantic publishing, spar, standards | 5 Comments

Libraries and linked data #4: A Comparison of RDF and XML

Posted on March 1, 2013 by David Shotton

(Note: Understanding of this paper will be enhanced by prior reading of the earlier papers in this series:

Libraries and linked data #1: What are linked data?
Libraries and linked data #2: A rough guide to Turtle.
Libraries and linked data #3: Encoding bibliographic records in RDF.)

Precise semantics for markup terms

A cornerstone of the Semantic Web is the use of public ontologies to give precise and universally available definitions to terms, so that RDF statements are unambiguous in their meaning. For all its flexibility and widespread use, this is not the case in the world of XML, where markup terms can take on different meanings, depending upon who is using them, reminiscent of Humpty Dumpty’s statement in Alice’s Adventures in Wonderland [1]:

“When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean — neither more nor less.”

We came across this difference in world views recently, when we were using the SPAR Ontologies to map to RDF the new Journal Article Tag Suite (JATS), published on 22 August 2012 as ANSI/NISO Z39.96-2012, JATS: Journal Article Tag Suite (version 1.0). JATS v1.0 is the successor to the National Library of Medicine (NLM) DTD v3.0, a de facto standard for the XML markup of scholarly journal articles, that is widely used both by many academic publishers within their routine publication workflows and also as the ingest and export format for PubMed Central.

For JATS, this ambiguity is by design. JATS is a descriptive, not a prescriptive model, that endeavours to capture and document the actual practice of current publishing. It does not tell publishers what they should call their content. Rather, if a term is widely used in practice, it is likely to appear in the JATS, which aims to provide a vocabulary that will be used more or less consistently across publishers. Furthermore, suggested values for JATS elements and attributes lists are just that – suggested, since JATS provides structures for recording different types of information, but does not attempt to regularize their usage.

For example, the JATS documentation describes the central element <article> as follows:

<article> ... </article>

"Usage: This element can be used to describe not only typical 
journal articles (research articles) but also much of the 
non-article content within a journal, such as book and product 
reviews, editorials, commentaries, and news summaries."

Thus the JATS element <article> may be used to describe an XML representation of a research article, but may also be used to describe an XML representation of many other kinds of journal content, such as an editorial, an obituary, a list of events, a book review, a puzzle, a game, a quiz, an interview or a photo-essay, depending upon the meaning an individual publisher chooses for this tag element. This clearly goes beyond what the average person means by “journal article”.

In other words, the JATS standard is deliberately vague and non-committal about the meaning of many terms, because there is no intention to tell any publisher what or how much metadata to publish.

As a consequence of this ‘loose’ design, the first barrier we came up against when mapping JATS to RDF was that a JATS element might mean what its name implies, but equally it might be used by some publishers to mean something entirely different!

What people means most frequently when they use the JATS element <article> is what is defined in FaBiO, the FRBR-aligned Bibliographic Ontology, as a fabio:JournalArticle:

fabio:JournalArticle rdfs:comment

"An article, typically the realization of a research paper reporting 
original research findings, published in a journal issue." .

However, as we have seen, it can also mean fabio:JournalEditorial, fabio:JournalNewsItem, fabio:BookReview, fabio:ProductReview, etc., all of which are journal content items. These various meanings could all be mapped to RDF generically, as follows:

:periodical-entity a fabio:PeriodicalItem ;
     frbr:partOf [ a fabio:JournalIssue ] .

However, that does not solve the problem entirely, since it is also permissible to use JATS <article> to describe textual entities before they appear in a journal issue, for example to describe a preprint or a revised manuscript being re-submitted to a publisher. Clearly, this brings problems for unambiguous XML mapping to specific RDF terms.

In our JATS2RDF mapping work [2, 3], our solution to this dilemma has been, where necessary, to map the entity described by <article> to :textual-entity, a resource name that is so broad that it includes all relevant possibilities, since they are all textual entities, thereby achieving semantic accuracy, if not detailed specificity.

Hierarchies versus triples

A further clear difference between XML and RDF concerns the structural organisation of items (i.e. the elements in XML and the resources in RDF). XML is able to structure elements according to a particular containment order, thus creating hierarchies of nested elements. Such a containment relation between two XML elements always carries a particular semantics, although it is not formalised and implicitly lives outside the XML schema of the language.

Let us consider the following two excerpts of JATS markup:

<article-meta>
   <title-group>
      <article-title>
         Dealing With Markup Semantics
      </article-title>
   </title-group>
</article-meta>

<element-citation>
   <article-title>
      Dealing With Markup Semantics
   </article-title>
</element-citation>

Above, the element <article-title> is used in two different contexts, thus having two alternative interpretations.

In the former excerpt (i.e. when it is descendant of the element <article-meta>) <article-title> is the title of the article under consideration, which can be simply represented in RDF:

:textual-entity fabio:hasTitle "XXX" .

In the latter excerpt (i.e. when it is descendant of the element <element-citation>), it represents the title of another bibliographic work that is cited by the article under consideration in one of its references in the reference list of the article. This could be represented in RDF as follows:

:textual-entity-A frbr:part :reference .

:reference a biro:BibliographicReference ;
     biro:references :textual-entity-B .

:textual-entity-B fabio:hasTitle "XXX" .

This says that the citing paper “A” contains a reference that refers to the cited paper B, and that the cited paper B has the title “XXX”. Here, the title “XXX” belongs to the referenced work.

However, this is a mis-interpretation. What the original XML actually means is subtly different – namely that the title “XXX” is part of the text of the reference itself, within the reference list that makes up part of the citing paper A. To express this in RDF, the encoding has to be different:

:textual-entity-A frbr:part :reference .

:reference a biro:BibliographicReference ;
     frbr:part [ a doco:Title ;
          literal:hasLiteralValue "XXX" ] .

Here, what we are saying is that part of the reference itself is a title with the string value “XXX”.

Sorting out the semantics hidden behind XML containment relationships is one of the main issues one has to address when trying to map XML schemas to RDF vocabularies correctly, because:

RDF is not able to represent the hierarchical relation of XML elements using native constructs, since everything is described as a ‘flat’ graph of resources; and
the semantics of the containment of XML elements, such as the aforementioned <article-meta>/<article-title> and <element-citation>/<article-title>, is neither explicitly nor formally defined – it can live either in a natural language definition of the element, or in the mind of the developer of the schema, or, sometimes, in the mind of the author of the XML document.

Of course, RDF can express hierarchical relationships, that are clearly defined by the DL logic of the ontologies from which terms are used. Thus fabio:hasShortTitle and fabio:hasTranslatedTitle are both defined as sub-properties of dcterms:title. However, such hierarchical definitions represent taxonomies, and do not address the contextual semantics of XML determined by containment relationships.

For more on this topic, readers are referred to an interesting yet simple comparison of XML and RDF made by Tim Berners-Lee, in one of his informal and highly influential Design Issues papers entitled Why the RDF model is different from the XML model [4], in which he attempts to answer the question “Why should I use RDF – why not just XML?”

This post is jointly authored by David Shotton, University of Oxford ([email protected]) and Silvio Peroni, University of Bologna ([email protected]), and is taken in part from reference [2].

References

[1] Lewis Carroll (1865). Alice’s Adventures in Wonderland. 2009 edition: Oxford University Press. ISBN 978-0-19-955829-2.

[2] Peroni S, Lapeyre DA and Shotton D (2012). From Markup to Linked Data: Mapping NISO JATS v1.0 to RDF using the SPAR (Semantic Publishing and Referencing) Ontologies. A paper describing a mapping of JATS metadata to RDF for the 2012 JATS Conference, Washington DC, USA, 16-17 October 2012. Available from http://www.ncbi.nlm.nih.gov/books/NBK100491/.

[3] Peroni S, Lapeyre DA and Shotton D (2012). JATS2RDF (v1.2): a mapping of JATS metadata to RDF. Available from http://purl.org/spar/jats2rdf.

[4] Berners-Lee, Tim (1998). Why the RDF model is different from the XML model: An attempt to explain the difference between the XML and RDF models. http://www.w3.org/DesignIssues/RDF-XML.html.

Posted in Linked data, Metadata, Ontologies, Semantic Publishing | Tagged Document markup, linked data, machine-readable metadata, ontologies, RDF, semantic publishing, standards, XML | 2 Comments

Libraries and linked data #3: Encoding bibliographic records in RDF

Posted on March 1, 2013 by David Shotton

Bibliographic index card records

Although the majority of library catalogues are now digitized, under the hood most continue to use an index card paradigm similar to the one shown below for my CiTO paper [1], which uses PubMed tag-value pairs to encode key bibliographic information.

Note that in this bibliographic record, there are no hierarchical structures and no explicit relationships between the statements, although there is an implicit presumption that all the tag-value pairs recorded on this card relate to the same article.

Bibliographic records as RDF graphs

In contrast, a generic RDF graph encoding such bibliographic information looks like this:

Here, the primary information graph about the paper itself links to other graphs describing the author and the publisher, creating a small web of linked data.

[Note: An introduction to RDF and linked data is given in the first paper in this series, entitled What are linked data?]

Clearly, such an RDF graph is not easily machine-readable. However, it can be written out (‘serialized’) into a series of simple machine-readable RDF statements, which in Turtle notation (Terse RDF Triple Language), are as follows:

<http://dx.doi.org/10.1186/2041-1480-1-S1-S6>
          # URI of the CiTO paper in Journal of Biomedical Semantics
     rdf:type fabio:JournalArticle ;

     dc:title "CiTO, the Citation Typing Ontology" ;
     fabio:hasPublicationYear "2010"^^xsd:gYear ;
     prism:publicationDate "2010-06-22"^^xsd:date ;
     dcterms:bibliographicCitation "Shotton D (2010). CiTO, the
          Citation Typing Ontology. J. Biomed. Semant. 1,S1: S6." ;
     prism:doi "10.1186/2041-1480-1-S1-S6" ;
     fabio:hasPubMedId "20626926" ;

     dcterms:publisher [ rdf:type foaf:Organization ;
          foaf:name "BioMed Central" ;
          foaf:homepage <http://www.biomedcentral.com/> ] ;

     dcterms:creator [ rdf:type foaf:Person ;
          foaf:name "David Shotton" ;
          foaf:mbox <mailto:[email protected]> ;
          foaf:workplaceHomePage
               <http://www.zoo.ox.ac.uk/staff/academics/shotton_dm.htm> ] .

[Note: A guide to understanding Turtle for the uninitiated is give in the previous post in this series, entitled Libraries and Linked Data #2: Rough Guide to Turtle.]

Notice the compact and easily comprehensible nature of this encoding. Note also how terms (class and property names) from different ontologies and structured vocabularies have been combined to create these RDF statements. Such use of terms from pre-existing well-used ontologies, such as the Dublin Core Metadata Initiative metadata terms and the Friend of a Friend Vocabulary, is good practice when creating RDF descriptions, because it builds on previous effort where possible, and reduces the number of new ontological descriptions that are required. A list of open linked data vocabularies, useful for finding required terms in existing vocabularies, is given by the Open Knowledge Foundation’s Linked Open Vocabularies site. Those specific for libraries – one of the biggest clusters – are given at http://lov.okfn.org/dataset/lov/details/vocabularySpace_Library.html.

Other RDF statements could be added to the RDF graph given above, for example detailing the author’s institutional affiliation, thereby enriching the information content of this graph of linked data.

If other RDF graphs are published by third parties in which BioMed Central is similarly defined as a publisher, then the CiTO graph given above can be combined automatically with the others to form an interconnected information network – a larger RDF graph of ‘linked data’ about bibliographic entities and their publishers – in which the truth content of each original statement is maintained, thereby enlarging the web of knowledge, the Semantic Web.

Of course, RDF is only one of several ways of storing bibliographic data. In the Open Citations Corpus, for example, we store the data internally in BibJSON format, a compact JSON format adapted for bibliographic information, and then convert it to RDF using an XSLT transformation for external exposure, as detailed in a previous post.

Reference

[1] Shotton, David (2010). CiTO, the Citation Typing Ontology. J. Biomedical Semantics 1 (Suppl. 1): S6. http://dx.doi.org/10.1186/2041-1480-1-S1-S6.

Libraries and linked data #2: A rough guide to Turtle

Posted on March 1, 2013 by David Shotton

The purpose of this post is to provide a simple guide to help the uninitiated understand RDF documents written in Turtle.

[Note 1: An introduction to the purpose of RDF in creating linked data has already been given the previous post entitled Libraries and linked data #1: What are linked data?, that should be read first.]

Turtle (Terse RDF Triple Language) is a syntax and grammar for RDF serialization written by David Beckett and Tim Berners-Lee in 2008, created as an alternative syntax to RDF/XML. It allows RDF graphs to be written out (‘serialized’) in a compact and natural text form that easier for humans to read than RDF/XML.

The official W3C Turtle documentation gives authoritative details of Turtle. However, that documentation is highly condensed, and requires some prior knowledge of the field for its understanding. Furthermore, while it provides statements and examples, it provides little by way of explanations.

This present document, in contrast, is incomplete and is not intended to be definitive. Rather, it is intended to provide the naïve Turtle reader with just sufficient information to make sense of a typical Turtle document without having to resort to the official Turtle documentation. It is presented as a series of explanatory statements, with examples where necessary, making no assumptions about the reader’s prior knowledge.

Each RDF triple or set of triples expressed in Turtle starts on a new line, and ends with a full stop.
Blank spaces (‘white space’) are used to separate two items that might otherwise be mistaken as being one, and are typically added before and after punctuation marks to aid clarity. Changing the number of blank spaces or inserting line breaks between RDF entities does not change the meaning of the RDF (except for comments and literals – see points 3 and 4).
Comments are preceded by a # symbol and a space, and continue to the end of the line. If a comment extends over more than one line, each line of the comment must start with a # symbol and a space. Comments are solely to guide human readers, do not form part of the RDF information, and are ignored when machine-processing the RDF.
Literals, for example personal names, are written either using double-quotes when they do not contain line breaks, e.g. “David Shotton”, or (rarely) using three sets of double-quotes when they may contain line breaks.

URIs are written enclosed in angle brackets, thus:

 <http://dx.doi.org/10.1186/2041-1480-1-S1-S6>

In defining a subject, a predicate or an object in a Turtle serialization of an RDF graph, the letters preceding a colon in a Turtle statement are an abbreviation for an ontology namespace URI, in which ontology the meaning of the term following the colon is defined. These namespaces are declared at the beginning of a formal Turtle document, but are typically omitted from exemplar excerpts. For example, the namespace declaration:

@prefix dc: <http://purl.org/dc/elements/1.1/> .

defines the Dublin Core Elements prefix “dc:” as representing the URI

<http://purl.org/dc/elements/1.1/>

, where the Dublin Core metadata elements are defined, so that

<http://dx.doi.org/10.1186/2041-1480-1-S1-S6> 
    dc:creator "David Shotton" .

has the same meaning as

<http://dx.doi.org/10.1186/2041-1480-1-S1-S6>
    <http://purl.org/dc/elements/1.1/creator> "David Shotton" .

Similarly,

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

defines the prefix ““foaf:” as representing the URI

<http://xmlns.com/foaf/0.1/>

, where the Friend-of-a-Friend vocabulary terms are defined, and

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

defines the prefix “xsd:”, representing

<http://www.w3.org/2001/XMLSchema#>

, the XML Schema namespace, where data types such as date (^^xsd:date) are defined.

If the term following the colon starts with a capital letter, e.g. foaf:Person, by convention it represents an ontology class, in this case defined at http://xmlns.com/foaf/0.1/Person, where the definition reads as follows:

“A person. The Person class represents people. We don’t nitpic about whether they’re alive, dead, real, or imaginary.”
Similarly, the names of object properties and data properties, i.e. the predicates in RDF subject-predicate-object triples, by convention start with lower case letters, as in dc:creator. Thus the data property dc:creator links a resource (the subject) to the name of its creator (the object, a text string), defined by Dublin Core Elements at
```
<http://purl.org/dc/elements/1.1/creator>
```
as follows:

“An entity primarily responsible for making the resource”.

(Note 2: The Dublin Core metadata elements and the Dublin Core Metadata Initiative (DCMI) metadata terms were among the earliest to be defined, at a time when the distinction between object property and data property was not made in the formal specifications.)

For usage of DCMI metadata terms, defined by the abbreviation dcterms: and the namespace declaration
```
@prefix dcterms: <http://purl.org/dc/terms/> .
```
we therefore follow the guidance given in the Dublin Core User Guide / Publishing Metadata, as to whether a particular DCMI metadata term is to be used as an object property, specified by a unique URI defining either a Web resource or a member of an ontology class, or is to be used as data properties, taking a literal as its object. The original ‘legacy’ Dublin Core Elements set of 15 metadata elements are all treated as data properties, taking literals as their objects.

Sometimes a choice exists between using a DCMI object property or a DC Elements data property, as in the following examples:

dc:creator (data property) – exemplar usage:
```
:my-dataset dc:creator "David Shotton" .
```
dcterms:creator (object property) – exemplar usage:
```
:my-dataset dcterms:creator 
    [ rdf:type foaf:Person ; foaf:name "David Shotton" ] .
```
[In this example, the object of the object property dcterms:creator is defined by a ‘blank node’ as something that is of type “Person” and has a name “David Shotton”, the property foaf:name being a data property taking the literal text string “David Shotton” as its object.]

In this situation, best practice is to use DCMI metadata terms (dcterms:) as object properties in preference over Dublin Core metadata elements (dc:) as data properties, unless one specifically needs to use a literal as the object of an RDF triple.)

(Note 3: The use of square brackets in the Turtle example above to define a blank node is explained in point 11, below.)
A colon without a prefix can similarly be used as a simple abbreviation for a URI, useful when providing illustrative examples. Thus, by defining the colon used without preceding letters as referring to the fictitious namespace http://example.org/, as in the following namespace declaration:
```
@prefix : <http://example.org/> .
```
one can then use :my-dataset (as used in point 8, above) to mean a fictitious example dataset
```
<http://example.org/my-dataset> .
```
A colon preceded by an underscore, i.e. “_:” , defines a ‘blank node’, i.e. a member of an anonymous class, for which no namespace declaration is required. Note that if a blank node is given a node ID or name, that definition is limited in its relevance and scope to the particular RDF graph in question. Thus the node “_:p1” in the following example does not represent the same node as a node that has been named “p1” in any other RDF graph:
```
:Patrick foaf:knows _:p1 .

_:p1 foaf:birthdate "1985-03-14"^^xsd:date .
```
In this example, the statements claim that Patrick knows someone (who, defined by the blank node, remains anonymous), who was born on 14th March 1985.

A blank node, i.e. a member of an anonymous class, can also be defined and enclosed by square brackets. This construction is particularly useful if one wishes to say several things simultaneously about the object of a triple. Thus:

<http://dx.doi.org/10.1186/2041-1480-1-S1-S6>
    dcterms:publisher [ rdf:type foaf:Organization ;
        foaf:name "BioMed Central" ;
        foaf:homepage <http://www.biomedcentral.com/> ] .

This means that the article defined by the DOI 10.1186/2041-1480-1-S1-S6 has a publisher who, as a member of the blank node, simultaneously has the following three properties:

– is of a type defined by the class foaf:Organization,
– has a name “BioMed Central”, and
– has a home page URI http://www.biomedcentral.com/.

One could say the same thing, more verbosely, by defining the publisher as “:publisher” and by using the following four separate triples:

<http://dx.doi.org/10.1186/2041-1480-1-S1-S6>
     dcterms:publisher :publisher .

:publisher rdf:type foaf:Organization .

:publisher foaf:name "BioMed Central" .

:publisher foaf:homepage <http://www.biomedcentral.com/> .

Square-bracketed blank nodes can be nested, one totally inside another.

The letter “a”, when used as a predicate, is an abbreviation for “rdf:type”, as defined by the XML Schema namespace using the namespace declaration
```
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
```
and means that the subject “is a type of” (i.e. is a member of) the object class. Thus the two following triples have the same meaning:
```
:publisher rdf:type foaf:Organization .

:publisher a foaf:Organization .
```
If one wishes to make a series of RDF triple statements about the same subject, these can be abbreviated by stating the subject once, and then separating each predicate-object pair using a semi-colon, as in the examples given in points 8 and 11, above. Thus:
```
:my-article rdf:type fabio:JournalArticle ;
    dc:creator "Shotton, David" ;
    dc:title "CiTO, the Citation Typing Ontology" .
```
and
```
:my-article rdf:type fabio:JournalArticle .

:my-article dc:creator "Shotton, David" .

:my-article dc:title "CiTO, the Citation Typing Ontology" .
```
have the same meaning.
Similarly, if one wishes to make a series of RDF triple statements in which the subject and predicate are constant, and only the objects differ, the objects may be separated by commas, as in the following example:
```
:that-article frbr:embodiment :printed , :html , :pdf .
```
which has the same meaning as the following three triples:
```
:that-article frbr:embodiment :printed .

:that-article frbr:embodiment :html .

:that-article frbr:embodiment :pdf .
```
meaning that that article is published in three different ‘manifestations’, in printed form, in a Web page in HTML format, and as a down-loadable PDF file.
Subsequent statements relating to the same subject are typically inset from the left solely for ease of reading, as in the first example in point 13, above. The meaning is the same without the inset. When writing Turtle, such inserts should be created using blank spaces rather than word processor tabs.

These explanations should enable a newcomer to Turtle to make sense of RDF expressed using this syntax.

Posted in Linked data, Ontologies, Semantic Publishing | Tagged linked data, machine-readable metadata, ontologies, RDF, semantic publishing, standards | 6 Comments

Libraries and linked data #1: What are linked data?

Posted on March 1, 2013 by David Shotton

[Note: An introduction to this and the following five blog posts, all under the general title Libraries and linked data, is given in the previous post.]

Linked data and RDF

‘Linked data’ are data encoded and published on the Web using three simple rules:

People, places, and other things under discussion are identified using HTTP names (i.e. Uniform Resource Identifiers; URIs).
Information is expressed in the form of simple relationships between pairs of things.
Information is encoded in a standard format.

This is achieved by using Semantic Web standards, particularly the Resource Description Framework (RDF), a general-purpose data model, language and encoding format developed by the World Wide Web Consortium. Information published on the Web in RDF is ‘linked data’.

Richard Cyganiac has created a map of linked data resources, that grows in complexity year by year as new sources are added. Because of its generality, the central node in the map is dbpedia, an RDF representation of the structured information that exists on Wikipedia pages, initially created in 2007 by Chris Bizer of the Free University of Berlin and his colleagues.

The principles of RDF are very simple:

Each RDF statement expresses a single simple relationship between two entities, forming a subject–predicate–object ‘triple’, for example “Chris Bizer created dbpedia”.

Each subject entity is identified by a unique URI defining either a Web resource or a member of an ontology class, for example

<http://www.w3.org/wiki/ChrisBizer>

Each object entity may be identified by a unique URI defining either a Web resource or a member of an ontology class, or alternatively may be identified as a literal value, signified as a character string presented in double quotation marks, for example
```
<http://dbpedia.org> or "Berlin"
```
Each predicate is specified by a unique URI defining a property within an ontology, either an object property or a data property. Object properties are those that take as their object a Web resource or a member of an ontology class defined by a URI, while data properties are those that take as their object a literal value. An example of an object property is
```
<http://purl.org/dc/terms/creator>
```
, defining the Dublin Core metadata term meaning “is creator of”.

Thus an RDF statement expressing the relationship “ Chris Bizer created dbpedia ” is:
```
<http://www.w3.org/wiki/ChrisBizer> 
       <http://purl.org/dc/terms/creator> <http://dbpedia.org> .
```
[Note: A brief introduction to the Turtle syntax used here to represent RDF is given in the second paper in this series, entitled Libraries and linked data #2: A Rough Guide to Turtle.]

This simplicity of course limits the sophistication of the things that can be said, but enables information to be encoded in a standardized, machine-readable manner.

In a second example, which uses ontology prefixes such as “rdf:” as abbreviations for full URIs, in this case for <http://www.w3.org/1999/02/22-rdf-syntax-ns#>, the following three triples form a small RDF graph (expressed in Turtle notation – see point 7 below) describing three facts about my journal article (doi:10.1186/2041-1480-1-S1-S6) entitled CiTO, the Citation Typing Ontology [1]. These three triples in turn define its nature, its author and its title:
```
 <http://dx.doi.org/10.1186/2041-1480-1-S1-S6>  
     rdf:type   fabio:JournalArticle  .
 <http://dx.doi.org/10.1186/2041-1480-1-S1-S6> 
     dc:creator "David Shotton"  .
 <http://dx.doi.org/10.1186/2041-1480-1-S1-S6> 
     dc:title "CiTO, the Citation Typing Ontology" .
```
Here, rdf:type is an object property, while dc:creator and dc:title are data properties, taking literal objects.
Ontology URIs reference publicly available and commonly accepted structured vocabularies (ontologies), in which the meanings of terms (such as ‘Journal Article’) are uniquely and unambiguously defined on the Web using unique URIs.
A group of RDF triple statements having subjects or objects in common constitute a labelled directed acyclic graph (DAG), in which the subjects and objects form the nodes (vertices) in the graph, and the properties form the links (edges).
An RDF graph can be written out (‘serialized’) in a series of simple machine-readable RDF statements. Various syntaxes exist for this purpose, including an XML syntax for RDF called RDF/XML, that is hard for humans to read, and Turtle, which is much easier for humans to read.

An explanation and example of encoding richer bibliographic data in RDF is given in the third paper in this series, Libraries and linked data #3: Encoding bibliographic records in RDF.

How to create and publish linked data?

Authoritative guidance on creating linked data is available in a series of blog posts by Jenny Tennison [2-6]. An excellent book describing the creation, publication and consumption of linked data is freely available on the Web [7].

One of the best ways to publish RDF triple statements is to put them in a Web-accessible ‘triple store’ – a database dedicated to storing RDF triples – that has a SPARQL endpoint, i.e. a human-readable interface that can be queried using SPARQL, the Semantic Web query language, and that also has an API permitting such querying to be done automatically from another computer.

An example of SPARQL

SPARQL is a query language for RDF triples, and stands in relation to them as SQL does to relational database data. For bibliographic information encoded in RDF, such as that exemplified in the third paper in this series, entitled Libraries and linked data #3: Encoding bibliographic records in RDF, the following SPARQL:

Select distinct ?paper ?doi ?pubmedid ?citation
where {
?paper fabio:hasPubMedId ?pubmedid ;

 prism:doi ?doi ;
dcterms:bibliographicCitation ?citation ;
dcterms:publisher [foaf:name ?publisher] .
filter regex(?publisher, "PubMed Central", "i") .
?paper prism:publicationDate ?date .
filter (?date ≥ 2010-01-01 && ?date ≤ 2010-12-31) .
}

expresses a typical query, which, in normal English, reads:

“Give me the PubMed ID, the DOI and the bibliographic citation for any paper that has PubMed Central as its publisher and was published in 2010.”

What are open linked data?

Open linked data are simply linked data that are published under a Creative Commons Zero (CC0) open data waiver, which essentially places the data in the public domain, free of legal and copyright restrictions, or a similar license, so that potential users are assured that they are free to re-use the data for any purpose.

The importance of explicitly stating the license under which linked data are published cannot be over-stated. This should be both in human-readable terms, with an Open Data label

that can be inserted into an HTLM page using the following statement:

<!-- Open Data Link -->
<a href="http://opendefinition.org/">
     <img alt="This material is Open Data" border="0"
     src="http://assets.okfn.org/images/ok_buttons/od_80x15_blue.png" />
</a>
<!-- /Open Data Link -->

and also in a machine-readable RDF license statement, thus:

 :this-dataset dcterms:license
     <http://creativecommons.org/publicdomain/zero/1.0/> .

It is equally important that a genuinely open data license such as CC0 is used. The Creative Commons Attribution License (CC-By), widely and appropriately used for licensing copyright documents and photographs, is inappropriate for published data and metadata, because its legal requirement to attribute the source of the data potentially leads to ‘attribution stacking’ when linked data from many different sources are integrated automatically.

Where data are reused in a manner that permits scholarly acknowledgement of the source(s), this of course should be done, following community norms, just as bibliographic citation references are made at the end of scholarly articles to previously published papers upon which new articles are based, without any legal license requirement that the author does this.

References

[2] Creating Linked Data – Part I: Analysing and Modelling. http://www.jenitennison.com/blog/node/135.

[3] Creating Linked Data – Part II: Defining URIs. http://www.jenitennison.com/blog/node/136.

[4] Creating Linked Data – Part III: Defining Concept Schemes. http://www.jenitennison.com/blog/node/137.

[5] Creating Linked Data – Part IV: Developing RDF Schemas. http://www.jenitennison.com/blog/node/138.

[6] Creating Linked Data – Part V: Finishing Touches. http://www.jenitennison.com/blog/node/139.

[7] Tom Heath and Chris Bizer (2011). Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. doi:10.2200/S00334ED1V01Y201102WBE001; ISBN: 9781608454303 (paperback); ISBN: 9781608454310 (ebook). Freely available on the Web in HTML at http://linkeddatabook.com/editions/1.0/.

Posted in Linked data, Metadata, Ontologies, Semantic Publishing | Tagged linked data, machine-readable metadata, ontologies, RDF, semantic publishing, standards | 7 Comments

Linked Data 101

Posted on March 1, 2013 by David Shotton

During a discussion with librarians towards the end of last year, I was asked why they should bother to publish their catalogues as open linked data, and how that might be done.

For those of us already part of the Semantic Web world, the answer may seem self-evident – by making data available in machine-readable form under open licenses using web standards, we permits them to be integrated seamlessly with similar data from elsewhere, and allow others to re-use these data in creative ways that we have probably never imagined.

As Tim Berners-Lee has said many times, for example in his 2009 TED address, “You do your bit, everybody else does their bit, and the data all connect, creating a power that is simply not available from hyperlinking documents.”

He gave a telling example dating from 2007. In the search for new drugs to treat Alzheimer Disease, a researcher may wish to knows the answer to the following biological question:

“What proteins are involved in cellular signal transduction, and are related to pyramidal neurons?”

A Google search on that question gave ~223,000 hits in 2007, none of which provided a specific answer. However, a search over the linked healthcare data, made in collaboration with the W3C Health Care and Life Science Interest Group of which I am a member, gave 32 responses, each one of which was the name of a specific protein involved both in signal transduction and related to pyramidal neurons.

[Today, in 2013, the same Google search gave me ~413,000 hits, the top one being to a 2007 presentation by Eric Prud’hommeaux of W3C, presenting details of that healthcare data search in greater detail!]

Such results would not be possible had not many different individuals and organisations published relevant linked data on sites such as DBpedia, the Allen Brain Atlas, DrugBank, Diseasome, National Drug Code, DailyMed, RXNorm, ChemSpider, chEBI. Those sites do not necessarily have a lot in common – one is about brain anatomy, another about disease-gene disorder relationships, another about standardized drug names, and so on. However, the fact that they all adopt common web standards to represent their data means that it is possible to search across all the data and find information that is relevant.

When I was first asked those questions by my librarian colleaguem over a glass of wine, I felt that I gave rather inadequate answers. These were genuine enquiries from someone outside the semantic web world that deserved a fuller response.

Since I too, as a cell biologist, was outside the semantic web world not so long ago, and have learned what little I now understand the hard way, I thought that perhaps I was in a good position to respond to the questions sympathetically, with a positive attempt to explain technical terms, rather than assume knowledge.

I thus started to write what I thought might be a helpful explanation for this librarian. Soon the document grew to unwieldy length, as I added more and more background information to support my central explanation! So I ended up splitting it into six shorter papers, under the overall title Libraries and linked data, each of which attempts to deal with just one facet of the problem.

These papers are intended to be read in sequence, the first five really just providing background for the sixth, which addresses the central question.

They are simple, perhaps even simplistic, and are not the kind of explanations one would get from a professional computer scientist. Nevertheless, in the hope that others unfamiliar with the semantic web may find them useful, I present them in the following six blog posts, under the titles:

Libraries and linked data #1: What are open linked data?
Libraries and linked data #2: A rough guide to Turtle
Libraries and linked data #3: Encoding bibliographic records in RDF
Libraries and linked data #4: A Comparison of RDF and XML
Libraries and linked data #5: Using the SPAR ontologies to publish bibliographic records
Libraries and linked data #6: Why publish library catalogues as open linked data?

Please let me know if you find them useful, by leaving a comment, clicking “Like” below the post, or e-mailing me at <[email protected]>. Thank you.

Ten next steps for semantic authors and publishers

Posted on February 26, 2013 by David Shotton

Note added 30 May 2013

Additional information concerning the use of OAI-ORE to specify aggregations of research outputs into data packages and research objects has been added at the end of this blog post.

– – –

Journal publishing has come some way in the last few years towards adopting the some of the semantic enhancements for scholarly journal articles described in [1] and illustrated in [2]. Publishers like of Pensoft Publishers are leading the way. However, there is much that could and should be done more widely.

So what are the most important things that publishers and their authors should be doing to enhance their journal articles? I have drafted the following suggestions, and now present them here to stimulate discussion within the community.

1 Develop semantic authorship of articles

Mark up important concepts with ontology terms and link them to external information sources such as chemical names and gene names with ontology, using tools like the Ontology Add-on for Word and the Chemistry Add-In for Word, or employing direct authoring platforms such as PLoS Currents and the Pensoft Writing Tool. Alternatively, semantic mark-up can be added by skilled editors after article submission, as routinely done by the Royal Society of Chemistry.

2 Use citation typing

Use citation typing to enable your authors to state why they are citing others’ work, using terms from CiTO, the Citation Typing Ontology, to describe why the author is citing each reference in the reference list. This can be enabled by employing the CiTO Reference Annotation Tools described in the previous blog post.

3 Publish machine-readable bibliographic metadata describing the article

Create machine-readable bibliographic metadata describing each article, specifying the authors, title and other bibliographic record details, for example by using an XML document mark-up language such as JATS, the Journal Article Tag Suite.

Also create RDF versions of these metadata using the SPAR (Semantic Publishing and Referencing) Ontologies together with other appropriate vocabularies, employing our JATS2RDF mapping to assist in this, described in [3].

Publish these RDF bibliographic metadata in three ways:

embedded in the online paper in RDFa,
in a supplementary RDF file, and/or
together with metadata describing other articles in a searchable database on the publisher’s web site.

The NPG Linked Data Platform provides a good example of the third way.

4 Encode the reference list of each article in machine-readable form

Encode the reference list of each article as machine-readable bibliographic metadata, and publish these in RDF, as for the bibliographic metadata (Point 3 above).

5 Open the reference lists of your articles

Open the reference lists of your articles, free from the copyright protection that covers the authored body text, so that they are freely available, even if your full-text articles are only available by subscription. Do this:

by changing the copyright and licensing arrangements for your articles, acknowledging that the citation data embedded in the reference lists are indeed non-copyrightable data that should be made freely available for the general benefit of the scholarly community,
by place the human-readable version of the article’s reference list outside the subscription firewall protecting the copyright text, and
by opening your article reference lists via CrossRef for harvesting and inclusion in the Open Citations Corpus, as detailed in my recent Open Letter to Publishers.

6 Use ORCIDs to identify authors and contributors, and encode their affiliations, funders, grants and geo-locations

Ensure the article metadata include ORCID personal IDs for all authors and contributors, if available, the details of their institutional affiliations, and also the names of funding bodies and the grant numbers of any funding that facilitated the research described in the article. Ensure that the geo-locations (longitude and latitude) of places mentioned in the text, such as field sites, are also recorded.

Publish this information both in human-readable text and in machine-readable form. This will facilitate author disambiguation, and will enable assignment of funder credit, evaluation of research grant outputs, and mapping of study sites, species distributions, etc.

7 Detail the contributions and roles of authors and other contributors

Adopt the recommendations of the Final Report of the Wellcome Trust / Harvard International Workshop on Contributorship and Scholarly Attribution, by providing human-readable text and machine-readable metadata detailing the authors’ roles and contributions to the research articles, and other’s roles and contributors to the preceding research investigations, to enable better attribution of credit.

Such metadata can be encoded in RDF easily, unambiguously and in a standardized machine-readable form, using SCoRO, the Scholarly Contributions and Roles Ontology. The Scholarly Contributions Report Form, SCoRF, is a simple Excel spreadsheet that makes such metadata creation easy, by encoding SCoRO ontology terms in drop-down lists. It is available here: http://purl.org/spar/scoro/scorf/, and an exemplar completed form is available here: http://purl.org/spar/scoro/scorf-example/.

8 Publish the research data underlying the results described in your articles

The research data underlying the results described in research articles should be published in appropriate open public databases or repositories under a CCZero data waiver, so that the data can be re-used without restriction, as recommended by the recent Royal Society report Science as an Open Enterprise. Adopt best practice for the citation of these datasets, by insisting that your authors include formal data references in their articles’ reference lists [4 – 6].

9 Publish a Structured Summary of each article

Publish a separate machine-readable Structured Summary of each article, to complement the Abstract. This should be a set of simple statements of the primary facts about the article, for example that it was a species abundance study, of a named species, undertaken at a particular place and over a specified time period, having stated numerical results. Such data, if published in machine-readable form, will enormously enhance automated attempts to cluster papers having certain criteria in common, as is necessary, for example, before attempting a systematic review.

The on-line report form for MIIRO, Minimal Information for an Investigation and Research Outputs, is a structured web form that facilitates the task of creating such Structured Summaries.

10 Score your articles against the Five Stars of Online Journal Articles

The Five Stars of Online Journal Articles is a constellation of five independent criteria concerning

peer review
open access
enriched content
available datasets
machine-readable metadata

described in a previous blog post, by which the quality of an online journal article may be evaluated to see how well it matches up with current aspirations for enhanced research communications, as detailed in [7].

Publish the Five Star Rating of each article alongside the article itself. Authors and publishers whose articles fulfil criteria 1-9 above will score well in the Five Star evaluation, and will be in the forefront of advances in scholarly publishing.

– – –

Note added 30 May 2013

The discussion above concerns various research outputs, including journal articles, datasets and structured summaries. Other research output that may be required to provide full understanding and reproducibility of a particular research investigation include descriptions of the methods, protocols and workflows involved in producing and analysing the data used or produced, provenance information about the experiments and datasets, details concerning the people involved in the investigation, and additional annotations about these resources that assist in interpretation of the scientific outcomes.

The Open Archives Initiative’s Object Reuse and Exchange metadata model (OAI-ORE; http://www.openarchives.org/ore/) defines a data model and a number of serializations (RDF, Atom and RDFa) for the description and exchange of aggregations of Web resources, and can be used to specify aggregations of such research outputs.

For example, the following RDF statements specify that a simple data package in the Dryad data repository is an aggregation of a single Excel data file and the Dryad web page that provides metadata for that data file:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ore: <http://www.openarchives.org/ore/terms/>.

<http://datadryad.org/handle/10255/dryad.8684> # Data package in the Dryad repository
     a ore:Aggregation ;
     ore:aggregates <http://datadryad.org/handle/10255/dryad.8685> ,
<http://datadryad.org/bitstream/handle/10255/dryad.8685/body%20size%20data%20%28dry%20weight%2c%20wing%20area%2c%20Cell%20size%20and%20cell%20number%29.xls> .

Research Objects (http://www.researchobject.org/) are specific OAI-ORE aggregations of such research outputs, packaged for transmission in a particular manner, that are designed to facilitate the sharing and reuse of these research outputs, and to permit the better understanding and reproducibility of the scientific experiments to which they relate, as detailed in [8].

References

[1] Shotton D, Portwin K, Klyne G and Miles A (2009). Adventures in semantic publishing: exemplar semantic enhancement of a research article. PLoS Computational Biology 5: e1000361. http://dx.doi.org/10.1371/journal.pcbi.1000361.

[2] Our enhanced version of the Reis et al. (2008) paper:

Reis RB, Ribeiro GS, Felzemburgh RDM, Santana FS, Mohr S et al. (2008). Impact of environment and social gradient on Leptospira infection in urban slums PLoS Neglected Tropical Diseases 2: e228.

is available at http://dx.doi.org/10.1371/journal.pntd.0000228.x001.

[3] Peroni S, Lapeyre DA and Shotton D (2012). From Markup to Linked Data: Mapping NISO JATS v1.0 to RDF using the SPAR (Semantic Publishing and Referencing) Ontologies. Proc. 2012 JATS Conference, National Library of Medicine, Bethesda, Maryland, USA, 16-17 October 2012. http://www.ncbi.nlm.nih.gov/books/NBK100491/.

[4] Goodman L, Lawrence R and Ashley K (2012). Data-set visibility: Cite links to data in reference lists. Nature 492: 356. http://dx.doi.org/10.1038/492356d.

[5] Borgman CL (2012). Why Are the Attribution and Citation of Scientific Data Important? In For Attribution – Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop (pp. 1–10). The National Academies Press, Washington, D.C. Retrieved from http://www.nap.edu/catalog.php?record_id=13564.

[6] National Academies of Science. US CODATA and the Board on Research Data and Information, in collaboration with CODATA-ICSTI Task Group on Data Citation Standards and Practices. (2012). Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop. Washington, DC. Retrieved from http://sites.nationalacademies.org/PGA/brdi/PGA_064019.

[7] Shotton D (2012). The Five Stars of Online Journal Articles — a framework for article evaluation. D-Lib Magazine 18 (1/2). http://dx.doi.org/10.1045/january2012-shotton.

[8] Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, Couch P, Cruickshank D, Delderfield M, Dunlop I, Gamble M, Michaelides D, Owen S, Newman D, Sufi S and Goble, C. (2011). Why linked data is not enough for scientists. Future Generation Computer Systems. 29 (Issue 2, February 2013): 599–611. http://dx.doi.org/10.1016/j.future.2011.08.004.

Posted in Data publication, Metadata, Ontologies, Open Citations, Semantic Publishing | Tagged citation data, data citation, datacite, five stars, Journal articles, linked data, machine-readable metadata, ontologies, open access, peer review, PensoftJournals, RDF, references, semantic publishing, spar, tool | 5 Comments

CiTO Reference Annotation Tools

Posted on February 26, 2013 by David Shotton

The characterization or ‘typing’ of bibliographic citations is the provision of annotations that detail the reason or reasons why the author of the citing paper cites a particular cited paper listed in the citing paper’s reference list.

Such annotation is facilitated by the use of CiTO, the Citations Typing Ontology, that provides a vocabulary of 39 properties, including the basic property cito:cites, to characterise the nature of such citations. The ontology also contains the inverse properties, that can be used reciprocally to explain why a paper is cited by another. Further details about CiTO are given in [1].

However, until recently, with the exception of Martin Fenner’s CiTO plugin for WordPress, previously described in this blog post, and Egon Willighagen’s use of CiTO in CiteULike, previously described in this other blog post, no good tools have previously been developed to ease the task of creating such annotations.

Thanks to the creativity of Tanya Gray, that situation has now changed. Recently, with funding from the JISC Open Citations Extension Project, Tanya has been working with me to build two tools that permit the creation of CiTO-defined annotations of references in article reference lists. These tools enable the author, or a reader of the article, to provide answers to the question “Why does this article cite that reference?”

The CiTO JavaScript Reference Annotation Tool

The first tool, which works for any modern Web browser, is the CiTO JavaScript Reference Annotation Tool. This inserts a CiTO Properties choice box after every reference in an article’s reference list, thereby permitting users to choose the CiTO properties that best explain why the Citing Article cites the Cited Article.

Five exemplar journal articles have been enhanced by having the functionality of the CiTO JavaScript Reference Annotation Tool embedded in them, from the following journals/sources:

PLOS Currents
eLife
PubMed Central
ZooKeys
Our semantically enhanced version of an article by Reis et al. in PLoS Neglected Tropical Diseases, doi:10.1371/journal.pntd.0000228.x001.

Details of how to view these exemplar articles are given in the documentation file.

Functionality

After each reference in the reference list of each exemplar article, the user will see the following CiTO Annotation Box, presenting the eleven most common CiTO citation annotation properties:

Hovering with the mouse over one of these properties will cause its button to change to a light blue, and will cause a pop-up to appear, displaying the definition of this property drawn from the Citation Typing Ontology, as shown:

Clicking on one of the CiTO property buttons will cause its appearance to change from blue to green, to indicate that it has been selected. The property definition pop-up will remain visible for as long as the mouse continues to hover over the property, and the green colour will persist after the mouse has been moved away, as shown in the following figure:

The user is free to choose as many CiTO properties for any one reference as apply.

Re-clicking on a green button that has been selected will de-select that property, reverting the button appearance from green to grey (or light blue while still hovering over it).

If none of the eleven displayed CiTO property choices are appropriate, clicking the SHOW OTHER REASONS button will display the other 28 CiTO properties, as shown in the following figure, which can be selected in the same manner.

Clicking HIDE OTHER REASONS will hide these additional options, but will not negate any selections that have been made from among them.

The user may continue making choices for this and other references in the reference list, and may stop making citation annotation activity at any time.

How the citation annotations are saved

If the annotated article is saved with a different filename as an .html file in the same directory as the CiTO-Tools-enabled .html file of the original article, alongside the javascript directory containing the cito.js and cito.css files, these annotations will be saved with the article and will be visible when the annotated article is re-opened in a browser.

Additionally, every time a CiTO property is selected or deselected, that choice is recorded both locally and centrally in our CiTO Tools Annotations Database.

When a user clicks on a CiTO property that was previously unselected, a key-value pair is stored in the browser’s web storage facility, the key being set to a value created by concatenating the browser window’s URI and the unique identifier for the HTML that forms the CiTO property ‘button’ in the web page, and the selection value being set to ‘1’. Additionally, an AJAX request is sent that inserts a record into our CiTO Tools Annotations Database hosted at http://www.miidi.org with the following fields:

unique id for database record (auto-increment)
userid – unique opaque identifier for user
timestamp – when the action was taken
action = ‘add’
subject – URI for the citing journal article
predicate – URI for the CiTO property
object – URI for the cited journal article (or citation text parsed from the reference, if URI not available)

The reference in the reference list that is the object of this annotation, and the cited paper that is referenced, are both defined by the last property.

Format

UniqueOpaqueUserID|DateTime|operation|CitingPaper|CiTOProperty|CitedPaper

Record example

|294|KDYXFJ4IM2RIAUBYRYUWPWO37BLNSD|Fri, 04 Jan 2013 17:25:34 GMT|add|<http://dx.doi.org/10.1093/nar/gkp850>|<http://purl.org/spar/cito/obtainsBackgroundFrom>|<http://dx.doi.org/10.1128/IAI.00105-07>

The last three items in each record are easily transformed into an RDF triple (in Turtle format):

<http://dx.doi.org/10.1093/nar/gkp850> ;
     <http://purl.org/spar/cito/obtainsBackgroundFrom> ;
     <http://dx.doi.org/10.1128/IAI.00105-07> .

When a user clicks on a CiTO property that was previously selected, in order to de-select it, exactly the same things happen, except that the local selection value is set to “0” and the database action is set to “remove”

Obtaining the CiTO annotation data

The CiTO Tools Annotations Database accumulates and stores all the CiTO annotation choices made by all users of this JavaScript Reference Annotation Tool and also of the CiTO Chrome Extension described below, for all annotated papers. These data are openly available, and can be obtained by visiting http://www.miidi.org/metaquery/cito and downloading the text file called cito containing the accumulated annotation records in the format given above.

Using the CiTO annotation data

It is anticipated that aggregation of citation annotations in this way may open the way to crowd-sourcing of CiTO citation typing, although it is recognized that the only person who can authoritatively say why a reference has been cited is the author who created it. For this reason, provenance of these CiTO annotations will be crucial. Ideally the ability for authors to annotate reference lists at the time of creating the article will become a feature of on-line authoring tools such as PLoS Currents and the Pensoft Writing Tool.

Implementing this functionality as part of normal article publication

To achieve the CiTO JavaScript Reference Annotation Tool functionality for journal articles as part of the normal publishing pipeline, three lines of JavaScript code needs to be inserted into each article before it is published, and the appropriate cito.js and cito.css files need to be present in a javascript directory alongside the directory containing the .html file for the article, as explained in the documentation.

Adaption of this CiTO JavaScript Annotation Tool for articles in other journals that use a different DTD, or that use a different method of mapping the NLM-DTD v3.0 to HTML than that used by PubMed Central, requires the addition of two new functions to the cito.js file, one to identify the HTML for the reference list, and another to extract a DOI, a URI or a textual bibliographic citation for the reference, that would then be used as the object of the bibliographic citation. The code is not complicated, but its modification requires someone with an understanding of JavaScript.

Since all our code is available under an open license from our GitHub Repository, this puts the implementation of such citation typing functionality within the grasp of every publisher that wishes to implement it.

The CiTO Chrome Extension

The second CiTO Tool, closely related in functionality to the first, is the CiTO Chrome Extension, that inserts additional code after each reference in HTML-format articles from PubMed Central, eLife and PLoS Currents without the need to modify each article individually.

This extension works only for the Chrome Browser, and is available free from the Chrome WebStore here. The software for this Chrome extension is also stored in the chrome-extension folder of the CiTO GitHub Repository.

The following figure shows a screen shot of the tool as it appears in the Chrome WebStore. The included image shows the user functionality, resembling that of the JavaScript examples shown above.

We welcome the involvement of developers in the community who would be interested working with our code to create and maintain similar extensions / add-ons / plug-ins for other browsers.

Reference

[1] Peroni S and Shotton D (2012). FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web. 17: 33-34. doi:10.1016/j.websem.2012.08.001.

Posted in JISC, Open Citations, Semantic Publishing | Tagged Chrome, citation typing, cito, CiTO Tools, JavaScript, reference annotation | 6 Comments

Semantic Publishing

Libraries and linked data #6: Why publish library catalogues as open linked data?

Libraries and linked data #5: Using the SPAR ontologies to publish bibliographic records

Libraries and linked data #4: A Comparison of RDF and XML

Libraries and linked data #3: Encoding bibliographic records in RDF

Libraries and linked data #2: A rough guide to Turtle

Libraries and linked data #1: What are linked data?

Linked Data 101

Ten next steps for semantic authors and publishers

CiTO Reference Annotation Tools

Recent Posts

Archives

Categories

Meta