Uncategorized – OpenCitations blog

OpenCitations’ renewed compliance with the Principles of Open Scholarly Infrastructure (April 2026)

OpenCitations has formally adopted the Principles of Open Scholarly Infrastructure (POSI) since its first self-assessment in August 2021. At that time, we had only recently been included in the SCOSS funding round and, although we had a clear vision of what we wanted to build and the role we wanted to play within the Open Science ecosystem, both our financial and human resources were still very limited. For this reason, the POSI self-assessment proved to be an important exercise, since it allowed us to critically reflect on both the strengths we could build upon and the weaknesses we needed to address as we developed OpenCitations into a sustainable, community-governed Open Science infrastructure.

That first assessment highlighted some key areas of attention. In particular, it highlighted the limited executive power of the community bodies involved in governance, as well as challenges related to long-term financial sustainability, particularly the ability to generate and manage financial surpluses.

Between 2021 and 2026, OpenCitations has evolved significantly, becoming a more complex infrastructure by expanding both the services it offers and the number of data sources it integrates. During this period, we have also expanded our team, worked on the development of a container-based infrastructure, and relaunched our governance framework.

These developments made it necessary to update our POSI self-assessment. This update is also timely, given the recent release of POSI Version 2 in October 2025. The new version of the principles is the result of a collective effort by POSI adopters and includes several clarifications to existing principles (for example, around the concept of lobbying and the distinction between transparent governance and transparent operations). In addition, new principles have been introduced to better reflect how scholarly infrastructures operate within the broader research ecosystem.

We are therefore happy to share our updated POSI self-assessment, which provides a comprehensive snapshot of OpenCitations as of April 2026. At the same time, it represents a renewed declaration of our commitment to the POSI principles.

Disclaimer. As with many early POSI adopters, our previous assessment used a traffic-light system to indicate the level of alignment with each principle. In this system:

Green indicates that the principle is fully met and evidenced in practice.

Yellow indicates that the principle is partially met, with active steps underway to achieve full alignment.

Red indicates that the principle is not currently met, or that compliance is not feasible.

During the POSI Version 2 working group discussions, it emerged that the use of this traffic-light system is not mandatory for POSI self-assessments. Nevertheless, we have decided to maintain this approach, as we believe it remains particularly effective as a communication tool and, at the same time, ensures continuity with our previous assessment. We have, however, taken the liberty of adding one more element: a symbol to indicate where we were already “green” in 2021, but have further improved our performance.

The symbol is therefore a “plus” placed next to the green traffic light.

Coverage across the scholarly enterprise

OpenCitations demonstrates broad coverage by collecting citation data from global scholarship, ensuring representation across disciplines, geographic areas, and research communities. The scope of OpenCitations’ coverage is universal, not limited to a particular scholarly domain, nor to the English language, nor restricted by imposed acceptance criteria.

Stakeholder governed

Governance is structured to reflect the stakeholder community, with the International Advisory Board elected by the Council of members and responsible for approving the Trustee Network. For more information: https://opencitations.net/governance/

Non-discriminatory participation or membership

Membership is open to all individuals and organisations that support Open Science principles, enabling inclusive and non-discriminatory participation in line with OpenCitations’ founding values.

Transparent governance

OpenCitations maintains a high level of transparency by publicly documenting its organisational structure, governance processes, and decision-making procedures. Reports and relevant documentation are shared with the community, including annual reports and the Rules of Membership and Organizational Bodies.

Cannot lobby

OpenCitations does not engage in political or financial lobbying activities. Its role remains focused on supporting the scholarly community without pursuing regulatory changes that would advantage its own position.

Living will

A clear long-term stewardship strategy is in place, ensuring that data, services, and infrastructure can be responsibly transferred if needed. This is supported by a recently implemented fully replicable technical infrastructure and a new governance model designed to facilitate handover thanks to the presence of the Trustee Network, which is capable of appointing new hosting members to physically and administratively host the infrastructure.

Regular review of purpose and community value

OpenCitations has recently monitored its relevance and community value through mechanisms such as community surveys. In addition, it has established governance and technical strategies to support a responsible wind-down process, including, if necessary, the transfer of assets and infrastructure through its Trustee Network. Additionally, the Trustee Network is responsible for regularly monitoring OpenCitations’ adherence to its mission and values, as well as its annual activities and finances.

Transparent operations

OpenCitations ensures a high level of operational transparency by openly providing key documentation, including financial reports, a public roadmap, a mission statement and value proposition, a sustainability model and fee structure, and privacy policies.

Time-limited funds used only for time-limited activities

Grant income is restricted to funding specific, time-limited projects, including the appointment of personnel working on them, while core operations are not dependent on such funding.

Goal to generate surplus

Thanks to memberships and donations, as well as in-kind support from the University of Bologna, OpenCitations has recently achieved the financial capacity to ensure stability until 2029, at least, in terms of funds allocated for staff salaries and technical operational expenses.

Establish and maintain financial reserves guided by policy

Although OpenCitations has reached a level of financial stability that ensures a budget surplus, there is currently no formal financial policy defining the amount of reserves to be allocated for a transition or wind-down plan, or to address exceptional or unforeseen events. However, OpenCitations has already initiated the necessary consultations to develop a Financial Reserve Policy, which will not only define the level and management of reserves but also provide a clear framework for handling revenues. In addition, it will establish and formalise procedures for approving both budget forecasts and actual expenditures.

Mission-consistent revenue generation

Revenue generation is aligned with the organisation’s mission (in particular, the value proposition according to which “external financial support is required from the stakeholder community to support OpenCitations and enable it to expand its delivery of high-quality comprehensive open bibliographic and citation metadata”), primarily through community funding via membership fees and donations. OpenCitations‘ members are listed on the website: https://opencitations.net/members-and-donors/

Revenue generated from services, not data

OpenCitations charges no fees for any of its services, access to its data, or reuse of its software. OpenCitations members, donors and third parties all have equal free access.

Volunteer labour

OpenCitations’ core operations are carried out by paid staff, ensuring that the continuity and reliability of its services do not depend on volunteer labour. Nevertheless, OpenCitations recognises the value of voluntary labour. Indeed, the document Rules of Membership and Organizational Structure specifies that membership of the International Advisory Board is honorary and without remuneration, and that the Hosting Entity will reimburse all reasonable expenses related to travel, accommodation, and meals incurred in attending meetings, within the limits of the budget allocated for such purposes. While this is already stated in the governance framework, it is important to reiterate and further formalise it within the Financial Policy document currently under development. More broadly, the OpenCitations Mission Statement emphasises the importance of community engagement (voluntary and non-remunerated) through, for example, the involvement of community actors in the direct provision and curation of OpenCitations data, as well as the broader role of the community in supporting funding and participating in governance.

Transition planning

Transition planning is only partially developed. While the governance structure is in place, ensuring a management handover through the possibility of changing the hosting member upon approval by the Trustee Network, there is a lack of detailed operational documentation for individual roles within the management and technical teams, which may limit the organisation’s ability to ensure immediate continuity in the event of key personnel changes.

Open source

All OpenCitations software is released under open source licences, ensuring full transparency, reusability, and the possibility for the community to inspect, modify, and replicate the infrastructure.

Open data

To ensure the greatest possible reusability, all OpenCitations data is published under a Creative Commons CC0 Public Domain Waiver that permits downloading and re-use of any nature, including added-value re-purposing and commercial exploitation.

Available data

Data are provided through multiple access points, including REST APIs, SPARQL endpoints, query interfaces, and downloadable data dumps. This ensures broad accessibility and supports diverse use cases across the community.

Patent non-assertion

OpenCitations commits not to pursue patents, ensuring its infrastructure remains fully open and replicable.

Prioritise interoperability and open standards to ensure continuity and resilience

The infrastructure has been redesigned as a container-based solution, thereby facilitating replication, deployment across different environments, and long-term service continuity.

Save the dates: OpenCitations October events

With the numerous September events in which the OpenCitations’ directors have recently been involved behind us, it is now time to announce the participation of our director Silvio Peroni in two October events.

On Wednesday 6th, Silvio will take part in the Beilstein OpenScience Symposium 2021 (October 5-7), giving a short presentation “Open Citations, an Open Infrastructure to Provide Access to Global Scholarly Bibliographic and Citation Data” during the Poster Flash Talk Presentation (17:10-18:00 CEST). The Beilstein OpenScience Symposium is an annual event that gathers leaders in the FAIR and Open Data movement, covering a wide range of research fields, including biomedical research, physics and social science, and exploring how open data practices are transforming sectors outside academia. The 2021 online edition will present a series of talks addressing the many ways that data transparency contributes to the research progress. Among them, the poster presentations involve short oral presentations on Wednesday, 6th October, to accompany the posters that will be displayed throughout the entire symposium. Poster abstracts are available in the Abstract Book that can be downloaded on Beilstein Symposium’s website: https://www.beilstein-institut.de/en/symposia/open-science/program/ .

You can register for the event here: https://www.beilstein-institut.de/en/symposia/open-science/registration/

The poster and slides from the presentation are available on Zenodo:

Peroni, S. (2021, October 6). Open Citations, an Open Infrastructure to Provide Access to Global Scholarly Bibliographic and Citation Data—Poster Flash Talk Session slides. Beilstein Open Science Symposium 2021, Virtual Event. Zenodo. https://doi.org/10.5281/zenodo.5553025

Peroni, S., Shotton, D. W., & Di Giambattista, C. (2021). Open Citations, an Open Infrastructure to Provide Access to Global Scholarly Bibliographic and Citation Data. https://doi.org/10.5281/zenodo.5553040

The second event is the annual European Computer Science Summit (ECSS), organized by Informatics Europe and involving academics, industry leaders, decision makers and others interested in Informatics/Computer Science research and education in Europe. ECSS 2021, “Informatics for a Sustainable Future” (Oct. 25-27), will be held as a hybrid event, involving both online as well as on-site sessions held in Madrid, at the Facultad de Ciencias de la Actividad Física y del Deporte (INEF), Universidad Politécnica de Madrid located at Calle de Martín Fierro, 7.

During the last day of the meeting (Oct. 27), Silvio Peroni will be speaking at the “National Informatics Associations Workshop”, an annual workshop organised by Informatics Europe in collaboration with the National Informatics Associations in Europe. This year the workshop will address the themes Informatics in Interdisciplinary Curricula and Research Evaluation in Informatics, thus focusing on an important question: how to recognise, assess and credit research contributions specific to Informatics, such as conference publications and software artefacts. Elaborating on this, Silvio’s talk (to be delivered in person, rather than online!) is entitled “Open citations in Informatics” (9:00 CEST).

For further information and registrations: https://www.informatics-europe.org/ecss/registration/how-to-register.html

We thank Beilstein Institute and Informatics Europe for involving OpenCitations in these international events, which provide opportunities to promote the OpenCitations infrastructure and services in stimulating environments.

We hope to see you there!

About engagement and evanescence: OpenCitations at the Open Science Fair 2021

Community, governance, and shared goals: these are the key concepts that you would have heard discussed, had you listened in on September 21 at the Open Science Fair 2021 to the sessions entitled “ScholeXplorer and OpenCitations as the new frontier of open citation indexing” and “The perils of being invisible. Collective funding models for Open Science infrastructure”, in which OpenCitations’ director Silvio Peroni discussed with other expert speakers the various goals and challenges of Open Science Infrastructures.

Open Science Fair is an initiative of OpenAIRE, co-organized by the following key international initiatives in the area of Open Science: COAR, EIFL, Force11, LA Referencia, LIBER, OPERAS, Sparc and Sparc Europe. A biannual event, it brings together “perspectives from different actors” and “suggests ways in which communities can work together to produce roadmaps for the implementation of Open Science”. The 2021 online event saw the participation of distinguished speakers in keynote talks, roundtable discussions, workshops and training sessions. At OpenCitations, we are grateful and proud to have been invited to participate in this collaborative and inspiring virtual fair.

The Lightning Talk “ScholeXplorer and OpenCitations as the new frontier of open citation indexing” was a presentation by Silvio Peroni, co-authored with Paolo Manghi (OpenAire), Alessia Bardi (CNR-ISTI), and Sandro La Bruzzo (CNR-ISTI), describing two of the 14 open data services included in the MONITOR portfolio of the OpenAIRE-Nexus project that are used to monitor Open Science, research impact and open citations – the domains of interest of the two services presented. ScholeXplorer provides access under a CC-BY license to a huge set of links between datasets and scholarly publications, while OpenCitations, as an open infrastructure for open bibliographic metadata and citations, grants access to more than 1.18 billion DOI-to-DOI citation links, released with a CC0 public domain waiver. Both services also provide tools that allow users to query their data, with the common final goal of providing open links between scholarly resources and making them available within the OpenAire Research Graph.

As Silvio Peroni recalled, last July this article by Ian Hutchins:

B. Ian Hutchins; A tipping point for open citation data. Quantitative Science Studies 2021; 2 (2): 433–437. doi: https://doi.org/10.1162/qss_c_00138

celebrated passing the threshold of one billion open citations within public-domain databases in February 2021. However, the evanescence of “missing” or untraceable scholarly data, discussed in this post, can affect Open Science Infrastructures (OSI) as a whole, as highlighted during the Workshop “The perils of being invisible. Collective funding models for Open Science infrastructure”.

This workshop involved Agata Morka and Vanessa Proudman from SCOSS, and representatives of the three Open Science Infrastructures involved in the Second SCOSS Funding Cycle, namely Silvio Peroni (OpenCitations), Niels Stern (DOAB/OAPEN), and James MacGregor (PKP). The starting topic of the Workshop, in which the funder perspective was provided Jean-Francois Lutz (University of Lorraine), was the paradox: although the scholarly community increasingly relies on Open Science Infrastructures, it seems very difficult to get them “to realize that there are some operational and development costs related to their existence”. In this discouraging environment, SCOSS plays a crucial role in promoting such services to potential supporters. The most recent SCOSS annual survey, presented by Jon Treadway (Great North Wood Consulting) during the workshop and involving 217 institutions from across the world, helps us understand the impact and external perception of SCOSS on a global scale. It revealed a correspondence between the geographical areas in which SCOSS is best known and the locations of the major supporters, which perhaps should not surprise us. More importantly, it found that, in addition to interoperability and global significance, community governance was one of the most important aspects of the infrastructures that institutions chose to invest in. In particular, the chance to be directly involved in governance was found to act as a great incentive for university libraries to invest in external infrastructures. Community governance is a core aspect of the Principles for Open Scholarly Infrastructures and, as underlined in this blog post, OpenCitations is working hard to achieve full compliance.

Obstacles to offering support were found to include administrative barriers, although these could be reduced if individual supporters joined consortia. Overall, however, such administration hurdles were minor issues. More significantly, some institutions were unable to pledge funds in the short term, and there was seen to be a need to raise awareness of the necessity of generating cultural change around the issue of community support. As Vanessa Proudman underlined, it is crucial to create dialogue with institutions and libraries, in order to foster the understanding that supporting Open Science Infrastructures could be beneficial for both students and researchers, and a realization that the minimum annual contribution requested by SCOSS from any one libraryto support such an infrastructure was less than a single article processing charge (APC), trivial in comparison with the huge annual institutional subscriptions for access to journals and citation indexes.

In a separate SCOSS interview, Marin Dacos (Open Science Advisor to the Director-General for Research and Innovation at the French Ministry of Higher Education, Research and Innovation) defined Open Science Infrastructures as “a common good of the scholarly community”, distinctly different from the services of commercial publishers. An essential value of an Open Scholarly Infrastructure is its community engagement and, from this perspective, the community funding approach promoted by SCOSS is the most efficient. As Silvio Peroni said, “What after SCOSS? SCOSS helps in creating a community around the service, and this community remains even after the three-year period of SCOSS promotion is over”.

Maybe, by fostering this growing global network of institutions and services committed to promoting a more open research, Open Scholarly Infrastrucutres can find an answer to the question “How can we not be invisible?” and achieve not just ‘sustainable’ but sustained funding from the academic community.

Wikipedia Citations in Wikidata

Xosema, CC BY-SA 4.0, via Wikimedia Commons

The interconnection between Wikipedia and Wikidata is now larger than ever.

The Wikipedia Citations dataset currently includes around 30M citations from Wikipedia pages to a variety of sources – of which 4M are to scientific publication. The increase of the connection with external data services and the provision of structured data to one of the key elements of Wikipedia articles has two significant benefits: first of all, a better discoverability of relevant encyclopedic articles related to scholarly studies; furthermore, the enacting of Wikipedia as a social authority and policy hub which would enable policymakers to assess the importance of an article, person, research group and institution by looking at how many Wikipedia articles cite them.

These are the motivations behind the “Wikipedia Citations in Wikidata” project, supported by a grant from the WikiCite Initiative. From January 2021 until the end of April, the team of Silvio Peroni (director of OpenCitations), Giovanni Colavizza, Marilena Daquino, Gabriele Pisciotta and Simone Persiani from the University of Bologna (Department of Classical Philology and Italian Studies) has been working in developing a codebase to enrich Wikidata with citations to scholarly publications that are currently referenced in English Wikipedia. This codebase consists of four software modules in Python and integrates new components (a classifier to distinguish citations by cited source and a look-up module to equip citations with identifiers from Crossref or other APIs). In so doing, Wikipedia Citations extends upon prior work which only focused on citations already equipped with identifiers.

In the first two steps of the workflow (extractor and converter) the mapping between the various ways Wikipedia citations are represented in Wikipedia articles and the OpenCitations Data Model (OCDM) has been implemented and then enriched with a component responsible to find new identifiers to the entities in a dataset compliant with OCDM (enricher), while in the pusher step the mapping between the OCDM and Wikidata has been enabled, and the code has been finally released in GitHub.

The extensive documentation that has accompanied the release of the codebase is crucial for one of the principal aims of the project, I.e., the adoption and reuse of the codebase by the community in other relevant Wikimedia projects, while the engagement of various communities (Wikidata, libraries, scholars…) is favored on one side by offering an increased number of citations data included in Wikidata, on the other side by blogging and sharing the updates on Twitter and public mailing lists

This project, whose ambitious purpose is to make Wikipedia contents better discoverable and enrich Wikidata with a ready-to-use corpus for further analysis or for developing new services, is opened to future perspectives. The intention is to use the software to create a dataset of Wikipedia English citations to understand, in particular, how many new entities (i.e., citing Wikipedia pages, cited articles and venues, authors) should be added to Wikidata in order to upload all the set of extracted citations, with the result of adding a massive amount of new bibliographic-related entities to the dataset.

The first steps have been taken, now we aim to extend the engagement of the community involved, especially those scholars that leverage Wikidata in existing services, and to interact with the scholars, libraries and institutions interested in a new approach to research, focused on people (from individuals to research groups) and their intellectual relevance.

This post was first published on Diff, a Wikimedia community blog.

OpenCitations at LIBER Annual Conference 2021: ‘How Can Open Infrastructures Support the Role of Research Libraries?’

For the second year, OpenCitations has taken part in the LIBER annual conference. LIBER (Ligue des Bibliothèques Européennes de Recherche – Association of European Research Libraries) is a network that gathers 440 research libraries, based in more than 40 countries all over the world, with the mission of supporting Europe’s research libraries by highlighting their value to policymakers, providing resources and training, and forming valuable partnerships.

Since 1951, the LIBER Annual Conference is a key event for the entire network, a keenly anticipated meeting for research library professionals whose mission is “to identify the most pressing needs for research libraries, and to share information and ideas for addressing those needs”. Due to the ongoing pandemic restrictions, the 50^th LIBER meeting (23-25 June 2021) was held online, as was the 2020 meeting, with digital co-hosting by the University of Belgrade Library in Serbia. The online-showcase format, however, didn’t constrain the creation of a vital virtual square, fostered by the voices of 70 speakers. The main theme of the conference, “Libraries and Open Knowledge: from vision to implementation” was deepened in 12 parallel sessions.

Professor Silvio Peroni, Director of OpenCitations, participated in Session #5 ‘How Can Open Infrastructures Support the Role of Research Libraries?’ with a presentation dedicated to the benefits of Open Infrastructures for libraries, dialoguing with James MacGregor (interim Managing Director of the Public Knowledge Project), Joanna Ball (Head of Roskilde University Library), and Niels Stern (director of OAPEN and co-Director of DOAB).

The session, chaired by Maaike Napolitano (National Library of the Netherlands) opened with a presentation by Fidan Limani (Research assistant at ZBW– Leibniz Information Centre for Economics) about the integration of scholarly artifacts from the domain of economics using Knowledge Graphs (KG), and the creation of a network of entities describing objects of interest and connections, while keeping a library perspective. The use of citation links connecting datasets and citations, and the adoption of ontologies and data exportation in RDF would facilitate a possible beneficial collaboration between ZBW and Open Infrastructures such as OpenCitations (whose data is itself in the form of a Knowledge Graph).

OpenCitations also shares some common features with the other Open Infrastructures described in the second presentation: the financial support from SCOSS project; the community-based approach; and their promising value for libraries and the entire scholarly community.

OpenCitations is an independent not-for-profit infrastructure organization dedicated to open scholarship and the publication of open bibliographic and citation data by the use of Semantic Web (Linked Open Data) technologies, engaged in advocacy for open citations and open bibliographic metadata, as a founding member of both the Initiative for Open Citations (I4OC) and the Initiative for Open Abstracts (I4OA). It provides data containing more than 7 hundred million citations that the community can use for any purpose. Such data can be crucial as a vehicle for use in national and international research evaluation exercises to make such activities more transparent and reproducible as compared to other proprietary services. Librarians can use OC citation data (e.g., via our REST API) to enhance or develop tools to support their authors, researchers, students, institutional administrators in different kind of contests, for instance by providing metrics to monitor research at your institution and by improving the discoverability of research products such as publications and data.

OAPEN is a no-profit foundation dedicated to increase the discoverability of open access books and trust around them. They are running three Open-Source platforms enabling open access to books: the Directory of Open Access Books (DOAB) – a freely available basic indexing service easy integrable within library catalogues; OAPEN Library – a publication platform dedicated to hosting, preserving and distributing books; OAPEN OA Books Toolkit – public information resource for authors to build trust around open-access books.

PKP (Public Knowledge Project) is a software and library project, consisting of three applications (Open Journal System, Open Pre-printer System and Open Monograph Press).

The dialogue during this LIBER session wasn’t a mere presentation of these projects and their technical properties: the speakers emphasized the importance of ensuring the participation and the engagement of the stakeholder community, pointed out the crucial value of the support received – not only financial – from Research Libraries, and discussed how such Open Infrastructures can be beneficial for libraries.

How can libraries support Open Infrastructures? And what role do they play in a long-term solution? According to Joanna Ball, from a librarian perspective, it’s not only a who-benefits-whom problem, but it’s more about finding a “third way, about developing mutually beneficial partnerships, and going beyond the traditional way of approaching things so that we can really play to each other’s strengths.”

This approach is fully aligned with OpenCitations’ intentions. As Silvio Peroni underlined, in most of cases the active collaboration between Open Infrastructures and libraries is not only about the financial support, but in cooperatively reach a common goal. In particular, “if infrastructures like OpenCitations provide appropriate and easy-to-use interfaces and tools that allow librarians to contribute appropriate bibliographic metadata, and if librarians are willing to enter such metadata from their own records, libraries may become a significant reliable source of this kind of information”. The result of such a ‘crowd-sourced’ entry of bibliographic metadata by libraries would be an enrichment of the overall global open knowledge graph made available through citational links.

In the last presentation, dedicated to two services provided by OPERAS, Emilie Blotière, (CNRS) and Tiziana Lombardo (Net7) reiterated the value of scholarly communication. COESO and GO TRIPLE, funded by the European Commission, aim in fact to create a persistent dialogue in the Social Sciences and Humanities community, by tackling the fragmentation and becoming a meeting point among different communities.

What emerged from the session is the importance of communication, cooperation and networking between Open Infrastructures and Libraries, and this is a message that perfectly matches with the core values of LIBER, collaboration and inclusivity. The next LIBER annual conference is scheduled for June 2022 in Odense, hopefully recreating the physical and enthusiastic gathering of the previous meetings.

You can find the recording of the full session here: LIBER 2021 Session #5: How Can Open Infrastructures Support the Role of Research Libraries?

You can find the slides of the session on Zenodo.

The Initiative for Open Abstracts is launched

OpenCitations is proud to be part of the launch of the Initiative for Open Abstracts, a new cross-publisher initiative calling for the unrestricted availability of abstracts to boost the discovery of research.

We’re thrilled to announce the launch of the Initiative for Open Abstracts, a new initiative that calls for unrestricted availability of abstracts to boost the discovery of research https://t.co/P0pnHVx7V4 #I4OA pic.twitter.com/nxdyZ23fJV
— Initiative for Open Abstracts (@open_abstracts) September 24, 2020

The Initiative for Open Abstracts (I4OA), launched on September 24th, calls on all scholarly publishers to open their abstracts, and specifically to deposit them with Crossref, in order to facilitate large-scale access and promote discovery of critical research.

Making abstracts openly available helps scholarly publishers to maximize the visibility and reach of their journals and books. Open abstracts make it easier for scholars to discover, read and then cite these publications; promotes their inclusion in systematic reviews; expands and simplifies the use of text mining, natural language processing and artificial intelligence techniques in bibliometric analyses; and facilitates scholarship across all disciplines by those without subscription access to commercial bibliographic services.

Many abstracts are already available in various bibliographic databases, but these sources have limitations, for example because they require a subscription, are not machine-accessible, or are restricted to a specific discipline. I4OA thus calls on all scholarly publishers using Crossref DOIs to make their abstracts openly available by depositing them with Crossref. This can be done as part of established workflows that publishers already have in place for submitting publication metadata to Crossref.

As detailed on the I4OA web site at https://i4oa.org, 40 publishers have already agreed to support I4OA and to make their abstracts openly available. I4OA is also supported by 56 other stakeholders including research funders, libraries and library associations, infrastructure providers, and open science organizations, demonstrating the importance and relevance of this Initiative to the scholarly community. The launch press release is available at https://i4oa.org/press.html#pressrelease.

I4OA was inspired by the success of the Initiative for Open Citations (I4OC, https://i4oc.org/), which encourages the submission of references to Crossref. Since the launch of I4OC in 2017, over two thousand scholarly publishers have chosen to make the reference lists of their journal articles and book chapters openly available through Crossref. I4OA aims to replicate the success of I4OC by achieving a rapid jump in the open availability of scholarly abstracts via Crossref.

Further information may be obtained from the I4OA web site at https://i4oa.org, from the I4OA poster at https://doi.org/10.5281/zenodo.4047454, by attending the free I4OA launch webinar on October 5th 2020 at 4 pm CEST (register at https://tinyurl.com/i4oa-webinar), by emailing Professor Ludo Waltman (CWTS, Leiden University; coordinator of I4OA) at [email protected], or by following @open_abstracts on Twitter.

OpenCitations described

OpenCitations is an infrastructure organization for open scholarship dedicated to the publication of open bibliographic and citation data. We at OpenCitations are proud to announce the publication, in the first issue of Quantitative Science Studies, of a canonical paper in which we introduce and describe OpenCitations and outline its achievements and goals [1].

Here, I outline the contents of our paper, and provide definitive links on the topics described. Many of these topics have been the subjects of earlier blog posts.

This paper appears in the first Special Issue of QSS, dedicated to the description of the bibliometric data sources that lie at the heart of scientometric research, which aims to characterize the most important data sources currently available and to show how they differ in various dimensions, for instance in the data they provide, their level of openness, and their support for making research reproducible. The first three papers in this special issue cover the most important commercial bibliographic data sources: Web of Science (Clarivate Analytics), Scopus (Elsevier), and Dimensions (Digital Science), while the remaining three articles describe open data sources: Microsoft Academic, Crossref and OpenCitations.

In the introduction to our own paper, we describe the origins of OpenCitations, discuss the growth and benefits of open science, and introduce the Semantic Web techniques used at OpenCitations for recording and publishing our data. We then go on to describe OpenCitations’ services and data, namely Open Citation Identifiers, the OpenCitations Data Model, the SPAR (Semantic Publishing and Referencing) Ontologies, the OpenCitations Corpus, and the OpenCitations Indexes of citation data, of which the first and largest is COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations, that currently holds information on over 624 million citations. We conclude our survey of OpenCitations’ services and data by outlining the generic open source software developed at OpenCitations, including OSCAR, the OpenCitations RDF Search Application for searching over RDF datasets, LUCINDA, OSCAR’s associated OpenCitations RDF Resource Browser, and RAMOSE, OpenCitations’ application for creating REST APIs over SPARQL endpoints, thus opening Semantic Web datasets to those not familiar with SPARQL, the RDF query language.

In the second half of the paper, we describe OpenCitations as an organization in terms of its compliance with the principles for the sustainability of open infrastructures proposed by Bilder, Lin and Neylon (2015) [2], and report the selection of OpenCitations by the Global Sustainability Coalition for Open Science Services (SCOSS) as an open infrastructure organization worthy of crowd-funding support by the stakeholder community. We then provide usage statistics for our datasets and web site, and describe the adoption of OpenCitations data and services by the community, before concluding with a forward look at our proposed developments of OpenCitations activities.

References

[1] Silvio Peroni and David Shotton (2020). OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies 1 (1): 428-444. https://doi.org/10.1162/qss_a_00023

[2] Geoffrey Bilder, Jennifer Lin and Cameron Neylon (2015). Principles for open scholarly infrastructures. Figshare. https://doi.org/10.6084/m9.figshare.1314859

The Wellcome Trust funds OpenCitations

The Open Biomedical Citations in Context Corpus funded by the Wellcome Trust

The Wellcome Trust, which funds research in big health challenges and campaigns for better science, has agreed to fund The Open Biomedical Citations in Context Corpus, a new project to enhance the OpenCitations Corpus, as part of the Open Research Fund programme.

As readers of this blog will know, the OpenCitations Corpus is an open scholarly citation database that freely and legally makes available accurate citation data (academic references) to assist scholars with their academic studies, and to serve knowledge to the wider public.

Objectives

The Open Biomedical Citations in Context Corpus, funded by the Wellcome Trust for 12 months from March 2019, will make the OpenCitations Corpus (OCC) more useful to the academic community by significantly expanding the kinds of citation data held within the Corpus, so as to provide data for each individual in-text reference and its semantic context, making it possible to distinguish references that are cited only once from those that are cited multiple times, to see which references are cited together (e.g. in the same sentence), to determine in which section of the article references are cited (e.g. Introduction, Methods), and, potentially, to retrieve the function of the citation.

At OpenCitations, we will achieve these objectives in the following ways:

by extending the OpenCitations Data Model so as to describe how the in-text reference data should be modeled in RDF for inclusion in the OpenCitations Corpus;
by develping scripts for extracting in-text references from articles within the Open Access Subset of biomedical literature hosted by Europe PubMed Central;
by extending the existing ingestion workflow so as to add the new in-text reference data into the Corpus;
by developing appropriate user interfaces for querying and browsing these new data.

Personnel

We are looking for a post-doctoral computer scientist / research engineer specifically to achieves the aforementioned objectives. This post-doctoral appointment will start the 1st of March 2019. We seek a highly intelligent, skilled and motivated individual who is expert in Python, Semantic Web technologies, Linked Data and Web technologies. Additional expertise in Web Interface Design and Information Visualization would be highly beneficial, plus a strong and demonstrable commitment to open science and team-working abilities.

The minimal formal requirement for this position is a Masters degree in computer science, computer science and engineering, telecommunications engineering, or equivalent title, but it is expected that the successful applicant will have had research experience leading to a doctoral degree. The position has a net salary (exempt from income tax, after deduction of social security contributions) in excess of 23K euros per year.

The formal advertisement for this post – which will be held at the Digital Humanities Advanced Research Centre (DHARC), Department of Computer Classical Philology and Italian Studies, University of Bologna, Italy, under the supervision of Dr Silvio Peroni – is published online, and it is accompanied by the activity plan (in Italian and English). The application must be presented exclusively online by logging in the website https://concorsi.unibo.it (default in Italian, but there is a link to switch the language in English). People who do not have a @unibo.it email account must register to the platform. The deadline for application is the 25th January 2019 at 15:00 Central Europe Time. Please feel free to contact Silvio Peroni (silvio dot peroni at unibo dot it) for further information.

People involved

The people formally involved in the projects are:

Vincent Larivière – École de Bibliothéconomie et des Sciences de l’Information, Université de Montréal, Canada;
Silvio Peroni (Principal Investigator) – Digital Humanities Advanced Research Centre (DHARC), Department of Computer Classical Philology and Italian Studies, University of Bologna, Italy, and Director of OpenCitations;
David Shotton – Oxford e-Research Centre, University of Oxford, Oxford, UK, and Director of OpenCitations;
Ludo Waltman – Centre for Science and Technology Studies (CWTS), Leiden University, Netherlands.

In addition, the project is supported by Europe PubMed Central (EMBL-EBI, Hinxton, UK).

Citations as First-Class Data Entities: Introduction

Citations are now centre stage

As a result of the Initiative for Open Citations (I4OC), launched on April 6 last year, almost all the major scholarly publishers now open the reference lists they submit to Crossref, resulting in more than half a billion references being openly available via the Crossref API.

It is therefore time to think carefully about how citations are treated, and how they might be better handled as part of the Linked Open Data Web.

Citations are normally treated simply as the links between published entities.

Conventional citation

However, an alternative richer view is to regard a citation as a data entity in its own right.

First class citation

This permits us to endow a citation with descriptive properties, such as

has citation creation date:   3rd March 2015
has citation time span:       6 years, 5 months and 23 days
has type:                     Self-citation
has identifier:               oci:7295288-3962641

[Note: a later blog post entitled “Open Citation Identifiers” will include an explanation of the identifier shown here.]

Advantages of treating citations as First-Class Data Entities

All the information regarding each citation is available in one place.
Citations become easier to describe, distinguish, count and process.
If available in aggregate, citations described in this manner are easier to analyze using bibliometric methods, for example to determine how citation time spans vary by discipline.

Requirements for citations to be treated as First-Class Data Entities

They must be definable in a machine-readable manner as a member of the class “Citation”, and describable using appropriate ontology terms.
They must have metadata structured using a generic yet appropriately detailed data model.
They must be storable, searchable and retrievable in an open database designed for bibliographic citations.
They must be identifiable using a global persistent identifier scheme.
There must be a Web-based identifier resolution service that takes the citation identifier as input and returns a description of the citation.

Blog post detailing how these requirements are met

Subsequent blog posts will describe how we at OpenCitations have satisfied these requirements, permitting citations to indeed be treated as First-Class Data Entities:

Libraries and linked data #1: What are linked data?

The first of six blog posts about libraries and linked data, bearing this title, is to be found at

http://semanticpublishing.wordpress.com/2013/03/01/lld1-what-are-linked-data/.

A draft of that post, that erroneously appeared here in this blog, has been removed.