Skip to main content

Timo Homburg

HS Mainz, University of Applied Sciences, I3mainz, Graduate Student

Goethe-Universität Frankfurt am Main, Information Management, Alumnus

Universität Koblenz, WeST, Graduate Student

Followers

98

Following

41

Co-authors

29

Public Views

Computer Scientist into Natural Language Processing, GIS and Semantics
Address: Wiesbaden, Germany

less

Marc-Peter Schambach

SIEMENS

Maciek Niemczyk

Stefan Gradmann

KU Leuven

Universität Bremen

Karlsruhe Institute of Technology (KIT)

Christoph Rauch

Berlin State Library

Vincenzo Damiani

Università di Catania

Björn Eyselein

Julius-Maximilians - Universität Würzburg

Julius-Maximilians - Universität Würzburg

Interests

Uploads

inproceedings by Timo Homburg

Towards Creating A Best Practice Digital Processing Pipeline For Cuneiform Languages

This publication proposes a best practice digital processing pipeline for cuneiform languages. Th... more This publication proposes a best practice digital processing pipeline for cuneiform languages. The pipeline includes the following steps: 1. Annotation of cuneiform tablet 3D scans, 2. Creation of transliterations in ATF and using PaleoCodage to capture cuneiform character variants, 3. Conversion and then annotation of transliterations using TEI (structurally, semantically and liguistically), 4. Creation of semantic dictionaries, 5. Export of the results in various formats to support the needs of many research communities. This poster shows how such a pipeline can be realized using a traditional Git versioning system and a variety of web-based tools assisting in the annotation and export.

Evaluating linked data location based services using the example of Stolpersteine

In this publication we introduce a linked data powered application which assists users to find so... more In this publication we introduce a linked data powered application which assists users to find so-called Stolpersteine, stones commemorating Jewish victims of the second world war. We show the feasibility of a progressive web app using linked data resources and evaluate this app against local datasources to find out if the current linkeddata environment can equally and/or sufficiently support an application in this knowledge domain.

Towards workflow planning based on semantic eligibility

A major problem in the research for new artificial intelligence methods for workflows is the eval... more A major problem in the research for new artificial intelligence methods for workflows is the evaluation. There is a lack of large evaluation corpora. Existing methods manually model workflows or use workflow extraction to automatically extract workflows from text. Both existing approaches have limitations. The manual modeling of workflows requires a lot of human effort and it would be expensive to create a large test corpus. Workflow extraction is limited by the number of existing textual process descriptions and it is not guaranteed that the workflows are semantically correct. In this paper we suggest to set up a planning domain and apply a planner to create a large number of valid plans. Workflows can be derived from plans. The planner uses a semantic eligibility function to determine whether an operator can be applied to a resource or not. We present a first concept and a prototype implementation in the cooking workflow domain.

Word Segmentation for Akkadian Cuneiform

We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and... more We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area.

inbooks by Timo Homburg

Semantic Data integration and quality assurance of thematic maps in the German geographic authority

In this paper we present a new concept of geospatial quality assurance that is currently planned ... more In this paper we present a new concept of geospatial quality assurance that is currently planned to be implemented in the German Federal Agency of Cartography and Geodesy. Linked open data is being enriched with Semantic Web data in order to create thematic maps relevant to the population. We evaluate the quality of such enriched maps using a standardized process and look at the possible impacts of enriching Semantic Web data with open data sets of the Federal Agency of Cartography and Geodesy.

Automatic Integration of Spatial Data into the Semantic Web

Situation-Dependent Data Quality Analysis for Geospatial Data Using Semantic Technologies

In this paper we present a new way to evaluate geospatial data quality using Semantic technologie... more In this paper we present a new way to evaluate geospatial data quality using Semantic technologies. In contrast to non-semantic approaches to evaluate data quality, Semantic technologies allow us to model situations in which geospatial data may be used and to apply costumized geospatial data quality models using reasoning algorithms on a broad scale. We explain how to model data quality using common vocabularies of ontologies in various contexts, apply data quality results using reasoning in a real-world application case using OpenStreetMap as our data source and highlight the results of our findings on the example of disaster management planning for rescue forces. We contribute to the Semantic Web community and the OpenStreetMap community by proposing a semantic framework to combine usecase dependent data quality assignments which can be used as reasoning rules and as data quality assurance tools for both communities respectively.

Interpreting Heterogeneous Geospatial Data Using Semantic Web Technologies

phdtheses by Timo Homburg

Entwicklung einer Androidanwendung zur Zustandsanzeige von Messstationen des Deutschen Wetterdienstes

Verfahren zur Wortsegmentierung nichtalphabetischer Schriften

miscs by Timo Homburg

Die Keilschrifttexte aus Haft Tappeh – Ein Werkstattbericht

This presentation includes recent effort to transliterate, 3D scan and annotate cuneiform texts w... more This presentation includes recent effort to transliterate, 3D scan and annotate cuneiform texts which have been excavated in Haft Tappeh in Iran recently. The history of the texts and artifacts is described and new digital methods to annotate and to register character variants are introduced to support the creation of a linked data results for cuneiform annotation projects.

Semantic Geographic Information System: Integration and management of heterogeneous geodata

Using an INSPIRE Ontology to Support Spatial Data Interoperability

POSTagging and Semantic Dictionary Creation for Hittite Cuneiform

Presentation Topic and State Of The Art On our poster we want to present ongoing work to create ... more Presentation Topic and State Of The Art On our poster we want to present ongoing work to create an automatic natural language processing tool for Hittite cuneiform. Hittite cuneiform texts are to this day manually transcribed by the respective experts and then published in a transliteration format (commonly ATF). Pictures of the original cuneiform tablet may be provided and more rarely cuneiform representations in Unicode are present. Due to recent advancements in the field (such as Cuneify) an automatic translation of many Hittite cuneiform transliterations to their respective cuneiform representation is possible. Research Contributions We build upon this work by creating tools that aim to automatically translate Hittite cuneiform texts to English from either a Unicode cuneiform representation or their transliteration representation. POSTagger We have created a morphological analyzer to detect nouns, verbs, several kinds of pronouns, their respective declinations and appendices as well as structural particles. On a sample set of annotated Hittite texts from different epochs in cuneiform and transliteration representation we have evaluated the morphological analyzer, its advantages, problems and possible solutions and intend to present the results as well as some POSTagging examples in section one of our poster. Dictionary Creation Dictionaries for Hittite cuneiform exist in often non-machine readable formats and without a connection to Semantic Web concepts. We intend to change this situation by parsing digitally available nonsemantic dictionaries and using matching algorithms to find concepts of the English translations of such dictionaries in the Semantic Web e.g. DBPedia or Wikidata. Dictionaries of this kind are stored using the Lexical Model for Ontologies (Lemon). In addition to freely available dictionaries we intend to use expert resources developed by the academy of sciences in Mainz/Germany to verify and extend our generated dictionaries. We intend to present the dictionary creation process, statistics about the content of generated dictionaries and their impact in section two of our poster. Machine Translation Using the newly created dictionaries as well as the POSTagging information we intend to test several automated machine translation approaches of which we will outline the process and possible approaches in poster section three. Contributions for the Communities With our approaches we intend to contribute to the archaeological community in Germany by analysing Hittite cuneiform tablets. Together with work from the University of Heidelberg on image recognition of cuneiform tablets, we want to focus on creating a natural language processing pipeline from scanning cuneiform tablets to an available translation in English.

Semantische Extraktion auf antiken Schriften am Beispiel von Keilschriftsprachen mithilfe semantischer Wörterbücher

Einleitung und Motivation Semantische Extraktionsmechanismen (z.B. Topic Modelling) werden seit v... more Einleitung und Motivation Semantische Extraktionsmechanismen (z.B. Topic Modelling) werden seit vielen Jahren im Bereich des Semantic Web und Natural Language Processings sowie in den Digital Humanities als Verfahren zur Visualisierung und automatischen Kategorisierung von Dokumenten eingesetzt. Oft ergeben sich durch den Einsatz neue Aspekte der Interpretation von Dokumentensammlungen die vorher noch nicht ersichtlich waren. Als Beispiele solcher Verfahren kommen häufig Machine Learning Algorithmen zum Einsatz, welche eine Grobeinordnung von Texten vornehmen können. Gepaart mit Metadaten von Texten können anschließend beispielsweise thematische Übersichten von Dokumenten mit geographischem Bezug auf Kartenmaterialien in GIS Systemen oder mittels historischer Gazetteers zeitliche Zusammenhänge automatisiert dargestellt werden. In dieser Publikation möchten wir die Möglichkeiten der semantischen Extraktion nutzen und diese auf ei464 Digital Humanities im deutschsprachigen Raum 2018 ner Sammlung von Texten in Keilschriftsprachen anwenden. Keilschriftsprachen Keilschriftsprachen haben in den letzten Jahren ein größeres Interesse in der Digital Humanities und Linguistik Community erfahren. (Inglese 2015, Homburg et. al. 2016, Homburg 2017, Sukhareva et. al. 2017). Neben der andauernden Standardisierung in Unicode werden unter anderem Part Of Speech Tagger und Mechanismen der automatisierten Übersetzung erprobt um Keilschrifttexte besser mit dem Computer zu erfassen und zu interpretieren. Desweiteren wurde die Erlernbarkeit der Keilschriftsprachen durch digitale Tools wie Eingabemethoden oder Karteikartenlernprogramme verbessert. (Homburg 2015) Trotz all der erreichten Fortschritte verbleiben jedoch zahlreiche Probleme bei der maschinellen Verarbeitung von Keilschriftsprachen, die unter anderem mit der geringen Verfügbarkeit annotierter Ressourcen und der fehlenden Verfügbarkeit maschinenlesbarer und semantisch sowie linguistisch annotierter Wörterbücher zusammenhängt. Diese Limitierungen hindern viele Natural Language Processing und semantische Extraktionsalgorithmen daran ein besseres Ergebnis zu erzielen. Wir möchten mit dieser Publikation einen Beitrag leisten diese Situation zu verbessern und stellen das "Semantic Dictionary for Ancient Languages" vor, welches ein Versuch ist durch Annotierung vorhandener in der Forschungscommunity anerkannter Wörterbuchressourcen mit Unicode Characters, Semantic Web Konzepten, etymologischen Daten, gemeinsamen Vokabularen und POSTags eine semantische Ressource in RDF für die Optimierung solcher Algorithmen auf Basis der Sprachen Hethitisch, Sumerisch und Akkadisch zu schaffen.Das Wörterbuch basiert auf dem Lemon-Standard, ein W3C Standard der es erlaubt ebenfalls multilinguale Resourcen abzubilden. So können Entwicklungen der Sprache und gemeinsame Vokabulare wie zum Beispiel Akkadogramme und Sumerogramme in Hethitisch mit erfasst werden. Semantisches Wörterbuch und Semantische Extraktion Wir testen die Performance des Wörterbuchs auf einer der größten Sammlungen von digitalen Keilschrifttexten, der CDLI, aus der wir repräsentative Texte in hethitischer, sumerischer und akkadischer Keilschrift aus verschiedenen Epochen extrahieren und mittels Machine Learning klassifizieren, sowie verschlagworten. Das Ergebnis der semantischen Extraktion ist eine Sammlung von Themen pro Keilschrifttafel, die sich wiederum in Überkategorien gruppieren lassen und in einen zeitlichen, sprachlichen, dialektischen, sowie örtlichen Kontext gestellt werden können. Anhand der verschiedenen Metadaten der CDLI war es uns möglich eine thematische Karte der Fundorte der Keilschrifttafeln sowie deren Inhalt pro Epoche darzustellen aus der das relevante Fachpublikum schließen kann welche Themen zu welcher Zeit an welchem Fundort relevant für die Schreiber der jeweiligen Epoche waren. Im Zuge einer Weiterentwicklung möchten wir diese Informationen mit weiteren Metadaten wie beispielsweise der Jurisdiktion, den Daten der jeweiligen Herrscher sowie rekonstruierten Orten aus der antiken Zeit vervollständigen um Rückschlüsse auf interessante historische Ereignisse zu ziehen. Aufbau des Posters Auf unserem Poster möchten wir gerne den Prozess des Aufbaus, sowie die Struktur des semantischen Wörterbuchs sowie die Karte die durch unsere semantische Extraktion entstanden ist präsentieren um die jeweiligen Fachwissenschaftler zur Diskussion über die Entwicklung eines Semantic Web von Keilschriftsprachen und Keilschriftartefakten einzuladen. Desweiteren soll unser Poster eine Reihe von Anwendungen demonstrieren die sich in Zukunft mit unserer semantischen Ressource entwickeln lassen können um einen Beitrag zu einem hoffentlich zukünftig existierenden LinkedData Datensatz für Keilschriftartefakte zur Dokumentation von Keilschrift zu leisten.

Map Change Prediction for Quality Assurance

Open geospatial datasources like OpenStreetMap are created by a community of mappers of different... more Open geospatial datasources like OpenStreetMap are created by a community of mappers of different experience and with different equipment available. It is therefore important to assess the quality of OpenStreetMap-like maps to give recommendations for users in which situations a map is suitable for their needs. In this work we want to use already defined ways to assess the quality of geospatial data and apply them a features to various Machine Learning algorithms to classify which areas are likely to change in future revisions of the map. In a next step we intend to qualify the changes detected by the algorithm and try to find causes of the changes being tracked.

Learning Cuneiform The Modern Way

articles by Timo Homburg

A GeoSPARQL Compliance Benchmark

GeoSPARQL is an important standard for the geospatial linked data community, given that it define... more GeoSPARQL is an important standard for the geospatial linked data community, given that it defines a vocabulary for representing geospatial data in RDF, defines an extension to SPARQL for processing geospatial data, and provides support for both qualitative and quantitative spatial reasoning. However, what the community is missing is a comprehensive and objective way to measure the extent of GeoSPARQL support in GeoSPARQL-enabled RDF triplestores. To fill this gap, we developed the GeoSPARQL compliance benchmark. We propose a series of tests that check for the compliance of RDF triplestores with the GeoSPARQL standard, in order to test how many of the requirements outlined in the standard a tested system supports. This topic is of concern because the support of GeoSPARQL varies greatly between different triplestore implementations, and the extent of support is of great importance for different users. In order to showcase the benchmark and its applicability, we present a comparison of the benchmark results of several triplestores, providing an insight into their current GeoSPARQL support and the overall GeoSPARQL support in the geospatial linked data domain

Software for the GeoSPARQL Compliance Benchmark

by Milos Jovanovik and Timo Homburg

Checking the compliance of geospatial triplestores with the GeoSPARQL standard represents a cruci... more Checking the compliance of geospatial triplestores with the GeoSPARQL standard represents a crucial step for many users when selecting the appropriate storage solution. This publication presents the software which comprises the GeoSPARQL compliance benchmark – a benchmark which checks RDF triplestores for compliance with the requirements of the GeoSPARQL standard. Users can execute this benchmark within the HOBBIT benchmarking platform to quantify the extent to which the GeoSPARQL standard is implemented within the triplestore of interest. This enables users to make an informed decision when choosing an RDF storage solution and helps assess the general state of adoption of geospatial technologies on the Semantic Web.

Linked Data & VGI - Eine komparative Qualitätsanalyse für Deutschland, Österreich und die Schweiz auf Basis von Wikidata und OpenStreetMap

In this publication we present results of a comparative study of Wikidata and OpenStreetMap (OSM)... more In this publication we present results of a comparative study of Wikidata and OpenStreetMap (OSM) in the area of Germany, Austria and Switzerland. We include metadata of OSM and Wikidata, compare the two datasets on an object-by-object basis and on equivalent properties as defined by the respective communities. Our results give an indication about the tag coverage of the respective countries, which objects are typically associated with a wikidata tag, which mistakes are commonly made when annotating OSM objects with wikidata and the equality and equivalence of the respective Wikidata and OSM objects.

Towards Creating A Best Practice Digital Processing Pipeline For Cuneiform Languages

This publication proposes a best practice digital processing pipeline for cuneiform languages. Th... more This publication proposes a best practice digital processing pipeline for cuneiform languages. The pipeline includes the following steps: 1. Annotation of cuneiform tablet 3D scans, 2. Creation of transliterations in ATF and using PaleoCodage to capture cuneiform character variants, 3. Conversion and then annotation of transliterations using TEI (structurally, semantically and liguistically), 4. Creation of semantic dictionaries, 5. Export of the results in various formats to support the needs of many research communities. This poster shows how such a pipeline can be realized using a traditional Git versioning system and a variety of web-based tools assisting in the annotation and export.

Evaluating linked data location based services using the example of Stolpersteine

In this publication we introduce a linked data powered application which assists users to find so... more In this publication we introduce a linked data powered application which assists users to find so-called Stolpersteine, stones commemorating Jewish victims of the second world war. We show the feasibility of a progressive web app using linked data resources and evaluate this app against local datasources to find out if the current linkeddata environment can equally and/or sufficiently support an application in this knowledge domain.

Towards workflow planning based on semantic eligibility

A major problem in the research for new artificial intelligence methods for workflows is the eval... more A major problem in the research for new artificial intelligence methods for workflows is the evaluation. There is a lack of large evaluation corpora. Existing methods manually model workflows or use workflow extraction to automatically extract workflows from text. Both existing approaches have limitations. The manual modeling of workflows requires a lot of human effort and it would be expensive to create a large test corpus. Workflow extraction is limited by the number of existing textual process descriptions and it is not guaranteed that the workflows are semantically correct. In this paper we suggest to set up a planning domain and apply a planner to create a large number of valid plans. Workflows can be derived from plans. The planner uses a semantic eligibility function to determine whether an operator can be applied to a resource or not. We present a first concept and a prototype implementation in the cooking workflow domain.

Word Segmentation for Akkadian Cuneiform

We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and... more We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area.

Semantic Data integration and quality assurance of thematic maps in the German geographic authority

In this paper we present a new concept of geospatial quality assurance that is currently planned ... more In this paper we present a new concept of geospatial quality assurance that is currently planned to be implemented in the German Federal Agency of Cartography and Geodesy. Linked open data is being enriched with Semantic Web data in order to create thematic maps relevant to the population. We evaluate the quality of such enriched maps using a standardized process and look at the possible impacts of enriching Semantic Web data with open data sets of the Federal Agency of Cartography and Geodesy.

Automatic Integration of Spatial Data into the Semantic Web

Situation-Dependent Data Quality Analysis for Geospatial Data Using Semantic Technologies

In this paper we present a new way to evaluate geospatial data quality using Semantic technologie... more In this paper we present a new way to evaluate geospatial data quality using Semantic technologies. In contrast to non-semantic approaches to evaluate data quality, Semantic technologies allow us to model situations in which geospatial data may be used and to apply costumized geospatial data quality models using reasoning algorithms on a broad scale. We explain how to model data quality using common vocabularies of ontologies in various contexts, apply data quality results using reasoning in a real-world application case using OpenStreetMap as our data source and highlight the results of our findings on the example of disaster management planning for rescue forces. We contribute to the Semantic Web community and the OpenStreetMap community by proposing a semantic framework to combine usecase dependent data quality assignments which can be used as reasoning rules and as data quality assurance tools for both communities respectively.

Interpreting Heterogeneous Geospatial Data Using Semantic Web Technologies

Entwicklung einer Androidanwendung zur Zustandsanzeige von Messstationen des Deutschen Wetterdienstes

Verfahren zur Wortsegmentierung nichtalphabetischer Schriften

Die Keilschrifttexte aus Haft Tappeh – Ein Werkstattbericht

This presentation includes recent effort to transliterate, 3D scan and annotate cuneiform texts w... more This presentation includes recent effort to transliterate, 3D scan and annotate cuneiform texts which have been excavated in Haft Tappeh in Iran recently. The history of the texts and artifacts is described and new digital methods to annotate and to register character variants are introduced to support the creation of a linked data results for cuneiform annotation projects.

Semantic Geographic Information System: Integration and management of heterogeneous geodata

Using an INSPIRE Ontology to Support Spatial Data Interoperability

POSTagging and Semantic Dictionary Creation for Hittite Cuneiform

Presentation Topic and State Of The Art On our poster we want to present ongoing work to create ... more Presentation Topic and State Of The Art On our poster we want to present ongoing work to create an automatic natural language processing tool for Hittite cuneiform. Hittite cuneiform texts are to this day manually transcribed by the respective experts and then published in a transliteration format (commonly ATF). Pictures of the original cuneiform tablet may be provided and more rarely cuneiform representations in Unicode are present. Due to recent advancements in the field (such as Cuneify) an automatic translation of many Hittite cuneiform transliterations to their respective cuneiform representation is possible. Research Contributions We build upon this work by creating tools that aim to automatically translate Hittite cuneiform texts to English from either a Unicode cuneiform representation or their transliteration representation. POSTagger We have created a morphological analyzer to detect nouns, verbs, several kinds of pronouns, their respective declinations and appendices as well as structural particles. On a sample set of annotated Hittite texts from different epochs in cuneiform and transliteration representation we have evaluated the morphological analyzer, its advantages, problems and possible solutions and intend to present the results as well as some POSTagging examples in section one of our poster. Dictionary Creation Dictionaries for Hittite cuneiform exist in often non-machine readable formats and without a connection to Semantic Web concepts. We intend to change this situation by parsing digitally available nonsemantic dictionaries and using matching algorithms to find concepts of the English translations of such dictionaries in the Semantic Web e.g. DBPedia or Wikidata. Dictionaries of this kind are stored using the Lexical Model for Ontologies (Lemon). In addition to freely available dictionaries we intend to use expert resources developed by the academy of sciences in Mainz/Germany to verify and extend our generated dictionaries. We intend to present the dictionary creation process, statistics about the content of generated dictionaries and their impact in section two of our poster. Machine Translation Using the newly created dictionaries as well as the POSTagging information we intend to test several automated machine translation approaches of which we will outline the process and possible approaches in poster section three. Contributions for the Communities With our approaches we intend to contribute to the archaeological community in Germany by analysing Hittite cuneiform tablets. Together with work from the University of Heidelberg on image recognition of cuneiform tablets, we want to focus on creating a natural language processing pipeline from scanning cuneiform tablets to an available translation in English.

Semantische Extraktion auf antiken Schriften am Beispiel von Keilschriftsprachen mithilfe semantischer Wörterbücher

Einleitung und Motivation Semantische Extraktionsmechanismen (z.B. Topic Modelling) werden seit v... more Einleitung und Motivation Semantische Extraktionsmechanismen (z.B. Topic Modelling) werden seit vielen Jahren im Bereich des Semantic Web und Natural Language Processings sowie in den Digital Humanities als Verfahren zur Visualisierung und automatischen Kategorisierung von Dokumenten eingesetzt. Oft ergeben sich durch den Einsatz neue Aspekte der Interpretation von Dokumentensammlungen die vorher noch nicht ersichtlich waren. Als Beispiele solcher Verfahren kommen häufig Machine Learning Algorithmen zum Einsatz, welche eine Grobeinordnung von Texten vornehmen können. Gepaart mit Metadaten von Texten können anschließend beispielsweise thematische Übersichten von Dokumenten mit geographischem Bezug auf Kartenmaterialien in GIS Systemen oder mittels historischer Gazetteers zeitliche Zusammenhänge automatisiert dargestellt werden. In dieser Publikation möchten wir die Möglichkeiten der semantischen Extraktion nutzen und diese auf ei464 Digital Humanities im deutschsprachigen Raum 2018 ner Sammlung von Texten in Keilschriftsprachen anwenden. Keilschriftsprachen Keilschriftsprachen haben in den letzten Jahren ein größeres Interesse in der Digital Humanities und Linguistik Community erfahren. (Inglese 2015, Homburg et. al. 2016, Homburg 2017, Sukhareva et. al. 2017). Neben der andauernden Standardisierung in Unicode werden unter anderem Part Of Speech Tagger und Mechanismen der automatisierten Übersetzung erprobt um Keilschrifttexte besser mit dem Computer zu erfassen und zu interpretieren. Desweiteren wurde die Erlernbarkeit der Keilschriftsprachen durch digitale Tools wie Eingabemethoden oder Karteikartenlernprogramme verbessert. (Homburg 2015) Trotz all der erreichten Fortschritte verbleiben jedoch zahlreiche Probleme bei der maschinellen Verarbeitung von Keilschriftsprachen, die unter anderem mit der geringen Verfügbarkeit annotierter Ressourcen und der fehlenden Verfügbarkeit maschinenlesbarer und semantisch sowie linguistisch annotierter Wörterbücher zusammenhängt. Diese Limitierungen hindern viele Natural Language Processing und semantische Extraktionsalgorithmen daran ein besseres Ergebnis zu erzielen. Wir möchten mit dieser Publikation einen Beitrag leisten diese Situation zu verbessern und stellen das "Semantic Dictionary for Ancient Languages" vor, welches ein Versuch ist durch Annotierung vorhandener in der Forschungscommunity anerkannter Wörterbuchressourcen mit Unicode Characters, Semantic Web Konzepten, etymologischen Daten, gemeinsamen Vokabularen und POSTags eine semantische Ressource in RDF für die Optimierung solcher Algorithmen auf Basis der Sprachen Hethitisch, Sumerisch und Akkadisch zu schaffen.Das Wörterbuch basiert auf dem Lemon-Standard, ein W3C Standard der es erlaubt ebenfalls multilinguale Resourcen abzubilden. So können Entwicklungen der Sprache und gemeinsame Vokabulare wie zum Beispiel Akkadogramme und Sumerogramme in Hethitisch mit erfasst werden. Semantisches Wörterbuch und Semantische Extraktion Wir testen die Performance des Wörterbuchs auf einer der größten Sammlungen von digitalen Keilschrifttexten, der CDLI, aus der wir repräsentative Texte in hethitischer, sumerischer und akkadischer Keilschrift aus verschiedenen Epochen extrahieren und mittels Machine Learning klassifizieren, sowie verschlagworten. Das Ergebnis der semantischen Extraktion ist eine Sammlung von Themen pro Keilschrifttafel, die sich wiederum in Überkategorien gruppieren lassen und in einen zeitlichen, sprachlichen, dialektischen, sowie örtlichen Kontext gestellt werden können. Anhand der verschiedenen Metadaten der CDLI war es uns möglich eine thematische Karte der Fundorte der Keilschrifttafeln sowie deren Inhalt pro Epoche darzustellen aus der das relevante Fachpublikum schließen kann welche Themen zu welcher Zeit an welchem Fundort relevant für die Schreiber der jeweiligen Epoche waren. Im Zuge einer Weiterentwicklung möchten wir diese Informationen mit weiteren Metadaten wie beispielsweise der Jurisdiktion, den Daten der jeweiligen Herrscher sowie rekonstruierten Orten aus der antiken Zeit vervollständigen um Rückschlüsse auf interessante historische Ereignisse zu ziehen. Aufbau des Posters Auf unserem Poster möchten wir gerne den Prozess des Aufbaus, sowie die Struktur des semantischen Wörterbuchs sowie die Karte die durch unsere semantische Extraktion entstanden ist präsentieren um die jeweiligen Fachwissenschaftler zur Diskussion über die Entwicklung eines Semantic Web von Keilschriftsprachen und Keilschriftartefakten einzuladen. Desweiteren soll unser Poster eine Reihe von Anwendungen demonstrieren die sich in Zukunft mit unserer semantischen Ressource entwickeln lassen können um einen Beitrag zu einem hoffentlich zukünftig existierenden LinkedData Datensatz für Keilschriftartefakte zur Dokumentation von Keilschrift zu leisten.

Map Change Prediction for Quality Assurance

Open geospatial datasources like OpenStreetMap are created by a community of mappers of different... more Open geospatial datasources like OpenStreetMap are created by a community of mappers of different experience and with different equipment available. It is therefore important to assess the quality of OpenStreetMap-like maps to give recommendations for users in which situations a map is suitable for their needs. In this work we want to use already defined ways to assess the quality of geospatial data and apply them a features to various Machine Learning algorithms to classify which areas are likely to change in future revisions of the map. In a next step we intend to qualify the changes detected by the algorithm and try to find causes of the changes being tracked.

Learning Cuneiform The Modern Way

A GeoSPARQL Compliance Benchmark

GeoSPARQL is an important standard for the geospatial linked data community, given that it define... more GeoSPARQL is an important standard for the geospatial linked data community, given that it defines a vocabulary for representing geospatial data in RDF, defines an extension to SPARQL for processing geospatial data, and provides support for both qualitative and quantitative spatial reasoning. However, what the community is missing is a comprehensive and objective way to measure the extent of GeoSPARQL support in GeoSPARQL-enabled RDF triplestores. To fill this gap, we developed the GeoSPARQL compliance benchmark. We propose a series of tests that check for the compliance of RDF triplestores with the GeoSPARQL standard, in order to test how many of the requirements outlined in the standard a tested system supports. This topic is of concern because the support of GeoSPARQL varies greatly between different triplestore implementations, and the extent of support is of great importance for different users. In order to showcase the benchmark and its applicability, we present a comparison of the benchmark results of several triplestores, providing an insight into their current GeoSPARQL support and the overall GeoSPARQL support in the geospatial linked data domain

Software for the GeoSPARQL Compliance Benchmark

by Milos Jovanovik and Timo Homburg

Checking the compliance of geospatial triplestores with the GeoSPARQL standard represents a cruci... more Checking the compliance of geospatial triplestores with the GeoSPARQL standard represents a crucial step for many users when selecting the appropriate storage solution. This publication presents the software which comprises the GeoSPARQL compliance benchmark – a benchmark which checks RDF triplestores for compliance with the requirements of the GeoSPARQL standard. Users can execute this benchmark within the HOBBIT benchmarking platform to quantify the extent to which the GeoSPARQL standard is implemented within the triplestore of interest. This enables users to make an informed decision when choosing an RDF storage solution and helps assess the general state of adoption of geospatial technologies on the Semantic Web.

Linked Data & VGI - Eine komparative Qualitätsanalyse für Deutschland, Österreich und die Schweiz auf Basis von Wikidata und OpenStreetMap

In this publication we present results of a comparative study of Wikidata and OpenStreetMap (OSM)... more In this publication we present results of a comparative study of Wikidata and OpenStreetMap (OSM) in the area of Germany, Austria and Switzerland. We include metadata of OSM and Wikidata, compare the two datasets on an object-by-object basis and on equivalent properties as defined by the respective communities. Our results give an indication about the tag coverage of the respective countries, which objects are typically associated with a wikidata tag, which mistakes are commonly made when annotating OSM objects with wikidata and the equality and equivalence of the respective Wikidata and OSM objects.

Ein unscharfer Suchalgorithmus für Transkriptionen von arabischen Ortsnamen

by Timo Homburg, Magdalena Scherl, and Martin Unold

Digitale Ortsverzeichnisse (Gazetteers) beinhalten Informationen über Orte sowie deren geographis... more Digitale Ortsverzeichnisse (Gazetteers) beinhalten Informationen über Orte sowie deren geographische Lage. Eine der grundlegendsten Aufgaben im Umgang mit solchen Ortsverzeichnissen ist die Suche nach Ortsnamen. Diese Suche kann sehr schwierig sein für Ortsnamen, die in verschiedenen Transliterations- oder Transkriptionsvarianten vorliegen, wie es oft bei arabischen Ortsnamen der Fall ist. In diesen Fällen reicht eine reine Volltextsuche nicht aus. Hier können unscharfe String-Matching-Algorithmen eine bessere Trefferquote für Suchen erreichen.Unser Ziel war es, einen Suchalgorithmus zu entwickeln, der in der Lage ist, arabische Ortsnamen in verschiedenen Transliterationen und Transkriptionen zu identifizieren. Einerseits sollte der Algorithmus fehlertolerant sein, sodass er einen Suchbegriff findet, selbst wenn er etwas anders geschrieben wurde als im Ortsverzeichnis hinterlegt. Andererseits sollte er genau genug sein, um nur tatsächliche Transliterations- und Transkriptionsvarianten einzuschließen. Zum Beispiel sollte die Suche nach "Agaga" den Ort "Ajaja" finden, da es sich um verschiedene Transliterationen des selben arabischen Wortes handelt, aber nicht "Dagaga", da dies ein ganz anderer Ort ist. Um diese beiden Ziele zu erreichen, haben wir einen Algorithmus mit einer modifizierten gewichteten Levenshtein-Distanz (Levenshtein 1965) entwickelt. Eine weitere Eigenschaft unseres Suchalgorithmus ist, dass er für andere Anwendungsfälle als arabische Schrift leicht angepasst werden kann. Wir haben daher auch eine Version für Keilschriftsprachen implementiert und auf einem sumerischen Wörterbuch getestet.

Paleo Codage - A machine-readable way to describe cuneiform characters paleographically

Cuneiform characters have been described using various systems in the past and the varieties of ... more Cuneiform characters have been described using various systems in the past and the varieties of systems used in the literature as well as in daily work varies from language to discipline. Commonly, sign lists Borger (1971, 2004), Deimel & Gössmann (1947), Rüster & Neu (1989) are created and published in the form of dictionaries in a non-machine-readable form. Similarly, for computers, the only way to distinguish cuneiform characters is currently to assign them different numbers in a list (e.g. Unicode Unicode Staff (1991)) and consider a distinction on this level. Therefore we are left with many systems and numbers to describe the same cuneiform sign. Contrary to listing cuneiform signs, Gottstein (2012) took another approach in creating a searchable cuneiform character encoding based on wedge types which would be implemented in applications such as CuneiPainter Homburg et al. (2015). Character image recognition has also been performed in the past Mara et al. (2010), but never yielded a machine-readable representation of a cuneiform characters paleographic information which could have been useful as a means of validation for machine learning recognitions. This publication therefore introduces Paleo Codage, a paleographic distinct machinereadable description inspired by the Manuel de Codage encoding van den Berg (1997) for Egyptian Hieroglyphs.

Interpretation and automatic integration of geospatial data into the Semantic Web

In the context of disaster management, geospatial information plays a crucial role in the decisio... more In the context of disaster management, geospatial information plays a crucial role in the decision-making process to protect and save the population. Gathering a maximum of information from different sources to oversee the current situation is a complex task due to the diversity of data formats and structures. Although several approaches have been designed to integrate data from different sources into an ontology, they mainly require background knowledge of the data. However, non-standard data set schema (NSDS) of relational geospatial data retrieved from e.g. web feature services are not always documented. This lack of background knowledge is a major challenge for automatic semantic data integration. Focusing on this problem, this article presents an automatic approach for geospatial data integration in NSDS. This approach does a schema mapping according to the result of an ontology matching corresponding to a semantic interpretation process. This process is based on geocoding and natural language processing. This article extends work done in a previous publication by an improved unit detection algorithm, data quality and provenance enrichments, the detection of feature clusters. It also presents an improved evaluation process to better assess the performance of this approach compared to a manually created ontology. These experiments have shown the automatic approach obtains an error of semantic interpretation around 10\% according to a manual approach.

$Research paper thumbnail of WikiNect: Gestisches Schreiben f{\"u}r kinetische Museumswikis$

WikiNect: Gestisches Schreiben f{\"u}r kinetische Museumswikis

Integration, Quality Assurance and Usage of Geospatial Data with Semantic Tools

In this article we want to present an integrational approach of geospatial data into the semantic... more In this article we want to present an integrational approach of geospatial data into the semantic web in the context of the semantic GIS project. We first highlight the purpose and advantages of the integration and interpretation of data into the semantic web and further on describe the process of data acquisition, data interpretation, quality assurance and provenance and how to access the so integrated data. We continue to highlight the advantages of this integration method by presenting two fields of application of our research project: The evaluation of OpenStreetMap data and the improvement of disaster management. We conclude the article by giving prospects of future work in our project.

ᚑᚌᚔ Linked Ogham Stones -Semantische Modellierung und prototypische Analyse irischer Ogham-Inschriften

by Timo Homburg and Florian Thiery

Wir stellen die Ogham-Steine, deren Inhalte, die Beziehungen der auf Steinen vermerkten Personen,... more Wir stellen die Ogham-Steine, deren Inhalte, die Beziehungen der auf Steinen vermerkten Personen, ihre Stammeszugehörigkeiten und weitere Metadaten als Linked Data bereit und ermöglichen somit deren Verarbeitung durch eine Reihe von Wissenschafts-Communities. Durch die Verwendung von Vokabularen wie Wikidata (Vrandečić et. al. 2014), FOAF (Brickley 2007) und Lemon (McCrae 2012) gewährleisten wir die Erstellung eines semantischen Wörterbuchs für Ogham, welches wir dynamisch aus Textquellen mittels Natural Language Processing Verfahren der Keyword Extraktion extrahieren. Die für uns relevanten Keywords haben wir aus der Literatur gesammelt und in unserem Repository veröffentlicht.1 Die Erfassung der Ogham-Steine als Linked Data Ressourcen erlaubt es, durch Verknüpfung von Wissen und dessen Anreicherung folgende Forschungsfragen anzugehen:

Klassifikation von Steinen (Familienhierarchie, Namensbeschreibung etc.)

Visualisierung von Zusammenhängen (Verwandtschaftsbeziehungen, Stammesgrenzen) aus Linked Data generierten Karten

Formale Erfassung und maschinenlesbare Kodierung von Ogham-Zeichen nach dem Vorbild von PaleoCodage (Homburg 2019)

Als Datenbasis für die Analysen stützen wir uns auf eine Wikidata-Retrodigitalisierung des CIIC Corpus von Macálister (1945,1949), Epidoc-Daten des Ogham in 3D Projekts, sowie auf die Celtic Inscribed Stones Project (CISP2) Datenbank, die uns dankenswerterweise von Dr. Kris Lockyear zur Verfügung gestellt wurde. Des Weiteren pflegen wir aktiv fehlende und passende Elemente in Wikidata ein, um so später die Daten der Research Community im Sinne des SPARQL Unicorn (Thiery and Trognitz 2019a, 2019b) bereitzustellen. Der Sourcecode unserer App steht quelloffen auf GitHub zur Verfügung (Homburg & Thiery 2019).

Book of Abstracts. ArcheoFOSS International Conference 2020

The book of abstracts of ArcheoFOSS International Conference, 14th edition 2020. The conference w... more

Ein unscharfer Suchalgorithmus für Transkriptionen von arabischen Ortsnamen

Annotated 3D-Models of Cuneiform Tablets

Journal of Open Archaeology Data

Integration, Bewertung und Nutzung heterogener Datenquellen mittels semantischer Werkzeuge

Le Centre pour la Communication Scientifique Directe - HAL - INSU, Sep 1, 2017

In this article we want to present an integrational approach of geospatial data into the semantic... more In this article we want to present an integrational approach of geospatial data into the semantic web in the context of the Semantic GIS project. We first highlight the purpose and advantages of the integration and interpretation of data into the semantic web and further on describe the process of data aquisition, data interpretation, quality assurance and provenance and how to access the so integrated data. We continue to highlight the advantages of this integration method by presenting two fields of application of our research project: The evaluation of map data and the improvement of disaster management. We conclude the article by giving prospects of future work in our project. In diesem Artikel stellen wir unsere Forschung in der Integration von Geodaten in einen Semantic Web Kontext in unserem Projekt Semantic GIS vor. Zunächst möchten wir den Zweck und die Vorteile einer Integration und Interpretation von Daten in das Semantic Web beleuchten und anschließend unseren Integrationprozess bestehend aus Datengewinnung, automatischer Interpretation, Qualitätssicherung und Provenance sowie den Datenzugriff erklären. Um die Anwendung unserer Forschung zu demonstrieren gehen wir auf zwei Anwendungsfälle in unserem Projekt ein: Die Bewertung von OpenStreetMap Daten und die Verbesserung des Katastrophenschutzes mittels semantischem Reasoning. Wir schließen den Artikel mit einem Fazit, sowie einem kurzen Ausblick auf zukünftige Forschung.

Semantic Geographic Information System: Integration and management of heterogeneous geodata

Towards Creating A Best Practice Digital Processing Pipeline For Cuneiform Languages

of paper 1204 presented at the Digital Humanities Conference 2019 (DH2019), Utrecht , the Netherl... more

Paleo Codage - A machine-readable way to describe cuneiform characters paleographically

of paper 0259 presented at the Digital Humanities Conference 2019 (DH2019), Utrecht , the Netherl... more

Querying spatial data in the SemanticGIS project - Towards a new version of GeoSPARQL?

Software for the GeoSPARQL compliance benchmark

Software Impacts, 2021

Checking the compliance of geospatial triplestores with the GeoSPARQL standard represents a cruci... more Checking the compliance of geospatial triplestores with the GeoSPARQL standard represents a crucial step for many users when selecting the appropriate storage solution. This publication presents the software which comprises the GeoSPARQL compliance benchmark – a benchmark which checks RDF triplestores for compliance with the requirements of the GeoSPARQL standard. Users can execute this benchmark within the HOBBIT benchmarking platform to quantify the extent to which the GeoSPARQL standard is implemented within the triplestore of interest. This enables users to make an informed decision when choosing an RDF storage solution and helps assess the general state of adoption of geospatial technologies on the Semantic Web.

A GeoSPARQL Compliance Benchmark

ISPRS International Journal of Geo-Information, 2021

GeoSPARQL is an important standard for the geospatial linked data community, given that it define... more GeoSPARQL is an important standard for the geospatial linked data community, given that it defines a vocabulary for representing geospatial data in RDF, defines an extension to SPARQL for processing geospatial data, and provides support for both qualitative and quantitative spatial reasoning. However, what the community is missing is a comprehensive and objective way to measure the extent of GeoSPARQL support in GeoSPARQL-enabled RDF triplestores. To fill this gap, we developed the GeoSPARQL compliance benchmark. We propose a series of tests that check for the compliance of RDF triplestores with the GeoSPARQL standard, in order to test how many of the requirements outlined in the standard a tested system supports. This topic is of concern because the support of GeoSPARQL varies greatly between different triplestore implementations, and the extent of support is of great importance for different users. In order to showcase the benchmark and its applicability, we present a comparison o...

Semantic Data Integration and Quality Assurance of Thematic Maps in the German Federal Agency for Cartography and Geodesy

Business Information Systems Workshops, 2019

In this paper we present a new concept of geospatial quality assurance that is currently planned ... more In this paper we present a new concept of geospatial quality assurance that is currently planned to be implemented in the German Federal Agency of Cartography and Geodesy. Linked open data is being enriched with Semantic Web data in order to create thematic maps relevant to the population. We evaluate the quality of such enriched maps using a standardized process and look at the possible impacts of enriching Semantic Web data with open data sets of the Federal Agency of Cartography and Geodesy.

Learning cuneiform the modern way

Towards Creating A Best Practice Digital Processing Pipeline For Cuneiform Languages

of paper 1204 presented at the Digital Humanities Conference 2019 (DH2019), Utrecht , the Netherl... more

Querying spatial data in the SemanticGIS project - Towards a new version of GeoSPARQL?

Software for the GeoSPARQL compliance benchmark

Software Impacts, 2021

Abstract Checking the compliance of geospatial triplestores with the GeoSPARQL standard represent... more Abstract Checking the compliance of geospatial triplestores with the GeoSPARQL standard represents a crucial step for many users when selecting the appropriate storage solution. This publication presents the software which comprises the GeoSPARQL compliance benchmark – a benchmark which checks RDF triplestores for compliance with the requirements of the GeoSPARQL standard. Users can execute this benchmark within the HOBBIT benchmarking platform to quantify the extent to which the GeoSPARQL standard is implemented within the triplestore of interest. This enables users to make an informed decision when choosing an RDF storage solution and helps assess the general state of adoption of geospatial technologies on the Semantic Web.

A GeoSPARQL Compliance Benchmark

ISPRS International Journal of Geo-Information, 2021

GeoSPARQL is an important standard for the geospatial linked data community, given that it define... more GeoSPARQL is an important standard for the geospatial linked data community, given that it defines a vocabulary for representing geospatial data in RDF, defines an extension to SPARQL for processing geospatial data, and provides support for both qualitative and quantitative spatial reasoning. However, what the community is missing is a comprehensive and objective way to measure the extent of GeoSPARQL support in GeoSPARQL-enabled RDF triplestores. To fill this gap, we developed the GeoSPARQL compliance benchmark. We propose a series of tests that check for the compliance of RDF triplestores with the GeoSPARQL standard, in order to test how many of the requirements outlined in the standard a tested system supports. This topic is of concern because the support of GeoSPARQL varies greatly between different triplestore implementations, and the extent of support is of great importance for different users. In order to showcase the benchmark and its applicability, we present a comparison o...

Semantic Data Integration and Quality Assurance of Thematic Maps in the German Federal Agency for Cartography and Geodesy

Business Information Systems Workshops, 2019

In this paper we present a new concept of geospatial quality assurance that is currently planned ... more In this paper we present a new concept of geospatial quality assurance that is currently planned to be implemented in the German Federal Agency of Cartography and Geodesy. Linked open data is being enriched with Semantic Web data in order to create thematic maps relevant to the population. We evaluate the quality of such enriched maps using a standardized process and look at the possible impacts of enriching Semantic Web data with open data sets of the Federal Agency of Cartography and Geodesy.

Semantic Data Integration and Quality Assurance of Thematic Maps in the German Federal Agency for Cartography and Geodesy

Business Information Systems Workshops, 2019

In this paper we present a new concept of geospatial quality assurance that is currently planned ... more In this paper we present a new concept of geospatial quality assurance that is currently planned to be implemented in the German Federal Agency of Cartography and Geodesy. Linked open data is being enriched with Semantic Web data in order to create thematic maps relevant to the population. We evaluate the quality of such enriched maps using a standardized process and look at the possible impacts of enriching Semantic Web data with open data sets of the Federal Agency of Cartography and Geodesy.

Word Segmentation for Akkadian Cuneiform

We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and... more We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area.

Learning cuneiform the modern way

With our poster and the accompanying demo, we present current progress on the information-technol... more With our poster and the accompanying demo, we present current progress on the information-technological support for scholars and students of cuneiform. For a period of about 3000 years, cuneiform was the dominant writing system of the Ancient Near East, with a rich literary tradition in several languages, and an extensive amount of texts preserved in tens of thousands of clay tablets. Despite this wealth of data and a strong academic tradition in their analysis, the numerous specific challenges of cuneiform have only partially been addressed so far. Here, we propose adapting input method engines (IMEs) and learning strategies commonly used for Asian languages according to the needs of Assyriology.

Word Segmentation for Akkadian Cuneiform

We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and... more We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area.