Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018, Lecture Notes in Computer Science
This paper describes a task of semantic labeling of document segments. The idea exploits ontology in providing a fine-grained conceptual document annotation. We describe a way of dividing a document into its constituent semantically-coherent blocks. These blocks are then used to perform conceptual tagging for efficient passage information retrieval. The proposed task interfaces other application areas such as intra-mapping of ontologies, text summarization and information extraction. The system has been evaluated on a task of conceptual tagging of documents and achieved a promising result.
2012
This paper presents an approach for automatically annotating document segments within information rich texts using a domain ontology. The work exploits the logical structure of input documents in order to achieve its task. The underlying assumption behind this work is that segments in such documents embody self contained informative units. Another assumption is that segment headings coupled with a document"s hierarchical structure offer informal representations of segment content; and that matching segment headings to concepts in an ontology/thesaurus can result in the creation of formal labels/meta-data for these segments. A series of experiments was carried out using the presented approach on a set of Arabic agricultural extension documents. The results of carrying out these experiments demonstrate that the proposed approach is capable of automatically annotating segments with concepts that describe a segment"s content with a high degree of accuracy.
… of the 2007 ACM symposium on …, 2007
This work exploits the logical structure of information rich texts to automatically annotate text segments contained within them using a domain ontology. The underlying assumption behind this work is that segments in such documents embody self contained informative units. Another assumption is that segment headings coupled with a document's hierarchical structure offer informal representations of segment content; and that matching segment headings to concepts in an ontology/thesaurus can result in the creation of formal labels/meta-data for these segments. When an encountered heading can not be matched with any concepts in the ontology, the hierarchical structure of the document is used to infer where a new concept represented by this heading should be added in the ontology. So, in this work the bootstrap ontology is also enriched by new concepts encountered within input documents. This paper also presents issues/problems related to matching textual entities to concepts in an incomplete ontology. The approach presented in this paper was applied to a set of agricultural extension documents. The results of carrying out this experiment demonstrates that the proposed approach is capable of automatically annotating segments with concepts that describe a segment's content with a high degree of accuracy.
In the legal field, it is a fact that a large number of documents are processed every day by management companies with the purpose of extracting data that they consider most relevant in order to be stored in their own databases. Despite technological advances, in many organizations, the task of examining these usually-extensive documents for extracting just a few essential data is still performed manually by people, which is expensive, time-consuming, and subject to human errors. Moreover, legal documents usually follow several conventions in both structure and use of language, which, while not completely formal, can be exploited to boost information extraction. In this work, we present an approach to obtain relevant information out from these legal documents based on the use of ontologies to capture and take advantage of such structure and language conventions. We have implemented our approach in a framework that allows to address different types of documents with minimal effort. Within this framework, we have also regarded one frequent problem that is found in this kind of documentation: the presence of overlapping elements, such as stamps or signatures, which greatly hinders the extraction work over scanned documents. Experimental results show promising results, showing the feasibility of our approach.
2019
Information Extraction (IE) is a pervasive task in the industry that allows to obtain automatically structured data from documents in natural language. Current software systems focused on this activity are able to extract a large percentage of the required information, but they do not usually focus on the quality of the extracted data. In this paper we present an approach focused on validating and improving the quality of the results of an IE system. Our proposal is based on the use of ontologies which store domain knowledge, and which we leverage to detect and solve consistency errors in the extracted data. We have implemented our approach to run against the output of the AIS system, an IE system specialized in analyzing legal documents and we have tested it using a real dataset. Preliminary results confirm the interest of our approach.
Proceedings of the V …, 2007
In this paper we address the problem of automatically enriching legal texts with semantic annotation, an essential pre-requisite to effective indexing and retrieval of legal documents. This is done through illustration of a computational system developed for automated semantic annotation of (Italian) law texts. This tool is an incremental system using Natural Language Processing techniques to perform two tasks: i) classify law paragraphs according to their regulatory content, and ii) extract relevant text fragments corresponding to specific semantic roles that are relevant for the different types of regulatory content. The paper sketches the overall architecture of the tool and reports results of a preliminary case study on a sample of Italian law texts.
Proceedings of the 10th …, 2005
Normative texts can be viewed as composed by formal partitions (articles, paragraphs, etc.) or by semantic units containing fragments of a regulation (provisions). Provisions can be described according to a metadata scheme which consists of provision types and their arguments. This semantic annotation of a normative text can make the retrieval of norms easier. The detection and description of the provisions according to the established metadata scheme is an analytic intellectual activity aiming at classifying portions of a normative text into provision types and to extract their arguments. Automatic facilities supporting this intellectual activity are desirable. Particularly, in this paper, two modules able to qualify fragments of a normative text in terms of provision types and to extract their arguments are presented.
Workshop …
We would like to thank all the authors for submitting their research and the members of the Program Committee for their careful reviews and useful suggestions to the authors. We also would like to thank the LREC 2010 Organising Committee that made this workshop possible.
Legal knowledge and …, 2009
In this work we present STIA a tool for text annotation in the jurisprudence domain. The tool offers an easy interface to experts (lawyers, administrative, researchers,…) for annotating qualified relationships between parts of different laws. Successively, the resulting conceptual annotations feed a complex process to retrieve specific relationships between texts (groups of sentences inside a section) of two laws for new application tasks…
2004
Abstract. We describe the recent enhancement of the CAFETIERE formalism (Conceptual Annotation of Facts, Events, Terms, Individual Entities and RElations) with the ability to link natural language words and phrases in textual documents with instances and classes from a language-enabled ontology. The language-enabled ontology is one with an index from one or more natural language expressions to each concept (as in WordNet). In an information extraction application.
The paper reports on the methodology and preliminary results of a case study in automatically extracting ontological knowledge from Italian legislative texts in the environmental domain. We use a fully-implemented ontology learning system (T2K) that includes a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine language learning. Tools are dynamically integrated to provide an incremental representation of the content of vast repositories of unstructured documents. Evaluated results, however preliminary, are very encouraging, showing the great potential of NLP-powered incremental systems like T2K for accurate large-scale semi-automatic extraction of legal ontologies.
Empirical Software Engineering, 2021
Semantic legal metadata provides information that helps with understanding and interpreting legal provisions. Such metadata is therefore important for the systematic analysis of legal requirements. However, manually enhancing a large legal corpus with semantic metadata is prohibitively expensive. Our work is motivated by two observations: (1) the existing requirements engineering (RE) literature does not provide a harmonized view on the semantic metadata types that are useful for legal requirements analysis; (2) automated support for the extraction of semantic legal metadata is scarce, and it does not exploit the full potential of artificial intelligence technologies, notably natural language processing (NLP) and machine learning (ML). Our objective is to take steps toward overcoming these limitations. To do so, we review and reconcile the semantic legal metadata types proposed in the RE literature. Subsequently, we devise an automated extraction approach for the identified metadata types using NLP and ML. We evaluate our approach through two case studies over the Luxembourgish legislation. Our results indicate a high accuracy in the generation of metadata annotations. In particular, in the two case studies, we were able to obtain precision scores of 97.2% and 82.4%, and recall scores of 94.9% and 92.4%.
2012
We focus on the classification of descriptions of legal obligations in the Legal Taxonomy Syllabus. We compare the results of classification using increasing levels of semantic information. Firstly, we use the text of the concept description, analysed via the TULE syntactic parser, to disambiguate syntactically and select informative nouns. Secondly, we add as additional features for the classifier the concepts (via their ontological ID) which have been semi-automatically linked to the text by knowledge engineers in order to disambiguate the meaning of relevant phrases which are associated to concepts in the ontology. Thirdly, we consider concepts related to the prescriptions by relations such as deontological clauase and sanction.
Lecture Notes in Computer Science, 2010
2012
In this work we illustrate a novel approach for solving an information extraction problem on legal texts. It is based on Natural Language Processing techniques and on the adoption of a formalization that allows coupling domain knowledge and syntactic information. The proposed approach is applied to extend an existing system to assist human annotators in handling normative modificatory provisions-that are the changes to other normative texts-. Such laws 'versioning' problem is a hard and relevant one. We provide a linguistic and legal analysis of a particular case of modificatory provision (the efficacy suspension), show how such knowledge can be formalized in a linguistic resource such as FrameNet, and used by the semantic interpreter.
Artificial Intelligence and Law, 2009
Access to legal information and, in particular, to legal literature is examined for the creation of a search and retrieval system for Italian legal literature. The design and implementation of services such as integrated access to a wide range of resources are described, with a particular focus on the importance of exploiting metadata assigned to disparate legal material. The integration of structured repositories and Web documents is the main purpose of the system: it is constructed on the basis of a federation system with service provider functions, aiming at creating a centralized index of legal resources. The index is based on a uniform metadata view created for structured data by means of the OAI approach and for Web documents by a machine learning approach, which, in this paper, has been assessed as regards document classification. Semantic searching is a major requirement for legal literature users and a solution based on the exploitation of Dublin Core metadata, as well as the use of legal ontologies and related terms prepared for accessing indexed articles have been implemented.
Information Processing & Management, 1997
The SALOMON system automatically summarizes Belgian criminal cases in order to improve access to the large number of existing and future court decisions. SALOMON extracts relevant text units from the case text to form a case summary. Such a case profile facilitates the rapid determination of the relevance of the case or may be employed in text search. In a first important abstracting step SALOMON performs an initial categorization of legal criminal cases and structures the case text into separate legally relevant and irrelevant components. A text grammar represented as a semantic network is used to automatically determine the category of the case and its components. In this way, we are able to extract from the case general data and to identify text portions relevant for further abstracting. It is argued that prior knowledge of the text structure and its indicative cues may support automatic abstracting. A text grammar is a promising form for representing the knowledge involved.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.