Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015
…
7 pages
1 file
Technical documentation is riddled with domain specific terminology which needs to be detected and properly organized in order to be meaningfully used. In this paper we describe how we coped with the problem of terminology detection for a specific type of document and how the extracted terminology was used within the context of our Answer Extraction System. 1.
2002
Technical documentation is riddled with domain specific terminology which needs to be detected and properly organized in order to be meaningfully used. In this paper we describe how we coped with the problem of terminology detection for a specific type of document and how the extracted terminology was used within the context of our Answer Extraction System.
2002
It is well known that one of the greatest hurdles in automatically processing technical documentation is the large amount of specific terminology that characterizes these domains. Terminology poses two major challenges to the developers of NLP applications: how to identify domain specific terms in the documents and how to efficiently process them. In this paper we will present methodologies that we have used to extract and bootstrap a terminological database and its usage in an answer extraction system.
2003
For most companies and organizations, technical documents are highly valued knowledge sources because they combine the know-how and experience of specialists in a particular domain. To guarantee the optimal use of these documents in specific problem situations, people must be able to quickly find precise and highly reliable information. Answer extraction is a new technology that helps users find precise answers to their questions in technical documents. In this article, the authors present ExtrAns, a real-world answer extraction system designed for technical domains. ExtrAns uses robust natural language processing technology and a semantic representation for information's propositional content.
Lecture Notes in Computer Science, 2002
In this paper we argue that questionanswering (QA) over technical domains is distinctly different from TREC-based QA or Web-based QA and it cannot benefit from data-intensive approaches. Technical questions arise in situations where concrete problems require specific answers and explanations. Finding a justification of the answer in the context of the document is essential if we have to solve a real-world problem. We show that NLP techniques can be used successfully in technical domains for high-precision access to information stored in documents. We present Extr-Ans, an answer extraction system over technical domains, its architecture, its use of logical forms for answer extractions and how terminology extraction becomes an important part of the system.
2019
Tracking developments in the highly dynamic data-technology landscape are vital to keeping up with novel technologies and tools, in the various areas of Artificial Intelligence (AI). However, It is difficult to keep track of all the relevant technology keywords. In this paper, we propose a novel system that addresses this problem. This tool is used to automatically detect the existence of new technologies and tools in text, and extract terms used to describe these new technologies. The extracted new terms can be logged as new AI technologies as they are found on-the-fly in the web. It can be subsequently classified into the relevant semantic labels and AI domains. Our proposed tool is based on a two-stage cascading model -- the first stage classifies if the sentence contains a technology term or not; and the second stage identifies the technology keyword in the sentence. We obtain a competitive accuracy for both tasks of sentence classification and text identification.
2002
Abstract. The shortcomings of traditional Information Retrieval are most evident when users require exact information rather than relevant documents. This practical need is pushing the research community towards systems that can exactly pinpoint those parts of documents that contain the information requested. Answer Extraction (AE) systems aim to satisfy this need.
2004
The current tendency in Question Answering is towards the processing of large volumes of opendomain text. This inclination is spurred by the creation of the Question Answering track in TREC, and the recent increase of systems that use the Web to extract the answers to the questions. This undoubtedly has the advantage that narrow, application-specific concerns can be overlooked in favour of more general approaches. However, the unconstrained nature of the domain and questions does not necessarily lead to systems that are better at specific tasks that may be required in a deployed application. By contrast, the non-redundant nature of most technical documentation and the use of domain specific sublanguage and terminology makes them unsuitable to (some of) the approaches seen in the TREC QA competition. We discuss the specific nature of technical documentation, with examples from real domains (e.g. the Maintenance Manual of a commercial aircraft) and illustrate solutions that have been adopted in a QA system.
Natural Language Engineering, 1995
This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the do...
Proceedings of the 21st annual international conference on Documentation - SIGDOC '03, 2003
Information retrieval systems within voluminous textual documents raise specific problems, such as the choice of the retrieval-unit and the relevance of each response. For the selection of the retrieval-unit, several solutions have been proposed, such as the exploitation of the document logical structure. In most cases, a measure of the retrieval-unit relevance is assessed using criteria, such as the number of occurrences of query terms in the document and their position in the document.. Few systems are user centered designed and are adapted to the taskthey are supposed to assist: usually, these systems are based on paper-aid documentation electronically recorded with a standard information retrieval module. Sysrit (technical information retrieval system), a system under development, is aimed at users expert in the search of technical documents. The conception of Sysrit is based on observations made on these users. In this system, a technical document is automatically segmented into paragraphs (called information units. In order to improve the relevance of the responses given to the users, Sysrit proposes to tag the information units. Indeed, we make the assumption that a response is all the more relevant since it belongs to the same category as the query. We show that queries and information units can be first categorized in two types: the OBJECT (which corresponds to object descriptions) and the PRO type (which concerns procedural descriptions). A detailed study of the OBJECT type shows that it is heterogeneous and covers different sub-types: objects descriptions (DO), definitions (DFI) and specifications descriptions (DF). Upon experimental validation with expert users, we first proposed to categorize the type of each information unit as either OBJECT or PRO., and second to subcategorize the OBJECT units as DO, DFI or DF. We here focus on queries more than on information units. A corpus analysis and a validation by expert users confirm that this categorization can also be used to characterize queries. Moreover, the results of this analysis enable us to propose rules in order to automatically recognize and tag each type of queries.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Proceedings of the 10th International Conference on Agents and Artificial Intelligence
Written Documents in the Workplace, 2007
Lecture Notes in Computer Science, 2009
Avances En Sistemas E Informatica, 2008
Computing Research Repository, 2000