Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
20 pages
1 file
Context: Requirement traceability (RT) is defined as the ability to describe and follow the life of a requirement. RT helps developers ensure that relevant requirements are implemented and that the source code is consistent with its requirement with respect to a set of traceability links called trace links. Previous work leverages Parts Of Speech (POS) tagging of software artifacts to recover trace links among them. These studies work on the premise that discarding one or more POS tags results in an improved accuracy of Information Retrieval (IR) techniques.
Context: Requirement traceability (RT) is defined as the ability to describe and follow the life of a requirement. RT helps developers ensure that relevant requirements are implemented and that the source code is consistent with its requirement with respect to a set of traceability links called trace links. Previous work leverages Parts Of Speech (POS) tagging of software artifacts to recover trace links among them. These studies work on the premise that discarding one or more POS tags results in an improved accuracy of Information Retrieval (IR) techniques. Objective: First, we show empirically that excluding one or more POS tags could negatively impact the accuracy of existing IR-based traceability approaches, namely the Vector Space Model (VSM) and the Jensen Shannon Model (JSM). Second, we propose a method that improves the accuracy of IR-based traceability approaches. Method: We developed an approach, called ConPOS, to recover trace links using constraint-based pruning. ConPOS uses major POS categories and applies constraints to the recovered trace links for pruning as a filtering process to significantly improve the effectiveness of IR-based techniques. We conducted an experiment to provide evidence that removing POSs does not improve the accuracy of IR techniques. Furthermore, we conducted two empirical studies to evaluate the effectiveness of ConPOS in recovering trace links compared to existing peer RT approaches. Results: The results of the first empirical study show that removing one or more POS negatively impacts the accuracy of VSM and JSM. Furthermore, the results from the other empirical studies show that ConPOS provides 11%-107%, 8%-64%, and 15%-170% higher precision, recall, and mean average precision (MAP) than VSM and JSM. Conclusion: We showed that ConPos outperforms existing IR-based RT approaches that discard some POS tags from the input documents.
2012
One of the most successful applications of textual analysis in software engineering is the use of Information Retrieval (IR) methods to reconstruct traceability links between software artifacts. Unfortunately, due to the limitations of both the humans developing artifacts and the IR techniques any IR-based traceability recovery method fails to retrieve some of the correct links, while on the other hand it also retrieves links that are not correct. This limitation has posed challenges for researchers that have proposed several methods to improve the accuracy of IR-based traceability recovery methods by removing the "noise" in the textual content of software artifacts (e.g., by removing common words or increasing the importance of critical terms). In this paper we propose a heuristic to remove the "noise" taking into account the linguistic nature of words in the software artifacts. In particular, the language used in software documents can be classified as a technical language, where the words that provide more indication on the semantics of a document are the nouns. The results of a case study conducted on five software artifact repositories indicate that characterizing the context of software artifacts considering only nouns significantly improves the accuracy of IR-based traceability recovery methods. 3 recovery process a tedious task, as the software engineer has to spend much more time to discard false positives than to trace correct links.
2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering, 2009
Existing methods for recovering traceability links among software documentation artifacts analyze textual similarities among these artifacts. It may be the case, however, that related documentation elements share little terminology or phrasing. This paper presents a technique for indirectly recovering these traceability links in requirements documentation by combining textual with structural information as we conjecture that related requirements share related source code elements. A preliminary case study indicates that our combined approach improves the precision and recall of recovering relevant links among documents as compared to stand-alone methods based solely on analyzing textual similarities.
2011 27th IEEE International Conference on Software Maintenance (ICSM), 2011
Different Information Retrieval (IR) methods have been proposed to recover traceability links among software artifacts. Until now there is no single method that sensibly outperforms the others, however, it has been empirically shown that some methods recover different, yet complementary traceability links. In this paper, we exploit this empirical finding and propose an integrated approach to combine orthogonal IR techniques, which have been statistically shown to produce dissimilar results. Our approach combines the following IRbased methods: Vector Space Model (VSM), probabilistic Jensen and Shannon (JS) model, and Relational Topic Modeling (RTM), which has not been used in the context of traceability link recovery before. The empirical case study conducted on six software systems indicates that the integrated method outperforms stand-alone IR methods as well as any other combination of non-orthogonal methods with a statistically significant margin. 1
Empirical Software Engineering, 2010
Proceedings of the Nineteenth Conference on Computational Natural Language Learning, 2015
Software system development is guided by the evolution of requirements. In this paper, we address the task of requirements traceability, which is concerned with providing bi-directional traceability between various requirements, enabling users to find the origin of each requirement and track every change made to it. We propose a knowledge-rich approach to the task, where we extend a supervised baseline system with (1) additional training instances derived from human-provided annotator rationales; and (2) additional features derived from a hand-built ontology. Experiments demonstrate that our approach yields a relative error reduction of 11.1-19.7%.
2009
The intensive human effort needed to manually manage traceability information has increased the interest in utilising semi-automated traceability recovery techniques. This paper presents a simple way to improve the accuracy of traceability recovery methods based on information retrieval techniques. The proposed method acts on the artefact indexing considering only the nouns contained in the artefact content to define the semantics of an artefact. The rationale behind such a choice is that the language used in software documents can be classified as a sectorial language, where the terms that provide more indication on the semantics of a document are the nouns. The results of a reported case study demonstrate that the proposed artefact indexing significantly improves the accuracy of traceability recovery methods based on the probabilistic or vector space based IR models.
Software documentation is usually expressed in natural languages contains much useful information. Therefore establishing the traceability links between documentation and source code can be very helpful for software engineering management. Suchas requirement traceability, impact analysis, and software reuse, currently. Therecovery of traceability links is mostly based on information retrieval techniques, for instance, probabilistic model, vector space model, and latent semantic indexing. Previous work treats both documentation and source code as plain text files. The quality of retrieved links can be improved by imposing additional structure using that they are software engineering documents. In this paper, we present four enhanced strategies to improve traditional LSI method based on the special characteristics of documentation experimental results show that the first three enhanced strategies can increase the precision of retrieved links by 5%∼16%, while the fourth strategy is about 13%.
Requirements traceability is an essential step in ensuring the quality of software during the early stages of its development life cycle. Requirements tracing usually consists of document parsing, candidate link generation and evaluation and traceability analysis. This paper demonstrates the applicability of Statistical Term Extraction metrics to generate candidate links. It is applied and validated using two datasets and four types of filters two for each dataset, 0.2 and 0.25 for MODIS, 0 and 0.05 for CM1. This method generates requirements traceability matrices between textual requirements artifacts (such as high-level requirements traced to low-level requirements). The proposed method includes ten word frequency metrics divided into three main groups for calculating the frequency of terms. The results show that the proposed method gives better result when compared with the traditional TF-IDF method. Keywords - Requirements Traceability; Traceability Analysis; Candidate Link Generation; Parsing; Term Extraction; Word Frequency Metrics. Call for Papers: https://sites.google.com/site/ijcsis/
2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE), 2013
Latent Semantic Indexing (LSI) is an advanced method widely and successfully employed in Information Retrieval (IR). It is an extension of Vector Space Model (VSM) and it is able to overcome VSM in canonical IR scenarios where it is used on very large document repositories. LSI has also been used to semi-automatically generate traceability links between software artefacts. However, in such a scenario LSI is not able to overcome VSM. This contradicting result is probably due to the different characteristics of software artefact repositories as compared to document repositories. In this paper we present a preliminary empirical study to analyze how the size and the vocabulary of the repository-in terms of number of documents and terms (i.e., the vocabulary)-affects the retrieval accuracy. Even if replications are needed to generalize our findings, the study presented in this paper provides some insights that might be used as guidelines for selecting the more adequate methods to be used for traceability recovery depending on the particular application context.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
ACM Transactions on Software Engineering and Methodology, 2007
2015 IEEE/ACM 8th International Symposium on Software and Systems Traceability, 2015
2013 IEEE International Conference on Software Maintenance, 2013
2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021
Empirical Software Engineering, 2014
2013 17th European Conference on Software Maintenance and Reengineering, 2013
Proceedings - Working Conference on Reverse Engineering, WCRE, 2010