Introduction to Data Repositories
Shweta Meena
Department of Software Engineering
Delhi Technological UniversityWhich repositories are available for extracting
software engineering data?
+ Software repositories can be mined to collect and gather
the data that can be used for providing empirical results
by validating various techniques or methods.
+ These evidences can allow software researchers to
establish well-formed and generalized theories.
+ By applying the information mined from these
repositories, software engineering researchers and
practitioners do not need to depend primarily on their
intuition and experience, but more on field and historical
data
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupWhat type of questions can be answered from data
mined from software repositories?
+ Is design A better than design B?
+ Is process/method A better than process/method B?
+ What is the probability of occurrence of a defect or
change in a module?
+ Is the effort estimation process accurate?
+ What is the time taken to correct a bug?
+ Is testing technique A better than testing technique B?
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeWhat are various data collection source?
+ Data can be collected from proprietary software, open source
software (OSS), or university software.
+ Data collection from proprietary software is extremely difficult due
to privacy concerns.
+ Data collection from programs developed by students are not
concerned due to non-determination of accuracy and applicability.
+ Data collection from university software is not recommended due
to inexperienced programmers involvement and _ limited
applicability in real-life sciences
CRC Press.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeConfiguration Management Systems
+ Configuration management systems are central to almost all software
projects developed by the organizations.
+ The aim of a configuration management system is to control and manage
changes that occur in all the artifacts produced during the software
development life cycle.
+ The artifacts (also known as deliverables) produced during the software
development life cycle include software requirement specification, software
design document, source code listings, user manuals, and so on.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeConfiguration management system: Types of activities
Configuration
Accounting
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika MalhotraConfiguration Identification
+ Each and every software project artifact produced
during the software development life cycle is uniquely
named.
+ Release:
Q The first issue of a software artifact is called a
release
Q This usually provides most of the functionalities of
a product, but may contain a large number of bugs
and thus is prone to issue fixing and
enhancements.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeConfiguration Identification
+ Versions:
Q Significant changes incurred in the software
project's artifacts are called versions.
Q Each version tends to enhance the functionalities
of a product, or fix some critical bugs reported in
the previous version.
Q New functionalities may or may not be added.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeConfiguration Identification
+ Editions:
Q Minor changes or revisions incurred in the
software artifacts are termed as editions.
Q As opposed to a version, an edition may not
introduce significant enhancements or fix some
critical issues reported in the previous version.
Q Small fixes and patches are introduced.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeConfiguration Control
+ Configuration control is a
critical process of versioning
or configuration
management activities.
+ Configuration control
incorporates the approval,
control, and implementation
of changes to the software
project artifact(s), or to the
software project itself. Change cycle
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressConfiguration Control
+ Its primary purpose is to ensure that each and every change incurred to any
software artifact is carried out with the knowledge and approval of the
software project management team.
+ The request consists of some important fields such as severity (impact of
failure on software operation) and priority (speed with which the defect
must be addressed).
+ The change control board (CCB) is responsible for the approval and tracking
of changes.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se‘Change Request Form
‘Change Request ID
“Type of Change Request
Enhancement
DiDetect Fixing
Dotter Specify
Project
Requested By
Project tenn member name
Brief Description ofthe Change
Request
Description of the change being request
‘ate Submited
Date Required
Priority
Titov
DMediom Mien
Mandatory
Severity
Ciriviat
Gwoderate — [Cserious
Deritieat
‘Reason for Change
scription of why the change Bing requested
Estimated Cost of Change
Estimates for the cost of incurring the change
‘Other Artifacts Impacted
List other arifct affected by this change
Signature
Change request form
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press(Change Notice Form
Change Request ID.
‘Type of Change Request
TiDefect Fixing
Project
“Module in which change is made
Change Implemented by
Project team member name
Date and time of change
implementation
‘Change Approved By
‘CCB member who approved the change
Brief Description of the Change
Request
Description ofthe change incurred
Decision
DApproved
TiApproved with Conditions
Decision Date
‘Conditions
Conditions imposed by the CCB
“Approval Signature
Change notice form
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Taylore Francs GroupConfiguration Control
+ The CCB carefully and closely reviews each and every change before
approval
+ After the changes are successfully implemented and documented, they
must be notified so that they are tracked and recorded in the software
repository hosted at version control systems (VCS).
+ Sometimes, it is also known as the software library, archive, or repository,
wherein the entire official artifacts (documents and source code) are
maintained during the software development life cycle.
+ The changes are notified through a software change notice.
Empirical Researchin Software Engineering: Concepts, Anais, and Applicatonsby Rech Malhotra CRC Press
Tayor Francs GroupConfiguration Accounting
+ Configuration accounting is the process that is responsible for keeping track
of each and every activity, including changes, and any action that affects the
configuration of a software product artifact, or the software product itself.
+ Generally, the entire data corresponding to each and every change is
maintained in the VCS.
+ Configuration accounting also incorporates recording and reporting of all
the information required for versioning or configuration management of a
software project.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeConfiguration Accounting
+ This information includes the status of
software artifacts under versioning control,
metadata, and other related information for
the proposed changes, and the
implementation status of the changes that
were approved in the configuration control
process.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeConfiguration Accounting
+ Atypical configuration status report includes
+ A list of software artifacts under versioning. These comprise a baseline
+ Version-wise date as to when the baseline of a version was established.
+ Specifications that describe each artifact under versioning
+ History of changes incurred in the baseline
* Open change requests for a given artifact.
+ Deficiencies discovered by reviews and audits
+ The status of approved changes.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeImportance of Mining Software Repositories
+ Software repositories usually provide a vast array of varied and valuable
information regarding software projects.
+ By applying the information mined from these repositories, software engineering
researchers and practitioners do not need to depend primarily on their intuition and
experience, but more on field and historical data.
+ A major reason behind the ignorance of how valuable is the information provided in
software engineering repositories, is perhaps the lack of effective mining
techniques that can extract the right kind of information from these repositories in
the right form
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressImportance of Mining Software Repositories
+ Recognizing the need for effective mining techniques, the mining software
repositories (MSR) field has been developed by software engineering
practitioners.
+ The MSR field analyzes and cross-links the rich and valuable data stored in
the software repositories to discover interesting and applicable information
about various software systems as well as projects.
+ The MSR researchers aims at carrying out a significant transformation of
these repositories from static-record keeping into active ones for guiding
the decision-making process of modern software projects.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressData analysis procedure after mining software repositories
+ Mining repositories
Source code
Change record
+ Timestamp
+ Author
+ Logs
Bug fixes
+ Defectidentifier
+ Fixed-By
+ Dateandtime
+ Fundin
(component/module)
+ Description
+ Severity
+ Priority
Web archives
+ Mails
+ Chats
+ Messages
Preprocessed data
(defect and changes)
Metrics
Learning techniques
+ Statistical models
+ Machine learning
Results
+ Obtain
+ Validate
+ Analyze
+ Interpret
Empirical Research n Sofware Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC PressImportance of Mining Software Repositories
+ After mining the relevant information from software repositories, data mining
techniques can be applied and useful results can be obtained, analyzed, and
interpreted.
+ These results will guide the practitioners in decision making. Hence, mining data from
software repositories will exhibit the following potential benefits:
(Enhance maintenance of the software system
O Empirical validation of techniques and methods
O Supporting software reuse
Q Proper allocation of testing and maintenance resources
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeCommonly used Software Repositories
ry
Cees
Epica Reserchin oftware Engng Cones Aras andApplatiosby chika ahora (@pa®) CRETypes of Software Repositories
Epica Reserchin oftware Engng: Cones Aras andApplatiosby Ruchita ahora (@pa®) CREHistorical repositories
+ Historical repositories record
varied information regarding the
evolution and progress of a
software project.
+ They capture significant historical
dependencies prevalent between
various artifacts of a project, such
as functions (in the source code),
documentation files, or
configuration files.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeHistorical repositories
+ Historical repositories include
O Source control repositories
U Bug repositories
Q Archived communications
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupHistorical Repositories: Source Control Repositories
+ They record and maintain the development trail of a
project and track each and every change incurred in any
of the artifacts of a software system, such as the source
code, documentation manuals, and so on.
+ They maintain metadata regarding each change, for
instance, the developer or project member who carried
out the change, the timestamp when the change was
performed, and a short description of the change.
+ Examples: Git, CVS, subversion (SVN), Perforce, and
ClearCase.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupExample Log File from Apache Logaj at Git
com 41625bac0085 0695209726937 ood 1D48
‘Autor Cuts Wika Acted
Date Sun Mar 28 0440"14 2070-0000
‘2948002 Att hnrowsben} pater pls bing patter you tessa deta test un)
.9t-omic: pis apace orropostastloggngog runkeQ028598 1379535-47bb 03109986 -eASO6do18
srclchangesicnanges xm 1
Teriapachelogt/EnnancesPaternayutjova [ia se
lapachetogsjtelpersNOGKeySetExractor ava | 1
larlapachalogt/patemGachedDataFmatjova| 4 r++
lapacnetogsjpatomDateaterConverar java | 8
‘ogtjpatem'PropariesPatemConererjove | 5
‘onabelvonationPatemonverier java, [Bh eessseeceanves
‘esisbuld sm Ta
ipatrleshancadPatemayoulmée 1 popes | 2+
ipatoriennancedPatorLayou grepenies | 2%
IpaenlennancadPatoLsyou groperies | 2+
ipatriennancedPaterLayouS preperies | 2%
IpatenleshancedPatiemayoul ropes | 2+
ipatomiennancedPatorLayou proper | 2+
IpaterlenhancedPatiemsyouls ropes | 2+
TpattonienhancadPaterLayout® proper | 2+
AegtjEmharcesPatemtajeutTostCave|ava || 11 reem
fogsjualemrancedntTestRunnerite: java. [16 #45
27 fos changed. 72 nsoins(). 85 slot).
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press
Taylore Francs GroupHistorical Repositories: Bug Repositories
+ These repositories track and maintain the resolution
history of defect/bug reports, which provide valuable
information regarding the bugs that were reported by the
users or developers of that project.
eugzila YJIRA
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeHistorical Repositories: Archived Communication
Repositories
+ Discussions regarding the various aspects of a software
project during its life cycle are recorded in the archived
communications.
Q Mailing Lists
OQ Emails
Q Instant messages, and
O Internet Relay Chat (IRC) chats
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressRun Time Repositories
+ Run-time repositories, also known as deployment
logs, record information regarding the execution of a
single deployment, or different deployments of a
software system.
+ For example, run-time repositories may record the
error messages reported by a software application at
varied deployment sites.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupRun Time Repositories
+ They can possibly be employed to determine the execution
anomalies by discovering dominant execution or usage
patterns across various deployments, and recording the
deviations observed from such patterns.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressSource Code Repositories
+ Source code repositories maintain the source code for a
large number of OSS projects.
+ Example: [Link] & Google code. They host the
source code for a large number of OSS systems, such as
Android OS, Apache Foundation Projects, and many
source
Google, HigL
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupUnderstanding Systems
+ Understanding large software systems still remains a challenging process for
most of the software organizations.
+ Most importantly, documentation manuals and files pertaining to large
systems rarely exist and even if such data exists, they are often not updated
+ In addition, system experts are usually too preoccupied to guide novice
developers, or may no longer be a part of the organization
+ Evaluating the system characteristics and tracing its evolution history thus
have become important techniques to gain an understanding about the
system.
Empirical Researchin Software Engineering: Concepts, Analysis, and Applicatonsby Ruck Malhotra CRC Press
Tayor Francs GroupSystem Characteristics
+ A software system may be analyzed by the following
general characteristics, which may prove helpful in
decision-making process on whether data should be
collected from a software system and used in research-
centric applications or not.
G Programming language(s): The computer language(s)
in which a software system has been written and
developed. Java remains the most popular
programming language for many OSS systems, such
as Apache projects, Android OS, and many more.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeSystem Characteristics
+ Number of source files:
Q This attribute gives the total number of source code files
contained in a software system.
C In some cases, this measure may be used to depict the
complexity of a software system.
© Asystem with greater number of source files tends to be
more complex than those with lesser number of source
files.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupSystem Characteristics
+ Number of lines of code (LOC):
Q It is an important size metric of any software system that
indicates the total number of LOC of the system.
Many software systems are classified on the basis of their
LOC as small-, medium-, and large scale systems.
Q This attribute also gives an indication of the complexity of a
software system.
O Generally, systems with larger size, that is, LOC, tend to be
more complex than those with smaller size.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeSystem Characteristics
+ Platform:
QThis attribute indicates the hardware and software environment
(predominantly software environment) that is required for a particular
software system to function.
G For example, some software systems are meant to work only on Windows
os.
+ Company:
Q This attribute provides information about the organization that has
developed, or contributed to the development of a software system.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeSystem Characteristics
+ Versions and editions:
Q A software system is typically released in versions,
with each version being rolled out to incorporate
some significant changes in the previous version of
that software system.
O Even for a given version, several editions may be
released to incorporate some minor changes in the
software system
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupSystem Characteristics
+ Application/domain:
OA software system usually serves a fundamental
purpose or application, along with some optional
or secondary features.
Q Open source systems typically belong to one of
these domains: graphics/media/3D, IDE, SDK,
database, diagramjvisualization, games,
middleware, parsers/generators, programming
language, testing, and general purpose tools that
combine multiple such domains.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupVersion Control Systems
+ Version Control Systems (VCS), also known as
source control systems or simply versioning
systems, are systems that track and record
changes incurred to a single artifact or a set of
artifacts of a software system.
Each and every change, no matter how big or
small, is recorded over time so that we may
recall specific revisions or versions of the
system artifacts later.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
Version 3
Version 2
Version 1
CRC Press
Tayor Francs GroupBasic Terminology used for VCS
+ Revision numbers:
Q VCS typically tend to distinguish
between different version numbers of
the software artifacts.
These version numbers are usually
called revision numbers and indicate
various versions of an artifact.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeBasic Terminology used for VCS
+ Release numbers:
Q With respect to software products,
revision numbers are termed as
release numbers and these indicate
different releases of the software
product.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeBasic Terminology used for VCS
+ Baseline or trunk:
QA baseline is the approved version or
revision of a software artifact from which
changes can be made subsequently.
C Itis also called trunk or master.
po Branch
> Branch 2
> Baseline (original line of development)}
—————> Branch3
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeBasic Terminology used for VCS
+ Tag:
QO Whenever a new version of a software product
is released, a symbolic name, called the tag, is
assigned to the revision numbers of current
software artifacts.
Q The tag indicates the release number.
(In the header section of every tagged artifact,
the relation tag (symbolic name)—revision
number is stored.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupBasic Terminology used for VCS
+ Branch:
Q Branch are very common in a VCS and a single branch indicates a self
maintained line of development.
OA developer may create a copy of some project artifacts for his own use, and
give an appropriate identification to the new line of development.
Q This new line of development created from the originally stored software
artifacts is referred to as a branch.
Multiple copies of a file may be created independent of each other.
Q Each branch is characterized by its branch number or identification.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeBasic Terminology used for VCS
+ Head:
Q It (sometimes also called “tip”) refers to the
commit that has been made most recently, either
toa branch or to the trunk
Q The trunk and every branch have their individual
heads.
O Head is also sometimes referred to the trunk.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeFunctionalities provided by VCS
+ Revert project artifacts back to a previously recorded
and maintained state.
+ Revert the entire software project back to a previously
recorded state.
+ Review any change made over time to any of the project
artifacts.
+ Retrieve metadata about any change, such as the
developer or project member who last modified any
artifact that might be causing a problem, and more.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupClassification of VCSLocal VCS
+ Local VCS employ a simple database that records and maintains all the
changes to artifacts of the software project under revision control
+ A system named revision control system (RCS) was a very popular local
versioning system, which is still being used by many organizations as well as.
the end users.
+ RCS operates by simply recording the patch sets (i.e., the differences
between two artifacts) while moving from one revision to the other in a
specific format on the user's system.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeLocal VCS
+ It can then easily recreate the image of a project artifact at any point of time
by summing up all the maintained patches.
+ However, the user cannot collaborate with other users on other systems, as
the database is local and not maintained centrally.
+ Each user has his/her own copy of the different revisions of project artifacts,
and thus there are consistency and data sharing problems.
+ If one user loses the versioning data, recovering it is impossible until and
unless a backup is maintained from time to time.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressCentralized VCS (CVCS)
+ The main aim of CVCS is to allow the user to easily collaborate with different
users on other systems
+ These systems, such as CVS, Perforce, and subversion (SVN), employ a
single centralized server that records and maintains all the versioned
artifacts of a software project under revision control, and there are a number
of clients or users that check out (obtain) the project artifacts from that
central server.
+ For several years, this has been the standard methodology followed in
various organizations for version control
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeQ However, if the central server fails or the data stored at central server is corrupted or
lost, there are no chances of recovery unless we maintain periodic backups.
Client system
Client server
Client system
Versioning
database
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
Centralized
version
control
system
CRC PressDistributed VCS
+ To overcome the limitations of CVCS, distributed VCS (DVCS) were
introduced
+ As opposed to CVCS, a DVCS (such as Bazaar, Darcs, Git, and Mercurial)
ensures that the clients or users do not just obtain or check out the latest
revision or snapshot of the project artifacts, but clone, mirror, or download
the entire software project repository to obtain the artifacts.
+ If any server of the DVCS fails or its data is corrupted or lost, any of the
software project repositories stored at the client machine can be uploaded
as back up to the server to restore it.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressDistributed VCS
+ Therefore, every checkout carried out by a client is essentially a complete
backup of the entire software project data.
+ Nowadays, DVCS have earned the attention of various organizations across
the globe, and these organizations are relying on them for maintaining their
software project repositories.
+ Git is the most popular DVCS employed in practice and hosts a large number
of software project repositories.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressDistributed VCS
+ Google and Apache Software Foundation also employ Git to maintain the
source code and change control data for their various projects, including
following projects:
G Android OS ([Link]
Q Chromium OS
Q Chrome browser ([Link]
Open Office
O logaJ
O PDFBox
Q Apache-Ant ([Link]
Empirical Researchin oftware Engineering: Concepts, Analysis, and Appiatonsby Ruch Malhotra CRC Press
Tayor Francs GroupServer
Versioning
database
ee —_— | Revision 2
Project
artifacts
Versioning
[-—, Client
Revision 2
Project
artifacts
Versioning
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
Distributed
version
control
system
CRC Press
Taylore Francs GroupBug Tracking System
+ A bug tracking system (also known as defect tracking system) is a software
system/application that is built with the intent of keeping a track record of various
defects, bugs, or issues in software development life cycle.
+ Itis a type of issue tracking system.
+ Bug tracking systems are commonly employed by a large number of OSS systems
and most of these tracking systems allow the users to generate various types of
defect reports directly.
+ Typical bug tracking systems are integrated with other software project
management tools and methodologies. Some systems are also used internally by
efpmRoraa nizations (nites dwwan A707. 00 ations by Ruchita Malhotra CRC Press
Tayor Francs GroupBug Tracking System
+ A database is a crucial component of a bug tracking system, which stores
and maintains information regarding the bugs reported by the users and/or
developers.
+ These bugs are generally referred to as known bugs.
+ The information about a bug typically includes the following:
The time when the bug was reported in the software system
Severity of the reported bug
Behavior of the source program/module in which the bug was encountered
Details on how to reproduce that bug
Information about the person who reported that bug
Developers who are possibly working to fix that bug, or will be assigned the jo
Empirical Researchin Sofware Engineering: Concepts, Analysis, and Applications by Rchika Malhotra, ERC PressBug Tracking System
+ Many bug tracking systems also support tracking through the status of a bug
to determine what is known as the concept of bug life cycle
+ Ideally, the administrators of a bug tracking system are allowed to
manipulate the bug information, such as determining the possible values of
bug status.
+ Hence the bug life cycle states, configuring the permissions based on bug
status, changing the status of a bug, or even remove the bug information
from the database.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressBug Tracking System
+ Many systems also update the administrators and developers associated
with a bug through emails or other means, whenever new information is
added in the database corresponding to the bug, or when the status of the
bug changes.
+ The primary advantage of a bug tracking system is that it provides a clear,
concise, and centralized overview of the bugs reported in any phase of the
software development life cycle, and their state.
+ The information provided is valuable for defining the product road map and
plan of action, or even planning the next release of a software system.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeBug Tracking System
+ Bugzilla is one of the most widely used
bug tracking systems.
+ Several open source projects, such as
Mozilla, employ the Bugzilla repository.
6? Bugzilla
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
CRC Press
Tayor Francs GroupExtracting Data from Software Repositories
+ 1. The first step in the data-collection procedure is to extract metrics using
metrics-collection tools such as understand and chidamber and kemerer
java metrics (CKJM).
+ 2. The second step involves collection of bug information to the desired level
of detail (file, method, or class) from the defect report and source control
repositories.
+ 3. Finally, the report containing the software metrics and the defects
extracted from the repositories is generated and can be used by the
researchers for further analysis.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressProcedure to extract data from software repositories
+The data is kept in software
] Soute control repositories in various types such
epostoes J [Som as CVS, Git, SVN, ClearCase,
evscnac Perforce, Mercurial, Veracity, and
Collect dfectichange | Fossil
Collet software metrics wing
‘metrics calslator tools ach
understand, CKI,
data using bugichange
‘lection eo
+ These repositories are used for
a management of software content
a and changes, including
No documents, programs, —_ user
= documentation, and other related
information.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressConcurrent Version System (CVS)
+ CVS is a popular CVCS that hosts a large number of OSS systems.
+ CVS has been developed with the primary goal to handle different revisions
of various software project artifacts by storing the changes between two
subsequent revisions of these artifacts in the repository.
+ Thus, CVS predominantly stores the change logs rather than the actual
artifacts such as binary files
+ CVS canstore binary files also, but they are not handled efficiently.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeWhat are the various features provided by CVS?
+ Revision numbers:
Q Each new revision or version of a project artifact stored in the CVS repository is assigned a
unique revision number by the VCS itself.
O For example, the first version of a checked in artifact is assigned the revision number 2.2.
After the artifacts are modified (updated) and the changes are committed (permanently
recorded) to the CVS repository, the revision number of each modified artifact is
incremented by one.
Q After updation or changes, the revision numbers of the artifacts are not unique.
O The final release of a software project comprises of all the artifacts under version control
where the artifacts can have individual revision numbers.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeWhat are the various features provided by CVS?
+ Branching and merging:
Q The user can create his/her own branch for development, and view, modify, or
delete a branch created by the user as well as other users, provided the user is
authorized to access those branches in the repository.
Q To create a new branch, CVS chooses the first unused even integer, starting
with 2, and appends it to the artifacts’ revision number from where the branch
is forked off, that is, the user who has created that branch wishes to work on
those particular artifacts only. For example, the first branch, which is created at
the revision number 1.2 of an artifact, receives the branch number 1.2.2 but
CVS internally stores it as [Link].
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeWhat are the various features provided by CVS?
+ Branching and merging:
Q However, the main issue with branches is that the detection of branch merges
is not supported by CVS.
Q Consequently, CVS does not boast of enough mechanisms that support
tracking of evolution of typically large-sized software systems as well as their
particular products.
+ Drawback of CVS: Lack of functionality to provide appropriate mechanisms for
linking detailed modification reports and classifying changes.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeWhat are the various features provided by CVS?
+ Version control data:
O For each artifact, which is under the repository’s version control,
CVS generates detailed version control data and saves it in a
change log or simply log files.
Q The recorded log information can be easily retrieved by using the
CVS log command.
Moreover, we can specify some additional parameters so as to
allow the retrieval of information regarding a particular artifact or
even the complete project directory.
CRC Press
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra SeExample log file from Mozilla
project at CVS
doa: ert 1 It shows the versioning data for the
symolie nanos source file
“[Link],”
which is taken from the Mozilla
epuord apatitotton: 7 project.
ee O The CVS change log file typically
comprises of several sections and
ESTRUS nae memes [Link] wate messmo «| each section presents the version
t history of an artifact (source file in
the given example).
O Different sections are always
separated by a single line of “=”
characters.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press
Taylore Francs GroupExample log file from Mozilla
project at CVS
RCS file: This field contains
the path information to
identify an artifact in the
repository.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC PressEmpirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
Example log file from Mozilla
project at CVS
Locks and Access List:
O These are file content access
and security options set by the
developer during the time of
committing the file with the
cvs.
CO These may be used to prevent
unauthorized modification of
the file and allow the users to
only download certain file, but
does not allow them to commit
protected or locked files with
the CVS repository.
CRC Press
Taylore Francs GroupExample log file from Mozilla
project at CVS
Symbolic names:
|_, QO This field contains the revision
numbers assigned to tag
names.
OThe assignment of revision
numbers to the tag names is
=— carried out individually for
/a3 [Link]; author: doo@[Link]; state: Exp; Lint: +16 - 47 each artifact because the
lace fantie omfineiom UAT Ee | revision numbers might be
different.
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press
Taylore Francs GroupExample log file from Mozilla
project at CVS
Description:
OThis field contains the
modification reports that
describe the change history of
the artifact, beginning from
1 the first commit until the
current version.
a/talis senna; autos [Link] sates Boy lissee 6 - «7 QGApart from the changes
icc baiie_comfaneioned USENET | incurred in the head or main
meee trunk, changes in all the
branches are also recorded
e there. The revisions are
ere me apn ht tree nearest st bY separated by a few number of
Empirical Researchin Software Engineering: Concepts, Analysis, and Application by Rucika Mafb¥A ACE er?) CRC Press
Taylore Francs GroupEmpirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
Example log file from Mozilla
project at CVS
Revision number: This field is
used to identify the revision of
source code artifact (main trunk,
branch) that has been subject to
change(s).
CRC PressEmpirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
Example log file from Mozilla
project at CVS
Date: This field records the
date and time of the check in.
1 Author: This field provides the
information of the person who
committed the change.
CRC PressExample log file from Mozilla
project at CVS
State: This field provides
information about the state of
the committed artifact and
|“ generally assumes one of these
values: “Exp” (experimental) and
“dead” (file has been removed).
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Pressta meas _ -_
Date: Thu Jul 11 09:32128 2013 + 0900
Bug: 9767739
Urce: fix read EF Image Instance
‘The E¥e(4mc) path under OF Graphics are not distinguish with che EFs(amco) path
Under DP Phonebook. 0, getarpsth (8? IIDF) is not able to return correct path.
Because goteFPach(EF_IMG) 18 corroct Bach, OP graphics, gotzFPath(EP_ING) 1s used
instead of geteFPath(EP_UIDe), EF_IMG is a linear fixed EF. The result of loading
EP_IMs should be processed as'a LoadLinearPixedcontext. 60, it is needed to calculate
the nunber of EF ima records. If those changes are adied, the changes are duplicated
‘with the codes of EVENT GET RECORD S122. DONS. The codes of EVENT_GST RECORD STZE_ING_
DONE are renoved and the event 1s treated by the logic of the EVENT_GET RECORD Size
‘DONE. And then renove incorrect handler eventa(EVENT_READ_IMG_GONE and
[EVENT_RUAD_ICON_DONS) are qoved to the handler eventa which hava the procedure for
Loading same type EFS (EVENT READ_RECORD_DOWE and che EVENT_READ_BINARY_DOWE)
/internal/telephony/uicc/IecPilellandler java | 140 +++
/internal/telephony/uice/RuimPileHandier-java | 8 4
2 Eslee changed, 38 insertions (+), 120 deletions (-)
Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra
Git also maintains
integrity (no
change can be
made without the
knowledge of Git)
and, generally, Git
only adds data.
CRC Press
Taylore Francs Group