Test case optimization a nature inspired approach using bacteriologic
algorithm
Submitted To
Amity University Uttar Pradesh
Bachelor of Technology
in
Computer Science & Engineering
by
AmbikaSingh(A2305215366)
AlankritaSingh(A2305215353)
Under the guidance of
Dr. Prakash Srivastava
Assistant Professor, Department of CSE,ASET
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
AMITY SCHOOL OF ENGINEERING AND TECHNOLOGY
AMITY UNIVERSITY UTTAR PRADESH, NOIDA (U.P.)
CHAPTER-1 Introduction
1.1Software Testing
Software Testing is defined as an analysis carried out to supply stakeholders with knowledge
related to the quality of the software under testing. Software Testing also supplies an objective as
well as an individual prospect of the product for the businesses to acknowledge the risk of
software development. Testing includes a lot of technique some of these are the process of
executing a program or application with the intention of finding software bugs. It includes
logical errors, runtime errors etc.
Generally, the following properties demonstrate the degree to which the individual component or
entire software under testing:
Satisfy the needs which regulated the design and development
Give the proper outputs for all inputs
Timely completes its function i.e. the time needed for such a purpose
Is adequately usable
Installed and executed within a desired environments
Accomplishes the results for the stakeholders’ desire
Even a single software can have a very large number of test cases. This is practically impossible
to test all the test cases in a short span of time. Therefor all software testing techniques attempts
to select and prioritize the test cases.
Thus, there is a need to minimize the number of test cases being executed. Generally test cases
are grouped in test suites. As a result, Software Testers attempt to minimize the test suites. This
results in a large amount of time being saved with the intention of seeking defects.
In general, Software Testing is an iterative/repetitive task. Since when one bug is found , it can
lead to other bugs which may create new bugs.
Testing enables users and all sponsors to know the risk of software implementation as well as it
provide objectives and data about the quality of the product.
Testing can be applied to even those products which are mostly incomplete but yet executable
programs. Testing is so important that the whole development and design process of the
software is dependent on testing.
In Earlier Days, Waterfall Model Testing is done after the coding phase of the product is over
while modern days, Agile Methodology Testing is often done concurrentlywith the development
of product.
1.2SIR
SIR stands for ‘Software Artifact Infrastructure Repository’. It is specially designed for
controlled experiments to be done under testing. It allows us to rigorously test the objects
available which may be written in C or Java. It is also important for software related testing
education.
The objects are mainly C, Java, C++ and C Sharp programs. All objects are in multiple versions,
both original and seeded versions. These seeded versions have seeded faults ,intended to check
or test in a highly controlled environment. These also contain fault data and test scripts.
Regression Testing requires many versions of the same software for conducting experiments.
Hence, this sort of testing becomes quite easy if fault data as well as numerous versions are
available. Obtaining such artifacts and organizing them in a manner that supports controlled
experimentation is a difficult task.
1.3 Greedy Approach
The optimisation here involves minimising and prioritizing of test cases. This is the reason and
motivation for finding this work. From past few years, various researchers have been engaged in
finding methods for test case minimisation and prioritisation, so that the whole system can be
effectively tested in given time. The first approach used for test suite minimisation is the
classical Greedy approach (Cormen et al., 2001). This uses a function that finds out the test cases
in test suite that satisfy most of the requirements and removes those requirements from the
requirement set. This process is continued recursively until all the requirements are satisfied.
Harrold et al. (1993) (HGS) proposed a technique which selects a set of test cases from a test
suite that could represent the entire test suite in terms of coverage, i.e., it will have the same
coverage as that of test suite. The main drawback in above two approaches is that the fault
detection capability of test suites has been reduced. Wong et al. (1995) suggested that keeping
all-uses coverage constant, if test suite’s size is reduced, then little or no reduction in its fault
detection effectiveness is examined. Von Ronne (1999) generalised the HGS algorithm for
support each conditions in multiple times. In contrast to previous studies, Rothermal et al.’s
(1998, 2001) empirical study showed that minimisation of test suites severely affects their fault
detection capabilities. Tallam and Gupta (2005) developed another heuristic called the delayed-
greedy strategy to minimise the test suites while considering the coverage requirements
CHAPTER 2 – Literature Review
2.1 Basic Genetic Algorithm
GAs automatically search in the input area for data with the maximum usage. GAs requires a
fitness function, capable of calculating the quality of chromosomes (a chromosome is a
collection of genes).
GAs use selection, crossover and mutation genetic operators.
1 Selection: in this process two chromosomes are selected for crossover and mutation.
Chromosomes with high fitness value have high probability of selection.
2 Crossover: this gene operator selects a random crossover point and makes new descendants by
crossover of parents. The simplest way is one-point crossover, which selects a gene as crossover
point. The parents gene before or beyond the crossover point are exchanged arbitrarily, resulting
two new descendants.
3 Mutation: crossover process is followed by mutation process, which randomly changes the
new descendants.
2.2 Bacteriologic algorithm
The BA is basically a variant of GA. It is inspired by the nature of bacteria. As a bacterium never
crosses-over, so crossover operator is removed here. The genetic operators used in this algorithm
are selection and mutation. A memorisation function is introduced here, which can memorise the
fittest bacteria of each population. There can be a threshold for memorisation function which
indicates the number of bacteria of each population to be memorised.
It takes ‘source code’ and a ‘set of test cases’ as input, and gives minimised and prioritised set of
test suites as output. Using Requirement mapping technique, the test cases are mapped with the
requirements extracted from the code, and two matrices are formed: a branch coverage matrix
(branch coverage table) b fault coverage matrix (definition-use table). The minimisation
algorithm which uses these matrices decides the initial test suite size and then test suites are
generated by randomly selecting the test cases. These form the initial population for
bacteriological algorithm. Then, selection and mutation operations are applied and new
descendants are formed. This process is repeated until stopping criteria is met. The fitness
function, fit used here for calculating fitness of test suites is the metric inferred as sum of product
of respective weights with bc and fda which can be directly obtained from the two tables formed
by requirement mapping.
Fit = w1*bc + w2*fda
wherebc is branch coverage (obtained from branch coverage table) and fda is fault detection
ability(obtained from fault coverage table). w1 and w2 are weights ranging from 0 to 1. They
depict the significance of each metric which is going to change for individual program. Different
values of these weights cover different metrics like
• w1 > w2 incline towards ‘all p-use and some c-use’
• w1 < w2 incline towards ‘all c-use and some p-use’
• (w1 = w2)! = 0 incline towards ‘some c-use and some p-use’.
If only branch coverage is taken into consideration for test suite size’s and fitness calculation,
then regardless of the values of w1 and w2, it tends towards ‘all p-use’ metric. For this case w2
will be equal to zero.
If only fault coverage (def-use table) is taken into consideration, then regardless of the values of
w1 and w2, it tends towards ‘all c-use’ metric. For this case w1 will be equal to zero.
Both the weights cannot be zero at the same time. These weights are required to be chosen by
the tester by thoroughly analysing the source code.
Chapter 3 - Research Methodology
The below algorithm shows the steps involved in proposed approach for test case minimisation
and prioritisation. It takes two inputs as
1 source code
2 initial set of test cases.
The output is minimised and prioritised set of test suites. Rb is the set of extracted branches
from the source code and Rc is the set of extracted variable def-use pair. Tablebc is the branch
coverage table and Tablefda is the variable def-use table. TS is a test suite which gives the
optimal length L here. The weights w1 and w2, both cannot be zero at the same time. If instead
of test cases, test suites are present then the steps 5, 6 and 7 becomes insignificant.
Input: source code,
S Set of initial test cases
ti Output: minimised and prioritised test suites.
Algorithm:
1 Extract branches as requirements from source code S Rb = {B1, B2, B3....Bn}
2 Map initial test cases, ti, with RbTablebc = map_fun(ti, Rb)
3 Find define-use of different variables, from source code S Rc = {a1(line x, line y)....an(line x,
line y)}
4 Map initial test cases, ti, with RcTablefda = map_fun(ti, Rc)
5 Find a test suite TS with maximum branch coverage and fault detection ability TS =
find(Tablefda, Tablebc, ti)
6 Find optimal length of test suites as L = find_length(TS)
7 Generate random test suites TSi = {TS1, TS2,....TSn}of length L from ti
8 Apply bacteriologic algorithm over TSi
8.1 Calculate fitness for each bacterium in TSifiti = w1 * bci + w2 * fdai where bc = branch
coverage of bacterium (obtained from Tablebc) fda = fault detection ability (obtained from
Tablefda) w1 and w2 are weights ranging between 0 and 1.
8.2 Arrange all bacterium in TSi in descending order of fiti
8.3 Apply mutation
8.4 Recalculate fiti
8.5 Memorise fittest bacteria
8.6 Add to next generation
8.7 Repeat from 8.1 until stopping criteria is not met. (stopping criteria here can be number of
generations)
9 Rearrange all memorised bacteria in descending order of fiti, which prioritises them.
3.1 Case study
For thorough explanation of the proposed approach, a source code of binary search (written in C)
is taken from ‘Data Structure using C and C++’, which is given as input to the algorithm and
returns the position of the element to be searched in the array.
1. int * binary_search ( intval)
2. {
3. unsignedint L = 0, R = array_length(arr), M;
4. while (L <R)
5. {
6. M = (L+R− 1)/2;
7. if (val = = arr [M])
8. returnarr+M;
9. else if (val<arr[M])
10. R = M;
11. else
12. L = M+1;
13. }
14. return NULL;
15. }
In the above source code L, R and val are variables for which the set of the initial test cases (ti)
are given as second input to the algorithm:
Test Cases (ti): Variables: L, R, val
The sample array, arr[] = –14, –7, 3, 8, 12} is taken for designing the test cases(ti) which are
shown in Table1 :
Now further process involves the following steps:
Step 1: Extract branches from source code S as:
1. int * binary_search ( intval)
2. {
3. unsignedint L = 0, R = array_length(arr), M;
4. B1 while (L <R)
5. {
6. M = (L+R− 1)/2;
7. B2 if (val = = arr [M])
8. returnarr+M;
9. B3 else if (val<arr[M])
10. R = M;
11. B4 else
12. L = M+1;
13. }
14. return NULL;
15. }
So, the set of branches Rb = {B1, B2, B3, B4}
Step 2: Map initial test cases ti (from Table 1) with Rb and form a branch coverage Tablebc.
This is shown in Table 2.
Step 3: Extract variable def-use pair from source code S as V (line x, line y), where V is a
variable which is defined at line x and used at line y. Here the variable-def means that the
variable is either initialised by some value which can be a constant or result of any expression,
and the variable-use means that where that variable is used in some computation. In source code
S the following def-use pairs are extracted
Rc={ L(3,6), R(3,6), M(6,8), M(6,10), M(6,12)}
E.g., L(3,6) depicts here that variable L is defined at line number 3 and then used at line number
6.
Step 4: Map initial test cases ti (from Table 1) with Rc to form Tablefda (fault coverage table).
This is shown in Table 3. The ‘X’ in Table 3 depicts that a variable’s definition and use both are
covered by corresponding test case ti. For example ‘X’ in row t1 and corresponding column L (3,
6) is showing that test case t1 is covering both definition (at line 3) and use (at line 6) of L.
Step 5: By using minimisation algorithm, find a test suite which has maximum branch coverage
and fault detection ability. For this both the tables, i.e., Tables 2 and 3 can be combined and
classical Greedy approach (Cormen et al., 2001) can be applied. It finds out the test cases that
satisfy most of the requirements and removes those requirements from the requirement set. This
process is continued recursively until all the requirements are satisfied, e.g., here test case t1
covers branches B1, B2, B3 (from Table 2) and def-use L(3, 6), R(3, 6), M(6, 8), M(6, 10) (from
Table 3). Now these requirements are removed and search for test case which can cover rest of
the requirements. Then test case t2 covers branches B1, B2, B4(from Table 2) and def-use L(3,
6), R(3, 6), M(6, 8), M(6, 10) (from Table 3). So both t1 and t2 are covering all branches and
def-use. They both can form a test suite with hundred percent bc and fda. We get a test suite TS
as TS = {t1, t2}.
Step 6: Now find length of TS (number of test cases in TS), which will give an optimal test suite
size, L = length (TS), here L = 2.
Step 7: Form initial test suites of size L by randomly taking the test cases from ti. Here the
following 10 test suites of size L = 2 are formed for the initial test case (ti):
Here the following 10 test suites of size L = 2 are formed for the initial test case (ti): TS1{ t1, t6}
,TS2{ t2, t7} ,TS3{ t3, t8} , TS4{ t4, t9} , TS5{ t5, t10} , TS6 {t11, t12} , TS7 {t13, t14} , TS8{
t15, t16} ,TS9 {t17, t18} ,TS{10 t19,}
If the tester is only having test cases, not test suites then he requires to calculate the optimal size
of a test suite, otherwise the calculations involved in step 5, step 6 and step 7 are not required.
Step 8: Apply BA on test suites generated after step 7, for number of generations until stopping
criteria is not met. The stopping criteria can be number of generations here. For every generation,
test suites having highest fitness are selected by the selection operator and mutation is applied
over them. The mutation operator randomly exchanges test cases between test suites. This
process is repeated several times for every generation. For example during this process, at any
time let the selection operator chooses two test suites (bacteria) as TS4 = {t4, t9} and TS7 =
{t13, t14}. The mutation operator chooses to exchange the first test cases of these test suites. So
after mutation the new test suites comes as TS4 = {t13, t9} and TS7 = {t4, t14}. This process of
selection and mutation is carried out several times and new test suites are created. Now after
every generation the fitness calculation of the generated test suites is required. For example:
Fitness calculation of TS4={ t13, t9}
Test case t13 covers branches B1 and B3; test case t9 covers branches B1 and B4 (from Table
2). So the test suite TS4 covered 3 branches out of total 4 branches. So branch coverage of TS4
will be:
bc = (3 / 4)*100 =75%
Similarly, t13 covers L(3, 6), R(3, 6), M(6, 10) and t9 covers L(3, 6), R(3, 6), M(6, 12) (from
Table 3). So the test suite TS4 covered 4 def-use pairs out of total 5, so fault detection ability
(fda) of TS4 will be:
fda = (4 / 5)*100 =80%
Let the weights taken here be w1 = 1 and w2 = 0, i.e., it tends to cover ‘all p-use and some c-
use’ here. Therefore fitness value of test suite TS4 will be
fit = (1*75)+ (0*80)= 75%
The most fit bacteria (test suites) of each generation will be memorised according to a
memorisation threshold (maximum number of bacteria of each generation to be memorised)
which is determined by the tester and finally these bacteria are arranged in descending order of
their fitness value, which gives minimised and prioritised test suites according to branch
coverage and fault detection capability.
As a result of an execution we have got ten test suites as shown in Table 4.
3.2 Implementation of bacteriologic algorithm
For implementing bacteriologic algorithm, coding is done in netbeans java.
3.2.1 GUI netbeans
NetBeans is an integrated development environment (IDE) for Java. NetBeans allows
applications to be developed from a set of modular software components called modules.
NetBeans runs on Windows, macOS, Linux and Solaris. In addition to Java development, it has
extensions for other languages like PHP, C, C++, HTML5, and JavaScript. Applications based
on NetBeans, including the NetBeans IDE, can be extended by third party developers.
This is the main file [or Program] containing the main functionality of the bacteriologic
algorithm in which test suites are prioritized and minimised.
Chapter 4: Result and Conclusion
This paper suggests an approach for the test case minimisation and prioritisation using a BA and
requirement mapping technique for improving the fault detection capability of the test cases. The
analysis of this new approach on a number of examples and graphs shows that this technique
improves the performance of the genetic as well as BA to a significant level in the terms of fault
detection and code coverage capability of the test suits
This is the output that we receive at the end stating the test suites which have high branch
coverage and fault detection ability and high value for fitness function.