Using Static Information of Programs to Partition the Input Domain in Search-based Test Data Generation

The quality of test data has an important effect on the fault-revealing ability of software testing. Search-based test data generation reformulates testing goals as fitness functions, thus, test data generation can be automated by meta-heuristic algorithms. Meta-heuristic algorithms search the domain of input variables in order to find input data that cover the targets. The domain of input variables is very large, even for simple programs, while this size has a major influence on the efficiency and effectiveness of all search-based methods. Despite the large volume of works on search-based test data generation, the literature contains few approaches that concern the impact of search space reduction. In order to partition the input domain, this study defines a relationship between the structure of the program and the input domain. Based on this relationship, we propose a method for partitioning the input domain. Then, to search in the partitioned search space, we select ant colony optimization as one of the important and prosperous meta-heuristic algorithms. To evaluate the performance of the proposed approach in comparison with the previous work, we selected a number of different benchmark programs. The experimental results show that our approach has 14.40% better average coverage versus the competitive approach.


1-Introduction
Software testing is a vital part of the software development life cycle with the aims of revealing failures in a program under test. Besides improving the quality of the testing activity, automation also reduces cost and time [3] [4]. Test data generation as the main part of the software testing process is the activity of finding test data for testing programs, effectively. Symbolic execution and dynamic methods are known as the two main approaches in automatic test data generation [5]. In symbolic execution [6] symbolic values are assigned to the input variables in order to formulate program paths in terms of logical constraints. These constraints must be solved in order to discover input values that trace specific paths in the program. Dependency on the capability of constraint solvers (that are unable to solve complex constraints) is the main issue of this method; pointer references, loop-dependent or array-dependent variables, and calls to functions whose implementations are unknown and external libraries also are the issues related to this approach. In dynamic methods by instrumenting the program and executing it with some input data, the state of the program is observed. Since functions are executed with real argument, pointer values and array subscripts are known at run-time, thus, many of the problems relevant to the symbolic execution are resolved. The application of optimization algorithms as dynamic methods in test data generation is called Search-Based Software Testing (SBST). In this approach, the input domain of the program is the search space, and a fitness function is determined to evaluates and scores different inputs of the program (as solutions) with respect to the given test criterion. The fitness function aims to guide the search into promising, unevaluated areas of the search space. The authors of [1] investigated the relationship between the search space size and search effectiveness and efficiency. They proposed a method to reduce the search space by removing irrelevant variables that are recognized based on the slicing approach. This approach has been applied to three categories of meta-heuristic algorithms, Hill Climbing as a local search, Genetic Algorithm as a global technique, and Memetic Algorithm as a hybrid optimization technique, which is based on the combination of global and local searches. The method has shown a positive effect on three meta-heuristic algorithms but has not outperformed random testing. In order to not being limited to irrelevant variables, we introduce an approach to partition the input domain of relevant variables. Our approach focuses on the analysis of program predicates, i.e., places where logical expressions of variables are evaluated to select the next branch to continue. In fact, we are going to establish a relationship between the structure of the program and the input domain. We obtain some values per each input variable. Then, we partition the search space with respect to these values. Furthermore, we customize the basic Ant Colony Optimization (ACO) algorithm, as a well-known optimization algorithm, according to the partitioned space. This way, we propose a new test data generation approach which is based on the static analysis of the code. To evaluate our proposed approach, we consider average coverage as the evaluation metric. According to the results of the experiments, this approach has better results in comparison to the previous work. We will also show that the suggested static input space partitioning approach implicitly contains irrelevant variable removal capability [1], as well. The rest of the paper is structured as follows. In the next section, a brief overview of related work is given. In Section 3, our approach to partition the search space with the modified ACO algorithm is presented. The experimental results and analysis are presented in Section 4, followed by the conclusion and an outline of future work in Section 5.

2-Related Work
In this section, we first review some of the important works for test data generation based on different optimization algorithms, such as Genetic Algorithm (GA), Simulated Annealing (SA), ACO, and Particle Swarm Optimization (PSO) with more emphasis on ACO because we customize ACO based on our input space partitioning. Then, the approaches related to input domain reduction in the search-based test data generation [1] are presented. In the 1990s, GA was tuned to generate test data. Jones [7] and [8] examined the usage of GA in order to automate test data generation with respect to branch coverage. Their experiments on some small programs demonstrate that GA notably works better than the random testing method. An empirical study on GA-based test data generation for large-scale programs performed by Harman and McMinn [9] [10]. Their experiments showed the superiority of GA over other optimization algorithms such as Hill Climbing. A tool named EvoSuite was implemented by Fraser et al. [11] to generate test suite for satisfying the determined coverage criterion. In EvoSuite, a list of coverage criteria can be set such as branch coverage, data flow, and mutation testing. Tracey et al. introduced a framework for generating test data by using SA as one of the well-known optimization algorithms which works based on the idea of neighborhood search [12]. Their method applies SA to structural test data generation with the hope of overcoming some of the problems raised with the application of local search. In this work, test data can be generated for coverage criteria such as branch and statements coverage. Moreover, Cohen et al. used SA in order to generate test data in combinatorial testing [13]. Sine ACO has shown notable results in solving optimization problems [14][15] [16], some scholars have utilized it to resolve software engineering problems in a wide range of sub-fields such as software project scheduling [17], release planning optimization [18], software quality prediction [19], and software testing [20]. Lam et al. [21] and Srivastava et al. [22] utilized ACO to generate test sequences for state-based software testing. In conformance testing of object-oriented software, the problem of state explosion was solved by Bouchachia et al. [23] via presenting Class Finite State Machines (CFSM). Li et al. [24] also utilized the ACO algorithm for generating test data in respect of the branch coverage criterion. However, this study lacked detailed experimental and comparative analysis. Mao et al. [2] also applied ACO for generating test data and have compared their approach against GA, PSO, and SA for the same purpose. Their findings exhibited that ACO has better performance than GA and SA and is comparable to PSO. The approach in [25] outperformed the work of Mao et al. [2] by incorporating (1+1)-evolution strategies to enhance search exploitation through improving the movement of ants in the local search. In [25], pheromone values were defined in each branch, and also, were considered as a part of the fitness function to discourage every ant from traversing branches already covered by other ants. Since this viewpoint is only appropriate for branch coverage, authors in [26][27] introduced a method for using ACO to cover prime paths. They applied the idea of adaptive random testing in local search and used the information of program predicates in partitioning the search space [26]. The experimental results confirm the positive effects of the proposed approach, especially for programs with complex predicates. Regarding the approaches related to the input domain reduction, the authors of [28] reduced the search space by the interval arithmetic method. Their approach is appropriate only for simple predicates such that each side of the clauses contains a single interval variable. Also, some important issues, such as converting local variables to input variables, have not been addressed in their approach. Therefore, we do not consider this approach in the evaluation section. An approach in order to reduce the dimension of the input domain introduced by Harman [1], [29] called "irrelevant input variable removal". Irrelevant input variables are input variables that do not affect executing the target structure. Therefore, they can be removed from the input domain without affecting the feasibility of the target. Their approach was empirically evaluated for search-based structural test data generation. The results showed that irrelevant input variable removal has no impact on the random search, but enhance the performance of optimization techniques. The authors of [1] encouraged concentrating on relevant variables to more reduce the search space via utilizing the static analysis stage; an idea which is followed in this study.

3-Proposed Approach
In this section, after describing the overall process of our approach, we explain our static analysis approach to input space partitioning. Then, we explain the customization of the ACO algorithm based on the partitioned space in section 3.3. The proposed algorithm produces test data to satisfy the desired coverage criterion.  Partitioning the input domain: Since the search space is constructed by the domain of input variables, we start with symbolic execution [30] to transform each clause of the program to a new clause that only includes input variables. Each resulting clause divides the search space into two partitions; the input vectors in one partition cause the clause to be evaluated to True, while the input vectors in the other partition lead to False value for the clause.

3-1-Overall Process
Replacing the relational operator (i.e., , , , ,      ) by the equality operator in clauses, the borders of these partitions are determined. Each input vector on these borders can be considered as partition values.
Customizing the ACO algorithm: In this phase, the ACO algorithm is customized based on the partitioned search space obtained in the previous phase.

3-2-Partitioning the Input Domain
To perform static partitioning, the program clauses should be initially analyzed. A clause is a predicate that does not have any logical operator. For example, the predicate contains three clauses. The output of analysis is a set of values per each input variable. The partitioning of the input space is done based on these values. In the rest of the paper, these values are called partition values. For obtaining partition values, the following three steps must be done.
Step 1 is done for predetermined test paths of the program (We assume that these test paths cover all the branches of the program); step 2 and step 3 are done for each clause. 1. By performing symbolic execution for predestinate test paths of the program, the clauses of the program are converted such that they only involve input variables. This is carried out because the space we are searching through is constructed by input variables. 2. Modifying each clause C by using the (=) operator instead of the operators. The resulting clause is called C'. 3. Finding a combination of input values that satisfy C'. For example, values that satisfy (e.g., a = 100; b = 100; c = 200) can be considered as partition values for clause a + b > c. For clarifying the elimination of non-input variables in step 1, static partitioning is done on a sample program shown in Fig. 2. that is a non-input variable; assignments which could be used to calculate the relationship between and input variables are distinguished by red in Fig. 2. (b ( . At last, two functions and are achieved to show the relationship between n and input variables. The same thing is done for the predicate in line 9 (Fig. 2. )c(). The predicates that are obtained by eliminating the noninput variables along with the obtained partition values are shown in Table 1. Although we illustrated the partitioned area in the search space only for one clause in the above example, in the next section, we use the partitioned space produced by all the clauses of the program.

3-3-Customizing the ACO Algorithm
ACO as an optimization algorithm is inspired from ants that release pheromone in the environment. ACO algorithms were originally utilized to solve the shortest route in traveling salesman problem [31]. To generate test data, we change the basic ACO with respect to the partitioned search space. We first formally describe the test data generation for the program under test P. Suppose P has d input variables represented by vector For generating test data with the ACO algorithm, an important consideration is the form of pheromone. In this paper, pheromones are defined on the partitions established by static partitioning, explained in the previous section. This way, in each partition, there is a pheromone value that is initialized to one. The pseudo-code of the customized ACO algorithm is presented in Algorithm 1 1 . Table 2 contains the notations and parameters used in this algorithm. The output of Algorithm 1 is a set of test data (or a test suite) that cover the given test targets. The test data generation process in Algorithm 1 is repeated until all the test targets are traversed by the test suite, or the predefined number of iterations is exceeded. The data generation process consists of two stages. In the first stage, all pheromone values are initialized by one (Lines 4-9) and an input vector in the input domain is randomly assigned to each ant as the position vector (Lines 10-12). In the second stage (Lines 13-34), the local search and global search are performed for each ant. Then, the pheromone values are updated with respect to Eq. 1 (Section 3.2.3). The fitness values of the ants are calculated at the end of each iteration. The position of any ant k that covers an uncovered test target is added to the test suite, i.e., TS. The methods local search, global search, and pheromone update are explained in the following subsections.

3-3-1-Local Search
In the local transfer of ants, for an ant in partition , the aim is to investigate whether there is a partition in the neighborhood of that has a better fitness function. If there is such a partition, the ant will transfer to the partition with the best fitness function value. This transfer increases the value of pheromone in the destination partition. The neighboring partitions of an ant in partition b are the partitions that have at least one common partition value. In our implementation, a random location per neighboring partition is selected, and these locations are the representatives of their partitions. In the local search ant transfer from to if the fitness of is better than that of and is in the neighborhood's partition of which has the best fitness value amongst neighborhoods of ; otherwise, the ant's location does not change in the local search and remains in the previous location. It must be noted that less fitness is considered better fitness, and the best fitness value is 0. This process is done for all ants in the partitioned space.

3-3-2-Global Search
The global search is used to solve two problems related to local search. First, there may be some partitions with acceptable fitness that are not visited by any ant in a reasonable time (or iteration limit) and second, there may be ants with the local optima trap [32][32] (that could not find a neighboring place with superior fitness value). To resolve these issues, when each ant's fitness value is worse than the average fitness value of all ants, a random value is generated. If is less than a predefined probability , the ant will randomly be moved to a new partition; otherwise, the partition with the highest pheromone value will be the destination of the ant.

3-3-3-Pheromone Update
Eq.1 is used to update the pheromone value in each partition of the input domain.
(1) Where (0, 1) represent pheromone evaporation rate, is the amount of pheromone in the jth partition, and j stands for partition index.

4-Experiment
In this section, we compare our approach with the approach presented in [1]. Although the competitive approach has been applied to the genetic algorithm, hill climbing, and memetic algorithm, we select its implementation with the genetic algorithm because the experimental results in [1] showed that removing irrelevant input variables has the greatest effect on the genetic algorithm.

Evaluation Metrics
Average Coverage (AC), Average Time (AT), and Mutation Score (MS) are used as the evaluation metrics. Average coverage is the average percentage of the covered branches and is calculated while the two competitive approaches run with the same iterations.
Average time is the average of elapsed time that has been taken to run the algorithms and is calculated to compare the efficiency of the two competitive approaches.
Mutation score is a testing metric provided by the mutation analysis as a fault-based testing technique. To perform mutation analysis, PIT [33] is used as a state-ofthe-art tool for this purpose.

Benchmark Programs
To conduct experiments, several benchmark programs have been selected (see Table 3): the first eleven programs from the Numerical Case Study (NCS) of EvoSuite1. Tcas and Totinfo from the Software-artifact Infrastructure Repository (SIR) 2, and the others from various related work. Table 3 displays the number of lines of code (LoC), and the description of each benchmark program.

The Parameters of the Algorithms
The parameters of algorithms have been set to the values presented in Table 4 before performing the experiments. Parameter selection for our algorithm was done based on the sensitivity analysis which had been done in [2]. Although we can use any coverage criteria, in this paper, we consider branch coverage with the fitness function proposed in [1]. Table 3. Programs selected for the empirical studies Table 4. Parameter setup

4-1-Experiment Results
Experiments were repeated 50 times with various initial population to consider the accidental nature of optimization algorithms. The average coverage resulted per each algorithm for all benchmarks are displayed in Fig.  4. The results demonstrate that the proposed approach has better average coverage for most benchmarks, except three, i.e., BubbleSort, Median, and Variance. The two approaches reached 100% average coverage for these three benchmark programs because satisfying the conditions of these programs is very easy. The Wilcoxon test in R [34] are conducted to statistically evaluate our experimental results. Table 5 presents the average coverage and average time along with resulted Pvalues and effect size. The effect sizes of the comparisons are quantified with the Vargha-Delaney Ȃ statistics. In case of average coverage, Ȃ xy is an estimation of the probability that, if we run the approach x, we will obtain better coverage than running it with the approach y. When two approaches are equivalent, then Ȃ xy = 0.5. A highvalue Ȃ xy = 1 means that, in all of the runs of x, we obtained higher coverage than the coverage obtained in all of the runs with y.  The results reveal significant improvements in the average coverage for 15 out of 23 benchmarks and significant improvement in the average time for 7 out of 23 benchmarks in comparison to the approach presented in [1]. The main cause for this outperformance is partitioning the input domain based on the information that exists in the conditional statement. In other words, we created a relationship between the input domain and the structure of the program. This causes performing searches more intelligently. Therefore, individuals converge to the test goal with higher speed. Utilizing the logic of the program to trace pheromone values results in having better exploitation. Furthermore, this causes having better exploration in the partition which has the highest pheromone value.
To explain more precisely, consider the following clauses and calculated partition values that are selected from program Synthesis-1 1   The obtained partition values for each input variable are: x={4, 20, 50, 150, 200}, y={4, 20, 50, 150}, and z={1, 100}. As the result, the input domain of x, y, and z respectively divided into 6, 5, and 3 parts. The composition of them creates 3 × 5 × 6 = 90 partitions in the whole input domain. If we assume each predicate involves only one clause, choosing one input vector from each of these partitions will lead to one of the branches 1 This example is the same as the one presented in [26].
being traversed. Partitioning based on the program structure causes individuals to converge to the targets sooner than when we just consider irrelevant input variables. Even though the search space is partitioned by considering only one clause (i.e., we do not consider a predicate), occasionally, the partitions that lead to True or False for a predicate are created spontaneously. This leads to more improvements in the efficiency of the proposed approach.
To more explain, consider predicate (a > b && a > 50). As shown in Fig. 5, the partition that leads True for this predicate is made spontaneously just via dividing the input domain by "a > 50" and "a > b", separately. Most importantly, our approach implicitly benefits from the strength point of the previous work [1]. In the case of having an irrelevant input variable, by definition, this variable is not used in any predicate of the target test paths. Since only the involved input variables are used for obtaining partition points in our approach, no partition value is found for irrelevant input variables. Consequently, only one part exists with respect to the domain of an irrelevant input variable; hence, it does not matter which value is selected for irrelevant input variables. We performed mutation analysis to experimentally investigate the failure detection capability of test suites generated by the proposed approach against test suites produced by the previous approach [1]. In order to conduct this analysis, we used 19 of 23 benchmarks presented in Table 4. Four benchmarks PrintCalender, Number, Totinfo, and Mcknap, had been implemented in C, and therefore, could not be used in PIT, which is a java-based tool. The statistical analysis of the results is shown in Table 6. In this table, the significant level for the p-value is considered as p-value ≤0.05. The results show that, with high statistical confidence, in 7 out of 19 programs, the generated test suites by our approach have a more mutation score, and thus, have a better ability to detect failures.
In some benchmarks, such as Remainder, there is no significant difference between mutation score achieved by the two approaches, while the improvement of mutation score on a program like Tcas is noticeable. Test data generation for more complicated programs such as Tcas with 12 input variables is likely more time-consuming. In these programs, fewer data from the input domain are desired, and therefore, using static information to generate test data enhance the mutation score of the generated test suites.

5-Threats to Validity
Threats to internal validity might come from the way the empirical study was carried out. To reduce the probability of having faults in our implementation, it has been carefully tested. But it is well known that testing alone cannot prove the absence of defects. Furthermore, optimization algorithms have random behavior, and thus, are affected by chance. To cope with this problem, we repeated experiments 50 times. Then, we followed statistical procedures to analyze the results. As a threat to the external validity of our results, it should be noted that a different selection of the benchmark programs might result in different conclusions.

6-Conclusions and Future Work
In this paper, we have presented an approach to input space partitioning based on the program's conditional statements. We also customized the ACO algorithm with respect to the partitioned space. In the evaluation section, we have compared our approach with the irrelevant input variable removal method. The results revealed that our approach leads to better results in respect of average coverage. The following research areas will be considered as future work:  Customizing other meta-heuristic algorithms based on predicate's information  Considering the combination of clauses to select better partition values  Presenting a more comprehensive way to reduce the input domain so that it can be applied to all optimization algorithms 