A New Game Theory-Based Algorithm for Target Coverage in Directional Sensor Networks

One of the challenging problems in directional sensor networks is maximizing target coverage while minimizing the amount of energy consumption. Considering the high redundancy in dense directional sensor networks, it is possible to preserve energy and enhance coverage quality by turning off redundant sensors and adjusting the direction of the active sensor nodes. In this paper, we address the problem of maximizing network lifetime with adjustable ranges (MNLAR) and propose a new game theory-based algorithm in which sensor nodes try to adjust their working direction and sensing range in a distributed manner to achieve the desired coverage. For this purpose, we formulate this problem as a multiplayer repeated game in which each sensor as a player tries to maximize its utility function which is designed to capture the tradeoff between target coverage and energy consumption. To achieve an efficient action profile, we present a distributed payoff-based learning algorithm. The performance of the proposed algorithm is evaluated via simulations and compared to some existing methods. The simulation results demonstrate the performance of the proposed algorithm and its superiority over previous approaches in terms of network lifetime.


1-Introduction
Directional sensor networks (DSNs) contain several directional sensors deployed densely and randomly to cover a set of targets. Applications of such networks have been grown and widely applied in the field of industry and our daily life. In comparison with omni-directional sensor nodes, directional sensors have their unique characteristics, such as angle of view, working direction, and line of sight, therefore DSN applications require specific solutions for enhancing target coverage. Motility capability of a directional sensor node has a noticeable impact on the coverage enhancement in randomly deployed DSNs. Directional sensor nodes exploit motility to adjust their working direction. So motility can be used to minimize the overlapped regions. Because of limited energy resources in these networks, providing desired target coverage is a challenging problem [1][2][3].
In many applications, a large number of directional sensor nodes are randomly deployed in an area of interest. The availability of redundant sensors enhances the fault tolerance capability of the network. However, keeping all the sensor nodes active is not efficient because it leads to higher energy consumption. Therefore, one of the goal of this paper is to enhance the target coverage in a network of self-orienting sensor nodes. The other goal is decreasing the power consumption and increasing the network lifetime.
Since we are interested in the automated self-orienting of the nodes, we cast this problem as a non-cooperative game as it is a well-established tool for modeling coordination problems. There are many reasons to choose game theory as a method to solve the coverage problem in sensor networks. First, the principle of game theory allows sensor nodes to operate independently and calculate their proper orientation in a distributed manner. A welldesigned gain function that takes into account all the limitations of the problem including the energy consumption, makes it possible to establish acceptable coverage in the sensor network. So, we can provide scalable network coverage using game theory. Finally, the game theory method is resistant to node failures and environmental disturbances [4].
In this paper, we model the coverage problem as a finite strategic game and propose a game theory-based algorithm (GT-based algorithm), in which the utility function is designed to capture the tradeoff between the worth of the covered area and the energy consumption due to sensing. An important issue is to devise distributed learning algorithms, using local information and processing abilities, to reach a Nash equilibrium (NE) of the game. To this end, we use a distributed payoff-based learning algorithm [5]. It has been proved that if all sensors (players) adhere to this algorithm, then each sensor selects the action profile that maximizes the total payoff of the sensors.
In most game theory based algorithm, a challenging problem is achieving the Nash equilibrium using a distributed learning algorithm [4,[6][7][8][9][10]. In [8,10,11], distributed learning algorithms are proposed to achieve NE in coverage problems. The authors in [12,13] have studied distributed systems and propose distributed learning algorithms to achieve NE in potential games. However, there are two main drawbacks in this context: First of all, most of the results in this area have focused on converging to the NE, while in many cases it is not the optimum solution. Detecting this inefficiency is an extremely active research area in algorithmic game theory [14]. Secondly, it is often impossible to frame the interaction of a given system as a potential game [5]. We measure the performance of an action profile using the sum of the sensor's utility functions. Therefore, by designing the appropriate utility function for each player (sensor node) and applying a distributed payoff-based learning algorithm, coverage in the sensor network converges to a Pareto optimal action profile. The utility function is defined based on the tradeoff between coverage and energy consumption. Then each sensor learns how to adjust its working direction to maximize its utility function which corresponds to find its best orientation based on local information. In the following, the main contributions of this study are presented:  We formulate the maximum network lifetime with adjustable ranges (MNLAR) problem as a multiplayer repeated game in which each sensor as a player tries to maximize its utility function. The utility function is designed to capture the tradeoff between the worth of covered targets and the energy consumption due to sensing.  We propose a distributed payoff-based learning algorithm that converges to an efficient action profile.  The performance of the proposed algorithm is evaluated via simulations and compared to previous approaches. The simulation results show that the proposed algorithm results in activating the minimum number of directional sensors. In addition, active sensors learn how to adjust their sensing ranges to maximize the coverage. These bring about less energy consumption along with network lifetime extension.
The paper is organized as follows: In section 2, we briefly review recent studies in the context of sensor coverage problems. In section 3, we introduce the MNLAR problem in DSNs. Section 4 presents the proposed method and its formulation based on game theory. In section 5, simulation results are presented through several experiments. Finally, we conclude the paper in section 6.

2-Related Work
In this section, we briefly review the research work on coverage in wireless sensor networks. The coverage problem is usually divided into three categories: area coverage, point coverage, and barrier coverage [2]. The purpose of area coverage is to cover the whole area. Point coverage is the problem of covering Points of Interest (PoI) in the area. The barrier coverage guarantees that every movement that crosses a barrier of sensors will be detected.
Habibi et al. [15] have proposed a distributed Voronoibased strategy to maximize the sensing coverage in a mobile sensor network. In this algorithm, each sensor moves through a gradient-based nonlinear optimization approach and is placed inside its Voronoi cell.
Ai et al. [16] have studied the problem of covering targets with directional sensors. They have formulated the problem as an optimization problem to maximize the coverage with a minimum number of sensors and proved that it is NP-complete. They have proposed several greedy heuristic methods to solve the problem. Mohamadi et al [17] have proposed two Greedy-based algorithms for target coverage in directional sensor networks with adjustable sensing ranges. To maximize the sensor lifetime, they have used both scheduling and adjusting sensing range techniques to form cover sets to cover all targets in the network.
In [18], the authors have provided a GA-based algorithm to find cover sets of directional sensors with appropriate sensor ranges in order to solve the MNLAR (Maximum Network Lifetime with Adjustable Ranges) problem. Yu et al. [19] have addressed the problem of providing Kcoverage along with prolonging the network lifetime in wireless sensor networks with both centralized and distributed protocols. They have introduced a new concept of Coverage Contribution Area (CCA). Based on this concept, a lower sensor spatial density is provided.
The authors in [20] have designed a probabilistic coverage preservation protocol (CPP) to achieve energy efficiency and ensure a certain coverage rate. The purpose of the proposed protocol is to select the minimum number of probabilistic sensors to reduce energy consumption. A graph model named Cover Adjacent Net (CA-Net) is proposed by Weng et al in [21] to simplify the problem of k-barrier coverage while reducing the complexity of computation. Based on the developed CA-Net, two distributed algorithms, called BCA and TOBA, are presented for energy balance and maximum network lifetime.
Mostafaei et al. [11] have proposed a distributed boundary surveillance (DBS) algorithm to cover the boundary and reduce the energy consumption of sensors. DBS selects the minimum number of sensors to increase the network lifetime using learning automata.
Li et al. [22] have proposed the Voronoi-based distribution approximation (VDA) algorithm. In the proposed algorithm, in order to maximize the coverage of the desired area, the most Voronoi edges are covered. In [23], the authors have proposed the distributed Voronoibased self-redeployment algorithm (DVSA), aiming to improve the overall field coverage of mobile directional sensor networks. This paper has utilized the geometrical features of Voronoi diagram and the advantages of a distributed algorithm.
Recently, game-theoretic approaches have been taken into consideration to solve coverage problems in WSNs [4,6,24,25]. In [26], the authors have proposed an algorithm based on game theory for the problem of maximizing coverage and reducing energy consumption. They have shown that the desired solution in this model is an NE strategy profile.
In [27], the authors have proposed a distributed learning method to maximize the area coverage in mobile directional sensor networks. Each sensor in collaboration with its neighbors tries to determine its best position and orientation.
The authors in [28] have considered the problem of area coverage without location information in mobile sensor networks. They have modeled this problem as a potential game and proposed a distributed learning algorithm to achieve an NE.
In [29] coverage of an unknown environment is investigated by robots. The state-based potential game was designed to control the robots' actions. The reward of sensing the areas and the penalty of energy consumption due to the sensors' movement are considered in the utility function. The sensors update their action profile using the Binary Log-Linear Learning (BLLL) in which the sensors must know an estimation of the outcome of their future actions. Hence, an estimation algorithm is proposed to assist the sensors in predicting the probability of targets in unknown areas. An improved EM algorithm is introduced to estimate the number of targets and other probability distribution parameters. In this study, we propose a game theory-based algorithm to optimally cover targets and reduce energy consumption.

3-Problem Definition
We consider a two-dimensional mission space, where n directional sensors with motility capability and a set of m target points, T, are initially located within a given area. We have defined several power levels, p, and a set of working directions, D, for sensor nodes. So, each sensor s i has two parameters, working direction and power level. Sensor nodes can rotate around their axis and adjust their power level to cover a sector area. So, a directional sensor, s i , monitors all targets within its sensing range and field of view. Each sensor has a limited energy resource. The amount of energy consumption is a function of sensing range; the greater sensing range, the more energy is consumed [30]. All the sensors in the network have the same characteristics in terms of initial battery power and energy consumption function. Since increasing the power level is equivalent to increasing the sensing range which results in covering more targets, for each sensor direction and each power level p > 1, we have ( ) , which (d i,j , p) is i th sensor with j th direction activated at power level of p.
is the set of covered targets by sensor s i . The power level p which is sufficient to cover target t j by sensor ( ) is minimal if ( ) and ( ) . It means that target t j cannot be covered by power level less than p. The notations used in this paper are listed in Table 1.
We assume the sensors are non-rechargeable. According to [17], the energy consumption due to displacement between the directions of a sensor is negligible, so it is ignored. Here, a positive parameter Δ p is defined at each power level p [30]. The parameter Δ p indicates the ratio between battery consumption at level p and level 1. Level 1 is the lowest and cheapest level. For example, if Δ p =2, then the battery consumption at level p is twice that of level 1 (Δ p =1). Problem. How to divide the available sensors into cover sets so that each cover set covers all the targets in the area of interest with the goal of prolonging the network lifetime as much as possible. In other words, the main challenge of this process is assigning the appropriate working direction and sensing range to each sensor in a way that full target coverage and maximum network lifetime can be achieved.
For a better understanding, consider Fig. 1.A, which includes a scenario with three targets and four directional sensors to monitor the targets in the network. Fig. 1.B shows that each sensor has three directions d i,j (1 ) and two sensing ranges. Consider the set of sensor nodes with their best parameters that cover each target t j . , , . the purpose is to form the best cover set. The possible cover sets include: . Therefore, in this example, the cover set C 4 is more desirable because of less energy consumption.

4-Proposed GT-Based Algorithm
In this section, we propose a game theory-based algorithm (GT-based algorithm) to target coverage. The new method is a solution to the MNLAR problem in DSNs. It converges to an efficient action profile using a distributed payoff-based learning algorithm. The output of the proposed algorithm is a cover set containing sensors with appropriate sensing ranges and working directions that can monitor all targets within the network. To calculate the activation time of the constructed cover set, we consider the sensor that minimizes . Then the residual energy of the sensors in the cover set is calculated. The sensors that have no residual energy are removed from the list of available sensors. The GT-based algorithm continues its operation until all targets are fully covered. Finally, the sum of the activation times is returned as the network lifetime.

4-1-Background in Game Theory
In this section, we consider a brief review of the concepts in game theory. More information about game theory and learning in game theory is mentioned in [31, 32]. The strategic game 〈 〉 has three components: The finite set of N players (sensors) where . An action set where is the finite action set of player i. The set of utility (payoff) functions . where the utility function models the benefit of player i over action profiles. For an action profile denotes the action profile of all players other than player i. Therefore, the action profile a can be represented as . The welfare of an action profile is defined as follows: if an action profile maximizes the welfare, then the action profile a is efficient. In other words: (2)

4-2-Game Formulation
Suppose m targets located in known locations in the area. A set of directional sensors with adjustable sensing ranges are deployed adjacent to the targets to completely monitor them. Sensors are static with variable sensing ranges between r min and r max . We assume that the communication range of each sensor i (R i ) is at least twice the r max ( ). Thus, each sensor can transmit state information to its neighbors and interact with its neighbors. The worth of each target is denoted with . Each sensor s i selects its mode from the set The sensor's direction is determined by d i and the set includes the defined working directions of sensor nodes. Each sensor s i chooses its sensing range r i from the discrete set . The action of each sensor (player) s i is shown by a vector and defined as follows: . As mentioned before, is a set of targets covered by sensor direction while its power level is a. For each target , represents the number of sensors that can observe the target point k and is defined as follows: The profit of observing the target point , which is shown by , is evenly divided by the sensors that observe . So, the utility that sensor s i obtains due to sensing is equal to Due to energy constraints, we consider the energy consumption parameter in the utility function. We assume that the energy consumption of sensor nodes is because of their sensing activity. So, the energy consumption of each sensor node depends on its sensing range and is defined as (5).
(5) in which is a weighting factor related to energy consumption. Therefore, the utility function of the sensor s i represents its contribution to the coverage task and energy consumption due to sensing. We consider the utility function for sensor s i as follows: In the following, we present a new distributed learning algorithm that leads to Pareto optimal outcomes.

4-3-Payoff-Based Learning Algorithm
The game G is repeated each time . In time stamp t, the sensors simultaneously select their actions, so the action profile is and each sensor receives utility . The sensor will select the action according to the probabilistic distribution .
represents the strategy of sensor at time t.
indicates the probability that the sensor selects action at time t according to the strategy . The sensor's strategy at time t depends on observations in previous times . The strategies of the sensors are updated by the information they have gathered. We know that the sensors here have limited observations. In this situation, sensors must learn to play an action profile that maximizes welfare. In this case, the sensors only have access to the actions they played and the utilities they received. Therefore, the strategy adjustment mechanism of sensor is as follows: Such an algorithm is called payoff-based or completely uncoupled [33]. It is proved in [5,[34][35][36][37]] that for finite strategic games, there are completely uncoupled learning rules that lead to Pareto optimal Nash equilibria. We use the learning rule presented in [5]. This distributed learning rule leads to the convergence of the game into a Pareto optimal action profile, which maximizes the welfare. Each sensor has a baseline action and a baseline utility, that is expected to play and receive, respectively. Each sensor has an internal state variable called mood. Mood defines the sensor behavior as follows. There are two distinct types of moods: content and discontent. When a sensor is in the content mood, baseline action is selected with high probability. When a sensor is in a discontent mood, an action differs from the baseline action is selected with a high probability. Each sensor updates its mood after choosing an action and receiving a payoff by comparing the action played and the payoff received with its baseline action and baseline payoff. At each time step, the sensor's state is represented by a triple ̅ ̅ , where  ̅ is the baseline action.  ̅ is the baseline utility.  is the mood of sensor i, which can be content (C) or discontent (D). The main steps of the distributed learning algorithm for Pareto optimality are described as follows: Step 1-Initialization:  At stage , each player randomly selects and plays any action, .  This action will be initially set as the player's baseline action at stage 1, ̅ .  Likewise, the player's baseline utility at stage 1 is initialized as ̅ .
 the player's mood at stage 1 is set as .
Step 2-Action Selection: At each subsequent stage , each player selects his action according to the following rules.
 If the mood of sensor i is content, i.e., , the sensor chooses an action according to the following probability distribution , | | ̅ ̅ where | | represents the cardinality of the set and c is a constant that satisfies .  If the mood of sensor i is discontent, i.e., , the sensor chooses an action according to the following probability distribution | | Note that the benchmark action and utility play no role in the sensor dynamics when the sensor is discontent.
Step 3-Baseline Action, Baseline Utility, and Mood Update: Each sensor i sends its status information including to its neighboring nodes or nodes that are less than or equal to twice the sensing range of the node i. Then each sensor i computes the payoff ( ) based on the data collected from neighbors. the state is updated according to the following rules.
 First, the baseline action and baseline utility at stage t+1 are set as ̅ ̅ ( )  The mood of sensor i is updated as follows.
Step 4-Return to Step 2 and repeat. The learning algorithm produces a sequence of action profiles , in which the behavior of a sensor i in each time depends on the baseline action ̅ , the baseline utility ̅ , and the mood . To converge the game into an efficient action profile, the game's structure must be interdependent [35]. In the following, the definition of interdependence is fully described. Definition1 (Interdependence): A finite game G is interdependent If for any action profile and any appropriate subset of the sensors , There is a sensor and a selection of actions ∏ so that . In general, the interdependence condition states that the sensors cannot be divided into two distinct subsets that do not interact with one another. For this reason, we assume that the sensors are deployed in the area so that they cannot be divided into two distinct subsets and the condition of interdependence is established in the game. Theorem1: Consider a finite interdependent game with n players. Under the distributed learning algorithm for Pareto optimization defined above, a state ̅ ̅ is stable if and only if the following conditions are met.
 The action profile ̅ optimizes the welfare ̅ ∑ ̅ .  The baseline actions and payoffs are aligned, i.e.
for each sensor i, ̅ ̅ .  All sensors are in the content mood i.e. for each sensor i, . Proof of the theorem depends on the resistance trees for Markov's decision process, and it has been proven in [38].

5-Simulation Results
In this section, we evaluate the performance of the proposed algorithm through several experiments. The algorithms are simulated on MATLABR2017b and implemented on a system with an Intel Core i7 processor, 3.4 GHz CPU, and 4 GB RAM. The most important criterion for evaluating the performance of the algorithm is the network lifetime. The network lifetime is defined as the time that the sensor nodes can cover all of the targets. Each experiment examines the impact of different parameters on the network lifetime. Here, to model a DNS, m targets and n directional sensors are deployed randomly and uniformly in a area. Each sensor has several working directions and sensing ranges. For each sensor node, only one working direction and one sensing range can be activated at each unit of time. By default, the number of sensors and targets are 100 and 10, respectively. Also, we have considered 3 working directions for each sensor. The sensing range of each sensor can be adjusted from 80-120(m) with incremental step 10(m). We assume that each sensor initially has one unit of energy. To establish the interdependence condition, those targets that are not monitored by any sensor direction are ignored and all sensors that cannot cover any targets are removed from the list of sensors. The simulation parameter is chosen as . According to [38], employing an annealing schedule √ in the learning algorithm guarantees the convergence to an efficient action profile.

Experiment 1.
This experiment is performed to provide an intuitive example for the implementation of the proposed algorithm. Five targets (m=5) and 20 directional sensors (n=20) are randomly deployed in the area. Fig. 2 shows the initial configuration of the network, and Fig. 3 shows the final configuration of the network after iterations. According to Fig. 3, in addition to full coverage of the targets, energy consumption in the area has decreased due to the adjustment of working direction and sensing range of all sensor nodes. In this figure, all targets are covered with the minimum number of sensors with the least overlapping area (each target is exactly in one sector). Consequently, it is clear that the result is a Pareto optimal action profile. Fig. 4 presents the evolution of the welfare for √ . The result shows the convergence of the welfare function to its maximum value. The reason is that energy consumption due to sensing is considered in the utility function. Experiment 2. This experiment is designed to evaluate the relationship between the number of directional sensors and the lifetime of the network. To this end, we have considered sensor networks with 60-100 sensor nodes. Fig.  5 shows that increasing the number of sensor nodes results in enhancing the network lifetime. The reason is that each target is covered with more sensors, so more cover sets are constructed and the network lifetime is increased. Simulation results demonstrate that as the number of sensors increases, the proposed GT-based algorithm performs better than the Genetic-based algorithm [18] and Greedy-based algorithm [17]. This is due to the iterative property of learning algorithms and the more efficient management of energy consumption in the proposed game. Experiment 3. This experiment is performed to determine how the number of targets affects the network lifetime. For this purpose, we have considered an area of interest with 6-14 target points. According to Fig. 6, network lifetime decreases when the number of targets increases. The reason is that as the number of targets increases, more sensors are needed to monitor them. This will cause the sensors to run out of energy earlier, so results in reducing the network lifetime. Fig. 7 shows the energy consumption of the cover sets in Greedy-based, Genetic-based, and GT-based algorithms based on the number of targets. As expected, an increase in the number of targets increases energy consumption. The reason is that more sensors are needed to cover more targets. The results confirm that the GT-based algorithm consumes less energy compared to the other two algorithms since the proposed algorithm activates fewer sensor nodes, so the energy consumption due to sensing in GT-based algorithm is less than Genetic-based algorithm and Greedybased algorithm. Experiment 4. This experiment is performed to investigate the effect of the sensing range on the network lifetime. The sensing range is fixed between 80 and 120 with incremental step 10. According to the results presented in Fig. 8, an increase in the sensing range leads to an improvement in the network lifetime. This is because of this fact that increasing the sensing range results in covering more targets, so fewer sensors are needed to cover all targets. In comparison with the other two algorithms, experiment results confirm that the proposed algorithm is more successful in terms of maximizing network. Fig. 2. The initial configuration of the network where "•" and "+" are sensor nodes and targets, respectively.

6-Conclusion
In this paper, we presented a new game theory-based algorithm for target coverage in networks containing sensors with multiple directions and sensing ranges to extend network lifetime. Due to the energy limitations in sensor networks, we formulate the target coverage problem as a finite strategic game in which a utility function is formulated to consider the tradeoff between energy consumption and coverage quality. To achieve an efficient action profile, we present a distributed payoff-based learning algorithm. The performance of our proposed algorithm was evaluated via simulations and compared to the greedy-based and genetic-based algorithms. The simulation results demonstrated the performance of our proposed algorithm and its superiority over previous approaches in terms of increasing the network lifetime.