Data Collection in Multi-Application Sharing Wireless Sensor Networks

Data Collection in Multi-Application SharingWireless Sensor NetworksHong Gao, Xiaolin Fang, Jianzhong Li, and Yingshu LiAbstract—Data sharing for data collection among multiple applications is an efficient way to reduce communication cost forWirelessSensor Networks (WSNs). This paper is the first work to introduce the interval data sharing problem which is to investigate howto transmit as less data as possible over the network, and meanwhile the transmitted data satisfies the requirements of all theapplications. Different from current studies where each application requires a single data sampling during each task, we studythe problem where each application requires a continuous interval of data sampling in each task. The proposed problem is anonlinear nonconvex optimization problem. In order to lower the high complexity for solving a nonlinear nonconvex optimizationproblem in resource restricted WSNs, a 2-factor approximation algorithm whose time complexity is Oðn2Þ and memory complexityis OðnÞ is provided. A special instance of this problem is also analyzed. This special instance can be solved with a dynamicprogramming algorithm in polynomial time, which gives an optimal result in Oðn2Þ time complexity and OðnÞ memory complexity.Three online algorithms are provided to process the continually coming tasks. Both the theoretical analysis and simulation resultsdemonstrate the effectiveness of the proposed algorithms.Index Terms—Data collection, data sharing, multi-application, wireless sensor networkÇ1 INTRODUCTIONWSN deployment is a difficult and time-consumingwork which requires much manpower or mechanicalpower. Once a network is deployed, it is expected to run fora long time without any human interruption. Therefore, itis inefficient to carry out only one application in a network.Sharing a network for multiple applications can significantlyimprove network utilization efficiency [1], [2], [3],[4], [20], [21]. Currently, it is popular for multiple applicationsto share a WSN. Each node in a network samples ata particular frequency and the sampled data is transmittedto the base station through multi-hops. All the applicationsprefer to receive all the sampled data. However, if all thesampled data is transmitted to the base station, thecommunication cost is high and network lifetime will bereduced. Fortunately, there may be some applicationsmonitoring the same physical attributes. In this case, acertain amount of data may not need to be repeatedlytransmitted back to the base station.Under the abovementioned scenario, carefully designeddata sharing algorithms are desired. Tavakoli et al. [5]proposed a data sampling algorithm for each node, suchthat the sampled data can be shared by as many applicationsas possible. Meanwhile, the amount of sampled data ateach node is reduced to a maximum level, reducing theoverall communication cost. In [5], each application consistsof a set of tasks. In each task, each node samples data once.As shown in Fig. 1, there are two applications running onthis node. Task T1 is for the first application, and Task T2 isfor the second one. T1 and T2 may overlap on the time axis,and both of themneed to sample data once.Anaivemethodis to sample data independently, e.g., s1 is sampled by T1and s2 is sampled by T2 as shown in Fig. 1a, resulting in twopieces of data s1 and s2. In [5], the authors designed a greedyalgorithm such that only one data sampling can serve bothapplications as shown in Fig. 1b.In many applications, data needs to be sampled for acontinuous interval as shown in Fig. 2, instead of samplingat a particular time point. For example, railway monitoringsystems collecting acoustic information [6], [7] need tosample data for a continuous interval. Volcanic andearthquake monitoring systems [8], [9], [10] also havesuch a requirement to measure vibrations. Habitat monitoringsystems for microclimate, plant physiology andanimal behavior [11], [12], [13] need to record wind speedand take video of animal behaviors, which again require tosample data for a continuous interval.This paper studies the interval data sharing problem of howto reduce the overall length of data sampling intervalswhich could be shared by multiple applications.We assumethere are multiple applications running on a same node,and each application consists of tasks. Each task requires tosample data for a continuous interval. In Fig. 2, T1 is for thefirst application, and T2 is for the second one. Both tasksneed to continuously sample data for an interval s. If twotasks sample data independently, two intervals of datawithlength s need to be sampled as shown in Fig. 2a. However,one interval of data with length s is enough if the startingpoints of data sampling of these two applications can beintelligently arranged. The data sampling interval lengthsfor different applications may be different, and for the same. H. Gao, X. Fang, and J. Li are with the Department of Computer Scienceand Technology, Harbin Institute of Technology, Harbin 150001, China.E-mail: {honggao, xlforu, lijzh}@hit.edu.cn.. Y. Li is with the Department of Computer Science and Technology,Harbin Institute of Technology, Harbin 150001, China, and also with theDepartment of Computer Science, Georgia State University, Atlanta, GA30303 USA. E-mail: yili@gsu.edu.Manuscript received 12 Aug. 2013; revised 3 Nov. 2013; accepted 4 Nov.2013. Date of publication 19 Nov. 2013; date of current version 9 Jan. 2015.Recommended for acceptance by A. Nayak.For information on obtaining reprints of this article, please send e-mail to:reprints@ieee.org, and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TPDS.2013.2891045-9219 _ 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2015 403application, tasks may have different data samplinginterval lengths. The investigated problem in this paper isto minimize the overall data sampling interval length ateach node while satisfying all the applications’ needs.We formulate the aforementioned problem as a nonlinearnonconvex optimization problem. Since sensor nodesare resource constrained, the cost to solve such a problemat each node is very high. Therefore, we propose a 2-factorgreedy algorithm with time complexity Oðn2Þ and memorycomplexity OðnÞ.We also consider a special instance wherethe data sampling interval lengths of all the tasks are thesame. The special instance could be solved with a dynamicprogramming algorithm in polynomial time, whose timecomplexity is Oðn2Þ and memory complexity is OðnÞ. Thecontributions of this paper are as follows.. This is the first work to study the interval datasharing problem, where each node samples data fora continuous interval instead of for a discrete datapoint. This problem is formulated as a nonlinearnonconvex optimization problem.. A greedy approximation algorithm is proposed tosolve the problem so as to reduce the cost of solvingthe nonlinear nonconvex optimization problem atresource restricted sensor nodes. The proposedalgorithm is proved to be a 2-factor approximationalgorithm. The time complexity of this algorithm isOðn2Þ, and the memory complexity is OðnÞ.. We analyze a special instance of the interval datasharing problem. We give a dynamic programmingalgorithm which gives an optimal result in polynomialtime. The time complexity is Oðn2Þ and thememory complexity is OðnÞ.. Three online algorithms are proposed to process thetasks one by one.. Extensive simulations were conducted to validatethe correctness and effectiveness of our algorithms.The rest of this paper is organized as follows. Section 2reviews the related works. Section 3 formally defines theinterval data sharing problem. Section 4 gives an algorithmto solve the problem and the approximation ratio isanalyzed. A special instance is investigated in Section 5. Adynamic programming algorithm is also presented in thissection to address the special instance. Section 6 proposesthree online algorithms. The performance evaluations areshown in Section 7 and Section 8 concludes this paper.2 RELATED WORKSOur problem is inspired by the work in [5], which studiesthe problem of data sharing among multiple applications.It assumes each application only needs discrete data pointsamplings. While in our problem, the applications mayrequire a continuous interval of data. The proposedsolution in [5] cannot be applied to our problem. However,our solution can solve their problem.Our problem is a novel one inWSNs. It tries to collect aslittle data as possible. Query optimization inWSNs [2], [14]tries to find in-network schemes or distributed algorithmsto reduce communication cost for aggregation queries. Ourwork focuses on reducing the amount of transmitted datafor each node.Multi-query optimization in database systems studieshow to efficiently process queries with common subexpressions[15], [16]. It aims at exploiting the commonsub-expression of SQLs to reduce query cost, while ourproblem aims at reducing data volume.Krishnamurthy et al. [17] considered the problem of datasharing in data streaming systems for aggregate queries.They studied the min, max, sum; and count-like aggregationqueries. A stream is scanned at least once and ischopped into slices. Only the slices that overlap amongmultiple queries could be shared. Their studied problemsare different fromours. We expect to reduce the number ofsensor samplings at each individual node resulting in lesscommunication cost. Our problem differs in that we wantto provide each application enough sampled data whileminimizing the total number of sampling times.3 PROBLEM DEFINITIONIn order to make our problem clear, we first introduce anexample as shown in Fig. 3. We have two applications, andeach application consists of many tasks. Application A1requires an interval of data of length l1 during each taskduration, and A2 requires an interval of data of length l2during each task duration. The task duration lengths of A1and A2 are different as shown in Fig. 3. Application A1consists of tasks T11; T12; . . . ; T1i, and so on. Application A2includes tasks T21; T22; . . . ; T2j, and so on. Take tasks T11,T12, T13, T21, and T22 as examples. The optimal solution isshown in the bottom part of Fig. 3. Tasks T11, T12 and T13pick the intervals I11, I12, and I13 respectively. The intervalsI11, I12, and I13 are all of length l1. Tasks T21 and T22 pick theintervals I21 and I22 respectively. The intervals I21 and I22are both of length l2. The optimal solution gives a result oflength s1 þ s2 in this example, as shown in the bottom partof Fig. 3, where the tasks are sorted according an ascendingorder of the ending time of the tasks.Fig. 2. Data sampling for a continuous interval. (a) Independentsampling. (b) Greedy sampling.Fig. 1. Data sampling at a time point. (a) Independent sampling.(b) Greedy sampling.404 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2015Data collected during the overlapped sampling intervalsof multiple tasks could be shared by these tasks. We aim atminimizing the overall length of the data sampling intervals.We now give some preliminary definitions.Definition 1. Define I ] I0 as the union of two intervals orinterval sets I and I0. For example, ½1; 5_ ] ½3; 7_ ¼ ½1; 7_,and f½1; 3_; ½5; 7_g] ½2; 6_ ¼ ½1; 7_.Definition 2. Define I \þ I0 as the overlap of two intervals I andI0. For example, ½1; 5_ \þ ½3; 7_ ¼ ½3; 5_.Definition 3. Define jIj as the length of interval I or the lengthof the union of the intervals in set I. For example, j½1; 5_j ¼ 4,j½1; 5_ ] ½3; 7_j ¼ 6, and j½1; 3_ ] ½5; 7_j ¼ 2.Definition 4. I_þ I0 means interval I is a sub-interval of I0. Forexample, ½2; 3_ ] _þ ½1; 5_.Given a set of n tasks T ¼ fTig, i ¼ 1; 2; . . . ; n. Each taskTi is a three-tuple Ti ¼ hbi; ei; lii, where bi denotes thebeginning time, ei represents the end time, and li meansthat Ti needs an interval of data with length li. It is assumedthat li _ ei _ bi. The problem is to find a continuous subintervalIi in interval ½bi; ei_, i.e., Ii_þ ½bi; ei_, for every tasksatisfying jIij ¼ li, so that the length of the union of all thesub-intervals on the time axis is minimized, i.e., j ]ni¼1 Iij isminimum. Note that sub-interval Ii is continuous.The bottom part of Fig. 3 illustrates an example. Sincesensor nodes have limited communication and computationalcapabilities, we want to find a set of sub-intervals I11,I21, I12, I22, and I13 for tasks T11, T21, T12, T22, and T13respectively, such that jI11 ] I21 ] I12 ] I22 ] I13j is minimum.In the example shown in Fig. 3, the optimal solutionis s1 þ s2, and all the tasks can obtain the expected data inintervals s1 and s2.We now formally define the interval data sharingproblem.Definition 5. Given a set of n tasks T, each task Ti is a threetupleTi ¼ hbi; ei; lii, that is, each task Ti has a beginning timebi,an end time ei, and an data sampling interval length li. Theproblem is to find a continuous sub-interval Ii for each taskso as tomin]ni¼1Ii__________(1)s.t.Ii_þ ½bi; ei_; i ¼1; 2; . . . ; n (2)jIij ¼ li; i ¼1; 2; . . . ; n: (3)The objective function of this problem is nonlinear. So ifbi, ei, and li are real numbers, the problem is a nonlinearproblem which has no efficient universal solution [18]. It iseasy to find that the objective function is nonconvex.Several methods are available for solving nonconvexoptimization problems. For example, one approach is touse special formulations of linear programming problems.Another method employs the branch and bound techniques,where the problem is divided into subclasses to besolved with convex or linear approximations that form alower bound on the overall cost within the subdivision.However, all these methods require high computationcomplexity which are impractical to be implemented onsensor nodes. Since digital signals are discrete, dataintervals can be regarded as integer sequences. Therefore,bi, ei, and li can be regarded as integers. The integervariables make the problem a nonlinear integer programmingproblem [19] which is hard to be solved.4 A 2-FACTOR APPROXIMATION ALGORITHMA naive method is to initiate a continuous data samplinginterval at the beginning time of each task independently.However, this method results in a large amount of data. Inthis section, we present a greedy algorithm which is a2-factor approximation algorithm for our interval datasharing problem. Before we present the approximationalgorithm, we propose a solution for the special case whereevery task overlaps with each other.4.1 Tasks Overlapped with Each OtherFor ease of understanding, we first define satisfy asfollows.Definition 6. We say that an interval I satisfies a task Ti ifjI \þ ½bi; ei_j _ li. An interval set S satisfies a task Ti if thereexists an interval I in S such that jI \þ ½bi; ei_j _ li.If all the tasks overlap with each other, then the intervaldata sharing problemcan be solved in polynomial time.Analgorithm is presented as follows.Step 1) Sort the tasks in an ascending order by theirend times.Step 2) Pick the sub-interval of length l1 at the end ofthe first task T1, i.e., pick sub-interval½e1 _ l1; e1_.Fig. 3. Interval data sampling for multi-applications.GAO ET AL.: DATA COLLECTION IN MULTI-APPLICATION SHARING WIRELESS SENSOR NETWORKS 405Step 3) Pick a sub-interval for each task from thesecond to the last. Take Ti as an example, if theunion of the picked sub-intervals satisfy Ti, donothing and continue to pick a sub-interval forthe next task Tiþ1. If it does not satisfy Ti,extend forward from the tail of the picked subintervals.If it still does not satisfy Ti, extendbackward from the head of the picked subintervals.The pseudo code for tasks overlapped with each other isdescribed in Algorithm 1 in Appendix which is available inthe Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.289. TakeFig. 4 as an example. Task T1, T2, and T3 overlap witheach other. T1 needs a data interval of length l1 ¼ 4, T2needs an interval of length l2 ¼ 3, and T3 needs an intervalof length l3 ¼ 9. First, the tasks are sorted in an ascendingorder by their end times. Second, pick the sub-interval oflength 4 at the end of T1. The picked interval for T1 isI ¼ ½7; 11_. Third, I satisfies task T2, so nothing is done forT2. Forth, I does not satisfy T3, thus, I is extended forwarduntil the end time of T3, at this time I ¼ ½7; 14_. But I stilldoes not satisfy T3, I is then extended backward from thehead of the picked interval to get I ¼ ½5; 14_ which satisfiesall these three tasks. The time complexity is Oðn log nÞ dueto the sorting step. If the tasks are pre-sorted, the timecomplexity is OðnÞ.One can find that the optimal interval I ¼ ½s; e_ for tasksoverlappedwith each other can be also obtained by anothermethod. An optimal interval I ¼ ½s; e_ can be derived fromthe following equations:s ¼ minni¼1fei _ lig (4)e ¼ max maxni¼1fbi þ lig; maxni¼1fs þ lig; minni¼1feig_ _: (5)The second method is described in Algorithm 2 inAppendix, which obtains the same result as Algorithm 1.This algorithm consists of two phases. Take Fig. 4 as anexample again. In the first phase, it needs to find thebeginning time s. In this example, s is the minimum ei _ li,and it is easy to find that s ¼ 5. In the second phase, we findthat e ¼ 14 which is the maximum s þ li in this example.Thus, an optimal interval is obtained which is [5, 14]. As wecan see, the case where tasks overlap with each other can besolved in time complexity Oð2nÞ ¼ OðnÞ with Algorithm 2.This algorithm does not require a sorting step. However, ifthe tasks are pre-sorted, Algorithm 1 is no worse thanAlgorithm 2. As shown in the later section, our approximationalgorithm pre-sorts the tasks, so either algorithmcan be used as a sub-process in our following approximationalgorithm.Lemma 1. Let Tm be the task with the minimum end time, i.e.em ¼ minni¼1ei. Then picking sub-interval ½em _ lm; em_ doesnot result in a worse result.Based on Lemma 1, we can find that Algorithm 1 andAlgorithm 2 are optimal. This is because, in the case wheretasks overlap with each other, pick the end sub-interval ofthe task which has the minimum end time will not resultin a worse result. Therefore, the overall result can bederived by extending this picked sub-interval forward andbackward.4.2 2-Factor Approximation AlgorithmWe now present our greedy approximation algorithm.First, sort all the tasks by the end time in an ascendingorder. Second, identify a subset of tasks that overlap withT1. It is easy to find that these tasks overlap with each other.Find the minimum interval that satisfies the tasks in theidentified subset by using Algorithm 1. Third, remove thepreviously identified tasks. Repeat the second and the thirdsteps for the remaining tasks until all the tasks areremoved. One can refer to Algorithm 3 in Appendix forthe detailed process.Fig. 5 illustrates the process of the greedy approximationalgorithm. The five tasks are sorted in an ascending orderby end time. In the first step, tasks T1, T2, and T5 are identifiedas a subset of tasks that overlap with each other. Onecan find that, if the tasks are sorted by end time, all thetasks which overlap with T1 also overlap with each other.Now, Algorithm 1 can be used to compute the interval thatsatisfies these three tasks. After that, the three tasks T1, T2,and T5 are removed. In the second step, T3 and T4 areidentified as a subset of tasks that overlap with each other.Now, Algorithm 1 is employed again to compute the intervalthat satisfies these two tasks. The union of the two foundintervals is the final result of this example returned byAlgorithm 3 in Appendix available online.Theorem 1. Algorithm 3 is a 2-factor approximation algorithm.A tight example is shown in Fig. 6. Algorithm 3 derivesan interval of length l for tasks T1 and T3 which overlapwith each other in the first iteration. Then it derives aninterval of length l for task T2 in the second iteration.Algorithm 3 returns a final result of length 2l as shown inFig. 6a. However, there exists an optimal solution whichderives an interval of length ” for T1 and an interval oflength l for T2 and T3 as shown in Fig. 6b. This optimalsolution returns a result of length ” þ l. Therefore, it deriveslim”!02lð”þlÞ ¼ 2.Fig. 4. Tasks overlapped with each other.Fig. 5. Illustration of the approximation algorithm.406 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2015The time complexity of Algorithm 3 is Oðn2Þ due to thestep of identifying tasks which overlap with the firstremaining task in each iteration.5 MULTIPLE TASKS WITH SAME DATA SAMPLINGINTERVAL LENGTHIn this section, we study a special instance of the intervaldata sharing problem where the length of the datasampling interval of all the tasks is the same. Differentfrom the general problem, this special instance can besolved with a dynamic programming algorithm.Given a set of tasks T ¼ fT1; T2; . . . ; Tng and a positiveinteger l, each task Ti is denoted as Ti ¼ hbi; ei; li, where bi isthe beginning time and ei is the end time. The problem is tofind a continuous sub-interval of length l for each task Ti in½bi; ei_, so that the length of the union of all the picked subintervalson the time axis is minimized.Definition 7. In the same data sampling interval length case, atask Ti covers Tj if ½bj; ej_ is a sub-interval of ½bi; ei_, that is½bj; ej__þ ½bi; ei_.One can find that in the same data sampling intervallength case, tasks which cover some other tasks can beremoved. This is because any interval that satisfies thecovered shorter task must satisfy the longer task. In Fig. 7a,task T2 covers T1. If they have the same data samplinginterval length, then any interval I that satisfies T1 satisfiesT2. Therefore, we do not have to consider T2, and T2 can beremoved. As shown in Fig. 7b, we can get the same resultafter removing T2.Lemma 2. Let the data sampling interval length of all the tasksbe the same. If Ti covers Tj, i.e., ½bj; ej__þ ½bi; ei_ for any i; j ¼1; 2; . . . ; n, any interval that satisfies Tj satisfies Ti.After removing the tasks which cover other tasks, theproblem could be solved with a dynamic programmingalgorithm. Let T0 ¼ fT01; T02; . . . ; T0mg be the set of tasks anyof which does not cover some other task. Assume thatT01; T02; . . . ; T0mare sorted in an ascending order by end time.We have the following lemma.Lemma 3. In T0 ¼ fT0i; T0iþ1; . . . ; T0jg, b0p G b0q and e0p G e0q fori _ p G q _ j.Let Iði; jÞ be the interval that satisfies both T0i and T0j ,i _ j. We define Iði; jÞ as follows:Iði; jÞ ¼e0i _ l; e0i_ _if b0j _ e0i _ le0i _ l; b0j þ lh iif e0i _ l G b0j G e0iþ1 if b0j _ e0i.8><>:(6)There are only two cases when T0i overlaps with T0j , i.e.,T0i \þ T0j 6¼ ;, i _ j. In the first case, T0j covers interval½e0i _ l; e0i_ as shown in Fig. 8a, then we let Iði; jÞ ¼½e0i _ l; e0i_. In the second case, T0j overlaps with interval½e0i _ l; e0i_ as shown in Fig. 8b, then we let Iði; jÞ ¼½e0i _ l; b0j þ l_. When T0i overlaps with T0j, we define Iði; jÞas presented in the first two equations in Equation (6) basedon Lemma 1.When T0i does not overlap with T0j, i.e. T0i \þ T0j ¼ ;, i _ j,we define Iði; jÞ ¼ þ1as presented in the last equation inEquation (6).Lemma 4. Iði; jÞ in Equation (6) satisfies all the tasks T0i; T0iþ1;. . . ; T0j .Let f ðiÞ be the result with minimum length of the unionof the results from tasks T0i; T0iþ1; . . . ; T0m, where ½e0i _ l; e0i_is picked. Let gðiÞ be the index x which results in theminimum length of the union of the results fromT0i; T0iþ1; . . . ; T0m. Then fðiÞ and gðiÞ could be represented asfollows:fðiÞ ¼Iði; gðiÞÞ ] fðgðiÞ þ 1Þ 1 _ i G me0m _ l; e0m_ _i ¼ m; i 9 m8<: (7)gðiÞ ¼argmini_x_mfjIði; xÞ ] fðx þ 1Þjg 1 _ i G mm i¼ m:((8)An example is shown in Fig. 9, and the process of thisexample is presented in Table 1.First, we compute Iði; jÞ. By Equation (6), we deriveIði; jÞ in Table 1a. T01overlaps with T02and T03, so we getIð1; 1Þ ¼ ½3; 7_, Ið1; 2Þ ¼ ½3; 8_ and Ið1; 3Þ ¼ ½3; 10_. T02overlapswith T03, sowe derive Ið2; 2Þ ¼ ½5; 9_ and Ið2; 3Þ ¼ ½5; 10_.T03overlaps with T04, so we derive Ið3; 3Þ ¼ ½12; 16_ andIð3; 4Þ ¼ ½12; 16_. We have Ið4; 4Þ ¼ ½14; 18_.Then, we compute fðiÞ. By Equations (8) and (7), fðiÞ isobtained in Table 1b. As represented in Equation (7), we getfð5Þ ¼ ; first. By recalling the definition of fðiÞ, fð4Þ ¼Ið4; 4Þ ¼ ½e04 _ l; e04_. Then fð3Þ is the one with less length ofthe union of the intervals between Ið3; 3Þ ] fð4Þ andIð3; 4Þ ] fð5Þ, thus we get fð3Þ ¼ Ið3; 4Þ. After that, fð2Þ isthe one with smaller length of the union of the intervalsFig. 6. Tight example. (a) Greedy result. (b) Optimal result.Fig. 7. Example of covering. (a) Before removing. (b) After removing.Fig. 8. Illustration of computing Iði; jÞ. (a) Case 1. (b) Case 2.GAO ET AL.: DATA COLLECTION IN MULTI-APPLICATION SHARING WIRELESS SENSOR NETWORKS 407between Ið2; 2Þ ] fð3Þ and Ið2; 3Þ ] fð4Þ, and we obtainfð2Þ ¼ Ið2; 2Þ ] fð3Þ. Finally, fð1Þ is the one with thesmallest length of the union of the intervals amongIð1; 1Þ ] fð2Þ, Ið1; 2Þ ] fð3Þ, and Ið1; 3Þ ] fð4Þ, and we getfð1Þ ¼ Ið1; 2Þ ] fð3Þ. The dynamic programming algorithmis described in Algorithm 4 in Appendix available online.In Algorithm 4, the tasks are sorted in ascending orderby end time in line 1. Lines 2-7 remove the tasks whichcover other task. fðiÞ is computed in lines 10-20. Line 15checks whether T0xoverlaps with T0i. If T0xdoes not overlapwith T0i , nothing is done. Break the loop because all the latertasks will not overlap with T0i. If T0xoverlaps with T0i, thealgorithm needs to record the best index gðiÞ and theminimum result. The final result is fð1Þ.Lemma 5. The special instance where tasks have the same datasampling interval length could be solved in time complexityOðn2Þ and memory complexity OðnÞ.6 ONLINE ALGORITHMSThree online algorithms are presented in this section for thesituationwhere tasks come one by one. Although the onlinealgorithms may not obtain optimal solutions, they generatereasonable results in our experiments.A task Ti is denoted as Ti ¼ hbi; ei; lii, where bi is thebeginning time and ei is the end time. In real applications,tasks arrive in sequence by beginning time. For an arrivingtask Ti, an online algorithm picks a sub-interval of length liin ½bi; ei_, so as to minimize the union length of all thepicked sub-intervals.The general online algorithm is described as follows. Letthe set of picked sub-intervals from task T1 to task Ti_1 befði _ 1Þ. When task Ti arrives, the online algorithmpicks asub-interval for Ti based on fði _ 1Þ. The sub-interval canbe picked by differentmethods.We compare theminimumincrement,the latest-overlap and the maximum-overlapmethods in this paper.Before presenting the three methods, an extensionprocess is introduced first.6.1 Extension ProcessGiven an interval I, this section introduces how to extend Ito satisfy the arriving task Ti. As shown in Fig. 10, therelationship between I and Ti could be of five cases.Figs. 10a and 10e are the cases where I does not overlapwith Ti. Figs. 10b and 10d are the cases where I partlyoverlaps with Ti. Fig. 10c is the case where I is within Ti.If I ¼ ½b; e_ is empty, or it does not overlap with Ti likeFigs. 10a and 10e, then pick end sub-interval ½ei _ li; ei_. If Idoes not satisfy Ti, and it overlapswith Ti like Fig. 10b, thenpick the sub-interval ½bi; bi þ li_. In the cases shown inFigs. 10c and 10d, pick sub-interval ½ei _ li; ei_ if ½b; b þ li_exceeds Ti, otherwise, pick ½b; b þ li_ if ½b; b þ li_ exceeds I.The main idea of the extension process is to extend forwardfirst. The extension process is described in Algorithm 6 inAppendix.6.2 Minimum-Increment MethodBased on the extension process, when task Ti arrives, theminimum-increment method selects any two intervals infði _ 1Þ, and finds the pair with the minimum incrementallength. The minimum-increment method is described inAlgorithm 7 in Appendix available online.Take Fig. 11 as an example, where fði _ 1Þ ¼ fs1; s2; s3;s4; s5g and js1j ¼ 2, js2j ¼ 5, js3j ¼ 2, js4j ¼ 5, js5j ¼ 7. Let thedata sampling interval length for Ti be 7. It is easy to findthat in the minimum-increment method, [3, 11] is theminimum incremental solution which includes s2, s3 andan additional incremental interval [8, 9]. The incrementallength is 1. The minimum-increment method finds a localoptimal solution for Ti.6.3 Latest-Overlap MethodWhen task Ti arrives, the latest-overlap method is to findthe interval s in fði _ 1Þ that overlaps with ½bi; ei_ the latest.The latest-overlap method is described in Algorithm 8 inAppendix.In Fig. 11, s5 is the latest interval that overlaps with Ti.The latest-overlap method tries to find a solution for thelater tasks. The solution may satisfy the later tasks, thus,the overall result may be better.6.4 Maximum-Overlap MethodWhen task Ti arrives, the maximum-overlap method is tofind the interval s in fði _ 1Þ that overlaps with ½bi; ei_ to amaximumamount, i.e.,maxs2fði_1Þ js \þ ½bi; ei_j. Themaximumoverlapmethod is described in Algorithm 9 in Appendixavailable online.Fig. 9. Example for the dynamic programming algorithm.TABLE 1Computing Iði; jÞ and fðiÞ for the Example in Fig. 9.(a) Computing Iði; jÞ. (b) Computing fðiÞFig. 10. Relationship between I and Ti.408 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2015Take Fig. 11 as an example again. In the maximumoverlapmethod, s4 is selected in Algorithm 9, and the datasampling interval for Ti is [14, 21]. The maximum-overlapmethod considers both Ti and the later tasks. The solutionmay not be optimal for Ti, but the overall result for Ti andthe later tasks may be better.6.5 Performance AnalysisAn online algorithm may not derive an optimal solution forall the tasks. All the above methods pick a sub-interval forthe arriving task based on the previous information. Theycannot be global optimal solutions. In the worst situation,every task picks a sub-interval independently, and theapproximation ratio is n. A worst instance is presented inFig. 12. In this example, one interval is enough for all thetasks in an optimal solution, while all the three onlinemethods derive a result with n intervals.However, the worst instance is almost impossible tooccur in the multi-application data sharing problem. In theworst instance, the tasks overlap with each other as shownin Fig. 12. We will analyze the probability that the worstinstance occurs. Assume the tasks arrive randomly. Let theoverall time interval be ½0; t_, and all the tasks are withinthis interval. Let task Ti in Fig. 12 be Ti ¼ hbi; ei; lii, and let_i ¼ ei _ bi. If the worst instance occurs, the probability thatthe longest task is fixed is 1t__1. Because t is usually a largenumber, _i _ t, and the probability could be representd as1t__1_ 1t. The probability that the second longest task iscovered by T1 is _1__2t__2_ 1t. Similarly, the probability of Ti is_i_1__it__i_ 1t. Therefore, the probability of the worst instanceis ðt _ _1Þ 1tn _ 1tn_1, which indicates that the worst instance isnearly impossible to occur. Our simulation results confirmthis analysis in the next section, where the performances ofthe online algorithms are much better than the theoreticalresults.7 PERFORMANCE EVALUATION7.1 Simulations with TOSSIMWe evaluate the effectiveness of the proposed algorithmsthrough simulations. The simulations are implementedwith TOSSIM which is a widely used simulation tool forWSNs. Four cases are tested. In each case, four applications,each with different task durations and different datasampling interval lengths are tested. In the first case, thetask durations of the four applications are 11, 13, 17, and19 unit time respectively. The task durations are 13, 17, 19,and 23 unit time respectively in the second case. The taskdurations of the third case are 17, 19, 23, and 29 unit time,and 19, 23, 29, and 31 for the forth case. We assume thatsensor nodes can sample once and obtain one unit data ineach unit time. The sensor nodes run Algorithm 4 everymaxTime unit time, where maxTime is the window sizeaccording to the computation ability of the sensor nodes.Higher computation ability allows larger maxTime. Thegreedy algorithm and the online algorithms are comparedwith the naive method which is introduced in Section 4.The naive method initiates a continuous data sampling atthe beginning of each task independently.7.1.1 Impact of Interval LengthIn the first set of simulations, we evaluate the performanceof the proposed algorithms in terms of the amount ofsampled data. The data sampling interval lengths for everycase are 2, 3, 5, and 7 unit time. It can be seen from Fig. 13that the naive method samples much more data than theoptimal solution, and it cannot be bounded. In thesimulations, maxTime is set to 150 unit time. Our greedyalgorithm samples more data than the optimal solution, butit is always no more than twice of the optimal result.Compared with the naive method, our algorithm samplesalmost 200 percent less data when the data samplinginterval length is short. One can also find that when thetask duration increases, the amount of data sampled byboth the naive method and the greedy algorithm decreases.Although the online algorithms give bad results in theworst situation in theoretical analysis, they have acceptableresults in the simulations as shown in Fig. 13, where minInc,maxOv, latestOv represent the minimum-increment method,the maximum-overlap method and the latest-overlap methodrespectively. It can be found in Fig. 13 that the data amountsampled by all these three online algorithms is larger thanthe optimal solution and the greedy algorithm, but lowerthan the naive method. These three online methods almosthave the same performance in all the simulations exceptcase 1, where minInc incurs more data. It indicates thatmaxOv and latestOv may be better in an online process.Fig. 11. Illustration of the online algorithm.Fig. 12. Worst instance.Fig. 13. Data amount for shorter interval lengths.GAO ET AL.: DATA COLLECTION IN MULTI-APPLICATION SHARING WIRELESS SENSOR NETWORKS 409Both maxOv and latestOv try to sample data for thearriving task as late as possible. Such a method couldsample data that may satisfy the later coming tasks. minIncis likely to pick data that is in the beginning part of thearriving task, which may not satisfy the later coming tasks.7.1.2 Impact of Window Size maxTimeThe next group of simulations is to evaluate howmaxTimeaffects the amount of sampled data. In the simulations, thetask durations of the four applications are 11, 13, 17, and19 unit time respectively, and the data sampling intervallengths for every case are 2, 3, 5, and 7 unit timerespectively. The result is shown in Fig. 14. The amountof sampled data changes slightly for different maxTimesettings. As maxTime increases, the amount of sampleddata increases. However, the average amount of data doesnot vary a lot. This observation means that it is notnecessary for the sensor nodes to take care of a largemaxTime. A small maxTime is already enough to derive agood result.As shown in Fig. 14, the online methods sample moredata than the optimal and the greedy algorithm, but lessthan the naive method in different cases with differentmaxTime settings. The three online methods have similarperformance. minInc derives slightly more data thanmaxOv and latestOv for some maxTime settings. Thereason is that minInr has a higher probability to pick datainterval that will not satisfy the later coming tasks.7.1.3 Impact of Node DensityNext we evaluate the impact of node density on the amountof sampled data. In the simulations, the area width of thenetwork is set to 100 m, the communication range is set to40 m, and the number of nodes increases from 10 to 160. Asthe node density increases, the amount of sent dataincreases, but the amount of data received by the basestation does not increase proportionally. Fig. 15 illustratesthe data loss rate of the proposed algorithms for networkswith different node densities.When the number of nodes is160, the naive method loses more than 30 percent of thesent sampled data. Data loss rate increases sharply in densenetworks when the traffic is heavy. This is becauseunreliable wireless links and retransmissions result inserious communication congestion. The greedy and theonline methods sample less data, thus the traffic carried onthe network is not quite heavy, and the data loss rate islower.7.1.4 Impact of Network ScaleFig. 16 shows how the data loss rate is affected by networkscale. In the simulations, the density is 10 nodes per50     50 m2, the communication range is 40 m, and the areawidth of the network increases from 50 m to 250 m. Thenaive method and the greedy algorithm have a similar lossrate in small scale networks. When the network scale isvery large, the data loss rate of the naive method is almost70 percent. This is because the naive method samples alarge amount of data which result in numerous collisions inlarge scale networks. The optimal solution and the greedyalgorithm which sample less amount of data show a betterresult. The online methods sample almost the same amountof data, thus their data loss rates are not quite different.Fig. 14. Data amount for different maxTime settings.Fig. 15. Data loss rate.Fig. 16. Data loss rate.410 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 2, FEBRUARY 20157.2 Simulations with More TasksIn this section, we investigate the performance of theproposed algorithms with large number of tasks. Weimplement the algorithms in C/C++. 10000 tasks are randomlygenerated in interval [1, 3000000]. The task durationlengths are at most 100.We first illustrate how the short data sampling intervallengths affect the amount of sampled data. The datasampling interval length for each task is at most 1/3 of thetask duration length. Seven cases are tested as shown inFig. 17. As the number of tasks is large, it is very difficult tofind the optimal solution. Therefore, the optimal result isnot presented in this and the next group of simulations. InFig. 17, the greedy algorithm derives the least data amount,and the naive method samples more data than other algorithms.Similar to the results in the simulations withTOSSIM, minInc samples the most data among the threeonline algorithms, while the other two methods havesimilar performance.In the next group of simulations, we show how thelonger data sampling interval lengths affect the amount ofsampled data. The data sampling interval length for eachtask is at most the task duration length. Seven cases aretested as shown in Fig. 18. The result is similar to thesimulations with short data sampling interval lengthsexcept that all the proposed algorithms derives more data.8 CONCLUSIONData sharing for multiple applications is an efficient way toreduce communication cost in WSNs. Many applicationsneed a continuous interval of data sampling periodically.This paper is the first work to introduce the interval datasharing problem among multiple applications, which is anonlinear nonconvex optimization problem. Since noefficient universal solution has been found for this problem,we provide a greedy approximation algorithm to lower thehigh computational complexity of the available solutions.We prove that the provided greedy algorithm is a 2-factorapproximation algorithm. The time complexity of thisalgorithm is Oðn2Þ and the memory complexity is OðnÞ. Ina special instance where all the tasks have the same datasampling interval length, the problem can be addressed inpolynomial time, and a dynamic programming algorithm isprovided for this special instance. The time complexity ofthe dynamic programming algorithm is Oðn2Þ and thememory complexity is OðnÞ. Because the tasks are comingone by one, three online algorithms are also provided.Although the online algorithmsmay sample a large amountof data in theoretical analysis, they show acceptableperformance in the simulations.ACKNOWLEDGMENTThis work was supported in part by the Major Program ofNational Natural Science Foundation of China under grantNo. 61190115, the National Basic Research Program ofChina (973 Program) under Grant 2012CB316200, theNational Natural Science Foundation of China (NSFC)under Grants 61033015, 60933001, and 61100030.REFERENCES[1] W.I. Grosky, A. Kansal, S. Nath, J. Liu, and F. Zhao, ‘‘Senseweb:An Infrastructure for Shared Sensing,’’ IEEE Multimedia, vol. 14,no. 4, pp. 8-13, Oct.-Dec. 2007.[2] N. Trigoni, Y. Yao, A. Demers, and J. Gehrke, ‘‘Multi-Query Optimizationfor Sensor Networks,’’ in Proc. DCOSS, 2005, pp. 307-321.[3] M. Li, T. Yan, D. Ganesan, E. Lyons, P. Shenoy, A. Venkataramani,and M. Zink, ‘‘Multi-User Data Sharing in Radar Sensor Networks,’’in Proc. 5th Int’l Conf. Embedded Netw. SenSys, 2007, pp. 247-260.[4] Y. Xu, A. Saifullah, Y. Chen, C. Lu, and S. Bhattacharya, ‘‘NearOptimal Multi-Application Allocation in Shared Sensor Networks,’’in Proc. 11th ACM Int’l Symp. MobiHoc, 2010, pp. 181-190.[5] A. Tavakoli, A. Kansal, and S. Nath, ‘‘On-Line Sensing TaskOptimization for Shared Sensors,’’ in Proc. 9th ACM/IEEE Int’lConf. IPSN, 2010, pp. 47-57.[6] S. Ganesan and R.D. Finch, ‘‘Monitoring of Rail Forces by UsingAcoustic Signature Inspection,’’ J. Sound Vibration, vol. 114, no. 2,pp. 165-171, Apr. 1987.[7] M. Cerullo, G. Fazio, M. Fabbri, F. Muzi, and G. Sacerdoti,‘‘Acoustic Signal Processing to Diagnose Transiting ElectricTrains,’’ IEEE Trans. Intell. Transp. Syst., vol. 6, no. 2, pp. 238-243,June 2005.[8] L. Cheng and S.N. Pakzad, ‘‘Agility of Wireless Sensor Networksfor Earthquake Monitoring of Bridges,’’ in Proc. 6th INSS,June 2009, pp. 1-4.Fig. 17. Data amount for short interval lengths. Fig. 18. Data amount for longer interval lengths.GAO ET AL.: DATA COLLECTION IN MULTI-APPLICATION SHARING WIRELESS SENSOR NETWORKS 411[9] M. Suzuki, S. Saruwatari, N. Kurata, and H.Morikawa, ‘‘A High-Density Earthquake Monitoring System Using Wireless SensorNetworks Proc. SenSys, 2007, pp. 373-374.[10] R. Tan, G. Xing, J. Chen, W. Song, and R. Huang, ‘‘Quality-Driven Volcanic Earthquake Detection Using Wireless SensorNetworks,’’ in Proc. IEEE 31st RTSS, Dec. 2010, pp. 271-280.[11] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson,‘‘Wireless Sensor Networks for Habitat Monitoring,’’ in Proc. 1stACM Int’l Workshop WSNA, 2002, pp. 88-97.[12] R. Szewczyk, A. Mainwaring, J. Polastre, J. Anderson, and D. Culler,‘‘An Analysis of a Large Scale Habitat Monitoring Application,’’ inProc. 2nd Int’l Conf. Embedded Netw. SenSys, 2004, pp. 214-226.[13] R. Szewczyk, E. Osterweil, J. Polastre, M. Hamilton, A. Mainwaring,andD. Estrin, ‘‘HabitatMonitoringwith SensorNetworks,’’ Commun.ACM, vol. 47, no. 6, pp. 34-40, June 2004.[14] S. Xiang, H.B. Lim, K.-L. Tan, and Y. Zhou, ‘‘Two-Tier MultipleQuery Optimization for Sensor Networks,’’ in Proc. 27th ICDCS,2007, p. 39.[15] T.K. Sellis, ‘‘Multiple-Query Optimization,’’ ACM Trans. DatabaseSyst., vol. 13, no. 1, pp. 23-52, Mar. 1988.[16] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe, ‘‘Efficient andExtensible Algorithms for Multi Query Optimization,’’ in Proc. ACMSIGMOD Int’l Conf., 2000, pp. 249-260.[17] S. Krishnamurthy, C. Wu, and M. Franklin, ‘‘On-the-Fly Sharingfor Streamed Aggregation,’’ in Proc. ACM SIGMOD Int’l Conf.,2006, pp. 623-634.[18] D.P. Bertsekas, Nonlinear Programming. Belmont, MA, USA:Athena Scientific, 1999.[19] D. Li and X. Sun, Nonlinear Integer Programming. NewYork,NY,USA: Springer-Verlag, 2006.[20] S. Cheng, J. Li, and Z. Cai, ‘‘O(“)-Approximation to PhysicalWorld by Sensor Networks,’’ in INFOCOM, 2013, pp. 3084-3092.[21] J. Li, S. Cheng, H.Gao, andZ.Cai, ‘‘Approximate PhysicalWorldReconstruction Algorithms in Sensor Networks,’’ in TPDS, 2014.Hong Gao received the BS degree in computerscience from Heilongjiang University, China, theMS degree in computer science from HarbinEngineering University, China, and the PhD degreein computer science from Harbin Institute ofTechnology, China. She is currently a Professorin the School of Computer Science and Technologyat Harbin Institute of Technology. Her researchinterests include graph data management,sensor network, and massive data management.Xiaolin Fang received the BS degree from theDepartment of Computer Science and Technologyat Harbin Engineering University, China, andthe MS degree from the Department of ComputerScience and Technology at Harbin Institute ofTechnology, China. He is currently pursuing thePhD degree in Department of Computer Scienceand Technology at Harbin Institute of Technology,China. His research interests includemassive data processing and sensor network.Jianzhong Li is a Professor in the School ofComputer Science and Technology at HarbinInstitute of Technology, China. In the past, heworked as a visiting scholar at the University ofCalifornia at Berkeley, as a Staff Scientist in theInformation Research Group at the LawrenceBerkeley National Laboratory, and as a VisitingProfessor at the University of Minnesota. Hisresearch interests include data managementsystems, sensor networks, and data intensivecomputing.Yingshu Li received the BS degree from theDepartment of Computer Science and Engineeringat Beijing Institute of Technology, China, andthe MS and PhD degrees from the Department ofComputer Science and Engineering at Universityof Minnesota-Twin Cities. She is currently anAssociate Professor in the Department of ComputerScience at Georgia State University. Herresearch interests include Wireless Network,Sensory Data Management, and Optimization.. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.412 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2015