412 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 23, NO. 2, APRIL 2015A Distortion-Resistant Routing Framework for VideoTraffic in Wireless Multihop NetworksGeorge Papageorgiou, Member, IEEE, ACM, Shailendra Singh, Srikanth V. Krishnamurthy, Fellow, IEEE,Ramesh Govindan, Fellow, IEEE, and Tom La Porta, Fellow, IEEEAbstract—Traditional routing metrics designed for wireless networksare application-agnostic. In this paper, we consider a wirelessnetwork where the application flows consist of video traffic.From a user perspective, reducing the level of video distortion iscritical. We ask the question “Should the routing policies changeif the end-to-end video distortion is to be minimized?” Popularlink-quality-based routing metrics (such as ETX) do not accountfor dependence (in terms of congestion) across the links of a path;as a result, they can cause video flows to converge onto a few pathsand, thus, cause high video distortion. To account for the evolutionof the video frame loss process, we construct an analyticalframework to, first, understand and, second, assess the impact ofthe wireless network on video distortion. The framework allows usto formulate a routing policy for minimizing distortion, based onwhich we design a protocol for routing video traffic. We find viasimulations and testbed experiments that our protocol is efficientin reducing video distortion and minimizing the user experiencedegradation.Index Terms—Protocol design, routing, video communications,video distortion minimization, wireless networks.I. INTRODUCTION WITH the advent of smartphones, video traffic has becomevery popular in wireless networks. In tactical networksor disaster recovery, one can envision the transfer ofvideo clips to facilitate mission management. From a user perspective,maintaining a good quality of the transferred video iscritical. The video quality is affected by: 1) the distortion dueto compression at the source, and 2) the distortion due to bothwireless channel induced errors and interference.Video encoding standards, like MPEG-4 [1] orH.264/AVC [2], define groups of I-, P-, and B-type framesthat provide different levels of encoding and, thus, protectionagainst transmission losses. In particular, the differentlevels of encoding refer to: 1) either information encodedManuscript received May 24, 2013; revised November 15, 2013; acceptedDecember 24, 2013; approved by IEEE/ACM TRANSACTIONS ON NETWORKINGEditor M. Reisslein. Date of publication February 11, 2014; date of current versionApril 14, 2015. This work was supported by the Army Research Laboratoryand was accomplished under Cooperative Agreement No. W911NF-09-2-0053.G. Papageorgiou, S. Singh, and S. V. Krishnamurthy are with the Departmentof Computer Science and Engineering, University of California, Riverside,Riverside, CA 92521 USA (e-mail: gpapag@cs.ucr.edu; singhs@cs.ucr.edu;krish@cs.ucr.edu).R. Govindan is with the Department of Computer Science, University ofSouthern California, Los Angeles, CA 90089 USA (e-mail: ramesh@usc.edu).T. La Porta is with the Department of Computer Science and Engineering,The Pennsylvania State University, University Park, PA 16802 USA (e-mail:tlp@cse.psu.edu).Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TNET.2014.2302815Fig. 1. Multilayer approach.independently, in the case of I-frames, or 2) encoding relativeto the information encoded within other frames, as is the casefor P- and B-frames. This Group of Pictures (GOP) allows forthe mapping of frame losses into a distortion metric that canbe used to assess the application-level performance of videotransmissions.One of the critical functionalities that is often neglected, butaffects the end-to-end quality of a video flow, is routing. Typicalrouting protocols, designed for wireless multihop settings,are application-agnostic and do not account for correlationof losses on the links that compose a route from a source toa destination node. Furthermore, since flows are consideredindependently, they can converge onto certain links that thenbecome heavily loaded (thereby increasing video distortion),while others are significantly underutilized. The decisions madeby such routing protocols are based on only network (and notapplication) parameters.In this paper, our thesis is that the user-perceived videoquality can be significantly improved by accounting for applicationrequirements, and specifically the video distortionexperienced by a flow, end-to-end. Typically, the schemes usedto encode a video clip can accommodate a certain number ofpacket losses per frame. However, if the number of lost packetsin a frame exceeds a certain threshold, the frame cannot bedecoded correctly. A frame loss will result in some amountof distortion. The value of distortion at a hop along the pathfrom the source to the destination depends on the positions ofthe unrecoverable video frames (simply referred to as frames)in the GOP, at that hop. As one of our main contributions,we construct an analytical model to characterize the dynamicbehavior of the process that describes the evolution of framelosses in the GOP (instead of just focusing on a network qualitymetric such as the packet-loss probability) as video is deliveredon an end-to-end path. Specifically, with our model, we capturehow the choice of path for an end-to-end flow affects theperformance of a flow in terms of video distortion. Our modelis built based on a multilayer approach as shown in Fig. 1. Thepacket-loss probability on a link is mapped to the probabilityof a frame loss in the GOP. The frame-loss probability isthen directly associated with the video distortion metric. Byusing the above mapping from the network-specific property(i.e., packet-loss probability) to the application-specific quality1063-6692 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.PAPAGEORGIOU et al.: DISTORTION-RESISTANT ROUTING FRAMEWORK FOR VIDEO TRAFFIC 413metric (i.e., video distortion), we pose the problem of routingas an optimization problem where the objective is to find thepath from the source to the destination that minimizes theend-to-end distortion.In our formulation, we explicitly take into account the historyof losses in the GOP along the path. This is in stark contrastwith traditional routing metrics (such as the total expectedtransmission count (ETX) [3]) wherein the links are treated independently.Our solution to the problem is based on a dynamicprogramming approach that effectively captures the evolutionof the frame-loss process. We then design a practical routingprotocol, based on the above solution, to minimize routingdistortion. In a nutshell, since the loss of the longer I-framesthat carry fine-grained information affects the distortion metricmore, our approach ensures that these frames are carried on thepaths that experience the least congestion; the latter frames ina GOP are sent out on relatively more congested paths. Ourrouting scheme is optimized for transferring video clips onwireless networks with minimum video distortion. Since optimizingfor video streaming is not an objective of our scheme,constraints relating to time (such as jitter) are not directly takeninto account in the design.Specifically, our contributions in this paper are as follows.• Developing an analytical framework to capture the impactof routing on video distortion: As our primary contribution,we develop an analytical framework that captures the impactof routing on the end-to-end video quality in terms ofdistortion. Specifically, the framework facilitates the computationof routes that are optimal in terms of achievingthe minimum distortion. The model takes into account thejoint impact of the PHY and MAC layers and the applicationsemantics on the video quality.• Design of a practical routing protocol for distortion-resilientvideo delivery: Based on our analysis, we design apractical routing protocol for a network that primarily carrieswireless video. The practical protocol allows a sourceto collect distortion information on the links in the networkand distribute traffic across the different paths in accordanceto: 1) the distortion, and 2) the position of a framein the GOP.• Evaluations via extensive experiments: We demonstrate viaextensive simulations and real testbed experiments on amultihop 802.11a testbed that our protocol is extremelyeffective in reducing the end-to-end video distortion andkeeping the user experience degradation to a minimum. Inparticular, the use of the protocol increases the peak signalto-noise ratio (PSNR) of video flows by as much as 20%,producing flows with a mean opinion score (MOS) thatis on the average 2–3 times higher compared to the casewhen traditional routing schemes are used. These PSNRand MOS gains project significant improvements in theperceived video quality at the destination of a flow [4]. Wealso evaluate our protocol with respect to various systemparameters.Organization: The paper is organized as follows. Relatedwork is presented in Section II. Our analytical models are inSection III, followed by the problem formulation in Section IV.In Section V, we discuss how our framework can be usedto route video flows in practice. Section VI contains resultsfrom our simulations and testbed experiments. We conclude inSection VII.II. RELATED WORKThe plethora of recommendations from the standardizationbodies regarding the encoding and transmission of video indicatesthe significance of video communications. Different approachesexist in handling such an encoding and transmission.The Multiple Description Coding (MDC) technique fragmentsthe initial video clip into a number of substreams called descriptions.The descriptions are transmitted on the network over disjointpaths. These descriptions are equivalent in the sense thatany one of them is sufficient for the decoding process to be successful,however the quality improves with the number of decodedsubstreams. Layered Coding (LC) produces a base layerand multiple enhancement layers. The enhancement layers serveonly to refine the base-layer quality and are not useful on theirown. Therefore, the base layer represents the most critical partof the encoded signal [5], [6]. In this paper, we focus on the layeredcoding due to its popularity in applications and adoption instandards.Standards like the MPEG-4 [1] and the H.264/AVC [2] provideguidelines on how a video clip should be encoded for atransmission over a communication system based on layeredcoding. Typically, the initial video clip is separated into a sequenceof frames of different importance with respect to qualityand, hence, different levels of encoding. The frames are calledI-, P-, and B-frames, and groups of such frames constitute astructure named the GOP. In each such GOP, the first frame is anI-frame that can be decoded independently of any other informationcarried within the same GOP. After the I-frame, a sequenceof P- and possibly B-frames follows. The P- and B-frames usethe I-frame as a reference to encode information. However, notethat the P-frames can also be used as references for other frames.There has been a body of work on packet-loss-resilientvideo coding in the signal processing research community [7].In [4], the video stream is split into high- and low-prioritypartitions, and FEC is used to protect the high-priority data.To account for temporal and spatial error propagation due toquantization and packet losses, an algorithm is proposed in [8]to produce estimates of the overall video distortion that canbe used for switching between inter- and intracoding modesper macroblock, achieving higher PSNR. In [9], an enhancementto the transmission robustness of the coded bitstreamis achieved through the introduction of inter/intracoding withredundant macroblocks. The coding parameters are determinedby a rate-distortion optimization scheme. These schemes areevaluated using simulation where the effect of the networktransmission is represented by a constant packet-loss rate,and therefore fails to capture the idiosyncrasies of real-worldsystems.In [10], an analytical framework is developed to model theeffects of wireless channel fading on video distortion. Themodel is, however, only valid for single-hop communication.In [11], the authors examine the effects of packet-loss patternsand specifically the length of error bursts on the distortionof compressed video. The work, although on a single link,showcases the importance of accounting for the correlation of414 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 23, NO. 2, APRIL 2015errors across frames. The performance of video streaming overa multihop IEEE 802.11 wireless network is studied in [12],and a two-dimensional Markov chain model is proposed. Themodel is used not only for performance evaluation, but also asa guide for deploying video streaming services with end-to-endquality-of-service (QoS) provisioning. Finally, a recursionmodel is derived in [13] to relate the average transmissiondistortion across successive P-frames. None of these effortsconsiders the impact of routing on video distortion.There have also been studies on the performance of videotransmissions over 4G wireless networks that have been designedto support high QoS for multimedia applications. In [14],an assessment of the recently defined video coding scheme(H.264/SVC) is performed over mobile WiMAX. Metrics suchas the PSNR and the MOS are used to represent the quality ofexperience perceived by the end-user. The results show that theperformance is sensitive to the different encoding options inthe protocols and responds differently to the loss of data in thenetwork. Again, these are single-link wireless networks, androuting is not a factor.Cross-layer optimization and QoS routing is not new. Anextensive body of research exists on routing algorithms forwireless ad hoc and mesh networks [15]. Furthermore, thesurvey in [16] provides various ways of classifying QoS routingschemes based on protocol evaluation metrics (transport/application,network- and MAC-layer metrics). However, noneof the routing schemes presented in these surveys takes intoaccount performance metrics defined for an application andspecifically for video transfers. Even when a QoS routing isdefined as application-aware, the applications need to specifythroughput and delay constraints. This is in contrast to ourapproach, where an application-related performance metric,namely the video distortion, is directly incorporated into theroute selection mechanism.Prior work on routing for video communications focuses onMultiple Description Coding (MDC). In [17] and [18], multipathrouting schemes are considered to improve the quality ofvideo transfer. In [17], an extension to the Dynamic SourceRouting is proposed to support multipath video communications.The basic idea is to use the information collected at thedestination node to compute nearly disjoint paths. In contrastwith our approach, no analysis is provided in [17], and the evaluationof the scheme is based solely on simulations. In [18], thecomputation of disjoint paths is achieved by proper schedulinggiven a set of path lengths. As is the case in [17], the work in [18]does not take into account any performance metric directly associatedwith video quality; in contrast, the optimization is basedon delay constraints. In [19] and [20], MDC is considered forvideo multicast in wireless ad hoc networks. A rate-distortionmodel is defined and used in an optimization problem wherethe objective is to minimize the overall video distortion by properlyselecting routing paths. Due to the complexity of the optimizationproblem, a genetic-algorithm-based heuristic approachis used to compute the routes. Although the approach in [19]and [20] takes into account the distortion of the video, it doesso using MDC. Our approach differs not only on the way wemodel video distortion, but also on the fact that we focus on LC,which is more popular in applications today. In [21], a multipathrouting scheme for video delivery over IEEE 802.11-basedwireless mesh networks is proposed. To achieve good traffic engineering,the scheme relies on maximally disjoint paths. However,this work does not consider distortion as a user-perceivedmetric. It simply aims to reduce the latency of video transmissions,and thus, its objective is different from what we considerhere.The work in [22] proposes a scheme for energy-efficientvideo communications with minimum QoS degradation for LC.However, the routing scheme is based on a hierarchical model.To support such a hierarchy, the nodes need to be groupedin clusters, and a process of electing a cluster head has tobe executed periodically, increasing the processing and datacommunication load of the network. In contrast, our proposedscheme assumes a flat model where all nodes in the networkare equivalent and perform the same set of tasks.III. MODEL FORMULATIONOur analytical model couples the functionality of the physicaland MAC layers of the network with the application layerfor a video clip that is sent from a source to a destination node.The model for the lower layers computes the packet-loss probabilitythrough a set of equations that characterize multiuser interference,physical path conditions, and traffic rates betweensource–destination pairs in the network. This packet-loss probabilityis then input to a second model to compute the frame-lossprobability and, from that, the corresponding distortion. Thevalue of the distortion at a hop along the path from the sourceto the destination node depends on the position of the first unrecoverableframe in the GOP.A. PHY- and MAC-Layer ModelingWe consider an IEEE 802.11 network that consists of a setof nodes denoted by N . For each nodeN , denote by P the set of paths that pass via node . Forsimplicity, we assume a constant packet length of bits for allsource–destination paths. There are various models [23]–[26]that attempt to capture the operations of the IEEE 802.11 protocol.These models are application-agnostic and provide an estimateof the packet-loss probability due to interference frombackground traffic in the network. We use the model in [26] torepresent the operations of the PHY and MAC layers; specificscan be found in [26].The approach followed in [26] is based on network-lossmodels. Three sets of equations are derived. The first correspondsto a scheduling model that computes the serving rateof a path at each node , as a function of the schedulercoefficient and the service time(1)The second captures the IEEE 802.11 MAC and PHY modelsand associates the probability of a transmission failure withthe channel access probability(2)PAPAGEORGIOU et al.: DISTORTION-RESISTANT ROUTING FRAMEWORK FOR VIDEO TRAFFIC 415where is the number of backoff stages and is the minimumwindow size. Finally, the third set of equations describes therouting model and computes the incoming traffic rate to thenext-hop node based on scheduling and transmission failuresfor all N P (3)A fixed-point method is used to couple the equations in aniteration, until convergence to a consistent solution is achievedand satisfied. The solution is an approximation to the packet-lossprobability per link and the throughput of the network. Notehere that any other method can be used to find , which can thenbe used in our video distortion estimation framework describedin Section III-B.B. Video Distortion ModelOur analysis is based on the model for video transmissiondistortion in [10]. The distortion is broken down into sourcedistortion and wireless transmission distortion over a single hop.Instead of focusing on a single hop, we significantly extend theanalysis by developing a model that captures the evolution ofthe transmission distortion along the links of a route from thesource node to the destination node.We consider a GOP structure that consists of an I-frame followedby P-frames. We index each frame in the GOPstructure starting from 0, i.e., the I-frame corresponds to index0, and the P-frames correspond to indices from 1 up to .We focus on predictive source coding where, if the th frameis the first lost frame in a GOP, then the th frame and all itssuccessors in the GOP are replaced by the st frame atthe destination node. Assuming that the sequence of frames isstationary, the average distortion introduced by such a frame replacementdepends on the temporal proximity of the replacedframe to the st frame and not on the actual position ofthe frame (in the GOP) to be replaced. In [10], a linear model,which corresponds to empirical data, is used to provide the averagemean squared error (MSE) as a function of the temporaldistance between frames. Using this model, the average distortionis computed in [10] to be(4)for . The minimum distortionis achieved when the last frame in the GOP is lost andthe maximum, , is attained if the first frame islost. The values and depend on the actual videosequence and have to be determined by measurement. To automatethe computation of the distortion, however, we computethe minimum and maximum values, and , respectively,of the distortion over different GOPs for each clip anduse their average values.If is the source coding rate, and are the percentageof bits in the GOP that belong to an I-frame and a P-frame,respectively, then:• the number of packets per an I-frame ;• the number of packets per a P-frame ;where and is the duration of a GOP.We define the sensitivity of a frame to lost packets to be theminimum number of packets that belong to a frame that, if lost,can prevent the correct decoding of the frame. We denote bythe sensitivity of an I-frame, and by , that of a P-frame. Forthe sensitivity of the I-frame, it holds that ,and for the P-frame it is . Note that anyfurther packet losses beyond for the I-frame and for theP-frame do not cause any additional distortion for that particularGOP because, in that case, the corresponding frame is alreadyconsidered lost and cannot be correctly decoded.We extend the wireless transmission distortion introducedin [10] and defined in (4) for the multihop case. We define thesequence to represent the wirelesstransmission distortion along the path from the source to thedestination, where is the wireless transmission video distortionat the th hop. In general, at the th hop, the distortioncan take one of the following discrete values given by (4):(5)The sequence of values the process takes depends on thenumber of lost packets per frame in the GOP at each link.Clearly, w.p.1, for all .We track the packet losses per frame by defining a multidimensionalcounting process(6)where the index is again the hop count along the path fromthe source to the destination. The first component of thecounting process tracks the number of lost packets fromthe I-frame at the th hop along the path, and the componentscount the lost packets that belong to the subsequentP-frames in the GOP at the th hop. The state space foreach of these components is given by(7)(8)for .Assuming that the packet losses in different frames in theGOP are independent events (likely if the fading patterns changein between), the transition probabilities for the process , canbe computed. Suppose that is the packet-loss probability providedby the analytical model that describes the MAC layer (inSection III-A). Furthermore, let the value of at hop beand at hop be. Since each of the components of isa counting process, the corresponding sample paths are nondecreasingw.p.1,, and therefore. Regarding the transitions of that correspond to theI-frame, we have that(9)The corresponding transition probability is equal to theprobability of losing packets out of theremaining packets in the I-frame. Therefore, the transition416 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 23, NO. 2, APRIL 2015probabilities for the first component are given by thefollowing binomial distribution:otherwise(10)Similar to the transitions of , the transitions ofthat correspond to the P-frames in the GOPare specified by the transition probabilitiesotherwise(11)From the transition probabilities (10) and (11), one can computethe distribution of lost packetsin each frame at hop assuming that there are no lost packets atthe source. In particular, for the I-frame, we have(12)for . We define the row vector(13)Then, (12), in vector form, becomes(14)for , where is the transition matrix forthe process . To make the dependence of the matrixto the packet-loss probability explicit, we use the notation.It follows then from (14) that(15)for a sequence of packet-loss probabilities ,where . Following the same process, we cancompute the corresponding distribution for the th P-framein the GOP(16)where and is the transitionmatrix for the process . As one canimmediately see, the packet-loss probabilities, computed afteraccounting for the PHY and MAC, in Section III-A, can be usedhere to compute the probabilities .C. Video Distortion DynamicsThe value of the distortion at hop along the path from thesource to the destination node depends on the position of thefirst unrecoverable frame in the GOP. We define the processsuch that is the index of the first unrecoverableframe in the GOP structure at hop . At each hop ,the process takes values in the setC (17)The value 0 indicates that the first (I-frame) is lost, and thereforethe whole GOP is unrecoverable. A value between 1 anddenotes that the corresponding P-frame is the first frame in theGOP that cannot be decoded correctly, and the value indicatesthat no frame has been lost thus far, yielding a distortion .The definition of the process suggests that the sample pathsof the process are nonincreasing w.p.1., which means that, for all .The dynamics of the process and therefore of the videodistortion depend on the process . The value of the processat each hop indicates the number of lost packets up to thatpoint along the path from the source to the destination node.These losses specify the first unrecoverable frame in the GOPand, hence, the value of the distortion at that point on the path.The transition probabilities at hop of the processfor C (18)specifying the likelihood that the first unrecoverable frame athop is given that the first unrecoverable frame at hop is ,can be computed using the distributionsgiven by (15) and (16). In particular, we consider the followingcases.1) For : In this case, the first unrecoverable frame at hopis the first frame (I-frame) in the GOP. This means thatthe GOP is unrecoverable, and the value of the processfor the rest of the path cannot be anything else other than 0.Therefore, the transition probabilities in this case are givenby(19)2) For : When the first unrecoverableframe in the GOP at hop is a P-frame, it is possible duringthe transition to the next hop to have packet lossesthat make an earlier frame in the GOP unrecoverable. Thiswill happen if the number of lost packets in an earlier frameis such that the total number of lost packets for the particularframe reaches the sensitivity of that frame type. This isused to compute the transition probabilities in (20), shownat the bottom of the next page.3) For : This corresponds to the case where no frameshave been lost in the GOP up to hop . The transition to thenext hop may cause packet losses such that either a framein the GOP becomes unrecoverable, or none is lost and notransition of happens. The transition probabilities in thiscase are given by (21), shown at the bottom of the nextpage.PAPAGEORGIOU et al.: DISTORTION-RESISTANT ROUTING FRAMEWORK FOR VIDEO TRAFFIC 417The value of the video transmission distortion depends on thevalue of the process at hop . More specificallyifif and (22)where is given by (4). Therefore, the dynamics of the videotransmission distortion are defined by the transition probabilitiesgiven by (19)–(21).IV. OPTIMAL ROUTING POLICYNext, our objective is to find the path that yields the minimumvideo transmission distortion between any source and destination.By using the analysis presented in Section III, we pose theproblem as a stochastic optimal control problem where the controlis the selection of the next node to be visited at each intermediatenode from the source to the destination.If N is the set of nodes in the network andC is the set of possible values for theprocess described in Section III-C, we define the state spaceof our problem asX N C (23)Each state X is a tuple such that . The firstcomponent N represents the current node on the pathfrom the source to the destination. The second component Cpoints to the first unrecoverable frame in the GOP and, therefore,specifies the video distortion at the current node.Suppose that at the th hop of the path between the sourceand the destination the node is . Suppose furthermore that thefirst unrecoverable frame in the GOP structure is . Then, thecurrent state of the system is . At this point, thesystem needs to select the next node to be visited. Denote thisselection by . Clearly, the node to be selected next shouldbelong to the set U of the one-hop neighbors of . Thismeans that if at stage the state is and a decisionis made such that U , the new state at the nextstage will be . The selection of as thenext node specifies the packet-loss probability from the analysisin Section III-B and accounts for both channel-induced andinterference-related failures. Moreover, it specifies the transitionprobabilities for the second component of thestate. To make the dependence of these transition probabilitiesto the selection explicit, we use the notation .We seek to find the optimal sequence of statesthat minimizes the total video transmission distortion from thesource to the destination node. The first component of each statethat belongs to such an optimal sequence of states indicates thenode that has to be visited next in the optimal path.For the initial state and the sequence of decisions, the cost to be minimized is defined asX(24)where is the length of the path, the function is the runningcost, and the function is the final cost. We call thisoptimization problem the Minimum Distortion Routing (MDR)problem.The running cost is the video transmission distortion at stateifif (25)(20)(21)418 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 23, NO. 2, APRIL 2015The final cost is defined to beif is the destination node andif is the destination node andotherwise(26)If is the source and is the destination of the connection,then the initial state for the optimization problem is definedas . Any state in the boundary setB X (27)is a terminating state for the optimization problem.If is an optimal decision sequence, we define the valuefunction as(28)for an initial state X . If at some stage the state is, we define the minimum cost-to-go as(29)and for the final stage(30)The MDR problem has the following properties.Lemma 1: MDR satisfies the overlapping property, i.e., theproblem can be broken down into smaller problems that retainthe same structure.Proof: From (29), it is clear that computing the cost-to-gorequires the calculation of the cost-to-go . Thismeans that the initial problem of finding the optimal route betweena source and a destination node can be solved if the subproblemof finding an optimal path between an intermediatenode and the destination can be solved.Lemma 2: MDR satisfies the optimal substructure property,i.e., the subpath of an optimal path is optimal for the correspondingsubproblem.Proof: This is immediate from the definition of the costto-go function defined in (29).Theorem 1.: The MDR problem is solvable by dynamicprogramming.Proof: An optimization problem can be solved by dynamicprogramming if the problem satisfies both the overlapping andthe optimal substructure properties [27]. The proof is immediatefrom Lemmas 1 and 2.Since the state space X is of finite dimension, the optimizationproblem can be solved via dynamic programming byback-propagating the computation of the value of the cost-to-gofunction [28], [29] starting from the terminating states of theboundary set B and moving backwards toward the initial state. If at some stage the state is , we consider allpossible neighbors of node that are one hop away. Foreach link , a packet-loss probability characterizesthe quality of this specific link. Using this , we can computethe transition probability from the current stateto a new state through the probabilitythat is defined in Section III-C for allpossible values of the second component of the state. Amongthe neighboring nodes of node , we choose as the next hoptoward the destination node, the node that corresponds to theminimum cost-to-go at stage defined in (29).Discussion: In essence, the MDR routing policy distributesthe video frames (and the packets contained therein) across multiplepaths and in particular minimizes the interference experiencedby the frames that are at the beginning of a GOP (to minimizedistortion). The I-frames are longer than other frames.Their loss impacts distortion more, and thus these are transmittedon relatively interference-free paths. The higher protectionrendered to I-frames is the key contributing factor in decreasingthe distortion with MDR (we also observe this in bothour simulations and testbed experiments).V. PROTOCOL DESIGNTo compute the solution to the MDR problem described inSection IV, knowledge of the complete network (the nodes thatare present in the network and the quality of the links betweenthese nodes) is necessary. However, because of the dynamicnature and distributed operations of a network, such completeknowledge of the global state is not always available to thenodes. In practice, the solution to the MDR problem can be computedby the source node based on partial information regardingthe global state that it gathers. The source node has to samplethe network during a path discovery process in order to collectinformation regarding the state of the network.The sampling process includes the estimation of the ETXmetric [3] for each wireless link in the network. These estimatesprovide a measure of the quality of the links. The estimationprocess can be implemented by tracking the successfulbroadcasting of probe messages in periodic time intervals. TheETX estimates computed locally in the neighborhood of a nodeare then appended in the Route Request messages duringthe Route Discovery phase. Upon reception of this message bythe destination, a Route Reply message is sent back to thesource that contains the computed ETX estimates, which are usableto compute .The source node then can solve the optimization problem(Section IV) by using the information gathered via the samplingprocess described above. Specifically, upon receiving theRoute Reply messages, the source node follows the stepspresented in Algorithm 1. It defines the initial state of the optimizationproblem as , where is the GOP size. Itdefines the boundary setB that serves as the terminating set forthe optimization process. Next, a call to Algorithm 2 producesthe next node in the path. Because of the stochastic nature ofthe second component of the state, its next value has to be estimated.The estimation is based on the transition probabilitiesgiven by (19)–(21). In particular, the estimated valueis the expected value of the second component given its currentvalue(31)PAPAGEORGIOU et al.: DISTORTION-RESISTANT ROUTING FRAMEWORK FOR VIDEO TRAFFIC 419Algorithm 1: Path discovery (Uses Algorithm 2)Input: source node , destination nodeInput: frame sizeOutput: route from to1: /* DSR Route Discovery Phase */2: send3: receive messages4: N node-ids from messages5:6: /*Path Discovery Initialization Phase*/7:8:9: B10:11:12: append to13:14: /* Path Computation /*15: repeat16: Next_node_in_optimal path( B N )17:18:19:20:21: append to22: N N23: until BTo avoid loops in the produced route, node is removedfrom the set N of available nodes. The process is repeatedwith a new initial state until the boundary set Bis reached. In each iteration, Algorithm 2 is called to determinethe next node on the path from the source to the destination .Algorithm 2 takes as an input an initial state , a boundaryset B, the GOP size and the set N . It solves the dynamicprogramming problem described in Section IV by first creatingthe state space of the system and then using the value iterationmethod, starting from the boundary set and moving backwards.At each stage of the process, it also computes the optimal policy.At the end of the computation, the ID of the best node to be selectedis returned by using the optimal policy for the first stage.In the source routing scheme, the routing decisions are madeat the source node ahead of time and before the packet entersthe network. Therefore, source routing is an open-loop controlproblem where all decisions have to be made in the beginning.The decisions are taken sequentially; a decision at a stage correspondsto the choice of the next-hop node at the node correspondingto the stage. The source node cannot know exactly thestate at the th stage of the selection process becauseof the randomness of the second component of the state.It has to estimate at each stage the value of and use this estimateto make a decision for that stage.The sequence of steps followed by each node in the networkis shown in Fig. 2.Algorithm 2: Next node in optimal pathInput: initial state , boundary set BInput: set of available nodes NInput: frame sizeOutput: next node in the optimal path1: /* Initialization Phase */2: C3: X N C4: X5:6: /* Optimal Control Computation */7: for TO 1 do8: if then9: for all X do10:11: end for12: else13: for all X do14:15:16:17:18: end for19: end if20: end for21:22: returnThe flowchart that corresponds to the operation of the sourcenode is depicted in Fig. 2(a), while the flowcharts for an intermediatenode and the destination node are shown in Fig. 2(b).VI. RESULTSWe show the performance gains of the proposed routingscheme via extensive simulations and testbed experiments.For the simulation experiments, we use the network simulatorns-2 [30]. The simulator provides a full protocol stack for awireless multihop network based on IEEE 802.11. We extendthe functionality of ns-2 by implementing our proposed routingscheme on top of the current protocol stack. For the testbed experiments,we implement our scheme using the Click modularrouter [31], [32]. We implement two different mechanisms andexperiment with each, one after another. The first mechanismestimates the ETX value for each link between a node and itsneighbors for all the nodes in the network. The mechanismbroadcasts periodically (every 1 s) small probe messages ofsize 32 B and checks for acknowledgments from the neighborsof the node. The routing policy computes the minimum ETXpath from the source to a destination and uses that path totransfer the video packets. The second mechanism implementsthe protocol defined in Section V in order to compute the routeson the wireless network that achieve minimum video distortion.Furthermore, we use EvalVid [33], which consists of a set oftools for the evaluation of the quality of video that is transmittedover a real or simulated network. The toolset supports different420 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 23, NO. 2, APRIL 2015Fig. 2. Flowchart for application-aware routing. (a) Source node. (b) Intermediateand destination node.performance metrics such as the PSNR and the MOS [34]. Toadapt the EvalVid to the ns-2 simulator, we follow the proceduredescribed in [35]. Specifically, for each simulated video flow betweentwo nodes in the network, we need to produce a sequenceof files. We start with the initial uncompressed video file thatconsists of a sequence of YUV frames [36]. Using the EvalVidtoolset, we transform the YUV format first to the MP4 and thento the MPEG4 format, which contains hints of how the video fileshould be transmitted over a network. When we do this, we donot constrain the GOP size to be the same from GOP to GOP,but rather, we let the tool decide the appropriate size for eachGOP based on the video clip content. We then need to capturea log from an attempted transmission over a real network. Thislog indicates which frame and at what time instance was transmittedover the network. The log is fed as an input to the ns-2simulation, which plays back the video transmission, producingat the end two sets of statistics regarding the transmission, onefor the sender and one for the receiver. By applying the EvalVidtoolset on this sequence of files, we can reconstruct the videofile as it is received by the destination and compare it to the initialvideo file. The comparison provides a measure of the videoquality degradation due to the transmissions over the network.A. Simulation ResultsTo evaluate the performance of the MDR protocol, we compareit against the minimum ETX routing scheme. We considera wireless multihop network that covers an area of 10001000 m . The nodes are distributed over this area accordingto a Poisson random field. The pair of nodes that constitute theFig. 3. Average PSNR for 5 and 10 video connections (Set-I).TABLE IVIDEO ENCODING PARAMETERSsource and destination in each case are selected at random. Ifthey happen to be neighbors, we discard that pair and repeat theprocess until we select a source and destination that are morethan one hop apart. Each node uses the IEEE 802.11b protocolwhere the propagation model is the Two Ray Ground, yielding acommunication range of about 250 m. Each set of experimentsis repeated 10 times, and the average value is reported in eachcase.In Table I, three sets of values are defined for the video encodingparameters. We vary the GOP size and the frame rateand thus, effectively, the video encoding rate. We keep the framesize constant as per the QCIF standard (176 144 pixels) andset the maximum packet size to 1024 B. Our simulation experimentsfocus on three metrics: 1) the PSNR, which is an objectivequality measure; 2) the MOS, which is a subjective qualitymetric; and 3) the delay experienced by each video connection.The effect of the node density on the PSNR is shown in Fig. 3.We plot the average PSNR for 5 and 10 concurrent video connectionsfor different node densities and for Set-I of the videoencoding parameters of Table I. We also plot the performanceof our proposed scheme (MDR) when instead of estimating theper-link packet-loss probabilities through the ETX metric, weuse the model in Section III-A to do so. In this case, we assumefull knowledge of the network topology, and so the state spacewhere we solve the optimal control problem of Section IV is asuperset of the state space when we collect the local estimatesof ETX through the network.We then fix the number of nodes to 20 (distributed as describedearlier) and compute the PSNR of each video connectionwhen: 1) the network serves four concurrent connections,and 2) when the number of concurrent connections is 8. In eachcase, the source–destination pairs are chosen uniformly fromamong the nodes in the network. We define the tail distributionof PSNR as the probability and plot itin Fig. 4 for the different traffic loads. The tail distribution ofPSNR that corresponds to Set-II of the video encoding parametersis shown in Fig. 4(a). For both the light and heavy trafficPAPAGEORGIOU et al.: DISTORTION-RESISTANT ROUTING FRAMEWORK FOR VIDEO TRAFFIC 421Fig. 4. Tail distribution of PSNR. (a) Set-II. (b) Set-III.loads (four and eight concurrent connections, respectively), theMDR protocol performs better, providing a higher percentageof paths that have a given PSNR value. As expected, a performancedegradation is observed for both schemes when the trafficload increases. This is due to the fact that under heavier trafficconditions in the network, the interference becomes more prevalent;furthermore, interference across adjacent links can be correlatedin some cases. Under such network conditions, the benefitsfrom the distortion-based optimization have a greater impacton the path selection process for the different types of frames ina video GOP as discussed earlier. The I-frames are sent on relativelyuncongested paths. With fur concurrent connections, themedian of PSNR is 17 for the minimum ETX policy and 18 forthe MDR protocol. The median decreases when the traffic loadincreases, and it is 9.5 and 10 for the minimum ETX and the application-aware schemes, respectively. The tail distribution ofPSNR that corresponds to the parameters of Set-III is shown inFig. 4(b). As is the case for Set-II, a large GOP size results ina denser state space, and therefore a better performance for theMDR protocol. In the case of the light traffic loads (four concurrentconnections), the median for the PSNR is 15 for the minimumETX scheme and 17 for MDR. Under heavier traffic loads(eight concurrent connections), Pthe median for the PSNR is 9for the minimum ETX scheme and 10.5 for the MDR protocol.The effects of the sensitivities, and , on the MDR-protocolare shown in Fig. 5. As before, the number of randomlyplaced nodes is set to 20. We compare the performance of theMDR protocol when and . Inthe first case, the sensitivity to the packet losses per frame is setto the maximum; in this case, a single packet loss in a framecauses the frame to be unrecoverable. Fig. 5(a) and (b) presentsthe same comparison for Set-II and Set-III of the encoding parameters,respectively. In both cases, relaxing the sensitivity ofan I- or P-frame to packet losses (i.e., increasing the value ofand ) deteriorates the performance of the scheme. A lowersensitivity (larger values of and ) diminishes the impact ofpacket losses on the video distortion, thus limiting the performancegains from using the scheme. For Set-II, the median ofthe PSNR is 17 for the minimum ETX scheme and 18 and 16 forMDR for and , respectively.When the video encoding parameters of Set-III are used, themedian values of the PSNR are 15 for the minimum ETX caseand 17 and 15 for the MDR protocol when and, respectively.Although the PSNR is the most widespread objective metricto measure the digital video quality, it does not always captureuser experience. A subjective quality measure that tries to capturehuman impression regarding the video quality is the MOS.Fig. 5. PSNR dependence on packet-loss sensitivity. (a) Set-II.. (b) Set-III.Fig. 6. Average mean opinion score.The metric uses a scale from 1 (worst) to 5 (best) to representuser satisfaction when watching a video clip [34].To evaluate the MOS with the MDR and ETX-based routing,we consider the wireless multihop network with the averagenumber of nodes equal to 20 (distributed as discussed earlier).The initial raw video is processed using the H.264 encoder witha maximum GOP size of 30 frames and a sampling frequencyof 30 frames per second. Fig. 6 shows the average MOS asthe number of concurrent video flows in the network increases.When the number of connections is three, the traffic load is low,and so both the ETX-based routing and MDR provide similaruser experience regarding video quality. As the traffic load increases,the distortion-based routing distributes the load acrossthe network, causing the I-frames to avoid highly congestedareas. When a moderate number of video flows are concurrentlyactive in the network, there is a significant gap in video qualityin favor of MDR. However, no significant gains are possiblewith MDR when congestion is high (more than nine concurrentvideo flows are active). In such cases, there are no congestion-free routes available to be used by MDR. This results inhigher MOS values, which translates to a better user experience.The delay characteristics of the two routing schemes areshown in Fig. 7 for Set-II of the video encoding parameters.The nodes are again randomly distributed according to aPoisson random field with varying density with values 14, 16,and 18. The traffic load corresponds to five concurrent videoconnections. We compute and plot the mean and varianceof the end-to-end delay for the five connections along withthe 95% confidence intervals. As seen in Fig. 7, for all threedifferent node densities, the MDR protocol produces routes thatexhibit less variability compared to the routes computed by theminimum ETX scheme. Smaller variability implies less jitter,which in turn suggests a better video quality as perceived bythe end-user. Moreover, because of the smaller variability, therequired sizes of buffers at the intermediate nodes is smaller.422 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 23, NO. 2, APRIL 2015Fig. 7. Delay characteristics for five concurrent connections (Set-II). (a) Meandelay. (b) Variance of delay.Note that this benefit is in addition in the reduction in distortionas discussed above. The primary reason for this reduction inthe delay is that the distortion-aware approach tries to avoidpaths that are congested; ETX, on the other hand, results inconvergence of flows onto a few good paths. For both routingschemes, the mean and variance of the delay increase withthe average number of nodes in the network. As the networkbecomes denser, the effect of interference becomes moreprofound, increasing the number of retransmissions and, thus,the delay. In contrast, a sparser network topology provides asmaller number of “good” routes, and thus it is more difficult toseparate flows and cope with congestion. It is in the moderatedensity regions, where the MDR protocol provides the mostbenefits in terms of delay and jitter.In order to understand how the characteristics of the videotraffic and, in particular, the motion level affect the distortion,we experiment with two classes of video clips: slow- and fastmotionvideo. The motion level of a video clip can be computedthrough appropriate detection algorithms; typically, these algorithmsclassify a video clip as a slow-motion or a fast-motionvideo. Tools such as PhysMo [37] and AForge [38] can be usedto perform this classification.We evaluate the MOS of slow- and fast-motion video flowswhen the MDR routing scheme is used. We consider a wirelessmultihop network with an average number of nodes equal to 20(distributed as discussed above). Fig. 8(a) shows the averageMOS for Set-II, and Fig. 8(b) shows the MOS in the case ofSet-III. In both cases, the slow-motion flows experience slightlylower distortion compared to the fast-motion videos and, thus,higher MOS. This is the result of the fact that in the slow-motionvideo clips, the I-frames carry most of the information. Due torapid changes in the content of a fast-motion clip, the P-framesare larger and contain more information than the P-frames forslow-motion video flows. The MDR routing scheme protectsthe I-frames by routing the corresponding packets through lesscongested paths. The P-frames are packed together on congestedpaths and could be lost. As evident from Fig. 8, such losses affectfast-motion video to a greater extent. However, as we increasethe traffic to extremely high levels (11 flows), the performanceof slow- and fast-motion videos is similar due to highframe losses.Next, we compare the behavior of MDR against a routingprotocol that chooses routes so as to minimize the overallexpected transmission time (ETT) [39]. The ETT is a functionof the loss rate and the bandwidth of the link. Therefore, it cancapture delays due to transmissions in multirate settings, unlikeETX, which only estimates the packet-loss ratio at the base rate.Fig. 8. Average value of the MOS for slow- and fast-motion video flows.(a) Set-II. (b) Set-III.Fig. 9. Comparison between the MDR and the ETT-based routing scheme.(a) Mean opinion score. (b) Mean delay.In Fig. 9, the comparison between MDR and the ETT-basedscheme is shown. The mean opinion score is shown in Fig. 9(a),where we observe a behavior similar to the one shown in Fig. 6.The average end-to-end delay is shown in Fig. 9(b). In contrastto what happens when the ETX is used, the routing mechanismthat minimizes the total ETT on the path from the source tothe destination yields smaller delays. However, the delays withMDR are comparable to those with ETT-based routing; in otherwords, the video quality is improved with minimum impact ondelay with MDR.B. Testbed ExperimentsNext, we evaluate the MDR protocol on a wireless indoortestbed composed of 41 nodes [40]. The nodes are based onthe Soekris net5501 hardware configuration and run a DebianLinux distribution. Each node is equipped with 500 MHz CPU,512 MB of RAM, and a WN-CM9 wireless mini-PCI card,which carries the AR5213 Atheros main chip. Each node usesIEEE 802.11a to avoid interference from co-located campusnetworks. To further minimize interference from these othernetworks, all experiments were performed at night. The networktopology of the testbed is shown in Fig. 10.The experiment setup consists of an initial raw video processedusing the H.264 encoder with a maximum GOP sizeof 30 frames. The traffic load ranges from 2 to 12 concurrentvideo flows, where the sender and receiver pairs are randomlyselected. Each scenario is repeated five times.To capture the effect of the ETX-based and MDR routingschemes on the user experience, we measure the average MOSas the number of concurrent video flows in the network increases.Fig. 11 shows that as the number of video connectionsin the network increases, the average MOS for both schemesdecreases. However, when the traffic load increases, the MDRprotocol computes multiple paths between the source and thePAPAGEORGIOU et al.: DISTORTION-RESISTANT ROUTING FRAMEWORK FOR VIDEO TRAFFIC 423Fig. 10. Network topology of the wireless network testbed.Fig. 11. Average value of MOS for a different number of concurrent videoflows.destination nodes and is better in distributing the load acrossthe network such that the frames at the beginning of a GOPavoid congestion. On the other hand, the shorter paths computedthrough the ETX-based scheme become quickly congested, resultingin heavy packet losses. As discussed, we observe thatthis primarily has a negative impact on correctly decoding therelatively longer (but more important) I-frames, resulting in aworse user experience.A visual comparison between Figs. 6 and 11 immediatelyshows the similarity in behaviors between our simulations andreal experiments, thereby validating the realism of our simulations.Fig. 12 shows snapshots from video clips transmittedover the testbed under different traffic conditions for both theETX-based and the MDR protocols. As shown in Fig. 11, whenthere are two connections in the network, the MOS for bothrouting schemes is the same. This is reflected in Fig. 12(a) and(b), where both snapshots are of very similar quality; in thiscase, the traffic load is fairly low, and congestion is not a bigissue (the flows do not cause high levels of interference to eachother). When there are eight concurrent video connections (andinterference across connections is more prevalent), the MDRprotocol achieves a higher MOS compared to the ETX-basedscheme. This is visually depicted in Fig. 12(c) and (d), wherethe snapshot in the case of MDR is much clearer than the noisysnapshot form the ETX-based protocol. Specifically, our protocoldistributes the I-frames across diverse paths with low interference;P-frames that are toward the end of GOPs are relativelypacked together onto more congested paths. The ETXFig. 12. User experience under different traffic loads. (a) Video snapshot—MDR (two connections). (b) Video snapshot—ETX (two connections).(c) Video snapshot—MDR (eight connections). (d) Video snapshot—ETX(eight connections).Fig. 13. Routes for I- and P-frames.metric, which is agnostic to video semantics, does not distinguishbetween frames and packs them together, causing highdistortion. It is difficult to explicitly prove that I- and P-framesfollow somewhat disjoint paths due to the stochastic nature ofthe process. The intuition, however, is based on the fact that thesensitivities and of the I- and P-frames, respectively, are, ingeneral, different. This has as a consequence that the frame-lossprobability for an I-frame is different from that of a P-frame,resulting in their choosing different routes.To illustrate this, we consider a simple network as shown inFig. 13 and perform an experiment with eight concurrent flows.For each video flow , we show in Fig. 13 thecorresponding sender and receiver denoted by and , respectively.We focus on flow 3 and show the different routesfor the I- and P-frames for both the MDR and the ETX-basedrouting protocols. We notice that when MDR is used, the routesfor the I- and P-frames are different. Specifically, the routes forthe packets that belong to I-frames, and the corresponding fractionsof the I-frame traffic they serve are as follows: 58.4% ofthe I-frame packets are routed through424 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 23, NO. 2, APRIL 2015; 14.6% of the I-frame traffic is routedthrough ; 16% isrouted through ; and 11% through. Meanwhile,the routes and traffic split for P-frame packets are as follows:56.7% of the P-frame packets use the path; 6.3% is using ;24% is routed through ; and 13%through . Noticethat the majority of I-frames are routed via a path that is disjointfrom the path followed by the majority of the P-frames.In contrast to the MDR case, when the ETX-based scheme isused, both the I- and P-frames are routed through the same path:.VII. CONCLUSIONIn this paper, we argue that a routing policy that is application-aware is likely to provide benefits in terms of user-perceivedperformance. Specifically, we consider a network thatprimarily carries video flows. We seek to understand the impactof routing on the end-to-end distortion of video flows. Towardthis, we construct an analytical model that ties video distortionto the underlying packet-loss probabilities. Using this model,we find the optimal route (in terms of distortion) between asource and a destination node using a dynamic programmingapproach. Unlike traditional metrics such as ETX, our approachtakes into account correlation across packet losses that influencevideo distortion. Based on our approach, we design a practicalrouting scheme that we then evaluate via extensive simulationsand testbed experiments. Our simulation study shows that thedistortion (in terms of PSNR) is decreased by 20% compared toETX-based routing.