A HYBRID CLOUD APPROACH FOR SECURE AUTHORIZED DEDUPLICATION

ABSTRACT:

Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. To protect the confidentiality of sensitive data while supporting deduplication, the convergent encryption technique has been proposed to encrypt the data before outsourcing. To better protect data security, this paper makes the first attempt to formally address the problem of authorized data deduplication. Different from traditional deduplication systems, the differential privileges of users are further considered in duplicate check besides the data itself. We also present several new deduplication constructions supporting authorized duplicate check in a hybrid cloud architecture. Security analysis demonstrates that our scheme is secure in terms of the definitions specified in the proposed security model. As a proof of concept, we implement a prototype of our proposed authorized duplicate check scheme and conduct test bed experiments using our prototype. We show that our proposed authorized duplicate check scheme incurs minimal overhead compared to normal operations.

INTRODUCTION

Cloud computing provides seemingly unlimited “virtualized” resources to users as services across the whole Internet, while hiding platform and implementation details. Today’s cloud  service providers offer both highly vailable storage and massively parallel computing resourcesat relatively low costs. As cloud computing becomes prevalent, an increasing amount of data is being stored in the cloud and shared by users with specified privileges, which define the access rights of the stored data. One critical challenge of cloud storage services is the management of the ever-increasing volume of data. To make data management scalable in cloud computing, deduplication  has been a well-known technique and has attracted more and more attention recently. Data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data in storage.

The technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. Instead of keeping multiple data copies with the same content, deduplication eliminates redundant data by keeping only one physical copy and referring other redundant data to that copy. Deduplication can take place at either the file level or the block level. For file level deduplication, it eliminates duplicate copies of the same file. Deduplication can also take place at the block level, which eliminates duplicate blocks of data that occur in non-identical files. Although data deduplication brings a lot of benefits, security and privacy concerns arise as users’ sensitive data are susceptible to both insider and outsider attacks. Traditional encryption, while providing data confidentiality, is incompatible with data deduplication. Specifically, traditional encryption requires different users to encrypt their data with their own keys.

Thus, identical data copies of different users will lead to different ciphertexts, making deduplication impossible. Convergent encryption  has been proposed to enforce data confidentiality while making deduplication feasible. It encrypts decrypts a data copy with a convergent key, which is obtained by computing the cryptographic hash value of the content of the data copy. After key generation and data encryption, users retain the keys and send the ciphertext to the cloud. Since the encryption operation is deterministic and is derived from the data content, identical data copies will generate the same convergent key and hence the same ciphertext. To prevent unauthorized access, a secure proof of ownership protocol  is also needed to provide the proof that the user indeed ownsthe same file when a duplicate is found. After the proof, subsequent users with the same file will be provided a pointer from the server without needing to upload the same file. A user can download the encrypted file with the pointer from the server, which can only be decryptedby the corresponding data owners with their convergent keys.

Thus, convergent encryption allows the cloud to perform deduplication on the ciphertexts and the proof of ownership prevents the unauthorized user to access the file. However, previous deduplication systems cannot supportdifferential authorization duplicate check, which is importantin many applications. In such an authorized deduplication system, each user is issued a set of privileges during system initialization (in Section 3, we elaborate the definition of a privilege with examples). Each file uploaded to the cloud is also bounded by a set of privileges to specify which kind of users is allowed to perform the duplicate check and access the files. Before

submitting his duplicate check request for some file, the user needs to take this file and his own privileges as inputs.

 The user is able to find a duplicate for this file if and only if there is a copy of this file and a matched privilege stored in cloud. For example, in a company, many different privileges will be assigned to employees. In order to save cost and efficiently management, the data will be moved to the storage server provider (SCSP) in the public cloud with specified privileges and the deduplication technique will be applied to store only one copy of the same file. Becase of privacy consideration, some files will be encrypted and allowed the duplicate check by employees with specified privileges to realize the access control. Traditional deduplication systems based on convergent encryption, although providing confidentiality to some extent, do not support the duplicate check with differential privileges. In other words, no differential privileges have been considered in the deduplication based on convergent encryption technique. It seems to be contradicted if we want to realize both deduplication and differential authorizationduplicate check at the same time.

A DISTRIBUTED THREE-HOP ROUTING PROTOCOL TO INCREASE THE CAPACITY OF HYBRID WIRELESS NETWORKS

ABSTRACT:

Hybrid wireless networks combining the advantages of both mobile ad-hoc networks and infrastructure wireless networks have been receiving increased attention due to their ultra-high performance. An efficient data routing protocol is important in such networks for high network capacity and scalability. However, most routing protocols for these networks simply combine the ad-hoc transmission mode with the cellular transmission mode, which inherits the drawbacks of ad-hoc transmission.

This paper presents a Distributed Three-hop Routing protocol (DTR) for hybrid wireless networks. To take full advantage of the widespread base stations, DTR divides a message data stream into segments and transmits the segments in a distributed manner. It makes full spatial reuse of a system via its high speed ad-hoc interface and alleviates mobile gateway congestion via its cellular interface. Furthermore, sending segments to a number of base stations simultaneously increases throughput and makes full use of widespread base stations.

DTR significantly reduces overhead due to short path lengths and the elimination of route discovery and maintenance. DTR also has a congestion control algorithm to avoid overloading base stations. Theoretical analysis and simulation results show the superiority of DTR in comparison with other routing protocols in terms of throughput capacity, scalability, and mobility resilience. The results also show the effectiveness of the congestion control algorithm in balancing the load between base stations.

INTRODUCTION:

Wireless networks including infrastructure wireless networks and mobile ad-hoc networks (MANETs) have attracted significant research interest. The growing desire to increase wireless network capacity for high performance applications has stimulated the development of hybrid wireless networks. A hybrid wireless network consists of both an infrastructure wireless network and a mobile ad-hoc network. Wireless devices such as smart-phones, tablets and laptops, have both an infrastructure interface and an ad-hoc interface. As the number of such devices has been increasing sharply in recent years, a hybrid transmission structure will be widely used in the near future. Such a structure synergistically combines the inherent advantages and overcome the disadvantages of the infrastructure wireless networks and mobile ad-hoc networks. In a mobile ad-hoc network, with the absence of a central control infrastructure, data is routed to its destination through the intermediate nodes in a multi-hop manner. The multi-hop routing needs on-demand route discovery or route maintenance.

Since the messages are transmitted in wireless channels and through dynamic routing paths, mobile ad-hoc networks are not as reliable as infrastructure wireless networks. Furthermore, because of the multi-hop transmission feature, mobile ad-hoc networks are only suitable for local area data transmission. The infrastructure wireless network (e.g., cellular network) is the major means of wireless communication in our daily lives. It excels at inter-cell communication (i.e., communication between nodes in different cells) and Internet access. It makes possible the support of universal network connectivity and ubiquitous computing by integrating all kinds of wireless devices into the network. In an infrastructure network, nodes communicate with each other through base stations (BSes).

A hybrid wireless network synergistically combines an infrastructure wireless network and a mobile ad-hoc network to leverage their advantages and overcome their shortcomings, and finally increases the throughput capacity of a wide-area wireless network. A routing protocol is a critical component that affects the throughput capacity of a wireless network in data transmission. Most current routing protocols in hybrid wireless networks simply combine the cellular transmission mode (i.e., BS transmission mode) in infrastructure wireless networks and the ad-hoc transmission mode in mobile ad-hoc networks. That is, as shown in Fig. 1a, the protocols use the multi-hop routing to forward a message to the mobile gateway nodes that are closest to the BSes or have the highest bandwidth to the BSes. The bandwidth of a channel is the maximum throughput (i.e., transmission rate in bits/s) that can be achieved. The mobile gateway nodes then forward the messages to the BSes, functioning as bridges to connect the ad-hoc network and the infrastructure network.

Since BSes are connected with a wired backbone, we assume that there are no bandwidth and power constraints on transmissions between BSes. We use intermediate nodes to denote relay nodes that function as gateways connecting an infrastructure wireless network and a mobile ad-hoc network. We assume every mobile node is dual-mode; that is, it has ad-hoc network interface such as a WLAN radio interface and infrastructure network interface such as a 3G cellular interface. DTR aims to shift the routing burden from the ad-hoc network to the infrastructure network by taking advantage of widespread base stations in a hybrid wireless network. Rather than using one multi-hop path to forward a message to one BS, DTR uses at most two hops to relay the segments of a message to different BSes in a distributed manner, and relies on BSes to combine the segments.

We simplify the routings in the infrastructure network for clarity. As shown in the figure, when a source node wants to transmit a message stream to a destination node, it divides the message stream into a number of partial streams called segments and transmits each segment to a neighbor node. Upon receiving a segment from the source node, a neighbor node locally decides between direct transmission and relay transmission based on the QoS requirement of the application. The neighbor nodes forward these segments in a distributed manner to nearby BSes. Relying on the infrastructure network routing, the BSes further transmit the segments to the BS where the destination node resides. The final BS rearranges the segments into the original order and forwards the segments to the destination. It uses the cellular IP transmission method [30] to send segments to the destination if the destination moves to another BS during segment transmission.

LITRATURE SURVEY:

OPTIMAL MULTI-HOP CELLULAR ARCHITECTURE FOR WIRELESS COMMUNICATIONS

AUTOHRS: Y. H. Tam, H. S. Hassanein, S. G. Akl, and R. Benkoczi

PUBLISH: Proc. Local Comput. Netw., 2006, pp. 738–745.

EXPLANATION:

Multi-hop relaying is an important concept in future generation wireless networks. It can address the inherent problems of limited capacity and coverage in cellular networks. However, most multi-hop relaying architectures are designed based on a small fixed-cell-size and a dense network. In a sparse network, the throughput and call acceptance ratio degrades because distant mobile nodes cannot reach the base station to use the available capacity. In addition, a fixed-cell-size cannot adapt to the dynamic changes of traffic pattern and network topology. In this paper, we propose a novel multi-hop relaying architecture called the adaptive multi-hop cellular architecture (AMC). AMC adapts the cell size to an optimal value that maximizes throughput by taking into account the dynamic changes of network density, traffic patterns, and network topology. To the best of our knowledge, this is the first time that adaptive (or optimal) cell size is accounted for in a multi-hop cellular environment. AMC also achieves the design goals of a good multi-hop relaying architecture. Simulation results show that AMC outperforms a fixed-cell-size multi-hop cellular architecture and a single-hop case in terms of data throughput, and call acceptance ratio.

COOPERATIVE PACKET DELIVERY IN HYBRID WIRELESS MOBILE NETWORKS: A COALITIONAL GAME APPROACH

AUTOHRS: K. Akkarajitsakul, E. Hossain, and D. Niyato

PUBLISH: IEEE Trans. Mobile Comput., vol. 12, no. 5, pp. 840–854, May 2013

EXPLANATION:

We consider the problem of cooperative packet delivery to mobile nodes in a hybrid wireless mobile network, where both infrastructure-based and infrastructure-less (i.e., ad hoc mode or peer-to-peer mode) communications are used. We propose a solution based on a coalition formation among mobile nodes to cooperatively deliver packets among these mobile nodes in the same coalition. A coalitional game is developed to analyze the behavior of the rational mobile nodes for cooperative packet delivery. A group of mobile nodes makes a decision to join or to leave a coalition based on their individual payoffs. The individual payoff of each mobile node is a function of the average delivery delay for packets transmitted to the mobile node from a base station and the cost incurred by this mobile node for relaying packets to other mobile nodes. To find the payoff of each mobile node, a Markov chain model is formulated and the expected cost and packet delivery delay are obtained when the mobile node is in a coalition. Since both the expected cost and packet delivery delay depend on the probability that each mobile node will help other mobile nodes in the same coalition to forward packets to the destination mobile node in the same coalition, a bargaining game is used to find the optimal helping probabilities. After the payoff of each mobile node is obtained, we find the solutions of the coalitional game which are the stable coalitions. A distributed algorithm is presented to obtain the stable coalitions and a Markov-chain-based analysis is used to evaluate the stable coalitional structures obtained from the distributed algorithm. Performance evaluation results show that when the stable coalitions are formed, the mobile nodes achieve a nonzero payoff (i.e., utility is higher than the cost). With a coalition formation, the mobile nodes achieve higher payoff than that when each mobile node acts alone.

EFFICIENT RESOURCE ALLOCATION IN HYBRID WIRELESS NETWORKS

AUTOHRS: B. Bengfort, W. Zhang, and X. Du

PUBLISH: Proc. Wireless Commun. Netw. Conf., 2011, pp. 820–825.

EXPLANATION:

n this paper, we study an emerging type of wireless network – Hybrid Wireless Networks (HWNs). A HWN consists of an infrastructure wireless network (e.g., a cellular network) and several ad hoc nodes (such as a Mobile ad hoc network). Forming a HWN is a very cost-effective way to improve wireless coverage and the available bandwidth to users. Specifically, in this work we investigate the issue of bandwidth allocation in multi-hop HWNs. We propose three efficient bandwidth allocation schemes for HWNs: top-down, bottom-up, and auction-based allocation schemes. In order to evaluate the bandwidth allocation schemes, we develop a simulated HWN environment. Our simulation results show that the proposed schemes achieve good performance: the schemes can achieve maximum revenue/utility in many cases, while also providing fairness. We also show that each of the schemes has merit in different application scenarios.

SYSTEM ANALYSIS

EXISTING SYSTEM:

Existing methods in the Two-hop transmission protocol in terms of the elimination of route maintenance and the limited number of hops in routing. In two-hop, when a node’s bandwidth to a BS is larger than that of each neighbor, it directly sends a message to the BS. Otherwise, it chooses a neighbor with a higher channel and sends a message to it, which further forwards the message to the BS uses distributed transmission involving multiple cells, which makes full use of system resources and dynamically balances the traffic load between neighboring cells. In contrast, Two-hop employs single-path transmission.

Direct combination of the two transmission modes inherits the following problems that are rooted in the ad-hoc transmission mode. 

High overhead: Route discovery and maintenance incur high overhead. The wireless random access medium access control (MAC) required in mobile ad-hoc networks, which utilizes control handshaking and a back-off mechanism, further increases overhead. 

Hot spots: The mobile gateway nodes can easily become hot spots. The RTS-CTS random access, in which most traffic goes through the same gateway, and the flooding employed in mobile ad-hoc routing to discover routes may exacerbate the hot spot problem. In addition, mobile nodes only use the channel resources in their route direction, which may generate hot spots while leave resources in other directions under-utilized. Hot spots lead to low transmission rates, severe network congestion, and high data dropping rates. 

Low reliability: Dynamic and long routing paths lead to unreliable routing. Noise interference and neighbor interference during the multi-hop transmission process because a high data drop rate. Long routing paths increase the probability of the occurrence of path breakdown due to the highly dynamic nature of wireless ad-hoc networks.

DISADVANTAGES:

  • Route discovery and maintenance incur high overhead.
  • The mobile gateway nodes can easily become hot spots.
  • Dynamic and long routing paths lead to unreliable routing.
  • Noise interference and neighbor interference during the multi-hop transmission process because a high data drop rate.
  • Long routing paths increase the probability of the occurrence of path breakdown due to the highly dynamic nature of wireless ad-hoc networks.

PROPOSED SYSTEM:

We propose a Distributed Three-hop Data Routing protocol (DTR). In DTR, as shown in Fig. 1b, a source node divides a message stream into a number of segments. Each segment is sent to a neighbor mobile node. Based on the QoS requirement, these mobile relay nodes choose between direct transmissions or relay transmission to the BS. In relay transmission, a segment is forwarded to another mobile node with higher capacity to a BS than the current node. In direct transmission, a segment is directly forwarded to a BS. In the infrastructure, the segments are rearranged in their original order and sent to the destination. The number of routing hops in DTR is confined to three, including at most two hops in the ad-hoc transmission mode and one hop in the cellular transmission mode. To overcome the aforementioned shortcomings, DTR tries to limit the number of hops. The first hop forwarding distributes the segments of a message in different directions to fully utilize the resources, and the possible second hop forwarding ensures the high capacity of the forwarder.

DTR also has a congestion control algorithm to balance the traffic load between the nearby BSes in order to avoid traffic congestion at BSes. Using self-adaptive and distributed routing with high speed and short-path ad-hoc transmission, DTR significantly increases the throughput capacity and scalability of hybrid wireless networks by overcoming the three shortcomings of the previous routing algorithms.

It has the following features:  

  • Low overhead: It eliminates overhead caused by route discovery and maintenance in the ad-hoc transmission mode, especially in a dynamic environment.
  • Hot spot reduction: It alleviates traffic congestion at mobile gateway nodes while makes full use of channel resources through a distributed multi-path relay.
  • High reliability: Because of its small hop path length with a short physical distance in each step, it alleviates noise and neighbor interference and avoids the adverse effect of route breakdown during data transmission. Thus, it reduces the packet drop rate and makes full use of special reuse, in which several source and destination nodes can communicate simultaneously without interference.

ADVANTAGES:

  • DTR eliminates overhead caused by route discovery and maintenance in the ad-hoc transmission mode, especially in a dynamic environment.
  • DTR should alleviate traffic congestion at mobile gateway nodes while makes full use of channel resources through a distributed multi-path relay.
  • Because of its small hop path length with a short physical distance in each step, it alleviates noise and neighbor interference and avoids the adverse effect of route breakdown during data transmission.
  • DTR reduces the packet drop rate and makes full use of spacial reuse, in which several source and destination nodes can communicate simultaneously without interference.
  • Network with High Throughput Performance.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor      –    SVGA

SOFTWARE REQUIREMENTS:

JAVA

  • Operating System        :           Windows XP or Win7
  • Front End       :           JAVA JDK 1.7
  • Back End :           MYSQL Server
  • Server :           Apache Tomact Server
  • Script :           JSP Script
  • Document :           MS-Office 2007

A DISTORTION-RESISTANT ROUTING FRAMEWORK FOR VIDEO TRAFFIC IN WIRELESS MULTIHOP NETWORKS

ABSTRACT:

Traditional routing metrics designed for wireless networks are application agnostic. In this paper, we consider a wireless network where the application flows consist of video traffic. From a user perspective, reducing the level of video distortion is critical. We ask the question “Should the routing policies change if the end-to-end video distortion is to be minimized?” Popular link-quality-based routing metrics (such as ETX) do not account for dependence (in terms of congestion) across the links of a path; as a result, they can because video flows to converge onto a few paths and, thus, cause high video distortion. To account for the evolution of the video frame loss process, we construct an analytical framework to, first, understand and, second, assess the impact of the wireless network on video distortion. The framework allows us to formulate a routing policy for minimizing distortion, based on which we design a protocol for routing video traffic. We find via simulations and tested experiments that our protocol is efficient in reducing video distortion and minimizing the user experience degradation.

INTRODUCTION

With the advent of smart phones, video traffic has become very popular in wireless networks. In tactical networks or disaster recovery, one can envision the transfer of video clips to facilitate mission management. From a user perspective, maintaining a good quality of the transferred video is critical. The video quality is affected by: 1) the distortion due to compression at the source, and 2) the distortion due to both wireless channel induced errors and interference. Video encoding standards, like MPEG-4 [1] or H.264/AVC, define groups of I-, P-, and B-type frames that provide different levels of encoding and, thus, protection against transmission losses. In particular, the different levels of encoding refer to: 1) either information encoded independently, in the case of I-frames, or 2) encoding relative to the information encoded within other frames, as is the case for P- and B-frames.

This Group of Pictures (GOP) allows for the mapping of frame losses into a distortion metric that can be used to assess the application-level performance of video transmissions. One of the critical functionalities that is often neglected, but affects the end-to-end quality of a video flow, is routing. Typical routing protocols, designed for wireless multihop settings, are application-agnostic and do not account for correlation of losses on the links that compose a route from a source to a destination node. Furthermore, since flows are considered independently, they can converge onto certain links that then become heavily loaded (thereby increasing video distortion), while others are significantly underutilized. The decisions made by such routing protocols are based on only network (and not application) parameters.

Our thesis is that the user-perceived video quality can be significantly improved by accounting for application requirements, and specifically the video distortion experienced by a flow, end-to-end. Typically, the schemes used to encode a video clip can accommodate a certain number of packet losses per frame. However, if the number of lost packets in a frame exceeds a certain threshold, the frame cannot be decoded correctly. A frame loss will result in some amount of distortion. The value of distortion at a hop along the path from the source to the destination depends on the positions of the unrecoverable video frames (simply referred to as frames) in the GOP, at that hop. As one of our main contributions, we construct an analytical model to characterize the dynamic behavior of the process that describes the evolution of frame losses in the GOP (instead of just focusing on a network quality metric such as the packet-loss probability) as video is delivered on an end-to-end path. Specifically, with our model, we capture how the choice of path for an end-to-end flow affects the performance of a flow in terms of video distortion.

Our model is built based on a multilayer approach in the packet-loss probability on a link is mapped to the probability of a frame loss in the GOP. The frame-loss probability is then directly associated with the video distortion metric. By using the above mapping from the network-specific property (i.e., packet-loss probability) to the application-specific quality metric (i.e., video distortion), we pose the problem of routing as an optimization problem where the objective is to find the path from the source to the destination that minimizes the end-to-end distortion. In our formulation, we explicitly take into account the history of losses in the GOP along the path. This is in stark contrast with traditional routing metrics (such as the total expected transmission count (ETX) wherein the links are treated independently.

Our solution to the problem is based on a dynamic programming approach that effectively captures the evolution of the frame-loss process. We then design a practical routing protocol, based on the above solution, to minimize routing distortion. In a nutshell, since the loss of the longer I-frames that carry fine-grained information affects the distortion metric more, our approach ensures that these frames are carried on the paths that experience the least congestion; the latter frames in a GOP are sent out on relatively more congested paths. Our routing scheme is optimized for transferring video clips on wireless networks with minimum video distortion. Since optimizing for video streaming is not an objective of our scheme, constraints relating to time (such as jitter) are not directly taken into account in the design.

LITRATURE SURVEY

TITLE: AN EVALUATION FRAMEWORK FOR MORE REALISTIC SIMULATIONS OF MPEG VIDEO TRANSMISSION

PUBLICATION: J. Inf. Sci. Eng., vol. 24, no. 2, pp. 425–440, Mar. 2008.

AUTHORS: C.-H. Ke, C.-K. Shieh, W.-S. Hwang, and A. Ziviani

EXPLANATION:

We present a novel and complete tool-set for evaluating the delivery quality of MPEG video transmissions in simulations of a network environment. This tool-set is based on the EvalVid framework. We extend the connecting interfaces of EvalVid to replace its simple error simulation model by a more general network simulator like NS2. With this combination, researchers and practitioners in general can analyze through simulation the performance of real video streams, i.e. taking into account the video semantics, under a large range of network scenarios. To demonstrate the usefulness of our new tool-set, we point out that it enables the investigation of the relationship between two popular objective metrics for Quality of Service (QoS) assessment of video quality delivery: the PSNR (Peak Signal to Noise Ratio) and the fraction of decodable frames. The results show that the fraction of decodable frames reflects well the behavior of the PSNR metric, while being less time-consuming. Therefore, the fraction of decodable frames can be an alternative metric to objectively assess through simulations the delivery quality of transmission in a network of publicly available video trace files.

TITLE: MULTIPATH ROUTING OVER WIRELESS MESH NETWORKS FOR MULTIPLE DESCRIPTION VIDEO TRANSMISSION

PUBLICATION: IEEE J. Sel. Areas Commun., vol. 28, no. 3, pp. 321–331, Apr. 2010.

AUTHORS: B. Rong, Y. Qian, K. Lu, R. Qingyang, and M. Kadoch

EXPLANATION:

In the past few years, wireless mesh networks (WMNs) have drawn significant attention from academia and industry as a fast, easy, and inexpensive solution for broadband wireless access. In WMNs, it is important to support video communications in an efficient way. To address this issue, this paper studies the multipath routing for multiple description (MD) video delivery over IEEE 802.11 based WMN. Specifically, we first design a framework to transmit MD video over WMNs through multiple paths; we then investigate the technical challenges encountered. In our proposed framework, multipath routing relies on the maximally disjoint paths to achieve good traffic engineering performance. However, video applications usually have strict delay requirements, which make it difficult to find multiple qualified paths with the least joints. To overcome this problem, we develop an enhanced version of Guaranteed-Rate (GR) packet scheduling algorithm, namely virtual reserved rate GR (VRR-GR), to shorten the packet delay of video communications in multiservice network environment. Simulation study shows that our proposed approach can reduce the latency of video delivery and achieve desirable traffic engineering performance in multipath routing environment.

TITLE: PERFORMANCE EVALUATION OF H.264/SVC VIDEO STREAMING OVER MOBILE WIMAX

PUBLICATION: Comput. Netw., vol. 55, no. 15, pp. 3578–3591, Oct. 2011.

AUTHORS: D. Migliorini, E. Mingozzi, and C. Vallati

EXPLANATION:

Mobile broadband wireless networks, such as mobile WiMAX, have been designed to support several features like, e.g., Quality of Service (QoS) or enhanced data protection mechanisms, in order to provide true access to real-time multimedia applications like Voice over IP or Video on Demand. On the other hand, recently defined video coding schemes, like H.264 scalable video coding (H.264/SVC), are evolving in order to better adapt to such mobile environments with heterogeneous clients and time-varying available capacity. In this work we assess the performance of H.264/SVC video streaming over mobile WiMAX under realistic network conditions. To this aim, we make use of specific metrics, like PSNR (Peak Signal to Noise Ratio) or MOS (Mean Opinion Score), which are related to the quality of experience as perceived by the end user. Simulation results show that the performance is sensitive to the different available H.264/SVC encoding options, which respond differently to the loss of data in the network. On the other hand, if aggressive error recovery based on WiMAX data protection mechanisms is used, this might lead to unacceptable latencies in the video play out, especially for those mobiles with poor wireless channel characteristics.

SYSTEM ANALYSIS

EXISTING SYSTEM:

Existing methods in WMNs, it is important to support video communications in an efficient way. To address this issue, this paper studies the single path routing for multiple description (MD) video delivery over IEEE 802.11 based WMN. Specifically, we first design a framework to transmit MD video over WMNs through single paths; we then investigate the technical challenges encountered framework, multipath routing relies on the maximally disjoint paths to achieve good traffic engineering performance.

However, video applications usually have strict delay requirements, which make it difficult to find multiple qualified paths with the least joints an enhanced version of Guaranteed-Rate (GR) packet scheduling algorithm, namely virtual reserved rate GR (VRR-GR), to shorten the packet delay of video communications in multiservice network environment. Simulation study shows that existing approach can reduce the latency of video delivery and achieve desirable traffic engineering performance in single path routing environment.

DISADVANTAGES:

  • Different approaches exist in handling such an encoding and transmission in the Multiple Description Coding technique fragments the initial video clip into a number of substreams called descriptions packet losses.
  • The descriptions are transmitted on the network over disjoint paths. These descriptions are equivalent in the sense that any one of them is sufficient for the decoding process very low buffer.
  • Layered Coding produces a base layer and multiple enhancement layers. The enhancement layers serve only to refine the base-layer quality and are not useful on their own routing is single path.

PROPOSED SYSTEM:

In this paper, our thesis is that the user-perceived video quality can be significantly improved by accounting for application requirements, and specifically the video distortion experienced by a flow, end-to-end. Typically, the schemes used to encode a video clip can accommodate a certain number of packet losses per frame. However, if the number of lost packets in a frame exceeds a certain threshold, the frame cannot be decoded correctly. A frame loss will result in some amount of distortion. The value of distortion at a hop along the path from the source to the destination depends on the positions of the unrecoverable video frames (simply referred to as frames) in the GOP, at that hop. As one of our main contributions, we construct an analytical model to characterize the dynamic behavior of the process that describes the evolution of frame losses in the GOP (instead of just focusing on a network quality metric such as the packet-loss probability) as video is delivered on an end-to-end path.

Specifically, with our model, we capture how the choice of path for an end-to-end flow affects the performance of a flow in terms of video distortion. Our model is built based on a multilayer approach as shown in Fig. 1. The packet-loss probability on a link is mapped to the probability of a frame loss in the GOP. The frame-loss probability is then directly associated with the video distortion metric. By using the above mapping from the network-specific property (i.e., packet-loss probability) to the application-specific quality metric (i.e., video distortion), we pose the problem of routing as an optimization problem where the objective is to find the path from the source to the destination that minimizes the end-to-end distortion.

ADVANTAGES:

Developing an analytical framework to capture the impact of routing on video distortion as our primary contribution, we develop an analytical framework that captures the impact of routing on the end-to-end video quality in terms of distortion.

 Specifically, the framework facilitates the computation of routes that are optimal in terms of achieving the minimum distortion. The model takes into account the joint impact of the PHY and MAC layers and the application semantics on the video quality.

Design of a practical routing protocol for distortion-resilient video delivery: Based on our analysis, we design a practical routing protocol for a network that primarily carries wireless video. The practical protocol allows a source to collect distortion information on the links in the network and distribute traffic across the different paths in accordance to: 1) the distortion, and 2) the position of a frame in the GOP.

Evaluations via extensive experiments: We demonstrate via extensive simulations and real testbed experiments on a multihop 802.11a testbed that our protocol is extremely effective in reducing the end-to-end video distortion and keeping the user experience degradation to a minimum rate.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk       –   20 GB
  • Floppy Drive        –    44 MB
  • Key Board       –    Standard Windows Keyboard
  • Mouse        –    Two or Three Button Mouse
  • Monitor       –    SVGA

SOFTWARE REQUIREMENTS:

  • Operating System          :           Windows XP or Win7
  • Front End        :           JAVA JDK 1.7
  • Tools                                     :           Netbeans 7
  • Document :           MS-Office 2007

Single Image Super-Resolution Based on Gradient Profile Sharpness

ABSTRACT

In this paper, a novel image superresolution algorithm is proposed based on GPS (Gradient Profile Sharpness). GPS is an edge sharpness metric, which is extracted from two gradient description models, i.e. a triangle model and a Gaussian mixture model for the description of different kinds of gradient profiles. Then the transformation relationship of GPSs in different image resolutions is studied statistically, and the parameter of the relationship is estimated automatically. Based on the estimated GPS transformation relationship, two gradient profile transformation models are proposed for two profile description models, which can keep profile shape and profile gradient magnitude sum consistent during profile transformation. Finally, the target gradient field of HR (high resolution) image is generated from the transformed gradient profiles, which is added as the image prior in HR image reconstruction model. Extensive experiments are conducted to evaluate the proposed algorithm in subjective visual effect, objective quality, and computation time. The experimental results demonstrate that the proposed approach can generate superior HR images with better visual quality, lower reconstruction error and acceptable computation efficiency as compared to state-of-the-art works 

Algorithm:

Super resolution  algorithm:

This Algorithm Used On Increasing Decreasing Resolution Purpose For Using.

HR:Higher Resolution Algorithm

Existing System                       

Single image super-resolution is a classic and active image processing problem, which aims to generate a high resolution image from a low resolution input image. Due to the severely under-determined nature of this problem, an effective image prior is necessary to make the problem solvable, and to improve the quality of generated images

Proposed System

  • More sophisticated interpolation models have also been proposed
  • To reduce the dependence on the training HR image, self-example based approaches were proposed, which utilized the observation that patches tended to redundantly recur inside an image within the same image scale as well as across different scales or there existed a transformation relationship across image space
  • . These approaches are more robust, however there are always some artifacts on their super-resolution results. Generally, the computational complexity of learning-based super-resolution approaches is quite high.
  • Various regularization terms have been proposed based on local gradient enhancement and globalgradient sparsity . Recently, metrics of edge sharpness have attracted researchers attention as the regularization term, since edges are of primary importance invisual image quality .
  • Based on the transformed GPS, two gradient profile transformation models are proposed, which can well keep profile shape and profile gradient magnitude sum consistent during the profile transformation.
  • Finally, the target gradient field of HR (high resolution) image is generated from transformed gradient profiles, which is added as the image priors in HR image reconstruction model.

MODULES

  • single image super-resolution
  • Gradient Profile Sharpness
  • Color Transfer
  • Multiple-reference color transfer
  • single image super-resolution:

Single-image super-resolution refers to the task of constructing a high-resolution enlargement of a given low-resolution image. Usual interpolation-based magnification introduces blurring. Then, the problem cast into estimating missing high-frequency details. Based on the framework of Freeman et al.

  1. interpolation of the input low-resolution image into the desired scale
  2. generation of a set of candidate images based on patch-wise regression: kernel ridge regression is utilized; To reduce the time complexity a sparse basis is found by combining kernel matching pursuit and gradient descent
  3. combining candidates to produce an image: patch-wise regression of output results in a set of candidates for each pixel location; An image output is obtained by combining the candidates based on estimated confidences for each pixel.
  4. post-processing based on the discontinuity prior of images: as a regularization method, kernel ridge regression tends to smooth major edges; The natural image prior proposed by Tappen et al. [2] is utilized to post-process the regression result such that the discontinuity at major edges are preserved.

Gradient Profile Sharpness:

A Novel edge sharpness metric GPS (gradient profile sharpness) is extracted as the eccentricity of gradient profile description models, which considers both the gradient magnitude and the spatial scattering of a gradient profile.

To precisely describe different kinds of gradient profile shapes, a triangle model and a mixed Gaussian model are proposed for short gradient profiles and heavy-tailed gradient profiles respectively. Then the pairs of GPS values under different image resolutions are studied statistically, and a linear GPS transformation relationship is formulated, whose parameter can be estimated automatically in each super-resolution application. Based on the transformed GPS, two gradient profile transformation models are proposed, which can well keep profile shape and profile gradient magnitude sum consistent during the profile transformation.

two gradient profile transformation models are proposed and the solve of HR image reconstruction model is introduced. Moreover, detailed experimental comparisons are made between the proposed approach and other state-of-the-art super-resolution methods, which are demonstrated in Section

Color Transfer:

Firstly proposed a way to match the means and variances between the target and the reference in the low correlated color space. This approach was efficient enough, but the simple means and variances  matching was likely to produce slight grain effect and serious color distortion. To prevent from the grain effect, Chang et al. proposed a color category based approach that categorized each pixelas one of the basic categories .Then a convex hull was generated in color space for each category of the pixel set, and the color transformation was applied with each pair of convex hull of the same category..

Multiple-reference color transfer:

requires the transfer naturally blending the colors from multiple references . However, as  illustrated  , the main difference exist among the references. Although both of the references are the sunshine theme, they have a big difference in the color appearance. This difference would easily lead to the grain effect in the result. As illustrated in , the  result has a serious grain effect approach adopts the gradient correction to suppress the grain, but it does not prevent the color distortion, see Our approach deals with the grain effect and distortion in each step, therefore, we can achieve a visual satisfactory result.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor              –    SVGA

SOFTWARE REQUIREMENTS:

JAVA

  • Operating System        :           Windows XP or Win7
  • Front End       :           JAVA JDK 1.7
  • Back End :           MYSQL Server
  • Server :           Apache Tomact Server
  • Script :           JSP Script
  • Document :           MS-Office 2007

.NET

  • Operating System        :           Windows XP or Win7
  • Front End       :           Microsoft Visual Studio .NET 2008
  • Script :           C# Script
  • Back End :           MS-SQL Server 2005
  • Document :           MS-Office 2007

REAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATION

ABSTRACT:

In today’s era, there is a great deal added to real-time remote sensing Big Data than it seems at first, and extracting the useful information in an efficient manner leads a system toward a major computational challenges, such as to analyze, aggregate, and store, where data are remotely collected. Keeping in view the above mentioned factors, there is a need for designing a system architecture that welcomes both realtime, as well as offline data processing. In this paper, we propose real-time Big Data analytical architecture for remote sensing satellite application.

The proposed architecture comprises three main units:

1) Remote sensing Big Data acquisition unit (RSDU);

2) Data processing unit (DPU); and

3) Data analysis decision unit (DADU).

First, RSDU acquires data from the satellite and sends this data to the Base Station, where initial processing takes place. Second, DPU plays a vital role in architecture for efficient processing of real-time Big Data by providing filtration, load balancing, and parallel processing. Third, DADU is the upper layer unit of the proposed architecture, which is responsible for compilation, storage of the results, and generation of decision based on the results received from DPU.

INTRODUCTION:

Recently, a great deal of interest in the field of Big Data and its analysis has risen mainly driven from extensive number of research challenges strappingly related to bonafide applications, such as modeling, processing, querying, mining, and distributing large-scale repositories. The term “Big Data” classifies specific kinds of data sets comprising formless data, which dwell in data layer of technical computing applications and the Web. The data stored in the underlying layer of all these technical computing application scenarios have some precise individualities in common, such as 1) largescale data, which refers to the size and the data warehouse; 2) scalability issues, which refer to the application’s likely to be running on large scale (e.g., Big Data); 3) sustain extraction transformation loading (ETL) method from low, raw data to well thought-out data up to certain extent; and 4) development of uncomplicated interpretable analytical over Big Data warehouses with a view to deliver an intelligent and momentous knowledge for them.

Big Data are usually generated by online transaction, video/audio, email, number of clicks, logs, posts, social network data, scientific data, remote access sensory data, mobile phones, and their applications. These data are accumulated in databases that grow extraordinarily and become complicated to confine, form, store, manage, share, process, analyze, and visualize via typical database software tools. Advancement in Big Data sensing and computer technology revolutionizes the way remote data collected, processed, analyzed, and managed. Particularly, most recently designed sensors used in the earth and planetary observatory system are generating continuous stream of data. Moreover, majority of work have been done in the various fields of remote sensory satellite image data, such as change detection, gradient-based edge detection region similarity based edge detection and intensity gradient technique for efficient intraprediction.

 In this paper, we referred the high speed continuous stream of data or high volume offline data to “Big Data,” which is leading us to a new world of challenges. Such consequences of transformation of remotely sensed data to the scientific understanding are a critical task. Hence the rate at which volume of the remote access data is increasing, a number of individual users as well as organizations are now demanding an efficient mechanism to collect, process, and analyze, and store these data and its resources. Big Data analysis is somehow a challenging task than locating, identifying, understanding, and citing data. Having a large-scale data, all of this has to happen in a mechanized manner since it requires diverse data structure as well as semantics to be articulated in forms of computer-readable format.

However, by analyzing simple data having one data set, a mechanism is required of how to design a database. There might be alternative ways to store all of the same information. In such conditions, the mentioned design might have an advantage over others for certain process and possible drawbacks for some other purposes. In order to address these needs, various analytical platforms have been provided by relational databases vendors. These platforms come in various shapes from software only to analytical services that run in third-party hosted environment. In remote access networks, where the data source such as sensors can produce an overwhelming amount of raw data.

We refer it to the first step, i.e., data acquisition, in which much of the data are of no interest that can be filtered or compressed by orders of magnitude. With a view to using such filters, they do not discard useful information. For instance, in consideration of new reports, is it adequate to keep that information that is mentioned with the company name? Alternatively, is it necessary that we may need the entire report, or simply a small piece around the mentioned name? The second challenge is by default generation of accurate metadata that describe the composition of data and the way it was collected and analyzed. Such kind of metadata is hard to analyze since we may need to know the source for each data in remote access.

LITRATURE SURVEY:

BIG DATA AND CLOUD COMPUTING: CURRENT STATE AND FUTURE OPPORTUNITIES

AUTHOR: D. Agrawal, S. Das, and A. E. Abbadi

PUBLISH: Proc. Int. Conf. Extending Database Technol. (EDBT), 2011, pp. 530–533.

EXPLANATION:

Scalable database management systems (DBMS)—both for update intensive application workloads as well as decision support systems for descriptive and deep analytics—are a critical part of the cloud infrastructure and play an important role in ensuring the smooth transition of applications from the traditional enterprise infrastructures to next generation cloud infrastructures. Though scalable data management has been a vision for more than three decades and much research has focussed on large scale data management in traditional enterprise setting, cloud computing brings its own set of novel challenges that must be addressed to ensure the success of data management solutions in the cloud environment. This tutorial presents an organized picture of the challenges faced by application developers and DBMS designers in developing and deploying internet scale applications. Our background study encompasses both classes of systems: (i) for supporting update heavy applications, and (ii) for ad-hoc analytics and decision support. We then focus on providing an in-depth analysis of systems for supporting update intensive web-applications and provide a survey of the state-of-theart in this domain. We crystallize the design choices made by some successful systems large scale database management systems, analyze the application demands and access patterns, and enumerate the desiderata for a cloud-bound DBMS.

CHANGE DETECTION IN SYNTHETIC APERTURE RADAR IMAGE BASED ON FUZZY ACTIVE CONTOUR MODELS AND GENETIC ALGORITHMS

AUTHOR: J. Shi, J. Wu, A. Paul, L. Jiao, and M. Gong

PUBLISH: Math. Prob. Eng., vol. 2014, 15 pp., Apr. 2014.

EXPLANATION:

This paper presents an unsupervised change detection approach for synthetic aperture radar images based on a fuzzy active contour model and a genetic algorithm. The aim is to partition the difference image which is generated from multitemporal satellite images into changed and unchanged regions. Fuzzy technique is an appropriate approach to analyze the difference image where regions are not always statistically homogeneous. Since interval type-2 fuzzy sets are well-suited for modeling various uncertainties in comparison to traditional fuzzy sets, they are combined with active contour methodology for properly modeling uncertainties in the difference image. The interval type-2 fuzzy active contour model is designed to provide preliminary analysis of the difference image by generating intermediate change detection masks. Each intermediate change detection mask has a cost value. A genetic algorithm is employed to find the final change detection mask with the minimum cost value by evolving the realization of intermediate change detection masks. Experimental results on real synthetic aperture radar images demonstrate that change detection results obtained by the improved fuzzy active contour model exhibits less error than previous approaches.

A BIG DATA ARCHITECTURE FOR LARGE SCALE SECURITY MONITORING

AUTHOR: S. Marchal, X. Jiang, R. State, and T. Engel

PUBLISH: Proc. IEEE Int. Congr. Big Data, 2014, pp. 56–63.

EXPLANATION:

Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this paper we propose a solution to cope with the tremendous amount of data to analyse for security monitoring perspectives. We introduce an architecture dedicated to security monitoring of local enterprise networks. The application domain of such a system is mainly network intrusion detection and prevention, but can be used as well for forensic analysis. This architecture integrates two systems, one dedicated to scalable distributed data storage and management and the other dedicated to data exploitation. DNS data, NetFlow records, HTTP traffic and honeypot data are mined and correlated in a distributed system that leverages state of the art big data solution. Data correlation schemes are proposed and their performance are evaluated against several well-known big data framework including Hadoop and Spark.

SYSTEM ANALYSIS

EXISTING SYSTEM:

Existing methods inapplicable on standard computers it is not desirable or possible to load the entire image into memory before doing any processing. In this situation, it is necessary to load only part of the image and process it before saving the result to the disk and proceeding to the next part. This corresponds to the concept of on-the-flow processing. Remote sensing processing can be seen as a chain of events or steps is generally independent from the following ones and generally focuses on a particular domain. For example, the image can be radio metrically corrected to compensate for the atmospheric effects, indices computed, before an object extraction based on these indexes takes place.

The typical processing chain will process the whole image for each step, returning the final result after everything is done. For some processing chains, iterations between the different steps are required to find the correct set of parameters. Due to the variability of satellite images and the variety of the tasks that need to be performed, fully automated tasks are rare. Humans are still an important part of the loop. These concepts are linked in the sense that both rely on the ability to process only one part of the data.

In the case of simple algorithms, this is quite easy: the input is just split into different non-overlapping pieces that are processed one by one. But most algorithms do consider the neighborhood of each pixel. As a consequence, in most cases, the data will have to be split into partially overlapping pieces. The objective is to obtain the same result as the original algorithm as if the processing was done in one go. Depending on the algorithm, this is unfortunately not always possible.

DISADVANTAGES:

  • A reader that loads the image, or part of the image in memory from the file on disk;
  • A filter which carries out a local processing that does not require access to neighboring pixels (a simple threshold for example), the processing can happen on CPU or GPU;
  • A filter that requires the value of neighboring pixels to compute the value of a given pixel (a convolution filter is a typical example), the processing can happen on CPU or GPU;
  • A writer to output the resulting image in memory into a file on disk, note that the file could be written in several steps. We will illustrate in this example how it is possible to compute part of the image in the whole pipeline, incurring only minimal computation overhead.

PROPOSED SYSTEM:

We present a remote sensing Big Data analytical architecture, which is used to analyze real time, as well as offline data. At first, the data are remotely preprocessed, which is then readable by the machines. Afterward, this useful information is transmitted to the Earth Base Station for further data processing. Earth Base Station performs two types of processing, such as processing of real-time and offline data. In case of the offline data, the data are transmitted to offline data-storage device. The incorporation of offline data-storage device helps in later usage of the data, whereas the real-time data is directly transmitted to the filtration and load balancer server, where filtration algorithm is employed, which extracts the useful information from the Big Data.

On the other hand, the load balancer balances the processing power by equal distribution of the real-time data to the servers. The filtration and load-balancing server not only filters and balances the load, but it is also used to enhance the system efficiency. Furthermore, the filtered data are then processed by the parallel servers and are sent to data aggregation unit (if required, they can store the processed data in the result storage device) for comparison purposes by the decision and analyzing server. The proposed architecture welcomes remote access sensory data as well as direct access network data (e.g., GPRS, 3G, xDSL, or WAN). The proposed architecture and the algorithms are implemented in applying remote sensing earth observatory data.

We proposed architecture has the capability of dividing, load balancing, and parallel processing of only useful data. Thus, it results in efficiently analyzing real-time remote sensing Big Data using earth observatory system. Furthermore, the proposed architecture has the capability of storing incoming raw data to perform offline analysis on largely stored dumps, when required. Finally, a detailed analysis of remotely sensed earth observatory Big Data for land and sea area are provided using .NET. In addition, various algorithms are proposed for each level of RSDU, DPU, and DADU to detect land as well as sea area to elaborate the working of architecture.

ADVANTAGES:

Big Data process high-speed, large amount of real-time remote sensory image data using our proposed architecture. It works on both DPU and DADU by taking data from medical application.

Our architecture for offline as well online traffic, we perform a simple analysis on remote sensing earth observatory data. We assume that the data are big in nature and difficult to handle for a single server.

The data are continuously coming from a satellite with high speed. Hence, special algorithms are needed to process, analyze, and make a decision from that Big Data. Here, in this section, we analyze remote sensing data for finding land, sea, or ice area.

We have used the proposed architecture to perform analysis and proposed an algorithm for handling, processing, analyzing, and decision-making for remote sensing Big Data images using our proposed architecture.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk       –   20 GB
  • Floppy Drive        –    44 MB
  • Key Board       –    Standard Windows Keyboard
  • Mouse        –    Two or Three Button Mouse
  • Monitor       –    SVGA

SOFTWARE REQUIREMENTS:

  • Operating System          :           Windows XP or Win7
  • Front End        :           Microsoft Visual Studio .NET 2008
  • Script :           C# Script
  • Back End :           MS-SQL Server 2005
  • Document :           MS-Office 2007

RANK-BASED SIMILARITY SEARCH REDUCING THE DIMENSIONAL DEPENDENCE

ABSTRACT:

This paper introduces a data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or other selection tests involving numerical constraints on distance values.

INTRODUCTION

Of the fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, perhaps the most widely-encountered is that of similarity search. Similarity search is the foundation of k-nearest-neighbor (k-NN) classification, which often produces competitively-low error rates in practice, particularly when the number of classes is large. The error rate of nearest-neighbor classification has been shown to be ‘asymptotically optimal’ as the training set size increases. For clustering, many of the most effective and popular strategies require the determination of neighbor sets based at a substantial proportion of the data set objects: examples include hierarchical (agglomerative) methods such as content-based filtering methods for recommender systems and anomaly detection methods commonly make use of k-NN techniques, either through the direct use of k-NN search, or by means of k-NN cluster analysis.

A very popular density-based measure, the Local Outlier Factor (LOF), relies heavily on k-NN set computation to determine the relative density of the data in the vicinity of the test point [8]. For data mining applications based on similarity search, data objects are typically modeled as feature vectors of attributes for which some measure of similarity is defined Motivated at least in part by the impact of similarity search on problems in data mining, machine learning, pattern recognition, and statistics, the design and analysis of scalable and effective similarity search structures has been the subject of intensive research for many decades. Until relatively recently, most data structures for similarity search targeted low-dimensional real vector space representations and the euclidean or other Lp distance metrics.

However, many public and commercial data sets available today are more naturally represented as vectors spanning many hundreds or thousands of feature attributes that can be real or integer-valued, ordinal or categorical, or even a mixture of these types. This has spurred the development of search structures for more general metric spaces, such as the MultiVantage-Point Tree, the Geometric Near-neighbor Access Tree (GNAT), Spatial Approximation Tree (SAT), the M-tree, and (more recently) the Cover Tree (CT). Despite their various advantages, spatial and metric search structures are both limited by an effect often referred to as the curse of dimensionality.

One way in which the curse may manifest itself is in a tendency of distances to concentrate strongly around their mean values as the dimension increases. Consequently, most pairwise distances become difficult to distinguish, and the triangle inequality can no longer be effectively used to eliminate candidates from consideration along search paths. Evidence suggests that when the representational dimension of feature vectors is high (roughly 20 or more traditional similarity search accesses an unacceptably-high proportion of the data elements, unless the underlying data distribution has special properties. Even though the local neighborhood information employed by data mining applications is useful and meaningful, high data dimensionality tends to make this local information very expensive to obtain.

The performance of similarity search indices depends crucially on the way in which they use similarity information for the identification and selection of objects relevant to the query. Virtually all existing indices make use of numerical constraints for pruning and selection. Such constraints include the triangle inequality (a linear constraint on three distance values), other bounding surfaces defined in terms of distance (such as hypercubes or hyperspheres), range queries involving approximation factors as in Locality-Sensitive Hashing (LSH) or absolute quantities as additive distance terms. One serious drawback of such operations based on numerical constraints such as the triangle inequality or distance ranges is that the number of objects actually examined can be highly variable, so much so that the overall execution time cannot be easily predicted.

Similarity search, researchers and practitioners have investigated practical methods for speeding up the computation of neighborhood information at the expense of accuracy. For data mining applications, the approaches considered have included feature sampling for local outlier detection, data sampling for clustering, and approximate similarity search for k-NN classification. Examples of fast approximate similarity search indices include the BD-Tree, a widely-recognized benchmark for approximate k-NN search; it makes use of splitting rules and early termination to improve upon the performance of the basic KD-Tree. One of the most popular methods for indexing, Locality-Sensitive Hashing can also achieve good practical search performance for range queries by managing parameters that influence a tradeoff between accuracy and time.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor      –    SVGA

SOFTWARE REQUIREMENTS:

JAVA

  • Operating System        :           Windows XP or Win7
  • Front End       :           JAVA JDK 1.7
  • Back End :           MYSQL Server
  • Server :           Apache Tomact Server
  • Script :           JSP Script
  • Document :           MS-Office 2007

.NET

  • Operating System        :           Windows XP or Win7
  • Front End       :           Microsoft Visual Studio .NET 2008
  • Script :           C# Script
  • Back End :           MS-SQL Server 2005
  • Document :           MS-Office 2007

PSMPA: PATIENT SELF-CONTROLLABLE AND MULTI-LEVEL PRIVACY-PRESERVING COOPERATIVE AUTHENTICATION IN DISTRIBUTED M-HEALTHCARE CLOUD COMPUTING SYSTEM

ABSTRACT:

The Distributed m-healthcare cloud computing system considerably facilitates secure and efficient patient treatment for medical consultation by sharing personal health information among the healthcare providers. This system should bring about the challenge of keeping both the data confidentiality and patients’ identity privacy simultaneously. Many existing access control and anonymous authentication schemes cannot be straightforwardly exploited. To solve the problem proposed a novel authorized accessible privacy model (AAPM) is established. Patients can authorize physicians by setting an access tree supporting flexible threshold predicates.

Our new technique of attribute based designated verifier signature, a patient self-controllable multi-level privacy preserving cooperative authentication scheme (PSMPA) realizing three levels of security and privacy requirement in distributed m-healthcare cloud computing system is proposed. The directly authorized physicians, the indirectly authorized physicians and the unauthorized persons in medical consultation can respectively decipher the personal health information and/or verify patients’ identities by satisfying the access tree with their own attribute sets.

INTRODUCTION:

Distributed m-healthcare cloud computing systems have been increasingly adopted worldwide including the European Commission activities, the US Health Insurance Portability and Accountability Act (HIPAA) and many other governments for efficient and high-quality medical treatment. In m-healthcare social networks, the personal health information is always shared among the patients located in respective social communities suffering from the same disease for mutual support, and across distributed healthcare providers (HPs) equipped with their own cloud servers for medical consultant. However, it also brings about a series of challenges, especially how to ensure the security and privacy of the patients’ personal health information from various attacks in the wireless communication channel such as eavesdropping and tampering As to the security facet, one of the main issues is access control of patients’ personal health information, namely it is only the authorized physicians or institutions that can recover the patients’ personal health information during the data sharing in the distributed m-healthcare cloud computing system. In practice, most patients are concerned about the confidentiality of their personal health information since it is likely to make them in trouble for each kind of unauthorized collection and disclosure.

Therefore, in distributed m-healthcare cloud computing systems, which part of the patients’ personal health information should be shared and which physicians their personal health information should be shared with have become two intractable problems demanding urgent solutions. There has emerged various research results focusing on them. A fine-grained distributed data access control scheme is proposed using the technique of attribute based encryption (ABE). A rendezvous-based access control method provides access privilege if and only if the patient and the physician meet in the physical world. Recently, a patient-centric and fine-grained data access control in multi-owner settings is constructed for securing personal health records in cloud computing. However, it mainly focuses on the central cloud computing system which is not sufficient for efficiently processing the increasing volume of personal health information in m-healthcare cloud computing system.

Moreover, it is not enough for to only guarantee the data confidentiality of the patient’s personal health information in the honest-but-curious cloud server model since the frequent communication between a patient and a professional physician can lead the adversary to conclude that the patient is suffering from a specific disease with a high probability. Unfortunately, the problem of how to protect both the patients’ data confidentiality and identity privacy in the distributed m-healthcare cloud computing scenario under the malicious model was left untouched.

In this paper, we consider simultaneously achieving data confidentiality and identity privacy with high efficiency. As is described in Fig. 1, in distributed m-healthcare cloud computing systems, all the members can be classified into three categories: the directly authorized physicians with green labels in the local healthcare provider who are authorized by the patients and can both access the patient’s personal health information and verify the patient’s identity and the indirectly authorized physicians with yellow labels in the remote healthcare providers who are authorized by the directly authorized physicians for medical consultant or some research purposes (i.e., since they are not authorized by the patients, we use the term ‘indirectly authorized’ instead). They can only access the personal health information, but not the patient’s identity. For the unauthorized persons with red labels, nothing could be obtained. By extending the techniques of attribute based access control and designated verifier signatures (DVS) on de-identified health information

LITRATURE SURVEY

SECURING PERSONAL HEALTH RECORDS IN CLOUD COMPUTING: PATIENT-CENTRIC AND FINE-GRAINED DATA ACCESS CONTROL IN MULTI-OWNER SETTINGS

AUTHOR: M. Li, S. Yu, K. Ren, and W. Lou

PUBLISH: Proc. 6th Int. ICST Conf. Security Privacy Comm. Netw., 2010, pp. 89–106.

EXPLANATION:

Online personal health record (PHR) enables patients to manage their own medical records in a centralized way, which greatly facilitates the storage, access and sharing of personal health data. With the emergence of cloud computing, it is attractive for the PHR service providers to shift their PHR applications and storage into the cloud, in order to enjoy the elastic resources and reduce the operational cost. However, by storing PHRs in the cloud, the patients lose physical control to their personal health data, which makes it necessary for each patient to encrypt her PHR data before uploading to the cloud servers. Under encryption, it is challenging to achieve fine-grained access control to PHR data in a scalable and efficient way. For each patient, the PHR data should be encrypted so that it is scalable with the number of users having access. Also, since there are multiple owners (patients) in a PHR system and every owner would encrypt her PHR files using a different set of cryptographic keys, it is important to reduce the key distribution complexity in such multi-owner settings. Existing cryptographic enforced access control schemes are mostly designed for the single-owner scenarios. In this paper, we propose a novel framework for access control to PHRs within cloud computing environment. To enable fine-grained and scalable access control for PHRs, we leverage attribute based encryption (ABE) techniques to encrypt each patients’ PHR data. To reduce the key distribution complexity, we divide the system into multiple security domains, where each domain manages only a subset of the users. In this way, each patient has full control over her own privacy, and the key management complexity is reduced dramatically.

PRIVACY AND EMERGENCY RESPONSE IN E-HEALTHCARE LEVERAGING WIRELESS BODY SENSOR NETWORKS

AUTHOR: J. Sun, Y. Fang, and X. Zhu

PUBLISH: IEEE Wireless Commun., vol. 17, no. 1, pp. 66–73, Feb. 2010.

EXPLANATION:

Electronic healthcare is becoming a vital part of our living environment and exhibits advantages over paper-based legacy systems. Privacy is the foremost concern of patients and the biggest impediment to e-healthcare deployment. In addressing privacy issues, conflicts from the functional requirements must be taken into account. One such requirement is efficient and effective response to medical emergencies. In this article, we provide detailed discussions on the privacy and security issues in e-healthcare systems and viable techniques for these issues. Furthermore, we demonstrate the design challenge in the fulfillment of conflicting goals through an exemplary scenario, where the wireless body sensor network is leveraged, and a sound solution is proposed to overcome the conflict.

HCPP: CRYPTOGRAPHY BASED SECURE EHR SYSTEM FOR PATIENT PRIVACY AND EMERGENCY HEALTHCARE

AUTHOR: J. Sun, X. Zhu, C. Zhang, and Y. Fang

PUBLISH: Proc. 31st Int. Conf. Distrib. Comput. Syst., 2011, pp. 373–382.

EXPLANATION:

Privacy concern is arguably the major barrier that hinders the deployment of electronic health record (EHR) systems which are considered more efficient, less error-prone, and of higher availability compared to traditional paper record systems. Patients are unwilling to accept the EHR system unless their protected health information (PHI) containing highly confidential data is guaranteed proper use and disclosure, which cannot be easily achieved without patients’ control over their own PHI. However, cautions must be taken to handle emergencies in which the patient may be physically incompetent to retrieve the controlled PHI for emergency treatment. In this paper, we propose a secure EHR system, HCPP (Healthcaresystem for Patient Privacy), based on cryptographic constructions and existing wireless network infrastructures, to provide privacy protection to patients under any circumstances while enabling timelyPHI retrieval for life-saving treatment in emergency situations. Furthermore, our HCPP system restricts PHI access to authorized (not arbitrary) physicians, who can be traced and held accountable if the accessed PHI is found improperly disclosed. Last but not least, HCPP leverages wireless network access to support efficient and private storage/retrieval of PHI, which underlies a secure and feasible EHR system.

PRIVACY-PRESERVING DETECTION OF SENSITIVE DATA EXPOSURE

ABSTRACT:

Statistics from security firms, research institutions and government organizations show that the numbers of data-leak instances have grown rapidly in recent years. Among various data-leak cases, human mistakes are one of the main causes of data loss. There exist solutions detecting inadvertent sensitive data leaks caused by human mistakes and to provide alerts for organizations. A common approach is to screen content in storage and transmission for exposed sensitive information. Such an approach usually requires the detection operation to be conducted in secrecy. However, this secrecy requirement is challenging to satisfy in practice, as detection servers may be compromised or outsourced.

In this paper, we present a privacy preserving data-leak detection (DLD) solution to solve the issue where a special set of sensitive data digests is used in detection. The advantage of our method is that it enables the data owner to safely delegate the detection operation to a semihonest provider without revealing the sensitive data to the provider. We describe how Internet service providers can offer their customers DLD as an add-on service with strong privacy guarantees. The evaluation results show that our method can support accurate detection with very small number of false alarms under various data-leak scenarios.

INTRODUCTION

According to a report from Risk Based Security (RBS), the number of leaked sensitive data records has increased dramatically during the last few years, i.e., from 412 million in 2012 to 822 million in 2013. Deliberately planned attacks, inadvertent leaks (e.g., forwarding confidential emails to unclassified email accounts), and human mistakes (e.g., assigning the wrong privilege) lead to most of the data-leak incidents. Detecting and preventing data leaks requires a set of complementary solutions, which may include data-leak detection, data confinement, stealthy malware detection and policy enforcement.

Network data-leak detection (DLD) typically performs deep packet inspection (DPI) and searches for any occurrences of sensitive data patterns. DPI is a technique to analyze payloads of IP/TCP packets for inspecting application layer data, e.g., HTTP header/content. Alerts are triggered when the amount of sensitive data found in traffic passes a threshold. The detection system can be deployed on a router or integrated into existing network intrusion detection systems (NIDS). Straightforward realizations of data-leak detection require the plaintext sensitive data.

However, this requirement is undesirable, as it may threaten the confidentiality of the sensitive information. If a detection system is compromised, then it may expose the plaintext sensitive data (in memory). In addition, the data owner may need to outsource the data-leak detection to providers, but may be unwilling to reveal the plaintext sensitive data to them. Therefore, one needs new data-leak detection solutions that allow the providers to scan content for leaks without learning the sensitive information.

In this paper, we propose a data-leak detection solution which can be outsourced and be deployed in a semihonest detection environment. We design, implement, and evaluate our fuzzy fingerprint technique that enhances data privacy during data-leak detection operations. Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data. Using our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected. Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. In another scenario, individuals can mark their own sensitive data and ask the administrator of their local network to detect data leaks for them.

In our detection procedure, the data owner computes a special set of digests or fingerprints from the sensitive data and then discloses only a small amount of them to the DLD provider. The DLD provider computes fingerprints from network traffic and identifies potential leaks in them. To prevent the DLD provider from gathering exact knowledge about the sensitive data, the collection of potential leaks is composed of real leaks and noises. It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.

Our contributions are summarized as follows.

1) We describe a privacy-preserving data-leak detection model for preventing inadvertent data leak in network traffic. Our model supports detection operation delegation and ISPs can provide data-leak detection as an add-on service to their customers using our model. We design, implement, and evaluate an efficient technique, fuzzy fingerprint, for privacy-preserving data-leak detection. Fuzzy fingerprints are special sensitive data digests prepared by the data owner for release to the DLD provider.

2) We implement our detection system and perform extensive experimental evaluation on 2.6 GB Enron dataset, Internet surfing traffic of 20 users, and also 5 simulated real-worlds data-leak scenarios to measure its privacy guarantee, detection rate and efficiency. Our results indicate high accuracy achieved by our underlying scheme with very low false positive rate. Our results also show that the detection accuracy does not degrade much when only partial (sampled) sensitive-data digests are used. In addition, we give an empirical analysis of our fuzzification as well as of the fairness of fingerprint partial disclosure.

SYSTEM ANALYSIS

EXISTING SYSTEM:

  • Existing detecting and preventing data leaks requires a set of complementary solutions, which may include data-leak detection, data confinement, stealthy malware detection, and policy enforcement.
  • Network data-leak detection (DLD) typically performs deep packet inspection (DPI) and searches for any occurrences of sensitive data patterns. DPI is a technique to analyze payloads of IP/TCP packets for inspecting application layer data, e.g., HTTP header/content.
  • Alerts are triggered when the amount of sensitive data found in traffic passes a threshold. The detection system can be deployed on a router or integrated into existing network intrusion detection systems (NIDS).
  • Straightforward realizations of data-leak detection require the plaintext sensitive data. However, this requirement is undesirable, as it may threaten the confidentiality of the sensitive information. If a detection system is compromised, then it may expose the plaintext sensitive data (in memory).
  • In addition, the data owner may need to outsource the data-leak detection to providers, but may be unwilling to reveal the plaintext sensitive data to them. Therefore, one needs new data-leak detection solutions that allow the providers to scan content for leaks without learning the sensitive information.

DISADVANTAGES:

  • As the Internet grows and network bandwidth continues to increase, administrators are faced with the task of keeping confidential information from leaving their networks. In response, researchers have created data loss prevention systems that check outgoing traffic for known confidential information.
  • These systems stop naive adversaries from leaking data, but are fundamentally unable to identify encrypted or obfuscated information leaks. What remains is a high-capacity pipe for tunneling data to the Internet.
  • Existing approach for quantifying information leak capacity in network traffic instead of trying to detect the presence of sensitive data-an impossible task in the general case–our goal is to measure and constrain its maximum volume.
  • We take disadvantage of the insight that most network traffic is repeated or determined by external information, such as protocol specifications or messages sent by a server. By filtering this data, we can isolate and quantify true information flowing from a computer.

PROPOSED SYSTEM:

  • We propose a data-leak detection solution which can be outsourced and be deployed in a semihonest detection environment. We design, implement, and evaluate our fuzzy fingerprint technique that enhances data privacy during data-leak detection operations.
  • Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data.
  • Our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected. Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. In another scenario, individuals can mark their own sensitive data and ask the administrator of their local network to detect data leaks for them.
  • Our detection procedure, the data owner computes a special set of digests or fingerprints from the sensitive data and then discloses only a small amount of them to the DLD provider. The DLD provider computes fingerprints from network traffic and identifies potential leaks in them.
  • To prevent the DLD provider from gathering exact knowledge about the sensitive data, the collection of potential leaks is composed of real leaks and noises. It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.

ADVANTAGES:

  • We describe privacy-preserving data-leak detection model for preventing inadvertent data leak in network traffic. Our model supports detection operation delegation and ISPs can provide data-leak detection as an add-on service to their customers using our model.
  • We design, implement, and evaluate an efficient technique, fuzzy fingerprint, for privacy-preserving data-leak detection. Fuzzy fingerprints are special sensitive data digests prepared by the data owner for release to the DLD provider.
  • We implement our detection system and perform extensive experimental evaluation on internet surfing traffic of 20 users, and also 5 simulated real-worlds data-leak scenarios to measure its privacy guarantee, detection rate and efficiency.
  • Our results indicate high accuracy achieved by our underlying scheme with very low false positive rate. Our results also show that the detection accuracy does not degrade much when only partial (sampled) sensitive-data digests are used an empirical analysis of our fuzzification as well as of the fairness of fingerprint partial disclosure.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

   Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk       –   20 GB
  • Floppy Drive        –    44 MB
  • Key Board       –    Standard Windows Keyboard
  • Mouse        –    Two or Three Button Mouse
  • Monitor       –    SVGA

SOFTWARE REQUIREMENTS:

  • Operating System          :           Windows XP or Win7
  • Front End        :           Microsoft Visual Studio .NET  
  • Back End :           MS-SQL Server
  • Server :           ASP .NET Web Server
  • Script :           C# Script
  • Document :           MS-Office 2007

PASSIVE IP TRACEBACK: DISCLOSING THE LOCATIONS OF IP SPOOFERS FROM PATH BACKSCATTER

 ABSTRACT:

It is long known attackers may use forged source IP address to conceal their real locations. To capture the spoofers, a number of IP traceback mechanisms have been proposed. However, due to the challenges of deployment, there has been not a widely adopted IP traceback solution, at least at the Internet level. As a result, the mist on the locations of spoofers has never been dissipated till now.

This paper proposes passive IP traceback (PIT) that bypasses the deployment difficulties of IP traceback techniques. PIT investigates Internet Control Message Protocol error messages (named path backscatter) triggered by spoofing traffic, and tracks the spoofers based on public available information (e.g., topology). In this way, PIT can find the spoofers without any deployment requirement.

This paper illustrates the causes, collection, and the statistical results on path backscatter, demonstrates the processes and effectiveness of PIT, and shows the captured locations of spoofers through applying PIT on the path backscatter data set.

These results can help further reveal IP spoofing, which has been studied for long but never well understood. Though PIT cannot work in all the spoofing attacks, it may be the most useful mechanism to trace spoofers before an Internet-level traceback system has been deployed in real.

 INTRODUCTION

IP spoofing, which means attackers launching attacks with forged source IP addresses, has been recognized as a serious security problem on the Internet for long. By using addresses that are assigned to others or not assigned at all, attackers can avoid exposing their real locations, or enhance the effect of attacking, or launch reflection based attacks. A number of notorious attacks rely on IP spoofing, including SYN flooding, SMURF, DNS amplification etc. A DNS amplification attack which severely degraded the service of a Top Level Domain (TLD) name server is reported in though there has been a popular conventional wisdom that DoS attacks are launched from botnets and spoofing is no longer critical, the report of ARBOR on NANOG 50th meeting shows spoofing is still significant in observed DoS attacks. Indeed, based on the captured backscatter messages from UCSD Network Telescopes, spoofing activities are still frequently observed.

To capture the origins of IP spoofing traffic is of great importance. As long as the real locations of spoofers are not disclosed, they cannot be deterred from launching further attacks. Even just approaching the spoofers, for example, determining the ASes or networks they reside in, attackers can be located in a smaller area, and filters can be placed closer to the attacker before attacking traffic get aggregated. The last but not the least, identifying the origins of spoofing traffic can help build a reputation system for ASes, which would be helpful to push the corresponding ISPs to verify IP source address.

Instead of proposing another IP traceback mechanism with improved tracking capability, we propose a novel solution, named Passive IP Traceback (PIT), to bypass the challenges in deployment. Routers may fail to forward an IP spoofing packet due to various reasons, e.g., TTL exceeding. In such cases, the routers may generate an ICMP error message (named path backscatter) and send the message to the spoofed source address. Because the routers can be close to the spoofers, the path backscatter messages may potentially disclose the locations of the spoofers. PIT exploits these path backscatter messages to find the location of the spoofers. With the locations of the spoofers known, the victim can seek help from the corresponding ISP to filter out the attacking packets, or take other counterattacks. PIT is especially useful for the victims in reflection based spoofing attacks, e.g., DNS amplification attacks. The victims can find the locations of the spoofers directly from the attacking traffic.

In this article, at first we illustrate the generation, types, collection, and the security issues of path backscatter messages in section III. Then in section IV, we present PIT, which tracks the location of the spoofers based on path backscatter messages together with the topology and routing information. We discuss how to apply PIT when both topology and routing are known, or only topology is known, or neither are known respectively. We also present two effective algorithms to apply PIT in large scale networks. In the following section, at first we show the statistical results on path backscatter messages. Then we evaluate the two key mechanisms of PIT which work without routing information. At last, we give the tracking result when applying PIT on the path backscatter message dataset: a number of ASes in which spoofers are found.

Our work has the following contributions:

1) This is the first article known which deeply investigates path backscatter messages. These messages are valuable to help understand spoofing activities. Though Moore et al. [8] has exploited backscatter messages, which are generated by the targets of spoofing messages, to study Denial of Services (DoS), path backscatter messages, which are sent by intermediate devices rather than the targets, have not been used in traceback. 2) A practical and effective IP traceback solution based on path backscatter messages, i.e., PIT, is proposed. PIT bypasses the deployment difficulties of existing IP traceback mechanisms and actually is already in force. Though given the limitation that path backscatter messages are not generated with stable possibility, PIT cannot work in all the attacks, but it does work in a number of spoofing activities. At least it may be the most useful traceback mechanism before an AS-level traceback system has been deployed in real. 3) Through applying PIT on the path backscatter dataset, a number of locations of spoofers are captured and presented. Though this is not a complete list, it is the first known list disclosing the locations of spoofers.

PANDA: PUBLIC AUDITING FOR SHARED DATA WITH EFFICIENT USER REVOCATION IN THE CLOUD

ABSTRACT:

With data storage and sharing services in the cloud, users can easily modify and share data as a group. To ensure share data integrity can be verified publicly, users in the group need to compute signatures on all the blocks in shared data. Different blocks in shared data are generally signed by different users due to data modifications performed by different users. For security reasons, once a user is revoked from the group, the blocks which were previously signed by this revoked user must be re-signed by an existing user. The straight forward method, which allows an existing user to download the corresponding part of shared data and re-sign it during user revocation, is inefficient due to the large size of shared data in the cloud. In this paper, we propose a novel public auditing mechanism

For the integrity of shared data with efficient user revocation in mind. By utilizing the idea of proxy re-signatures, we allow the cloud tore-sign blocks on behalf of existing users during user revocation, so that existing users do not need to download and re-sign blocks by themselves. In addition, a public verifier is always able to audit the integrity of shared data without retrieving the entire data from the

Cloud, even if some part of shared data has been re-signed by the cloud. Moreover, our mechanism is able to support batch auditing by verifying multiple auditing tasks simultaneously. Experimental results show that our mechanism can significantly improve the efficiency of user revocation.

INTRODUCTION

With data storage and sharing services (such as Dropbox and Google Drive) provided by the cloud, people can easily work together as a group by sharing data with each other. More specifically, once a user creates shared data in the cloud, every user in the group is able to not only access and modify shared data, but also share the latest version of the shared data with the rest of the group. Although cloud providers promise a more secure and reliable environment to the users, the integrity of data in the cloud may still be compromised, due to the existence of hardware/software failures and human errors.

To protect the integrity of data in the cloud, a number of mechanisms have been proposed. In these mechanisms, a signature is attached to each block in data, and the integrity of data relies on the correctness of all the signatures. One of the most significant and common features of these mechanisms is to allow a public verifier to efficiently check data integrity in the cloud without downloading the entire data, referred to as public auditing (or denoted as Provable Data Possession). This public verifier could be a client who would like to utilize cloud data for particular purposes (e.g., search, computation, data mining, etc.) or a thirdparty auditor (TPA) who is able to provide verification services on data integrity to users. Most of the previous works focus on auditing the integrity of personal data. Different from these works, several recent works focus on how to preserve identity privacy from public verifiers when auditing the integrity of shared data. Unfortunately, none of the above mechanisms, considers the efficiency of user revocation when auditing the correctness of shared data in the cloud.

With shared data, once a user modifies a block, she also needs to compute a new signature for the modified block. Due to the modifications from different users, different blocks are signed by different users. For security reasons, when a user leaves the group or misbehaves, this user must be revoked from the group. As a result, this revoked user should no longer be able to access and modify shared data, and the signatures generated by this revoked user are no longer valid to the group. Therefore, although the content of shared data is not changed during user revocation, the blocks, which were previously signed by the revoked user, still need to be re-signed by an existing user in the group. As a result, the integrity of the entire data can still be verified with the public keys of existing users only.

Since shared data is outsourced to the cloud and users no longer store it on local devices, a straightforward method to re-compute these signatures during user revocation is to ask an existing user to first download the blocks previously signed by the revoked user verify the correctness of these blocks, then re-sign these blocks, and finally upload the new signatures to the cloud. However, this straightforward method may cost the existing user a huge amount of communication and computation resources by downloading and verifying blocks, and by re-computing and uploading signatures, especially when the number of re-signed blocks is quite large or the membership of the group is frequently changing. To make this matter even worse, existing users may access their data sharing services provided by the cloud with resource limited devices, such as mobile phones, which further prevents existing users from maintaining the correctness of shared data efficiently during user revocation.

Clearly, if the cloud could possess each user’s private key, it can easily finish the re-signing task for existing users without asking them to download and re-sign blocks. However, since the cloud is not in the same trusted domain with each user in the group, outsourcing every user’s private key to the cloud would introduce significant security issues. Another important problem we need to consider is that the re-computation of any signature during user revocation should not affect the most attractive property of public auditing — auditing data integrity publicly without retrieving the entire data. Therefore, how to efficiently reduce the significant burden to existing users introduced by user revocation, and still allow a public verifier to check the integrity of shared data without downloading the entire data from the cloud, is a challenging task.

In this paper, we propose Panda, a novel public auditing mechanism for the integrity of shared data with efficient user revocation in the cloud. In our mechanism, by utilizing the idea of proxy re-signatures, once a user in the group is revoked, the cloud is able to resign the blocks, which were signed by the revoked user, with a re-signing key. As a result, the efficiency of user revocation can be significantly improved, and computation and communication resources of existing users can be easily saved. Meanwhile, the cloud, who is not in the same trusted domain with each user, is only able to convert a signature of the revoked user into a signature of an existing user on the same block, but it cannot sign arbitrary blocks on behalf of either the revoked user or an existing user. By designing a new proxy re-signature scheme with nice properties, which traditional proxy resignatures do no have, our mechanism is always able to check the integrity of shared data without retrieving the entire data from the cloud.

NEIGHBOR SIMILARITY TRUST AGAINST SYBIL ATTACK IN P2P E-COMMERCE

In this paper, we present a distributed structured approach to Sybil attack. This is derived from the fact that our approach is based on the neighbor similarity trust relationship among the neighbor peers. Given a P2P e-commerce trust relationship based on interest, the transactions among peers are flexible as each peer can decide to trade with another peer any time. A peer doesn’t have to consult others in a group unless a recommendation is needed. This approach shows the advantage in exploiting the similarity trust relationship among peers in which the peers are able to monitor each other.

Our contribution in this paper is threefold:

1) We propose SybilTrust that can identify and protect honest peers from Sybil attack. The Sybil peers can have their trust canceled and dismissed from a group.

2) Based on the group infrastructure in P2P e-commerce, each neighbor is connected to the peers by the success of the transactions it makes or the trust evaluation level. A peer can only be recognized as a neighbor depending on whether or not trust level is sustained over a threshold value.

3) SybilTrust enables neighbor peers to carry recommendation identifiers among the peers in a group. This ensures that the group detection algorithms to identify Sybil attack peers to be efficient and scalable in large P2P e-commerce networks.

Malware Propagation in Large-Scale Networks

Malware is pervasive in networks, and poses a critical threat to network security. However, we have very limited understanding of malware behavior in networks to date. In this paper, we investigate how malware propagate in networks from a global perspective. We formulate the problem, and establish a rigorous two layer epidemic model for malware propagation from network to network. Based on the proposed model, our analysis indicates that the distribution of a given malware follows exponential distribution, power law distribution with a short exponential

tail, and power law distribution at its early, late and final stages, respectively. Extensive experiments have been performed through two real-world global scale malware data sets, and the results confirm our theoretical findings.

LOSSLESS AND REVERSIBLE DATA HIDING IN ENCRYPTED IMAGES WITH PUBLIC KEY CRYPTOGRAPHY

This paper proposes a lossless, a reversible, and a combined data hiding schemes for ciphertext images encrypted by public key cryptosystems with probabilistic and homomorphic properties. In the lossless scheme, the ciphertext pixels are replaced with new values to embed the additional data into several LSB-planes of ciphertext pixels by multi-layer wet paper coding. Then, the embedded data can be directly extracted from the encrypted domain, and the data embedding operation does not affect the decryption of original plaintext image. In the reversible scheme, a preprocessing is employed to shrink the image histogram before image encryption, so that the modification on encrypted images for data embedding will not cause any pixel oversaturation in plaintext domain. Although a slight distortion is introduced, the embedded data can be extracted and the original image can be recovered from the directly decrypted image. Due to the compatibility between the lossless and reversible schemes, the data embedding operations in the two manners can be simultaneously performed in an encrypted image. With the combined technique, a receiver may extract a part of embedded data before decryption, and extract another part of embedded data and recover the original plaintext image after decryption.

Joint Beamforming, Power and Channel Allocation in Multi-User and Multi-Channel Underlay MISO Cognitive Radio Networks

In this paper, we consider a joint beamforming, power, and channel allocation in a multi-user and multi-channel underlay multiple input single output (MISO) cognitive radio network (CRN). In this system, primary users’ (PUs’) spectrum can be reused by the secondary user transmitters (SUTXs) to maximize the spectrum utilization while the intra-user interference is minimized by implementing beamforming at each SU-TX. After formulating the joint optimization problem as a non-convex, mixed integer nonlinear programming (MINLP) problem, we propose a solution which consists of two stages.

In the first stage, a feasible solution for power allocation and beamforming vectors is derived under a given channel allocation by converting the original problem into a convex form with an introduced optimal auxiliary variable and semidefinite relaxation (SDR) approach. After that, in the second stage, two explicit searching algorithms, i.e., genetic algorithm (GA) and simulated annealing (SA)-based algorithm, are proposed to determine suboptimal channel allocations. Simulation results show that
beamforming, power and channel allocation with SA (BPCA-SA) algorithm can achieve close-to-optimal sum-rate while having a lower computational complexity compared with beamforming, power and channel allocation with GA (BPCA-GA) algorithm.

Furthermore, our proposed allocation scheme has significant improvement in achievable sum-rate compared to the existing zero-forcing beamforming (ZFBF).

GDCLUSTER A GENERAL DECENTRALIZED CLUSTERING ALGORITHM

In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P.