Authenticated Key Exchange Protocols for Parallel Network File Systems

We study the problem of key establishment for secure many-to-many communications. The problem is inspired by the proliferation of large-scale distributed file systems supporting parallel access to multiple storage devices. Our work focuses on the current Internet standard for such file systems, i.e., parallel
Network File System (pNFS), which makes use of Kerberos to establish parallel session keys between clients and storage devices.
Our review of the existing Kerberos-based protocol shows that it has a number of limitations:

(i) a metadata server facilitating key exchange between the clients and the storage devices has heavy workload that restricts the scalability of the protocol;

(ii) the protocol does not provide forward secrecy;

(iii) the metadata server generates itself all the session keys that are used between the clients and storage devices, and this inherently leads to key escrow. In this paper, we propose a variety of authenticated key exchange protocols that are designed to address the above issues. We show that our protocols are capable of reducing up to approximately 54% of the workload of the metadata server and concurrently supporting forward secrecy and escrow-freeness. All this requires only a small fraction of increased computation overhead at the client.

AGGREGATED-PROOF BASED HIERARCHICAL AUTHENTICATION SCHEME FOR THE INTERNET OF THINGS

ABSTRACT:

The Internet of Things (IoT) is becoming an attractive system paradigm to realize interconnections through the physical, cyber, and social spaces. During the interactions among the ubiquitous things, security issues become noteworthy, and it is significant to establish enhanced solutions for security protection. In this work, we focus on an existing U2IoT architecture (i.e., unit IoT and ubiquitous IoT), to design an aggregated-proof based hierarchical authentication scheme (APHA) for the layered networks. Concretely, 1) the aggregated-proofs are established for multiple targets to achieve backward and forward anonymous data transmission; 2) the directed path descriptors, homomorphism functions, and Chebyshev chaotic maps are jointly applied for mutual authentication; 3) different access authorities are assigned to achieve hierarchical access control. Meanwhile, the BAN logic formal analysis is performed to prove that the proposed APHA has no obvious security defects, and it is potentially available for the U2IoT architecture and other IoT applications.

INTRODUCTION:

The Internet of Things (IoT) is emerging as an attractive system paradigm to integrate physical perceptions, cyber interactions, and social correlations, in which the physical objects, cyber entities, and social attributes are required to achieve interconnections with the embedded intelligence. During the interconnections, the IoT is suffering from severe security challenges, and there are potential vulnerabilities due to the complicated networks referring to heterogeneous targets, sensors, and backend management systems. It becomes noteworthy to address the security issues for the ubiquitous things in the IoT.

Recent studies have been worked on the general IoT, including system models, service platforms, infrastructure architectures, and standardization. Particularly, a human-society inspired U2IoT architecture (i.e., unit IoT and ubiquitous IoT) is proposed to achieve the physical cyber- social convergence in the U2IoT architecture, mankind neural system and social organization framework are introduced to establish the single-application and multi-application IoT frameworks.

Multiple unit IoTs compose a local IoT within a region, or an industrial IoT for an industry. The local IoTs and industrial IoTs are covered within a national IoT, and jointly form the ubiquitous IoT. Towards the IoT security, related works mainly refer to the security architectures and recommended countermeasures secure communication and networking mechanisms cryptography algorithms and application security solutions.

Current researches mainly refer to three aspects: system security, network security, and application security.

_ System security mainly considers a whole IoT system to identify the unique security and privacy challenges, to design systemic security frameworks, and to provide security measures and guidelines.

_ Network security mainly focuses on wireless communication networks (e.g., wireless sensor networks (WSN), radio frequency identification (RFID), and the Internet) to design key distribution algorithms, authentication protocols, advanced signature algorithms, access control mechanisms, and secure routing protocols. Particularly, authentication protocols are popular to address security and privacy issues in the IoT, and should be designed considering the things’ heterogeneity and hierarchy.

_ Application security serves for IoT applications (e.g.., multimedia, smart home, and smart grid), and resolves practical problems with particular scenario requirements.

Towards the U2IoT architecture, a reasonable authentication scheme should satisfy the following requirements. 1) Data CIA (i.e., confidentiality, integrity, and availability): The exchanged messages between any two legal entities should be protected against illegal access and modification. The communication channels should be reliable for the legal entities. 2) Hierarchical access control: Diverse access authorities are assigned to different entities to provide hierarchical interactions.

An unauthorised entity cannot access data exceeding its permission. 3) Forward security: Attackers cannot correlate any two communication sessions, and also cannot derive the previous interrogations according to the ongoing session. 4) Mutual authentication: The untrusted entities should pass each other’s verification so that only the legal entity can access the networks for data acquisition. 5) Privacy preservation: The sensors cannot correlate or disclose an individual target’s private information (e.g., location). Considering above security requirements, we design an aggregated proof based hierarchical authentication scheme (APHA) for the unit IoT.

EXISTING SYSTEM:

Existing WSN network is to be completely integrated into the Internet as part of the Internet of Things (IoT), it is necessary to consider various security challenges, such as the creation of a secure channel between an Internet host and a sensor node. In order to create such a channel, it is necessary to provide key management mechanisms that allow two remote devices to negotiate certain security credentials (e.g. secret keys) that will be used to protect the information flow analyze not only the applicability.

Existing mechanisms such as public key cryptography and pre-shared keys for sensor nodes in the IoT context, but also the applicability of those link-layer oriented key management systems (KMS) whose original purpose is to provide shared keys for sensor nodes belonging to the same WSNs to provide key management mechanisms to allow that two remote devices can negotiate certain security certificates (e.g., shared keys, Blom key pairs, and polynomial shares). The authors analyzed the applicability of existing mechanisms, including public key infrastructure (PKI) and pre-shared keys for sensor nodes in IoT contexts.

DISADVANTAGES:

Smart community model for IoT applications, and a cyber-physical system with the networked smart homes was introduced with security considerations. Filtering false network traffic and avoiding unreliable home gateways are suggested for safeguard. Meanwhile, the security challenges are discussed, including the cooperative authentication, unreliable node detection, target tracking, and intrusion detection group of individuals that hacked into federal sites and released confidential information to the public in the government is supposed to have the highest level of security, yet their system was easily breached.   Therefore, if all of our information is stored on the internet, people could hack into it, finding out everything about individuals lives. Also, companies could misuse the information that they are given access to.  This is a common mishap that occurs within companies all the time.  

PROPOSED SYSTEM:

We proposed scheme realizes data confidentiality and data integrity by the directed path descriptor and homomorphism based Chebyshev chaotic maps, establishes trust relationships via the lightweight mechanisms, and applies dynamically hashed values to achieve session freshness. It indicates that the APHA is suitable for the U2IoT architecture.

In this work, the main purpose is to provide bottom-up safeguard for the U2IoT architecture to realize secure interactions. Towards the U2IoT architecture, a reasonable authentication scheme should satisfy the following requirements.

1) Data CIA (i.e., confidentiality, integrity, and availability): The exchanged messages between any two legal entities should be protected against illegal access and modification. The communication channels should be reliable for the legal entities.

2) Hierarchical access control: Diverse access authorities are assigned to different entities to provide hierarchical interactions. An unauthorised entity cannot access data exceeding its permission.

3) Forward security: Attackers cannot correlate any two communication sessions, and also cannot derive the previous interrogations according to the ongoing session.

4) Mutual authentication: The untrusted entities should pass each other’s verification so that only the legal entity can access the networks for data acquisition.

5) Privacy preservation: The sensors cannot correlate or disclose an individual target’s private information (e.g., location). Considering above security requirements, we design an aggregated proof based hierarchical authentication scheme (APHA) for the ubiquitous IoT.

ADVANTAGES:

Aggregated-proofs are established by wrapping multiple targets’ messages for anonymous data transmission, which realizes that individual information cannot be revealed during both backward and forward communication channels.

Directed path descriptors are defined based on homomorphism functions to establish correlation during the cross-layer interactions. Chebyshev chaotic maps are applied to describe the mapping relationships between the shared secrets and the path descriptors for mutual authentication.

Diverse access authorities on the group identifiers and pseudonyms are assigned to different entities for achieving the hierarchical access control through the layered networks.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP or Win7
  • Front End                                :           JAVA JDK 1.7
  • Back End                                :           MYSQL Server
  • Server                                      :           Apache Tomact Server
  • Script                                       :           JSP Script
  • Document                               :           MS-Office 2007

A TIME EFFICIENT APPROACH FOR DETECTING ERRORS IN BIG SENSOR DATA ON CLOUD

ABSTRACT:

Big sensor data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity it is difficult to process using on-hand database management tools or traditional data processing applications. Cloud computing provides a promising platform to support the addressing of this challenge as it provides a flexible stack of massive computing, storage, and software services in a scalable manner at low cost. Some techniques have been developed in recent years for processing sensor data on cloud, such as sensor-cloud. However, these techniques do not provide efficient support on fast detection and locating of errors in big sensor data sets.

We develop a novel data error detection approach which exploits the full computation potential of cloud platform and the network feature of WSN. Firstly, a set of sensor data error types are classified and defined. Based on that classification, the network feature of a clustered WSN is introduced and analyzed to support fast error detection and location. Specifically, in our proposed approach, the error detection is based on the scale-free network topology and most of detection operations can be conducted in limited temporal or spatial data blocks instead of a whole big data set. Hence the detection and location process can be dramatically accelerated.

Furthermore, the detection and location tasks can be distributed to cloud platform to fully exploit the computation power and massive storage. Through the experiment on our cloud computing platform of U-Cloud, it is demonstrated that our proposed approach can significantly reduce the time for error detection and location in big data sets generated by large scale sensor network systems with acceptable error detecting accuracy.

INTRODUCTION:

Recently, we enter a new era of data explosion which brings about new challenges for big data processing. In general, big data is a collection of data sets so large and complex that it becomes difficult to process with onhand database management systems or traditional data processing applications. It represents the progress of the human cognitive processes, usually includes data sets with sizes beyond the ability of current technology, method and theory to capture, manage, and process the data within a tolerable elapsed time. Big data has typical characteristics of five ‘V’s, volume, variety, velocity, veracity and value. Big data sets come from many areas, including meteorology, connectomics, complex physics simulations, genomics, biological study, gene analysis and environmental research. According to literature since 1980s, generated data doubles its size in every 40 months all over the world. In the year of 2012, there were 2.5 quintillion (2.5  1018) bytes of data being generated every day.

Hence, how to process big data has become a fundamental and critical challenge for modern society. Cloud computing provides apromising platform for big data processing with powerful computation capability, storage, scalability, resource reuse and low cost, and has attracted significant attention in alignment with big data. One of important source for scientific big data is the data sets collected by wireless sensor networks (WSN). Wireless sensor networks have potential of significantly enhancing people’s ability to monitor and interact with their physical environment. Big data set from sensors is often subject to corruption and losses due to wireless medium of communication and presence of hardware inaccuracies in the nodes. For a WSN application to deduce an appropriate result, it is necessary that the data received is clean, accurate, and lossless. However, effective detection and cleaning of sensor big data errors is a challenging issue demanding innovative solutions. WSN with cloud can be categorized as a kind of complex network systems. In these complex network systems such as WSN and social network, data abnormality and error become an annoying issue for the real network applications.

Therefore, the question of how to find data errors in complex network systems for improving and debugging the network has attracted the interests of researchers. Some work has been done for big data analysis and error detection in complex networks including intelligence sensors networks. There are also some works related to complex network systems data error detection and debugging with online data processing techniques. Since these techniques were not designed and developed to deal with big data on cloud, they were unable to cope with current dramatic increase of data size. For example, when big data sets are encountered, previous offline methods for error detectionand debugging on a single computer may take a long time and lose real time feedback. Because those offline methods are normally based on learning or mining, they often introduce high time cost during the process of data set training and pattern matching. WSN big data error detection commonly requires powerful real-time processing and storing of the massive sensor data as well as analysis in the context of using inherently complex error models to identify and locate events of abnormalities.

In this paper, we aim to develop a novel error detection approach by exploiting the massive storage, scalability and computation power of cloud to detect errors in big data sets from sensor networks. Some work has been done about processing sensor data on cloud. However, fast detection of data errors in big data with cloud remains challenging. Especially, how to use the computation power of cloud to quickly find and locate errors of nodes in WSN needs to be explored. Cloud computing, a disruptive trend at present, poses a significant impact on current IT industry and research communities. Cloud computing infrastructure is becoming popular because it provides an open, flexible, scalable and reconfigurable platform. The proposed error detection approach in this paper will be based on the classification of error types. Specifically, nine types of numerical data abnormalities/errors are listed and introduced in our cloud error detection approach. The defined error model will trigger the error detection process. Compared to previous error detection of sensor network systems, our approach on cloud will be designed and developed by utilizing the massive data processing capability of cloud to enhance error detection speed and real time reaction. In addition, the architecture feature of complex networks will also be analyzed to combine with the cloud computing with a more efficient way. Based on current research literature review, we divide complex network systems into scale-free type and non scale-free type. Sensor network is a kind of scale-free complex network system which matches cloud scalability feature.

A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS

ABSTRACT:

Characterized by the increasing arrival rate of live content, the emergency applications pose a great challenge: how to disseminate large-scale live content to interested users in a scalable and reliable manner. The publish/subscribe (pub/sub) model is widely used for data dissemination because of its capacity of seamlessly expanding the system to massive size. However, most event matching services of existing pub/sub systems either lead to low matching throughput when matching a large number of skewed subscriptions, or interrupt dissemination when a large number of servers fail. The cloud computing provides great opportunities for the requirements of complex computing and reliable communication.

In this paper, we propose SREM, a scalable and reliable event matching service for content-based pub/sub systems in cloud computing environment. To achieve low routing latency and reliable links among servers, we propose a distributed overlay Skip Cloud to organize servers of SREM. Through a hybrid space partitioning technique HPartition, large-scale skewed subscriptions are mapped into multiple subspaces, which ensures high matching throughput and provides multiple candidate servers for each event.

Moreover, a series of dynamics maintenance mechanisms are extensively studied. To evaluate the performance of SREM, 64 servers are deployed and millions of live content items are tested in a Cloud Stack testbed. Under various parameter settings, the experimental results demonstrate that the traffic overhead of routing events in SkipCloud is at least 60 percent smaller than in Chord overlay, the matching rate in SREM is at least 3.7 times and at most 40.4 times larger than the single-dimensional partitioning technique of BlueDove. Besides, SREM enables the event loss rate to drop back to 0 in tens of seconds even if a large number of servers fail simultaneously.

INTRODUCTION

Because of the importance in helping users to make realtime decisions, data dissemination has become dramatically significant in many large-scale emergency applications, such as earthquake monitoring, disaster weather warning and status update in social networks. Recently, data dissemination in these emergency applications presents a number of fresh trends. One is the rapid growth of live content. For instance, Facebook users publish over 600,000 pieces of content and Twitter users send over 100,000 tweets on average per minute. The other is the highly dynamic network environment. For instance, the measurement studies indicate that most users’ sessions in social networks only last several minutes. In emergency scenarios, the sudden disasters like earthquake or bad weather may lead to the failure of a large number of users instantaneously.

These characteristics require the data dissemination system to be scalable and reliable. Firstly, the system must be scalable to support the large amount of live content. The key is to offer a scalable event matching service to filter out irrelevant users. Otherwise, the content may have to traverse a large number of uninterested users before they reach interested users. Secondly, with the dynamic network environment, it’s quite necessary to provide reliable schemes to keep continuous data dissemination capacity. Otherwise, the system interruption may cause the live content becomes obsolete content. Driven by these requirements, publish/subscribe (pub/ sub) pattern is widely used to disseminate data due to its flexibility, scalability, and efficient support of complex event processing. In pub/sub systems (pub/subs), a receiver (subscriber) registers its interest in the form of a subscription. Events are published by senders to the pub/ sub system.

The system matches events against subscriptions and disseminates them to interested subscribers.

In traditional data dissemination applications, the live content are generated by publishers at a low speed, which makes many pub/subs adopt the multi-hop routing techniques to disseminate events. A large body of broker-based pub/subs forward events and subscriptions through organizing nodes into diverse distributed overlays, such as treebased design cluster-based design and DHT-based design. However, the multihop routing techniques in these broker-based systems lead to a low matching throughput, which is inadequate to apply to current high arrival rate of live content.

Recently, cloud computing provides great opportunities for the applications of complex computing and high speed communication where the servers are connected by high speed networks, and have powerful computing and storage capacities. A number of pub/sub services based on the cloud computing environment have been proposed, such as Move BlueDove and SEMAS. However, most of them can not completely meet the requirements of both scalability and reliability when matching large-scale live content under highly dynamic environments.

This mainly stems from the following facts:

1) Most of them are inappropriate to the matching of live content with high data dimensionality due to the limitation of their subscription space partitioning techniques, which bring either low matching throughput or high memory overhead.

2) These systems adopt the one-hop lookup technique among servers to reduce routing latency. In spite of its high efficiency, it requires each dispatching server to have the same view of matching servers. Otherwise, the subscriptions or events may be assigned to the wrong matching server, which brings the availability problem in the face of current joining or crash of matching servers. A number of schemes can be used to keep the consistent view, like periodically sending heartbeat messages to dispatching servers or exchanging messages among matching servers. However, these extra schemes may bring a large traffic overhead or the interruption of event matching service.

LITRATURE SURVEY

RELIABLE AND HIGHLY AVAILABLE DISTRIBUTED PUBLISH/SUBSCRIBE SERVICE

PUBLICATION: Proc. 28th IEEE Int. Symp. Reliable Distrib. Syst., 2009, pp. 41–50.

AUTHORS: R. S. Kazemzadeh and H.-A Jacobsen

EXPLANATION:

This paper develops reliable distributed publish/subscribe algorithms with service availability in the face of concurrent crash failure of up to delta brokers. The reliability of service in our context refers to per-source in-order and exactly-once delivery of publications to matching subscribers. To handle failures, brokers maintain data structures that enable them to reconnect the topology and compute new forwarding paths on the fly. This enables fast reaction to failures and improves the system’s availability. Moreover, we present a recovery procedure that recovering brokers execute in order to re-enter the system, and synchronize their routing information.

BUILDING A RELIABLE AND HIGH-PERFORMANCE CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEM

PUBLICATION: J. Parallel Distrib. Comput., vol. 73, no. 4, pp. 371–382, 2013.

AUTHORS: Y. Zhao and J. Wu

EXPLANATION:

Provisioning reliability in a high-performance content-based publish/subscribe system is a challenging problem. The inherent complexity of content-based routing makes message loss detection and recovery, and network state recovery extremely complicated. Existing proposals either try to reduce the complexity of handling failures in a traditional network architecture, which only partially address the problem, or rely on robust network architectures that can gracefully tolerate failures, but perform less efficiently than the traditional architectures. In this paper, we present a hybrid network architecture for reliable and high-performance content-based publish/subscribe. Two overlay networks, a high-performance one with moderate fault tolerance and a highly-robust one with sufficient performance, work together to guarantee the performance of normal operations and reliability in the presence of failures. Our design exploits the fact that, in a high-performance content-based publish/subscribe system, subscriptions are broadcast to all brokers, to facilitate efficient backup routing when failures occur, which incurs a minimal overhead. Per-hop reliability is used to gracefully detect and recover lost messages that are caused by transit errors. Two backup routing methods based on DHT routing are proposed. Extensive simulation experiments are conducted. The results demonstrate the superior performance of our system compared to other state-of-the-art proposals.

SCALABLE AND ELASTIC EVENT MATCHING FOR ATTRIBUTE-BASED PUBLISH/SUBSCRIBE SYSTEMS

PUBLICATION: Future Gener. Comput. Syst., vol. 36, pp. 102–119, 2013.

AUTHORS: X. Ma, Y. Wang, Q. Qiu, W. Sun, and X. Pei

EXPLANATION:

Due to the sudden change of the arrival live content rate and the skewness of the large-scale subscriptions, the rapid growth of emergency applications presents a new challenge to the current publish/subscribe systems: providing a scalable and elastic event matching service. However, most existing event matching services cannot adapt to the sudden change of the arrival live content rate, and generate a non-uniform distribution of load on the servers because of the skewness of the large-scale subscriptions. To this end, we propose SEMAS, a scalable and elastic event matching service for attribute-based pub/sub systems in the cloud computing environment. SEMAS uses one-hop lookup overlay to reduce the routing latency. Through ahierarchical multi-attribute space partition technique, SEMAS adaptively partitions the skewed subscriptions and maps them into balanced clusters to achieve high matching throughput. The performance-aware detection scheme in SEMAS adaptively adjusts the scale of servers according to the churn of workloads, leading to high performance–price ratio. A prototype system on an OpenStack-based platform demonstrates that SEMAS has a linear increasing matching capacity as the number of servers and the partitioning granularity increase. It is able to elastically adjust the scale of servers and tolerate a large number of server failures with low latency and traffic overhead. Compared with existing cloud based pub/sub systems, SEMAS achieves higher throughput in various workloads.

SYSTEM ANALYSIS

EXISTING SYSTEM:

Characterized by the increasing arrival rate of live content, the emergency applications pose a great challenge: how to disseminate large-scale live content to interested users in a scalable and reliable manner. The publish/subscribe (pub/sub) model is widely used for data dissemination because of its capacity of seamlessly expanding the system to massive size. However, most event matching services of existing pub/sub systems either lead to low matching throughput when matching a large number of skewed subscriptions, or interrupt dissemination when a large number of servers fail.

However, most existing event matching services cannot adapt to the sudden change of the arrival live content rate, and generate a non-uniform distribution of load on the servers because of the skewness of the large-scale subscriptions. To this end SEMAS, a scalable and elastic event matching service for attribute-based pub/sub systems in the cloud computing environment. SEMAS uses one-hop lookup overlay to reduce the routing latency. Through ahierarchical multi-attribute space partition technique, SEMAS adaptively partitions the skewed subscriptions and maps them into balanced clusters to achieve high matching throughput.

The performance-aware detection scheme in SEMAS adaptively adjusts the scale of servers according to the churn of workloads, leading to high performance–price ratio. A prototype system on an OpenStack-based platform demonstrates that SEMAS has a linear increasing matching capacity as the number of servers and the partitioning granularity increase. It is able to elastically adjust the scale of servers and tolerate a large number of server failures with low latency and traffic overhead.

DISADVANTAGES:

Publish/Subscribe (pub/sub) is a commonly used asynchronous communication pattern among application components. Senders and receivers of messages are decoupled from each other and interact with an intermediary— a pub/sub system.

A receiver registers its interest in certain kinds of messages with the pub/sub system in the form of a subscription. Messages are published by senders to the pub/sub system. The system matches messages (i.e., publications) to subscriptions and delivers messages to interested subscribers using a notification mechanism.

There are several ways for subscriptions to specify messages of interest. In its simplest form messages are associated with topic strings and subscriptions are defined as patterns of the topic string. A more expressive form is attribute-based pub/sub where messages are further annotated with various attributes.

Subscriptions are expressed as predicates on the message topic and attributes. An even more general form is content based pub/sub where subscriptions can be arbitrary Boolean functions on the entire content of messages (e.g., XML documents), limited to attributes1.

Attribute based pub/sub strikes a balance between the simplicity and performance of topic-based pub/sub and the expressiveness of content-based pub/sub. Many large-scale and loosely coupled applications including stock quote distribution, network management, and environmental monitoring can be structured around a pub/sub messaging paradigm.

PROPOSED SYSTEM:

We propose a scalable and reliable matching service for content-based pub/sub service in cloud computing environments, called SREM. Specifically, we mainly focus on two problems: one is how to organize servers in the cloud computing environment to achieve scalable and reliable routing. The other is how to manage subscriptions and events to achieve parallel matching among these servers. Generally speaking, we provide the following contributions:

We propose a distributed overlay protocol, called SkipCloud, to organize servers in the cloud computing environment. SkipCloud enables subscriptions and events to be forwarded among brokers in a scalable and reliable manner. Also it is easy to implement and maintain.

  • To achieve scalable and reliable event matching among multiple servers, we propose a hybrid multidimensional space partitioning technique, called HPartition. It allows similar subscriptions to be divided into the same server and provides multiple candidate matching servers for each event. Moreover, it adaptively alleviates hot spots and keeps workload balance among all servers.
  • We implement extensive experiments based on a CloudStack testbed to verify the performance of SREM under various parameter settings.
  • In order to take advantage of multiple distributed brokers, SREM divides the entire content space among the top clusters of SkipCloud, so that each top cluster only handles a subset of the entire space and searches a small number of candidate subscriptions. SREM employs a hybrid multidimensional space partitioning technique, called HPartition, to achieve scalable and reliable event matching.

ADVANTAGES:

To achieve reliable connectivity and low routing latency, these brokers are connected through a distributed overlay, called SkipCloud. The entire content space is partitioned into disjoint subspaces, each of which is managed by a number of brokers. Subscriptions and events are dispatched to the subspaces that are overlapping with them through SkipCloud.

Since the pub/sub system needs to find all the matched subscribers, it requires each event to be matched in all datacenters, which leads to large traffic overhead with the increasing number of datacenters and the increasing arrival rate of live content.

Besides, it’s hard to achieve workload balance among the servers of all datacenters due to the various skewed distributions of users’ interests. Another question is that why we need a distributed overlay like SkipCloud to ensure reliable logical connectivity in datacenter environment where servers are more stable than the peers in P2P networks.

This is because as the number of servers increases in datacenters, the node failure becomes normal, but not rare exception. The node failure may lead to unreliable and inefficient routing among servers. To this end, we try to organize servers into SkipCloud to reduce the routing latency in a scalable and reliable manner.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor      –    SVGA

SOFTWARE REQUIREMENTS:

  • Operating System        :           Windows XP or Win7
  • Front End       :           JAVA JDK 1.7
  • Back End :           MYSQL Server
  • Server :           Apache Tomact Server
  • Script :           JSP Script
  • Document :           MS-Office 2007

A PROFIT MAXIMIZATION SCHEME WITH GUARANTEED QUALITY OF SERVICE IN CLOUD COMPUTING

ABSTRACT:

As an effective and efficient way to provide computing resources and services to customers on demand, cloud computing has become more and more popular. From cloud service providers’ perspective, profit is one of the most important considerations, and it is mainly determined by the configuration of a cloud service platform under given market demand. However, a single long-term renting scheme is usually adopted to configure a cloud platform, which cannot guarantee the service quality but leads to serious resource waste.

In this paper, a double resource renting scheme is designed firstly in this double renting scheme can effectively guarantee the quality of service of all requests and reduce the resource waste greatly.

Secondly, a service system is considered as an M/M/m+D queuing model and the performance indicators that affect the profit of our double renting scheme are analyzed, e.g., the average charge, the ratio of requests that need temporary servers, and so forth.

Thirdly, a profit maximization problem is formulated for the double renting scheme and the optimized configuration of a cloud platform is obtained by solving the profit maximization problem.

Finally, a series of calculations are conducted to compare the profit of our proposed scheme with that of the single renting scheme. The results show that our scheme can not only guarantee the service quality of all requests, but also obtain more profit than the latter.

INTRODUCTION

We aim at researching the multiserver configuration of a service provider such that its profit is maximized. Like all business, the profit of a service provider in cloud computing is related to two parts, which are the cost and the revenue. For a service provider, the cost is the renting cost paid to the infrastructure providers plus the electricity cost caused by energy consumption, and the revenue is the service charge to customers. In general, a service provider rents a certain number of servers from the infrastructure providers and builds different multiserver systems for different application domains. Each multiserver system is to execute a special type of service requests and applications. Hence, the renting cost is proportional to the number of servers in a multiserver system. The power consumption of a multiserver system is linearly proportional to the number of servers and the server utilization, and to the square of execution speed. The revenue of a service provider is related to the amount of service and the quality of service. To summarize, the profit of a service provider is mainly determined by the configuration of its service platform. To configure a cloud service platform, a service provider usually adopts a single renting scheme.

However, the waiting time of the service requests cannot be too long. In order to satisfy quality-of-service requirements, the waiting time of each incoming service request should be limited within a certain range, which is determined by a service-level agreement (SLA). If the quality of service is guaranteed, the service is fully charged, otherwise, the service provider serves the request for free as a penalty of low quality. To obtain higher revenue, a service provider should rent more servers from the infrastructure providers or scale up the server execution speed to ensure that more service requests are processed with high service quality. However, doing this would lead to sharp increase of the renting cost or the electricity cost. Such increased cost may counterweight the gain from penalty reduction. In conclusion, the single renting scheme is not a good scheme for service providers. In this paper, we propose a novel renting scheme for service providers, which not only can satisfy quality-of-service requirements, but also can obtain more profit.

LITRATURE SURVEY

OPTIMAL MULTISERVER CONFIGURATION FOR PROFIT MAXIMIZATION IN CLOUD COMPUTING

AUTHOR: J. Cao, K. Hwang, K. Li, and A. Y. Zomaya,

PUBLICATION: IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 6, pp. 1087–1096, 2013.

EXPALANTION:

As cloud computing becomes more and more popular, understanding the economics of cloud computing becomes critically important. To maximize the profit, a service provider should understand both service charges and business costs, and how they are determined by the characteristics of the applications and the configuration of a multiserver system. The problem of optimal multiserver configuration for profit maximization in a cloud computing environment is studied. Our pricing model takes such factors into considerations as the amount of a service, the workload of an application environment, the configuration of a multiserver system, the service-level agreement, the satisfaction of a consumer, the quality of a service, the penalty of a low-quality service, the cost of renting, the cost of energy consumption, and a service provider’s margin and profit. Our approach is to treat a multiserver system as an M/M/m queuing model, such that our optimization problem can be formulated and solved analytically. Two server speed and power consumption models are considered, namely, the idle-speed model and the constant-speed model. The probability density function of the waiting time of a newly arrived service request is derived. The expected service charge to a service request is calculated. The expected net business gain in one unit of time is obtained. Numerical calculations of the optimal server size and the optimal server speed are demonstrated.

PROFITDRIVEN SCHEDULING FOR CLOUD SERVICES WITH DATA ACCESS AWARENESS

AUTHOR: Y. C. Lee, C. Wang, A. Y. Zomaya, and B. B. Zhou

PUBLICATION: J. Parallel Distr. Com., vol. 72, no. 4, pp. 591– 602, 2012

EXPALANTION:

Resource sharing between multiple tenants is a key rationale behind the cost effectiveness in the cloud. While this resource sharing greatly helps service providers improve resource utilization and increase profit, it impacts on the service quality (e.g., the performance of consumer applications). In this paper, we address the reconciliation of these conflicting objectives by scheduling service requests with the dynamic creation of service instances. Specifically, our scheduling algorithms attempt to maximize profit within the satisfactory level of service quality specified by the service consumer. Our contributions include (1) the development of a pricing model using processor-sharing for clouds (i.e., queuing delay is embedded in processing time), (2) the application of this pricing model to composite services with dependency consideration, (3) the development of two sets of service request scheduling algorithms, and (4) the development of a prioritization policy for data service aiming to maximize the profit of data service.

ENERGY AND PERFORMANCE MANAGEMENT OF GREEN DATA CENTERS: A PROFIT MAXIMIZATION APPROACH

AUTHOR: M. Ghamkhari and H. Mohsenian-Rad

PUBLICATION: IEEE Trans. Smart Grid, vol. 4, no. 2, pp. 1017–1025, 2013.

EXPALANTION:

While a large body of work has recently focused on reducing data center’s energy expenses, there exists no prior work on investigating the trade-off between minimizing data center’s energy expenditure and maximizing their revenue for various Internet and cloud computing services that they may offer. In this paper, we seek to tackle this shortcoming by proposing a systematic approach to maximize green data center’s profit, i.e., revenue minus cost. In this regard, we explicitly take into account practical service-level agreements (SLAs) that currently exist between data centers and their customers. Our model also incorporates various other factors such as availability of local renewable power generation at data centers and the stochastic nature of data centers’ workload. Furthermore, we propose a novel optimization-based profit maximization strategy for data centers for two different cases, without and with behind-the-meter renewable generators. We show that the formulated optimization problems in both cases are convex programs; therefore, they are tractable and appropriate for practical implementation. Using various experimental data and via computer simulations, we assess the performance of the proposed optimization-based profit maximization strategy and show that it significantly outperforms two comparable energy and performance management algorithms that are recently proposed in the literature.

SYSTEM ANALYSIS

EXISTING SYSTEM:

Existing works relevant to the profit of service providers is related with many factors such as the price, the market demand, the system configuration, the customer satisfaction and so forth. Service providers naturally wish to set a higher price to get a higher profit margin; but doing so would decrease the customer satisfaction, which leads to a risk of discouraging demand in the future. Hence, selecting a reasonable pricing strategy is important for service providers. The pricing strategies are divided into two categories, i.e., static pricing and dynamic pricing. Static pricing means that the price of a service request is fixed and known in advance, and it does not change with the conditions.

Previous statically pricing a service provider delays the pricing decision until after the customer demand is revealed, so that the service provider can adjust prices accordingly. Static pricing is the dominant strategy which is widely used in real world and in research. Ghamkhari et al.  Adopted a flat-rate pricing strategy and set a fixed price for all requests, but Odlyzko argued that the predominant flat-rate pricing encourages waste and is incompatible with service differentiation of static pricing strategies are usage-based pricing. For example, the price of a service request is proportional to the service time and task execution requirement.

DISADVANTAGES:

  • In Many existing research they only consider the power consumption cost. As a major difference between their models and ours, the resource rental cost is considered in this paper as well, since it is a major part which affects the profit of service providers.
  • The traditional single resource renting scheme cannot guarantee the quality of all requests but wastes a great amount of resources due to the uncertainty of system workload. To overcome the weakness, we propose a double renting scheme as follows, which not only can guarantee the quality of service completely but also can reduce the resource waste greatly.

PROPOSED SYSTEM:

In this paper, we propose a novel renting scheme for service providers, which not only can satisfy quality-of-service requirements, but also can obtain more profit. Our contributions in this paper can be summarized as follows.

A novel double renting scheme is proposed for service providers. It combines long-term renting with short-term renting, which can not only satisfy quality-of-service requirements under the varying system workload, but also reduce the resource waste greatly.

A multiserver system adopted in our paper is modeled as an M/M/m+D queuing model and the performance indicators are analyzed such as the average service charge, the ratio of requests that need shortterm servers, and so forth.

The optimal configuration problem of service providers for profit maximization is formulated and two kinds of optimal solutions, i.e., the ideal solutions and the actual solutions, are obtained respectively.

A series of comparisons are given to verify the performance of our scheme. The results show that the proposed Double-Quality-Guaranteed (DQG) renting scheme can achieve more profit than the compared Single-Quality-Unguaranteed (SQU) renting scheme in the premise of guaranteeing the service quality completely.

In this paper, to overcome the shortcomings mentioned above, a double renting scheme is designed to configure a cloud service platform, which can guarantee the service quality of all requests and reduce the resource waste greatly. Moreover, a profit maximization problem is formulated and solved to get the optimal multiserver configuration which can product more profit than the optimal configuration.

ADVANTAGES:

  • We first propose the Double-Quality- Guaranteed (DQG) resource renting scheme which combines long-term renting with short-term renting. The main computing capacity is provided by the long-term rented servers due to their low price. The short-term rented servers provide the extra capacity in peak period.
  • In proposed system we are using the Double-Quality-Guaranteed (DQG) renting scheme can achieve more profit than the compared Single-Quality-Unguaranteed (SQU) renting scheme in the premise of guaranteeing the service quality completely.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive         –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor      –    SVGA

SOFTWARE REQUIREMENTS:

JAVA

  • Operating System        :           Windows XP, Win7 or Win8
  • Front End       :           JAVA JDK 1.7
  • Back End :           MYSQL Server
  • Server :           Apache Tomact Server
  • Script :           JSP Script
  • Document :           MS-Office 2007

A HYBRID CLOUD APPROACH FOR SECURE AUTHORIZED DEDUPLICATION

ABSTRACT:

Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. To protect the confidentiality of sensitive data while supporting deduplication, the convergent encryption technique has been proposed to encrypt the data before outsourcing. To better protect data security, this paper makes the first attempt to formally address the problem of authorized data deduplication. Different from traditional deduplication systems, the differential privileges of users are further considered in duplicate check besides the data itself. We also present several new deduplication constructions supporting authorized duplicate check in a hybrid cloud architecture. Security analysis demonstrates that our scheme is secure in terms of the definitions specified in the proposed security model. As a proof of concept, we implement a prototype of our proposed authorized duplicate check scheme and conduct test bed experiments using our prototype. We show that our proposed authorized duplicate check scheme incurs minimal overhead compared to normal operations.

INTRODUCTION

Cloud computing provides seemingly unlimited “virtualized” resources to users as services across the whole Internet, while hiding platform and implementation details. Today’s cloud  service providers offer both highly vailable storage and massively parallel computing resourcesat relatively low costs. As cloud computing becomes prevalent, an increasing amount of data is being stored in the cloud and shared by users with specified privileges, which define the access rights of the stored data. One critical challenge of cloud storage services is the management of the ever-increasing volume of data. To make data management scalable in cloud computing, deduplication  has been a well-known technique and has attracted more and more attention recently. Data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data in storage.

The technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. Instead of keeping multiple data copies with the same content, deduplication eliminates redundant data by keeping only one physical copy and referring other redundant data to that copy. Deduplication can take place at either the file level or the block level. For file level deduplication, it eliminates duplicate copies of the same file. Deduplication can also take place at the block level, which eliminates duplicate blocks of data that occur in non-identical files. Although data deduplication brings a lot of benefits, security and privacy concerns arise as users’ sensitive data are susceptible to both insider and outsider attacks. Traditional encryption, while providing data confidentiality, is incompatible with data deduplication. Specifically, traditional encryption requires different users to encrypt their data with their own keys.

Thus, identical data copies of different users will lead to different ciphertexts, making deduplication impossible. Convergent encryption  has been proposed to enforce data confidentiality while making deduplication feasible. It encrypts decrypts a data copy with a convergent key, which is obtained by computing the cryptographic hash value of the content of the data copy. After key generation and data encryption, users retain the keys and send the ciphertext to the cloud. Since the encryption operation is deterministic and is derived from the data content, identical data copies will generate the same convergent key and hence the same ciphertext. To prevent unauthorized access, a secure proof of ownership protocol  is also needed to provide the proof that the user indeed ownsthe same file when a duplicate is found. After the proof, subsequent users with the same file will be provided a pointer from the server without needing to upload the same file. A user can download the encrypted file with the pointer from the server, which can only be decryptedby the corresponding data owners with their convergent keys.

Thus, convergent encryption allows the cloud to perform deduplication on the ciphertexts and the proof of ownership prevents the unauthorized user to access the file. However, previous deduplication systems cannot supportdifferential authorization duplicate check, which is importantin many applications. In such an authorized deduplication system, each user is issued a set of privileges during system initialization (in Section 3, we elaborate the definition of a privilege with examples). Each file uploaded to the cloud is also bounded by a set of privileges to specify which kind of users is allowed to perform the duplicate check and access the files. Before

submitting his duplicate check request for some file, the user needs to take this file and his own privileges as inputs.

 The user is able to find a duplicate for this file if and only if there is a copy of this file and a matched privilege stored in cloud. For example, in a company, many different privileges will be assigned to employees. In order to save cost and efficiently management, the data will be moved to the storage server provider (SCSP) in the public cloud with specified privileges and the deduplication technique will be applied to store only one copy of the same file. Becase of privacy consideration, some files will be encrypted and allowed the duplicate check by employees with specified privileges to realize the access control. Traditional deduplication systems based on convergent encryption, although providing confidentiality to some extent, do not support the duplicate check with differential privileges. In other words, no differential privileges have been considered in the deduplication based on convergent encryption technique. It seems to be contradicted if we want to realize both deduplication and differential authorizationduplicate check at the same time.

A DISTRIBUTED THREE-HOP ROUTING PROTOCOL TO INCREASE THE CAPACITY OF HYBRID WIRELESS NETWORKS

ABSTRACT:

Hybrid wireless networks combining the advantages of both mobile ad-hoc networks and infrastructure wireless networks have been receiving increased attention due to their ultra-high performance. An efficient data routing protocol is important in such networks for high network capacity and scalability. However, most routing protocols for these networks simply combine the ad-hoc transmission mode with the cellular transmission mode, which inherits the drawbacks of ad-hoc transmission.

This paper presents a Distributed Three-hop Routing protocol (DTR) for hybrid wireless networks. To take full advantage of the widespread base stations, DTR divides a message data stream into segments and transmits the segments in a distributed manner. It makes full spatial reuse of a system via its high speed ad-hoc interface and alleviates mobile gateway congestion via its cellular interface. Furthermore, sending segments to a number of base stations simultaneously increases throughput and makes full use of widespread base stations.

DTR significantly reduces overhead due to short path lengths and the elimination of route discovery and maintenance. DTR also has a congestion control algorithm to avoid overloading base stations. Theoretical analysis and simulation results show the superiority of DTR in comparison with other routing protocols in terms of throughput capacity, scalability, and mobility resilience. The results also show the effectiveness of the congestion control algorithm in balancing the load between base stations.

INTRODUCTION:

Wireless networks including infrastructure wireless networks and mobile ad-hoc networks (MANETs) have attracted significant research interest. The growing desire to increase wireless network capacity for high performance applications has stimulated the development of hybrid wireless networks. A hybrid wireless network consists of both an infrastructure wireless network and a mobile ad-hoc network. Wireless devices such as smart-phones, tablets and laptops, have both an infrastructure interface and an ad-hoc interface. As the number of such devices has been increasing sharply in recent years, a hybrid transmission structure will be widely used in the near future. Such a structure synergistically combines the inherent advantages and overcome the disadvantages of the infrastructure wireless networks and mobile ad-hoc networks. In a mobile ad-hoc network, with the absence of a central control infrastructure, data is routed to its destination through the intermediate nodes in a multi-hop manner. The multi-hop routing needs on-demand route discovery or route maintenance.

Since the messages are transmitted in wireless channels and through dynamic routing paths, mobile ad-hoc networks are not as reliable as infrastructure wireless networks. Furthermore, because of the multi-hop transmission feature, mobile ad-hoc networks are only suitable for local area data transmission. The infrastructure wireless network (e.g., cellular network) is the major means of wireless communication in our daily lives. It excels at inter-cell communication (i.e., communication between nodes in different cells) and Internet access. It makes possible the support of universal network connectivity and ubiquitous computing by integrating all kinds of wireless devices into the network. In an infrastructure network, nodes communicate with each other through base stations (BSes).

A hybrid wireless network synergistically combines an infrastructure wireless network and a mobile ad-hoc network to leverage their advantages and overcome their shortcomings, and finally increases the throughput capacity of a wide-area wireless network. A routing protocol is a critical component that affects the throughput capacity of a wireless network in data transmission. Most current routing protocols in hybrid wireless networks simply combine the cellular transmission mode (i.e., BS transmission mode) in infrastructure wireless networks and the ad-hoc transmission mode in mobile ad-hoc networks. That is, as shown in Fig. 1a, the protocols use the multi-hop routing to forward a message to the mobile gateway nodes that are closest to the BSes or have the highest bandwidth to the BSes. The bandwidth of a channel is the maximum throughput (i.e., transmission rate in bits/s) that can be achieved. The mobile gateway nodes then forward the messages to the BSes, functioning as bridges to connect the ad-hoc network and the infrastructure network.

Since BSes are connected with a wired backbone, we assume that there are no bandwidth and power constraints on transmissions between BSes. We use intermediate nodes to denote relay nodes that function as gateways connecting an infrastructure wireless network and a mobile ad-hoc network. We assume every mobile node is dual-mode; that is, it has ad-hoc network interface such as a WLAN radio interface and infrastructure network interface such as a 3G cellular interface. DTR aims to shift the routing burden from the ad-hoc network to the infrastructure network by taking advantage of widespread base stations in a hybrid wireless network. Rather than using one multi-hop path to forward a message to one BS, DTR uses at most two hops to relay the segments of a message to different BSes in a distributed manner, and relies on BSes to combine the segments.

We simplify the routings in the infrastructure network for clarity. As shown in the figure, when a source node wants to transmit a message stream to a destination node, it divides the message stream into a number of partial streams called segments and transmits each segment to a neighbor node. Upon receiving a segment from the source node, a neighbor node locally decides between direct transmission and relay transmission based on the QoS requirement of the application. The neighbor nodes forward these segments in a distributed manner to nearby BSes. Relying on the infrastructure network routing, the BSes further transmit the segments to the BS where the destination node resides. The final BS rearranges the segments into the original order and forwards the segments to the destination. It uses the cellular IP transmission method [30] to send segments to the destination if the destination moves to another BS during segment transmission.

LITRATURE SURVEY:

OPTIMAL MULTI-HOP CELLULAR ARCHITECTURE FOR WIRELESS COMMUNICATIONS

AUTOHRS: Y. H. Tam, H. S. Hassanein, S. G. Akl, and R. Benkoczi

PUBLISH: Proc. Local Comput. Netw., 2006, pp. 738–745.

EXPLANATION:

Multi-hop relaying is an important concept in future generation wireless networks. It can address the inherent problems of limited capacity and coverage in cellular networks. However, most multi-hop relaying architectures are designed based on a small fixed-cell-size and a dense network. In a sparse network, the throughput and call acceptance ratio degrades because distant mobile nodes cannot reach the base station to use the available capacity. In addition, a fixed-cell-size cannot adapt to the dynamic changes of traffic pattern and network topology. In this paper, we propose a novel multi-hop relaying architecture called the adaptive multi-hop cellular architecture (AMC). AMC adapts the cell size to an optimal value that maximizes throughput by taking into account the dynamic changes of network density, traffic patterns, and network topology. To the best of our knowledge, this is the first time that adaptive (or optimal) cell size is accounted for in a multi-hop cellular environment. AMC also achieves the design goals of a good multi-hop relaying architecture. Simulation results show that AMC outperforms a fixed-cell-size multi-hop cellular architecture and a single-hop case in terms of data throughput, and call acceptance ratio.

COOPERATIVE PACKET DELIVERY IN HYBRID WIRELESS MOBILE NETWORKS: A COALITIONAL GAME APPROACH

AUTOHRS: K. Akkarajitsakul, E. Hossain, and D. Niyato

PUBLISH: IEEE Trans. Mobile Comput., vol. 12, no. 5, pp. 840–854, May 2013

EXPLANATION:

We consider the problem of cooperative packet delivery to mobile nodes in a hybrid wireless mobile network, where both infrastructure-based and infrastructure-less (i.e., ad hoc mode or peer-to-peer mode) communications are used. We propose a solution based on a coalition formation among mobile nodes to cooperatively deliver packets among these mobile nodes in the same coalition. A coalitional game is developed to analyze the behavior of the rational mobile nodes for cooperative packet delivery. A group of mobile nodes makes a decision to join or to leave a coalition based on their individual payoffs. The individual payoff of each mobile node is a function of the average delivery delay for packets transmitted to the mobile node from a base station and the cost incurred by this mobile node for relaying packets to other mobile nodes. To find the payoff of each mobile node, a Markov chain model is formulated and the expected cost and packet delivery delay are obtained when the mobile node is in a coalition. Since both the expected cost and packet delivery delay depend on the probability that each mobile node will help other mobile nodes in the same coalition to forward packets to the destination mobile node in the same coalition, a bargaining game is used to find the optimal helping probabilities. After the payoff of each mobile node is obtained, we find the solutions of the coalitional game which are the stable coalitions. A distributed algorithm is presented to obtain the stable coalitions and a Markov-chain-based analysis is used to evaluate the stable coalitional structures obtained from the distributed algorithm. Performance evaluation results show that when the stable coalitions are formed, the mobile nodes achieve a nonzero payoff (i.e., utility is higher than the cost). With a coalition formation, the mobile nodes achieve higher payoff than that when each mobile node acts alone.

EFFICIENT RESOURCE ALLOCATION IN HYBRID WIRELESS NETWORKS

AUTOHRS: B. Bengfort, W. Zhang, and X. Du

PUBLISH: Proc. Wireless Commun. Netw. Conf., 2011, pp. 820–825.

EXPLANATION:

n this paper, we study an emerging type of wireless network – Hybrid Wireless Networks (HWNs). A HWN consists of an infrastructure wireless network (e.g., a cellular network) and several ad hoc nodes (such as a Mobile ad hoc network). Forming a HWN is a very cost-effective way to improve wireless coverage and the available bandwidth to users. Specifically, in this work we investigate the issue of bandwidth allocation in multi-hop HWNs. We propose three efficient bandwidth allocation schemes for HWNs: top-down, bottom-up, and auction-based allocation schemes. In order to evaluate the bandwidth allocation schemes, we develop a simulated HWN environment. Our simulation results show that the proposed schemes achieve good performance: the schemes can achieve maximum revenue/utility in many cases, while also providing fairness. We also show that each of the schemes has merit in different application scenarios.

SYSTEM ANALYSIS

EXISTING SYSTEM:

Existing methods in the Two-hop transmission protocol in terms of the elimination of route maintenance and the limited number of hops in routing. In two-hop, when a node’s bandwidth to a BS is larger than that of each neighbor, it directly sends a message to the BS. Otherwise, it chooses a neighbor with a higher channel and sends a message to it, which further forwards the message to the BS uses distributed transmission involving multiple cells, which makes full use of system resources and dynamically balances the traffic load between neighboring cells. In contrast, Two-hop employs single-path transmission.

Direct combination of the two transmission modes inherits the following problems that are rooted in the ad-hoc transmission mode. 

High overhead: Route discovery and maintenance incur high overhead. The wireless random access medium access control (MAC) required in mobile ad-hoc networks, which utilizes control handshaking and a back-off mechanism, further increases overhead. 

Hot spots: The mobile gateway nodes can easily become hot spots. The RTS-CTS random access, in which most traffic goes through the same gateway, and the flooding employed in mobile ad-hoc routing to discover routes may exacerbate the hot spot problem. In addition, mobile nodes only use the channel resources in their route direction, which may generate hot spots while leave resources in other directions under-utilized. Hot spots lead to low transmission rates, severe network congestion, and high data dropping rates. 

Low reliability: Dynamic and long routing paths lead to unreliable routing. Noise interference and neighbor interference during the multi-hop transmission process because a high data drop rate. Long routing paths increase the probability of the occurrence of path breakdown due to the highly dynamic nature of wireless ad-hoc networks.

DISADVANTAGES:

  • Route discovery and maintenance incur high overhead.
  • The mobile gateway nodes can easily become hot spots.
  • Dynamic and long routing paths lead to unreliable routing.
  • Noise interference and neighbor interference during the multi-hop transmission process because a high data drop rate.
  • Long routing paths increase the probability of the occurrence of path breakdown due to the highly dynamic nature of wireless ad-hoc networks.

PROPOSED SYSTEM:

We propose a Distributed Three-hop Data Routing protocol (DTR). In DTR, as shown in Fig. 1b, a source node divides a message stream into a number of segments. Each segment is sent to a neighbor mobile node. Based on the QoS requirement, these mobile relay nodes choose between direct transmissions or relay transmission to the BS. In relay transmission, a segment is forwarded to another mobile node with higher capacity to a BS than the current node. In direct transmission, a segment is directly forwarded to a BS. In the infrastructure, the segments are rearranged in their original order and sent to the destination. The number of routing hops in DTR is confined to three, including at most two hops in the ad-hoc transmission mode and one hop in the cellular transmission mode. To overcome the aforementioned shortcomings, DTR tries to limit the number of hops. The first hop forwarding distributes the segments of a message in different directions to fully utilize the resources, and the possible second hop forwarding ensures the high capacity of the forwarder.

DTR also has a congestion control algorithm to balance the traffic load between the nearby BSes in order to avoid traffic congestion at BSes. Using self-adaptive and distributed routing with high speed and short-path ad-hoc transmission, DTR significantly increases the throughput capacity and scalability of hybrid wireless networks by overcoming the three shortcomings of the previous routing algorithms.

It has the following features:  

  • Low overhead: It eliminates overhead caused by route discovery and maintenance in the ad-hoc transmission mode, especially in a dynamic environment.
  • Hot spot reduction: It alleviates traffic congestion at mobile gateway nodes while makes full use of channel resources through a distributed multi-path relay.
  • High reliability: Because of its small hop path length with a short physical distance in each step, it alleviates noise and neighbor interference and avoids the adverse effect of route breakdown during data transmission. Thus, it reduces the packet drop rate and makes full use of special reuse, in which several source and destination nodes can communicate simultaneously without interference.

ADVANTAGES:

  • DTR eliminates overhead caused by route discovery and maintenance in the ad-hoc transmission mode, especially in a dynamic environment.
  • DTR should alleviate traffic congestion at mobile gateway nodes while makes full use of channel resources through a distributed multi-path relay.
  • Because of its small hop path length with a short physical distance in each step, it alleviates noise and neighbor interference and avoids the adverse effect of route breakdown during data transmission.
  • DTR reduces the packet drop rate and makes full use of spacial reuse, in which several source and destination nodes can communicate simultaneously without interference.
  • Network with High Throughput Performance.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor      –    SVGA

SOFTWARE REQUIREMENTS:

JAVA

  • Operating System        :           Windows XP or Win7
  • Front End       :           JAVA JDK 1.7
  • Back End :           MYSQL Server
  • Server :           Apache Tomact Server
  • Script :           JSP Script
  • Document :           MS-Office 2007

A DISTORTION-RESISTANT ROUTING FRAMEWORK FOR VIDEO TRAFFIC IN WIRELESS MULTIHOP NETWORKS

ABSTRACT:

Traditional routing metrics designed for wireless networks are application agnostic. In this paper, we consider a wireless network where the application flows consist of video traffic. From a user perspective, reducing the level of video distortion is critical. We ask the question “Should the routing policies change if the end-to-end video distortion is to be minimized?” Popular link-quality-based routing metrics (such as ETX) do not account for dependence (in terms of congestion) across the links of a path; as a result, they can because video flows to converge onto a few paths and, thus, cause high video distortion. To account for the evolution of the video frame loss process, we construct an analytical framework to, first, understand and, second, assess the impact of the wireless network on video distortion. The framework allows us to formulate a routing policy for minimizing distortion, based on which we design a protocol for routing video traffic. We find via simulations and tested experiments that our protocol is efficient in reducing video distortion and minimizing the user experience degradation.

INTRODUCTION

With the advent of smart phones, video traffic has become very popular in wireless networks. In tactical networks or disaster recovery, one can envision the transfer of video clips to facilitate mission management. From a user perspective, maintaining a good quality of the transferred video is critical. The video quality is affected by: 1) the distortion due to compression at the source, and 2) the distortion due to both wireless channel induced errors and interference. Video encoding standards, like MPEG-4 [1] or H.264/AVC, define groups of I-, P-, and B-type frames that provide different levels of encoding and, thus, protection against transmission losses. In particular, the different levels of encoding refer to: 1) either information encoded independently, in the case of I-frames, or 2) encoding relative to the information encoded within other frames, as is the case for P- and B-frames.

This Group of Pictures (GOP) allows for the mapping of frame losses into a distortion metric that can be used to assess the application-level performance of video transmissions. One of the critical functionalities that is often neglected, but affects the end-to-end quality of a video flow, is routing. Typical routing protocols, designed for wireless multihop settings, are application-agnostic and do not account for correlation of losses on the links that compose a route from a source to a destination node. Furthermore, since flows are considered independently, they can converge onto certain links that then become heavily loaded (thereby increasing video distortion), while others are significantly underutilized. The decisions made by such routing protocols are based on only network (and not application) parameters.

Our thesis is that the user-perceived video quality can be significantly improved by accounting for application requirements, and specifically the video distortion experienced by a flow, end-to-end. Typically, the schemes used to encode a video clip can accommodate a certain number of packet losses per frame. However, if the number of lost packets in a frame exceeds a certain threshold, the frame cannot be decoded correctly. A frame loss will result in some amount of distortion. The value of distortion at a hop along the path from the source to the destination depends on the positions of the unrecoverable video frames (simply referred to as frames) in the GOP, at that hop. As one of our main contributions, we construct an analytical model to characterize the dynamic behavior of the process that describes the evolution of frame losses in the GOP (instead of just focusing on a network quality metric such as the packet-loss probability) as video is delivered on an end-to-end path. Specifically, with our model, we capture how the choice of path for an end-to-end flow affects the performance of a flow in terms of video distortion.

Our model is built based on a multilayer approach in the packet-loss probability on a link is mapped to the probability of a frame loss in the GOP. The frame-loss probability is then directly associated with the video distortion metric. By using the above mapping from the network-specific property (i.e., packet-loss probability) to the application-specific quality metric (i.e., video distortion), we pose the problem of routing as an optimization problem where the objective is to find the path from the source to the destination that minimizes the end-to-end distortion. In our formulation, we explicitly take into account the history of losses in the GOP along the path. This is in stark contrast with traditional routing metrics (such as the total expected transmission count (ETX) wherein the links are treated independently.

Our solution to the problem is based on a dynamic programming approach that effectively captures the evolution of the frame-loss process. We then design a practical routing protocol, based on the above solution, to minimize routing distortion. In a nutshell, since the loss of the longer I-frames that carry fine-grained information affects the distortion metric more, our approach ensures that these frames are carried on the paths that experience the least congestion; the latter frames in a GOP are sent out on relatively more congested paths. Our routing scheme is optimized for transferring video clips on wireless networks with minimum video distortion. Since optimizing for video streaming is not an objective of our scheme, constraints relating to time (such as jitter) are not directly taken into account in the design.

LITRATURE SURVEY

TITLE: AN EVALUATION FRAMEWORK FOR MORE REALISTIC SIMULATIONS OF MPEG VIDEO TRANSMISSION

PUBLICATION: J. Inf. Sci. Eng., vol. 24, no. 2, pp. 425–440, Mar. 2008.

AUTHORS: C.-H. Ke, C.-K. Shieh, W.-S. Hwang, and A. Ziviani

EXPLANATION:

We present a novel and complete tool-set for evaluating the delivery quality of MPEG video transmissions in simulations of a network environment. This tool-set is based on the EvalVid framework. We extend the connecting interfaces of EvalVid to replace its simple error simulation model by a more general network simulator like NS2. With this combination, researchers and practitioners in general can analyze through simulation the performance of real video streams, i.e. taking into account the video semantics, under a large range of network scenarios. To demonstrate the usefulness of our new tool-set, we point out that it enables the investigation of the relationship between two popular objective metrics for Quality of Service (QoS) assessment of video quality delivery: the PSNR (Peak Signal to Noise Ratio) and the fraction of decodable frames. The results show that the fraction of decodable frames reflects well the behavior of the PSNR metric, while being less time-consuming. Therefore, the fraction of decodable frames can be an alternative metric to objectively assess through simulations the delivery quality of transmission in a network of publicly available video trace files.

TITLE: MULTIPATH ROUTING OVER WIRELESS MESH NETWORKS FOR MULTIPLE DESCRIPTION VIDEO TRANSMISSION

PUBLICATION: IEEE J. Sel. Areas Commun., vol. 28, no. 3, pp. 321–331, Apr. 2010.

AUTHORS: B. Rong, Y. Qian, K. Lu, R. Qingyang, and M. Kadoch

EXPLANATION:

In the past few years, wireless mesh networks (WMNs) have drawn significant attention from academia and industry as a fast, easy, and inexpensive solution for broadband wireless access. In WMNs, it is important to support video communications in an efficient way. To address this issue, this paper studies the multipath routing for multiple description (MD) video delivery over IEEE 802.11 based WMN. Specifically, we first design a framework to transmit MD video over WMNs through multiple paths; we then investigate the technical challenges encountered. In our proposed framework, multipath routing relies on the maximally disjoint paths to achieve good traffic engineering performance. However, video applications usually have strict delay requirements, which make it difficult to find multiple qualified paths with the least joints. To overcome this problem, we develop an enhanced version of Guaranteed-Rate (GR) packet scheduling algorithm, namely virtual reserved rate GR (VRR-GR), to shorten the packet delay of video communications in multiservice network environment. Simulation study shows that our proposed approach can reduce the latency of video delivery and achieve desirable traffic engineering performance in multipath routing environment.

TITLE: PERFORMANCE EVALUATION OF H.264/SVC VIDEO STREAMING OVER MOBILE WIMAX

PUBLICATION: Comput. Netw., vol. 55, no. 15, pp. 3578–3591, Oct. 2011.

AUTHORS: D. Migliorini, E. Mingozzi, and C. Vallati

EXPLANATION:

Mobile broadband wireless networks, such as mobile WiMAX, have been designed to support several features like, e.g., Quality of Service (QoS) or enhanced data protection mechanisms, in order to provide true access to real-time multimedia applications like Voice over IP or Video on Demand. On the other hand, recently defined video coding schemes, like H.264 scalable video coding (H.264/SVC), are evolving in order to better adapt to such mobile environments with heterogeneous clients and time-varying available capacity. In this work we assess the performance of H.264/SVC video streaming over mobile WiMAX under realistic network conditions. To this aim, we make use of specific metrics, like PSNR (Peak Signal to Noise Ratio) or MOS (Mean Opinion Score), which are related to the quality of experience as perceived by the end user. Simulation results show that the performance is sensitive to the different available H.264/SVC encoding options, which respond differently to the loss of data in the network. On the other hand, if aggressive error recovery based on WiMAX data protection mechanisms is used, this might lead to unacceptable latencies in the video play out, especially for those mobiles with poor wireless channel characteristics.

SYSTEM ANALYSIS

EXISTING SYSTEM:

Existing methods in WMNs, it is important to support video communications in an efficient way. To address this issue, this paper studies the single path routing for multiple description (MD) video delivery over IEEE 802.11 based WMN. Specifically, we first design a framework to transmit MD video over WMNs through single paths; we then investigate the technical challenges encountered framework, multipath routing relies on the maximally disjoint paths to achieve good traffic engineering performance.

However, video applications usually have strict delay requirements, which make it difficult to find multiple qualified paths with the least joints an enhanced version of Guaranteed-Rate (GR) packet scheduling algorithm, namely virtual reserved rate GR (VRR-GR), to shorten the packet delay of video communications in multiservice network environment. Simulation study shows that existing approach can reduce the latency of video delivery and achieve desirable traffic engineering performance in single path routing environment.

DISADVANTAGES:

  • Different approaches exist in handling such an encoding and transmission in the Multiple Description Coding technique fragments the initial video clip into a number of substreams called descriptions packet losses.
  • The descriptions are transmitted on the network over disjoint paths. These descriptions are equivalent in the sense that any one of them is sufficient for the decoding process very low buffer.
  • Layered Coding produces a base layer and multiple enhancement layers. The enhancement layers serve only to refine the base-layer quality and are not useful on their own routing is single path.

PROPOSED SYSTEM:

In this paper, our thesis is that the user-perceived video quality can be significantly improved by accounting for application requirements, and specifically the video distortion experienced by a flow, end-to-end. Typically, the schemes used to encode a video clip can accommodate a certain number of packet losses per frame. However, if the number of lost packets in a frame exceeds a certain threshold, the frame cannot be decoded correctly. A frame loss will result in some amount of distortion. The value of distortion at a hop along the path from the source to the destination depends on the positions of the unrecoverable video frames (simply referred to as frames) in the GOP, at that hop. As one of our main contributions, we construct an analytical model to characterize the dynamic behavior of the process that describes the evolution of frame losses in the GOP (instead of just focusing on a network quality metric such as the packet-loss probability) as video is delivered on an end-to-end path.

Specifically, with our model, we capture how the choice of path for an end-to-end flow affects the performance of a flow in terms of video distortion. Our model is built based on a multilayer approach as shown in Fig. 1. The packet-loss probability on a link is mapped to the probability of a frame loss in the GOP. The frame-loss probability is then directly associated with the video distortion metric. By using the above mapping from the network-specific property (i.e., packet-loss probability) to the application-specific quality metric (i.e., video distortion), we pose the problem of routing as an optimization problem where the objective is to find the path from the source to the destination that minimizes the end-to-end distortion.

ADVANTAGES:

Developing an analytical framework to capture the impact of routing on video distortion as our primary contribution, we develop an analytical framework that captures the impact of routing on the end-to-end video quality in terms of distortion.

 Specifically, the framework facilitates the computation of routes that are optimal in terms of achieving the minimum distortion. The model takes into account the joint impact of the PHY and MAC layers and the application semantics on the video quality.

Design of a practical routing protocol for distortion-resilient video delivery: Based on our analysis, we design a practical routing protocol for a network that primarily carries wireless video. The practical protocol allows a source to collect distortion information on the links in the network and distribute traffic across the different paths in accordance to: 1) the distortion, and 2) the position of a frame in the GOP.

Evaluations via extensive experiments: We demonstrate via extensive simulations and real testbed experiments on a multihop 802.11a testbed that our protocol is extremely effective in reducing the end-to-end video distortion and keeping the user experience degradation to a minimum rate.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk       –   20 GB
  • Floppy Drive        –    44 MB
  • Key Board       –    Standard Windows Keyboard
  • Mouse        –    Two or Three Button Mouse
  • Monitor       –    SVGA

SOFTWARE REQUIREMENTS:

  • Operating System          :           Windows XP or Win7
  • Front End        :           JAVA JDK 1.7
  • Tools                                     :           Netbeans 7
  • Document :           MS-Office 2007

Single Image Super-Resolution Based on Gradient Profile Sharpness

ABSTRACT

In this paper, a novel image superresolution algorithm is proposed based on GPS (Gradient Profile Sharpness). GPS is an edge sharpness metric, which is extracted from two gradient description models, i.e. a triangle model and a Gaussian mixture model for the description of different kinds of gradient profiles. Then the transformation relationship of GPSs in different image resolutions is studied statistically, and the parameter of the relationship is estimated automatically. Based on the estimated GPS transformation relationship, two gradient profile transformation models are proposed for two profile description models, which can keep profile shape and profile gradient magnitude sum consistent during profile transformation. Finally, the target gradient field of HR (high resolution) image is generated from the transformed gradient profiles, which is added as the image prior in HR image reconstruction model. Extensive experiments are conducted to evaluate the proposed algorithm in subjective visual effect, objective quality, and computation time. The experimental results demonstrate that the proposed approach can generate superior HR images with better visual quality, lower reconstruction error and acceptable computation efficiency as compared to state-of-the-art works 

Algorithm:

Super resolution  algorithm:

This Algorithm Used On Increasing Decreasing Resolution Purpose For Using.

HR:Higher Resolution Algorithm

Existing System                       

Single image super-resolution is a classic and active image processing problem, which aims to generate a high resolution image from a low resolution input image. Due to the severely under-determined nature of this problem, an effective image prior is necessary to make the problem solvable, and to improve the quality of generated images

Proposed System

  • More sophisticated interpolation models have also been proposed
  • To reduce the dependence on the training HR image, self-example based approaches were proposed, which utilized the observation that patches tended to redundantly recur inside an image within the same image scale as well as across different scales or there existed a transformation relationship across image space
  • . These approaches are more robust, however there are always some artifacts on their super-resolution results. Generally, the computational complexity of learning-based super-resolution approaches is quite high.
  • Various regularization terms have been proposed based on local gradient enhancement and globalgradient sparsity . Recently, metrics of edge sharpness have attracted researchers attention as the regularization term, since edges are of primary importance invisual image quality .
  • Based on the transformed GPS, two gradient profile transformation models are proposed, which can well keep profile shape and profile gradient magnitude sum consistent during the profile transformation.
  • Finally, the target gradient field of HR (high resolution) image is generated from transformed gradient profiles, which is added as the image priors in HR image reconstruction model.

MODULES

  • single image super-resolution
  • Gradient Profile Sharpness
  • Color Transfer
  • Multiple-reference color transfer
  • single image super-resolution:

Single-image super-resolution refers to the task of constructing a high-resolution enlargement of a given low-resolution image. Usual interpolation-based magnification introduces blurring. Then, the problem cast into estimating missing high-frequency details. Based on the framework of Freeman et al.

  1. interpolation of the input low-resolution image into the desired scale
  2. generation of a set of candidate images based on patch-wise regression: kernel ridge regression is utilized; To reduce the time complexity a sparse basis is found by combining kernel matching pursuit and gradient descent
  3. combining candidates to produce an image: patch-wise regression of output results in a set of candidates for each pixel location; An image output is obtained by combining the candidates based on estimated confidences for each pixel.
  4. post-processing based on the discontinuity prior of images: as a regularization method, kernel ridge regression tends to smooth major edges; The natural image prior proposed by Tappen et al. [2] is utilized to post-process the regression result such that the discontinuity at major edges are preserved.

Gradient Profile Sharpness:

A Novel edge sharpness metric GPS (gradient profile sharpness) is extracted as the eccentricity of gradient profile description models, which considers both the gradient magnitude and the spatial scattering of a gradient profile.

To precisely describe different kinds of gradient profile shapes, a triangle model and a mixed Gaussian model are proposed for short gradient profiles and heavy-tailed gradient profiles respectively. Then the pairs of GPS values under different image resolutions are studied statistically, and a linear GPS transformation relationship is formulated, whose parameter can be estimated automatically in each super-resolution application. Based on the transformed GPS, two gradient profile transformation models are proposed, which can well keep profile shape and profile gradient magnitude sum consistent during the profile transformation.

two gradient profile transformation models are proposed and the solve of HR image reconstruction model is introduced. Moreover, detailed experimental comparisons are made between the proposed approach and other state-of-the-art super-resolution methods, which are demonstrated in Section

Color Transfer:

Firstly proposed a way to match the means and variances between the target and the reference in the low correlated color space. This approach was efficient enough, but the simple means and variances  matching was likely to produce slight grain effect and serious color distortion. To prevent from the grain effect, Chang et al. proposed a color category based approach that categorized each pixelas one of the basic categories .Then a convex hull was generated in color space for each category of the pixel set, and the color transformation was applied with each pair of convex hull of the same category..

Multiple-reference color transfer:

requires the transfer naturally blending the colors from multiple references . However, as  illustrated  , the main difference exist among the references. Although both of the references are the sunshine theme, they have a big difference in the color appearance. This difference would easily lead to the grain effect in the result. As illustrated in , the  result has a serious grain effect approach adopts the gradient correction to suppress the grain, but it does not prevent the color distortion, see Our approach deals with the grain effect and distortion in each step, therefore, we can achieve a visual satisfactory result.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor              –    SVGA

SOFTWARE REQUIREMENTS:

JAVA

  • Operating System        :           Windows XP or Win7
  • Front End       :           JAVA JDK 1.7
  • Back End :           MYSQL Server
  • Server :           Apache Tomact Server
  • Script :           JSP Script
  • Document :           MS-Office 2007

.NET

  • Operating System        :           Windows XP or Win7
  • Front End       :           Microsoft Visual Studio .NET 2008
  • Script :           C# Script
  • Back End :           MS-SQL Server 2005
  • Document :           MS-Office 2007

REAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATION

ABSTRACT:

In today’s era, there is a great deal added to real-time remote sensing Big Data than it seems at first, and extracting the useful information in an efficient manner leads a system toward a major computational challenges, such as to analyze, aggregate, and store, where data are remotely collected. Keeping in view the above mentioned factors, there is a need for designing a system architecture that welcomes both realtime, as well as offline data processing. In this paper, we propose real-time Big Data analytical architecture for remote sensing satellite application.

The proposed architecture comprises three main units:

1) Remote sensing Big Data acquisition unit (RSDU);

2) Data processing unit (DPU); and

3) Data analysis decision unit (DADU).

First, RSDU acquires data from the satellite and sends this data to the Base Station, where initial processing takes place. Second, DPU plays a vital role in architecture for efficient processing of real-time Big Data by providing filtration, load balancing, and parallel processing. Third, DADU is the upper layer unit of the proposed architecture, which is responsible for compilation, storage of the results, and generation of decision based on the results received from DPU.

INTRODUCTION:

Recently, a great deal of interest in the field of Big Data and its analysis has risen mainly driven from extensive number of research challenges strappingly related to bonafide applications, such as modeling, processing, querying, mining, and distributing large-scale repositories. The term “Big Data” classifies specific kinds of data sets comprising formless data, which dwell in data layer of technical computing applications and the Web. The data stored in the underlying layer of all these technical computing application scenarios have some precise individualities in common, such as 1) largescale data, which refers to the size and the data warehouse; 2) scalability issues, which refer to the application’s likely to be running on large scale (e.g., Big Data); 3) sustain extraction transformation loading (ETL) method from low, raw data to well thought-out data up to certain extent; and 4) development of uncomplicated interpretable analytical over Big Data warehouses with a view to deliver an intelligent and momentous knowledge for them.

Big Data are usually generated by online transaction, video/audio, email, number of clicks, logs, posts, social network data, scientific data, remote access sensory data, mobile phones, and their applications. These data are accumulated in databases that grow extraordinarily and become complicated to confine, form, store, manage, share, process, analyze, and visualize via typical database software tools. Advancement in Big Data sensing and computer technology revolutionizes the way remote data collected, processed, analyzed, and managed. Particularly, most recently designed sensors used in the earth and planetary observatory system are generating continuous stream of data. Moreover, majority of work have been done in the various fields of remote sensory satellite image data, such as change detection, gradient-based edge detection region similarity based edge detection and intensity gradient technique for efficient intraprediction.

 In this paper, we referred the high speed continuous stream of data or high volume offline data to “Big Data,” which is leading us to a new world of challenges. Such consequences of transformation of remotely sensed data to the scientific understanding are a critical task. Hence the rate at which volume of the remote access data is increasing, a number of individual users as well as organizations are now demanding an efficient mechanism to collect, process, and analyze, and store these data and its resources. Big Data analysis is somehow a challenging task than locating, identifying, understanding, and citing data. Having a large-scale data, all of this has to happen in a mechanized manner since it requires diverse data structure as well as semantics to be articulated in forms of computer-readable format.

However, by analyzing simple data having one data set, a mechanism is required of how to design a database. There might be alternative ways to store all of the same information. In such conditions, the mentioned design might have an advantage over others for certain process and possible drawbacks for some other purposes. In order to address these needs, various analytical platforms have been provided by relational databases vendors. These platforms come in various shapes from software only to analytical services that run in third-party hosted environment. In remote access networks, where the data source such as sensors can produce an overwhelming amount of raw data.

We refer it to the first step, i.e., data acquisition, in which much of the data are of no interest that can be filtered or compressed by orders of magnitude. With a view to using such filters, they do not discard useful information. For instance, in consideration of new reports, is it adequate to keep that information that is mentioned with the company name? Alternatively, is it necessary that we may need the entire report, or simply a small piece around the mentioned name? The second challenge is by default generation of accurate metadata that describe the composition of data and the way it was collected and analyzed. Such kind of metadata is hard to analyze since we may need to know the source for each data in remote access.

LITRATURE SURVEY:

BIG DATA AND CLOUD COMPUTING: CURRENT STATE AND FUTURE OPPORTUNITIES

AUTHOR: D. Agrawal, S. Das, and A. E. Abbadi

PUBLISH: Proc. Int. Conf. Extending Database Technol. (EDBT), 2011, pp. 530–533.

EXPLANATION:

Scalable database management systems (DBMS)—both for update intensive application workloads as well as decision support systems for descriptive and deep analytics—are a critical part of the cloud infrastructure and play an important role in ensuring the smooth transition of applications from the traditional enterprise infrastructures to next generation cloud infrastructures. Though scalable data management has been a vision for more than three decades and much research has focussed on large scale data management in traditional enterprise setting, cloud computing brings its own set of novel challenges that must be addressed to ensure the success of data management solutions in the cloud environment. This tutorial presents an organized picture of the challenges faced by application developers and DBMS designers in developing and deploying internet scale applications. Our background study encompasses both classes of systems: (i) for supporting update heavy applications, and (ii) for ad-hoc analytics and decision support. We then focus on providing an in-depth analysis of systems for supporting update intensive web-applications and provide a survey of the state-of-theart in this domain. We crystallize the design choices made by some successful systems large scale database management systems, analyze the application demands and access patterns, and enumerate the desiderata for a cloud-bound DBMS.

CHANGE DETECTION IN SYNTHETIC APERTURE RADAR IMAGE BASED ON FUZZY ACTIVE CONTOUR MODELS AND GENETIC ALGORITHMS

AUTHOR: J. Shi, J. Wu, A. Paul, L. Jiao, and M. Gong

PUBLISH: Math. Prob. Eng., vol. 2014, 15 pp., Apr. 2014.

EXPLANATION:

This paper presents an unsupervised change detection approach for synthetic aperture radar images based on a fuzzy active contour model and a genetic algorithm. The aim is to partition the difference image which is generated from multitemporal satellite images into changed and unchanged regions. Fuzzy technique is an appropriate approach to analyze the difference image where regions are not always statistically homogeneous. Since interval type-2 fuzzy sets are well-suited for modeling various uncertainties in comparison to traditional fuzzy sets, they are combined with active contour methodology for properly modeling uncertainties in the difference image. The interval type-2 fuzzy active contour model is designed to provide preliminary analysis of the difference image by generating intermediate change detection masks. Each intermediate change detection mask has a cost value. A genetic algorithm is employed to find the final change detection mask with the minimum cost value by evolving the realization of intermediate change detection masks. Experimental results on real synthetic aperture radar images demonstrate that change detection results obtained by the improved fuzzy active contour model exhibits less error than previous approaches.

A BIG DATA ARCHITECTURE FOR LARGE SCALE SECURITY MONITORING

AUTHOR: S. Marchal, X. Jiang, R. State, and T. Engel

PUBLISH: Proc. IEEE Int. Congr. Big Data, 2014, pp. 56–63.

EXPLANATION:

Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this paper we propose a solution to cope with the tremendous amount of data to analyse for security monitoring perspectives. We introduce an architecture dedicated to security monitoring of local enterprise networks. The application domain of such a system is mainly network intrusion detection and prevention, but can be used as well for forensic analysis. This architecture integrates two systems, one dedicated to scalable distributed data storage and management and the other dedicated to data exploitation. DNS data, NetFlow records, HTTP traffic and honeypot data are mined and correlated in a distributed system that leverages state of the art big data solution. Data correlation schemes are proposed and their performance are evaluated against several well-known big data framework including Hadoop and Spark.

SYSTEM ANALYSIS

EXISTING SYSTEM:

Existing methods inapplicable on standard computers it is not desirable or possible to load the entire image into memory before doing any processing. In this situation, it is necessary to load only part of the image and process it before saving the result to the disk and proceeding to the next part. This corresponds to the concept of on-the-flow processing. Remote sensing processing can be seen as a chain of events or steps is generally independent from the following ones and generally focuses on a particular domain. For example, the image can be radio metrically corrected to compensate for the atmospheric effects, indices computed, before an object extraction based on these indexes takes place.

The typical processing chain will process the whole image for each step, returning the final result after everything is done. For some processing chains, iterations between the different steps are required to find the correct set of parameters. Due to the variability of satellite images and the variety of the tasks that need to be performed, fully automated tasks are rare. Humans are still an important part of the loop. These concepts are linked in the sense that both rely on the ability to process only one part of the data.

In the case of simple algorithms, this is quite easy: the input is just split into different non-overlapping pieces that are processed one by one. But most algorithms do consider the neighborhood of each pixel. As a consequence, in most cases, the data will have to be split into partially overlapping pieces. The objective is to obtain the same result as the original algorithm as if the processing was done in one go. Depending on the algorithm, this is unfortunately not always possible.

DISADVANTAGES:

  • A reader that loads the image, or part of the image in memory from the file on disk;
  • A filter which carries out a local processing that does not require access to neighboring pixels (a simple threshold for example), the processing can happen on CPU or GPU;
  • A filter that requires the value of neighboring pixels to compute the value of a given pixel (a convolution filter is a typical example), the processing can happen on CPU or GPU;
  • A writer to output the resulting image in memory into a file on disk, note that the file could be written in several steps. We will illustrate in this example how it is possible to compute part of the image in the whole pipeline, incurring only minimal computation overhead.

PROPOSED SYSTEM:

We present a remote sensing Big Data analytical architecture, which is used to analyze real time, as well as offline data. At first, the data are remotely preprocessed, which is then readable by the machines. Afterward, this useful information is transmitted to the Earth Base Station for further data processing. Earth Base Station performs two types of processing, such as processing of real-time and offline data. In case of the offline data, the data are transmitted to offline data-storage device. The incorporation of offline data-storage device helps in later usage of the data, whereas the real-time data is directly transmitted to the filtration and load balancer server, where filtration algorithm is employed, which extracts the useful information from the Big Data.

On the other hand, the load balancer balances the processing power by equal distribution of the real-time data to the servers. The filtration and load-balancing server not only filters and balances the load, but it is also used to enhance the system efficiency. Furthermore, the filtered data are then processed by the parallel servers and are sent to data aggregation unit (if required, they can store the processed data in the result storage device) for comparison purposes by the decision and analyzing server. The proposed architecture welcomes remote access sensory data as well as direct access network data (e.g., GPRS, 3G, xDSL, or WAN). The proposed architecture and the algorithms are implemented in applying remote sensing earth observatory data.

We proposed architecture has the capability of dividing, load balancing, and parallel processing of only useful data. Thus, it results in efficiently analyzing real-time remote sensing Big Data using earth observatory system. Furthermore, the proposed architecture has the capability of storing incoming raw data to perform offline analysis on largely stored dumps, when required. Finally, a detailed analysis of remotely sensed earth observatory Big Data for land and sea area are provided using .NET. In addition, various algorithms are proposed for each level of RSDU, DPU, and DADU to detect land as well as sea area to elaborate the working of architecture.

ADVANTAGES:

Big Data process high-speed, large amount of real-time remote sensory image data using our proposed architecture. It works on both DPU and DADU by taking data from medical application.

Our architecture for offline as well online traffic, we perform a simple analysis on remote sensing earth observatory data. We assume that the data are big in nature and difficult to handle for a single server.

The data are continuously coming from a satellite with high speed. Hence, special algorithms are needed to process, analyze, and make a decision from that Big Data. Here, in this section, we analyze remote sensing data for finding land, sea, or ice area.

We have used the proposed architecture to perform analysis and proposed an algorithm for handling, processing, analyzing, and decision-making for remote sensing Big Data images using our proposed architecture.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk       –   20 GB
  • Floppy Drive        –    44 MB
  • Key Board       –    Standard Windows Keyboard
  • Mouse        –    Two or Three Button Mouse
  • Monitor       –    SVGA

SOFTWARE REQUIREMENTS:

  • Operating System          :           Windows XP or Win7
  • Front End        :           Microsoft Visual Studio .NET 2008
  • Script :           C# Script
  • Back End :           MS-SQL Server 2005
  • Document :           MS-Office 2007

RANK-BASED SIMILARITY SEARCH REDUCING THE DIMENSIONAL DEPENDENCE

ABSTRACT:

This paper introduces a data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or other selection tests involving numerical constraints on distance values.

INTRODUCTION

Of the fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, perhaps the most widely-encountered is that of similarity search. Similarity search is the foundation of k-nearest-neighbor (k-NN) classification, which often produces competitively-low error rates in practice, particularly when the number of classes is large. The error rate of nearest-neighbor classification has been shown to be ‘asymptotically optimal’ as the training set size increases. For clustering, many of the most effective and popular strategies require the determination of neighbor sets based at a substantial proportion of the data set objects: examples include hierarchical (agglomerative) methods such as content-based filtering methods for recommender systems and anomaly detection methods commonly make use of k-NN techniques, either through the direct use of k-NN search, or by means of k-NN cluster analysis.

A very popular density-based measure, the Local Outlier Factor (LOF), relies heavily on k-NN set computation to determine the relative density of the data in the vicinity of the test point [8]. For data mining applications based on similarity search, data objects are typically modeled as feature vectors of attributes for which some measure of similarity is defined Motivated at least in part by the impact of similarity search on problems in data mining, machine learning, pattern recognition, and statistics, the design and analysis of scalable and effective similarity search structures has been the subject of intensive research for many decades. Until relatively recently, most data structures for similarity search targeted low-dimensional real vector space representations and the euclidean or other Lp distance metrics.

However, many public and commercial data sets available today are more naturally represented as vectors spanning many hundreds or thousands of feature attributes that can be real or integer-valued, ordinal or categorical, or even a mixture of these types. This has spurred the development of search structures for more general metric spaces, such as the MultiVantage-Point Tree, the Geometric Near-neighbor Access Tree (GNAT), Spatial Approximation Tree (SAT), the M-tree, and (more recently) the Cover Tree (CT). Despite their various advantages, spatial and metric search structures are both limited by an effect often referred to as the curse of dimensionality.

One way in which the curse may manifest itself is in a tendency of distances to concentrate strongly around their mean values as the dimension increases. Consequently, most pairwise distances become difficult to distinguish, and the triangle inequality can no longer be effectively used to eliminate candidates from consideration along search paths. Evidence suggests that when the representational dimension of feature vectors is high (roughly 20 or more traditional similarity search accesses an unacceptably-high proportion of the data elements, unless the underlying data distribution has special properties. Even though the local neighborhood information employed by data mining applications is useful and meaningful, high data dimensionality tends to make this local information very expensive to obtain.

The performance of similarity search indices depends crucially on the way in which they use similarity information for the identification and selection of objects relevant to the query. Virtually all existing indices make use of numerical constraints for pruning and selection. Such constraints include the triangle inequality (a linear constraint on three distance values), other bounding surfaces defined in terms of distance (such as hypercubes or hyperspheres), range queries involving approximation factors as in Locality-Sensitive Hashing (LSH) or absolute quantities as additive distance terms. One serious drawback of such operations based on numerical constraints such as the triangle inequality or distance ranges is that the number of objects actually examined can be highly variable, so much so that the overall execution time cannot be easily predicted.

Similarity search, researchers and practitioners have investigated practical methods for speeding up the computation of neighborhood information at the expense of accuracy. For data mining applications, the approaches considered have included feature sampling for local outlier detection, data sampling for clustering, and approximate similarity search for k-NN classification. Examples of fast approximate similarity search indices include the BD-Tree, a widely-recognized benchmark for approximate k-NN search; it makes use of splitting rules and early termination to improve upon the performance of the basic KD-Tree. One of the most popular methods for indexing, Locality-Sensitive Hashing can also achieve good practical search performance for range queries by managing parameters that influence a tradeoff between accuracy and time.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk      –   20 GB
  • Floppy Drive       –    44 MB
  • Key Board      –    Standard Windows Keyboard
  • Mouse       –    Two or Three Button Mouse
  • Monitor      –    SVGA

SOFTWARE REQUIREMENTS:

JAVA

  • Operating System        :           Windows XP or Win7
  • Front End       :           JAVA JDK 1.7
  • Back End :           MYSQL Server
  • Server :           Apache Tomact Server
  • Script :           JSP Script
  • Document :           MS-Office 2007

.NET

  • Operating System        :           Windows XP or Win7
  • Front End       :           Microsoft Visual Studio .NET 2008
  • Script :           C# Script
  • Back End :           MS-SQL Server 2005
  • Document :           MS-Office 2007

PSMPA: PATIENT SELF-CONTROLLABLE AND MULTI-LEVEL PRIVACY-PRESERVING COOPERATIVE AUTHENTICATION IN DISTRIBUTED M-HEALTHCARE CLOUD COMPUTING SYSTEM

ABSTRACT:

The Distributed m-healthcare cloud computing system considerably facilitates secure and efficient patient treatment for medical consultation by sharing personal health information among the healthcare providers. This system should bring about the challenge of keeping both the data confidentiality and patients’ identity privacy simultaneously. Many existing access control and anonymous authentication schemes cannot be straightforwardly exploited. To solve the problem proposed a novel authorized accessible privacy model (AAPM) is established. Patients can authorize physicians by setting an access tree supporting flexible threshold predicates.

Our new technique of attribute based designated verifier signature, a patient self-controllable multi-level privacy preserving cooperative authentication scheme (PSMPA) realizing three levels of security and privacy requirement in distributed m-healthcare cloud computing system is proposed. The directly authorized physicians, the indirectly authorized physicians and the unauthorized persons in medical consultation can respectively decipher the personal health information and/or verify patients’ identities by satisfying the access tree with their own attribute sets.

INTRODUCTION:

Distributed m-healthcare cloud computing systems have been increasingly adopted worldwide including the European Commission activities, the US Health Insurance Portability and Accountability Act (HIPAA) and many other governments for efficient and high-quality medical treatment. In m-healthcare social networks, the personal health information is always shared among the patients located in respective social communities suffering from the same disease for mutual support, and across distributed healthcare providers (HPs) equipped with their own cloud servers for medical consultant. However, it also brings about a series of challenges, especially how to ensure the security and privacy of the patients’ personal health information from various attacks in the wireless communication channel such as eavesdropping and tampering As to the security facet, one of the main issues is access control of patients’ personal health information, namely it is only the authorized physicians or institutions that can recover the patients’ personal health information during the data sharing in the distributed m-healthcare cloud computing system. In practice, most patients are concerned about the confidentiality of their personal health information since it is likely to make them in trouble for each kind of unauthorized collection and disclosure.

Therefore, in distributed m-healthcare cloud computing systems, which part of the patients’ personal health information should be shared and which physicians their personal health information should be shared with have become two intractable problems demanding urgent solutions. There has emerged various research results focusing on them. A fine-grained distributed data access control scheme is proposed using the technique of attribute based encryption (ABE). A rendezvous-based access control method provides access privilege if and only if the patient and the physician meet in the physical world. Recently, a patient-centric and fine-grained data access control in multi-owner settings is constructed for securing personal health records in cloud computing. However, it mainly focuses on the central cloud computing system which is not sufficient for efficiently processing the increasing volume of personal health information in m-healthcare cloud computing system.

Moreover, it is not enough for to only guarantee the data confidentiality of the patient’s personal health information in the honest-but-curious cloud server model since the frequent communication between a patient and a professional physician can lead the adversary to conclude that the patient is suffering from a specific disease with a high probability. Unfortunately, the problem of how to protect both the patients’ data confidentiality and identity privacy in the distributed m-healthcare cloud computing scenario under the malicious model was left untouched.

In this paper, we consider simultaneously achieving data confidentiality and identity privacy with high efficiency. As is described in Fig. 1, in distributed m-healthcare cloud computing systems, all the members can be classified into three categories: the directly authorized physicians with green labels in the local healthcare provider who are authorized by the patients and can both access the patient’s personal health information and verify the patient’s identity and the indirectly authorized physicians with yellow labels in the remote healthcare providers who are authorized by the directly authorized physicians for medical consultant or some research purposes (i.e., since they are not authorized by the patients, we use the term ‘indirectly authorized’ instead). They can only access the personal health information, but not the patient’s identity. For the unauthorized persons with red labels, nothing could be obtained. By extending the techniques of attribute based access control and designated verifier signatures (DVS) on de-identified health information

LITRATURE SURVEY

SECURING PERSONAL HEALTH RECORDS IN CLOUD COMPUTING: PATIENT-CENTRIC AND FINE-GRAINED DATA ACCESS CONTROL IN MULTI-OWNER SETTINGS

AUTHOR: M. Li, S. Yu, K. Ren, and W. Lou

PUBLISH: Proc. 6th Int. ICST Conf. Security Privacy Comm. Netw., 2010, pp. 89–106.

EXPLANATION:

Online personal health record (PHR) enables patients to manage their own medical records in a centralized way, which greatly facilitates the storage, access and sharing of personal health data. With the emergence of cloud computing, it is attractive for the PHR service providers to shift their PHR applications and storage into the cloud, in order to enjoy the elastic resources and reduce the operational cost. However, by storing PHRs in the cloud, the patients lose physical control to their personal health data, which makes it necessary for each patient to encrypt her PHR data before uploading to the cloud servers. Under encryption, it is challenging to achieve fine-grained access control to PHR data in a scalable and efficient way. For each patient, the PHR data should be encrypted so that it is scalable with the number of users having access. Also, since there are multiple owners (patients) in a PHR system and every owner would encrypt her PHR files using a different set of cryptographic keys, it is important to reduce the key distribution complexity in such multi-owner settings. Existing cryptographic enforced access control schemes are mostly designed for the single-owner scenarios. In this paper, we propose a novel framework for access control to PHRs within cloud computing environment. To enable fine-grained and scalable access control for PHRs, we leverage attribute based encryption (ABE) techniques to encrypt each patients’ PHR data. To reduce the key distribution complexity, we divide the system into multiple security domains, where each domain manages only a subset of the users. In this way, each patient has full control over her own privacy, and the key management complexity is reduced dramatically.

PRIVACY AND EMERGENCY RESPONSE IN E-HEALTHCARE LEVERAGING WIRELESS BODY SENSOR NETWORKS

AUTHOR: J. Sun, Y. Fang, and X. Zhu

PUBLISH: IEEE Wireless Commun., vol. 17, no. 1, pp. 66–73, Feb. 2010.

EXPLANATION:

Electronic healthcare is becoming a vital part of our living environment and exhibits advantages over paper-based legacy systems. Privacy is the foremost concern of patients and the biggest impediment to e-healthcare deployment. In addressing privacy issues, conflicts from the functional requirements must be taken into account. One such requirement is efficient and effective response to medical emergencies. In this article, we provide detailed discussions on the privacy and security issues in e-healthcare systems and viable techniques for these issues. Furthermore, we demonstrate the design challenge in the fulfillment of conflicting goals through an exemplary scenario, where the wireless body sensor network is leveraged, and a sound solution is proposed to overcome the conflict.

HCPP: CRYPTOGRAPHY BASED SECURE EHR SYSTEM FOR PATIENT PRIVACY AND EMERGENCY HEALTHCARE

AUTHOR: J. Sun, X. Zhu, C. Zhang, and Y. Fang

PUBLISH: Proc. 31st Int. Conf. Distrib. Comput. Syst., 2011, pp. 373–382.

EXPLANATION:

Privacy concern is arguably the major barrier that hinders the deployment of electronic health record (EHR) systems which are considered more efficient, less error-prone, and of higher availability compared to traditional paper record systems. Patients are unwilling to accept the EHR system unless their protected health information (PHI) containing highly confidential data is guaranteed proper use and disclosure, which cannot be easily achieved without patients’ control over their own PHI. However, cautions must be taken to handle emergencies in which the patient may be physically incompetent to retrieve the controlled PHI for emergency treatment. In this paper, we propose a secure EHR system, HCPP (Healthcaresystem for Patient Privacy), based on cryptographic constructions and existing wireless network infrastructures, to provide privacy protection to patients under any circumstances while enabling timelyPHI retrieval for life-saving treatment in emergency situations. Furthermore, our HCPP system restricts PHI access to authorized (not arbitrary) physicians, who can be traced and held accountable if the accessed PHI is found improperly disclosed. Last but not least, HCPP leverages wireless network access to support efficient and private storage/retrieval of PHI, which underlies a secure and feasible EHR system.

PRIVACY-PRESERVING DETECTION OF SENSITIVE DATA EXPOSURE

ABSTRACT:

Statistics from security firms, research institutions and government organizations show that the numbers of data-leak instances have grown rapidly in recent years. Among various data-leak cases, human mistakes are one of the main causes of data loss. There exist solutions detecting inadvertent sensitive data leaks caused by human mistakes and to provide alerts for organizations. A common approach is to screen content in storage and transmission for exposed sensitive information. Such an approach usually requires the detection operation to be conducted in secrecy. However, this secrecy requirement is challenging to satisfy in practice, as detection servers may be compromised or outsourced.

In this paper, we present a privacy preserving data-leak detection (DLD) solution to solve the issue where a special set of sensitive data digests is used in detection. The advantage of our method is that it enables the data owner to safely delegate the detection operation to a semihonest provider without revealing the sensitive data to the provider. We describe how Internet service providers can offer their customers DLD as an add-on service with strong privacy guarantees. The evaluation results show that our method can support accurate detection with very small number of false alarms under various data-leak scenarios.

INTRODUCTION

According to a report from Risk Based Security (RBS), the number of leaked sensitive data records has increased dramatically during the last few years, i.e., from 412 million in 2012 to 822 million in 2013. Deliberately planned attacks, inadvertent leaks (e.g., forwarding confidential emails to unclassified email accounts), and human mistakes (e.g., assigning the wrong privilege) lead to most of the data-leak incidents. Detecting and preventing data leaks requires a set of complementary solutions, which may include data-leak detection, data confinement, stealthy malware detection and policy enforcement.

Network data-leak detection (DLD) typically performs deep packet inspection (DPI) and searches for any occurrences of sensitive data patterns. DPI is a technique to analyze payloads of IP/TCP packets for inspecting application layer data, e.g., HTTP header/content. Alerts are triggered when the amount of sensitive data found in traffic passes a threshold. The detection system can be deployed on a router or integrated into existing network intrusion detection systems (NIDS). Straightforward realizations of data-leak detection require the plaintext sensitive data.

However, this requirement is undesirable, as it may threaten the confidentiality of the sensitive information. If a detection system is compromised, then it may expose the plaintext sensitive data (in memory). In addition, the data owner may need to outsource the data-leak detection to providers, but may be unwilling to reveal the plaintext sensitive data to them. Therefore, one needs new data-leak detection solutions that allow the providers to scan content for leaks without learning the sensitive information.

In this paper, we propose a data-leak detection solution which can be outsourced and be deployed in a semihonest detection environment. We design, implement, and evaluate our fuzzy fingerprint technique that enhances data privacy during data-leak detection operations. Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data. Using our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected. Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. In another scenario, individuals can mark their own sensitive data and ask the administrator of their local network to detect data leaks for them.

In our detection procedure, the data owner computes a special set of digests or fingerprints from the sensitive data and then discloses only a small amount of them to the DLD provider. The DLD provider computes fingerprints from network traffic and identifies potential leaks in them. To prevent the DLD provider from gathering exact knowledge about the sensitive data, the collection of potential leaks is composed of real leaks and noises. It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.

Our contributions are summarized as follows.

1) We describe a privacy-preserving data-leak detection model for preventing inadvertent data leak in network traffic. Our model supports detection operation delegation and ISPs can provide data-leak detection as an add-on service to their customers using our model. We design, implement, and evaluate an efficient technique, fuzzy fingerprint, for privacy-preserving data-leak detection. Fuzzy fingerprints are special sensitive data digests prepared by the data owner for release to the DLD provider.

2) We implement our detection system and perform extensive experimental evaluation on 2.6 GB Enron dataset, Internet surfing traffic of 20 users, and also 5 simulated real-worlds data-leak scenarios to measure its privacy guarantee, detection rate and efficiency. Our results indicate high accuracy achieved by our underlying scheme with very low false positive rate. Our results also show that the detection accuracy does not degrade much when only partial (sampled) sensitive-data digests are used. In addition, we give an empirical analysis of our fuzzification as well as of the fairness of fingerprint partial disclosure.

SYSTEM ANALYSIS

EXISTING SYSTEM:

  • Existing detecting and preventing data leaks requires a set of complementary solutions, which may include data-leak detection, data confinement, stealthy malware detection, and policy enforcement.
  • Network data-leak detection (DLD) typically performs deep packet inspection (DPI) and searches for any occurrences of sensitive data patterns. DPI is a technique to analyze payloads of IP/TCP packets for inspecting application layer data, e.g., HTTP header/content.
  • Alerts are triggered when the amount of sensitive data found in traffic passes a threshold. The detection system can be deployed on a router or integrated into existing network intrusion detection systems (NIDS).
  • Straightforward realizations of data-leak detection require the plaintext sensitive data. However, this requirement is undesirable, as it may threaten the confidentiality of the sensitive information. If a detection system is compromised, then it may expose the plaintext sensitive data (in memory).
  • In addition, the data owner may need to outsource the data-leak detection to providers, but may be unwilling to reveal the plaintext sensitive data to them. Therefore, one needs new data-leak detection solutions that allow the providers to scan content for leaks without learning the sensitive information.

DISADVANTAGES:

  • As the Internet grows and network bandwidth continues to increase, administrators are faced with the task of keeping confidential information from leaving their networks. In response, researchers have created data loss prevention systems that check outgoing traffic for known confidential information.
  • These systems stop naive adversaries from leaking data, but are fundamentally unable to identify encrypted or obfuscated information leaks. What remains is a high-capacity pipe for tunneling data to the Internet.
  • Existing approach for quantifying information leak capacity in network traffic instead of trying to detect the presence of sensitive data-an impossible task in the general case–our goal is to measure and constrain its maximum volume.
  • We take disadvantage of the insight that most network traffic is repeated or determined by external information, such as protocol specifications or messages sent by a server. By filtering this data, we can isolate and quantify true information flowing from a computer.

PROPOSED SYSTEM:

  • We propose a data-leak detection solution which can be outsourced and be deployed in a semihonest detection environment. We design, implement, and evaluate our fuzzy fingerprint technique that enhances data privacy during data-leak detection operations.
  • Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data.
  • Our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected. Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. In another scenario, individuals can mark their own sensitive data and ask the administrator of their local network to detect data leaks for them.
  • Our detection procedure, the data owner computes a special set of digests or fingerprints from the sensitive data and then discloses only a small amount of them to the DLD provider. The DLD provider computes fingerprints from network traffic and identifies potential leaks in them.
  • To prevent the DLD provider from gathering exact knowledge about the sensitive data, the collection of potential leaks is composed of real leaks and noises. It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.

ADVANTAGES:

  • We describe privacy-preserving data-leak detection model for preventing inadvertent data leak in network traffic. Our model supports detection operation delegation and ISPs can provide data-leak detection as an add-on service to their customers using our model.
  • We design, implement, and evaluate an efficient technique, fuzzy fingerprint, for privacy-preserving data-leak detection. Fuzzy fingerprints are special sensitive data digests prepared by the data owner for release to the DLD provider.
  • We implement our detection system and perform extensive experimental evaluation on internet surfing traffic of 20 users, and also 5 simulated real-worlds data-leak scenarios to measure its privacy guarantee, detection rate and efficiency.
  • Our results indicate high accuracy achieved by our underlying scheme with very low false positive rate. Our results also show that the detection accuracy does not degrade much when only partial (sampled) sensitive-data digests are used an empirical analysis of our fuzzification as well as of the fairness of fingerprint partial disclosure.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

   Processor                                 –    Pentium –IV

  • Speed       –    1 GHz
  • RAM       –    256 MB (min)
  • Hard Disk       –   20 GB
  • Floppy Drive        –    44 MB
  • Key Board       –    Standard Windows Keyboard
  • Mouse        –    Two or Three Button Mouse
  • Monitor       –    SVGA

SOFTWARE REQUIREMENTS:

  • Operating System          :           Windows XP or Win7
  • Front End        :           Microsoft Visual Studio .NET  
  • Back End :           MS-SQL Server
  • Server :           ASP .NET Web Server
  • Script :           C# Script
  • Document :           MS-Office 2007

PASSIVE IP TRACEBACK: DISCLOSING THE LOCATIONS OF IP SPOOFERS FROM PATH BACKSCATTER

 ABSTRACT:

It is long known attackers may use forged source IP address to conceal their real locations. To capture the spoofers, a number of IP traceback mechanisms have been proposed. However, due to the challenges of deployment, there has been not a widely adopted IP traceback solution, at least at the Internet level. As a result, the mist on the locations of spoofers has never been dissipated till now.

This paper proposes passive IP traceback (PIT) that bypasses the deployment difficulties of IP traceback techniques. PIT investigates Internet Control Message Protocol error messages (named path backscatter) triggered by spoofing traffic, and tracks the spoofers based on public available information (e.g., topology). In this way, PIT can find the spoofers without any deployment requirement.

This paper illustrates the causes, collection, and the statistical results on path backscatter, demonstrates the processes and effectiveness of PIT, and shows the captured locations of spoofers through applying PIT on the path backscatter data set.

These results can help further reveal IP spoofing, which has been studied for long but never well understood. Though PIT cannot work in all the spoofing attacks, it may be the most useful mechanism to trace spoofers before an Internet-level traceback system has been deployed in real.

 INTRODUCTION

IP spoofing, which means attackers launching attacks with forged source IP addresses, has been recognized as a serious security problem on the Internet for long. By using addresses that are assigned to others or not assigned at all, attackers can avoid exposing their real locations, or enhance the effect of attacking, or launch reflection based attacks. A number of notorious attacks rely on IP spoofing, including SYN flooding, SMURF, DNS amplification etc. A DNS amplification attack which severely degraded the service of a Top Level Domain (TLD) name server is reported in though there has been a popular conventional wisdom that DoS attacks are launched from botnets and spoofing is no longer critical, the report of ARBOR on NANOG 50th meeting shows spoofing is still significant in observed DoS attacks. Indeed, based on the captured backscatter messages from UCSD Network Telescopes, spoofing activities are still frequently observed.

To capture the origins of IP spoofing traffic is of great importance. As long as the real locations of spoofers are not disclosed, they cannot be deterred from launching further attacks. Even just approaching the spoofers, for example, determining the ASes or networks they reside in, attackers can be located in a smaller area, and filters can be placed closer to the attacker before attacking traffic get aggregated. The last but not the least, identifying the origins of spoofing traffic can help build a reputation system for ASes, which would be helpful to push the corresponding ISPs to verify IP source address.

Instead of proposing another IP traceback mechanism with improved tracking capability, we propose a novel solution, named Passive IP Traceback (PIT), to bypass the challenges in deployment. Routers may fail to forward an IP spoofing packet due to various reasons, e.g., TTL exceeding. In such cases, the routers may generate an ICMP error message (named path backscatter) and send the message to the spoofed source address. Because the routers can be close to the spoofers, the path backscatter messages may potentially disclose the locations of the spoofers. PIT exploits these path backscatter messages to find the location of the spoofers. With the locations of the spoofers known, the victim can seek help from the corresponding ISP to filter out the attacking packets, or take other counterattacks. PIT is especially useful for the victims in reflection based spoofing attacks, e.g., DNS amplification attacks. The victims can find the locations of the spoofers directly from the attacking traffic.

In this article, at first we illustrate the generation, types, collection, and the security issues of path backscatter messages in section III. Then in section IV, we present PIT, which tracks the location of the spoofers based on path backscatter messages together with the topology and routing information. We discuss how to apply PIT when both topology and routing are known, or only topology is known, or neither are known respectively. We also present two effective algorithms to apply PIT in large scale networks. In the following section, at first we show the statistical results on path backscatter messages. Then we evaluate the two key mechanisms of PIT which work without routing information. At last, we give the tracking result when applying PIT on the path backscatter message dataset: a number of ASes in which spoofers are found.

Our work has the following contributions:

1) This is the first article known which deeply investigates path backscatter messages. These messages are valuable to help understand spoofing activities. Though Moore et al. [8] has exploited backscatter messages, which are generated by the targets of spoofing messages, to study Denial of Services (DoS), path backscatter messages, which are sent by intermediate devices rather than the targets, have not been used in traceback. 2) A practical and effective IP traceback solution based on path backscatter messages, i.e., PIT, is proposed. PIT bypasses the deployment difficulties of existing IP traceback mechanisms and actually is already in force. Though given the limitation that path backscatter messages are not generated with stable possibility, PIT cannot work in all the attacks, but it does work in a number of spoofing activities. At least it may be the most useful traceback mechanism before an AS-level traceback system has been deployed in real. 3) Through applying PIT on the path backscatter dataset, a number of locations of spoofers are captured and presented. Though this is not a complete list, it is the first known list disclosing the locations of spoofers.

PANDA: PUBLIC AUDITING FOR SHARED DATA WITH EFFICIENT USER REVOCATION IN THE CLOUD

ABSTRACT:

With data storage and sharing services in the cloud, users can easily modify and share data as a group. To ensure share data integrity can be verified publicly, users in the group need to compute signatures on all the blocks in shared data. Different blocks in shared data are generally signed by different users due to data modifications performed by different users. For security reasons, once a user is revoked from the group, the blocks which were previously signed by this revoked user must be re-signed by an existing user. The straight forward method, which allows an existing user to download the corresponding part of shared data and re-sign it during user revocation, is inefficient due to the large size of shared data in the cloud. In this paper, we propose a novel public auditing mechanism

For the integrity of shared data with efficient user revocation in mind. By utilizing the idea of proxy re-signatures, we allow the cloud tore-sign blocks on behalf of existing users during user revocation, so that existing users do not need to download and re-sign blocks by themselves. In addition, a public verifier is always able to audit the integrity of shared data without retrieving the entire data from the

Cloud, even if some part of shared data has been re-signed by the cloud. Moreover, our mechanism is able to support batch auditing by verifying multiple auditing tasks simultaneously. Experimental results show that our mechanism can significantly improve the efficiency of user revocation.

INTRODUCTION

With data storage and sharing services (such as Dropbox and Google Drive) provided by the cloud, people can easily work together as a group by sharing data with each other. More specifically, once a user creates shared data in the cloud, every user in the group is able to not only access and modify shared data, but also share the latest version of the shared data with the rest of the group. Although cloud providers promise a more secure and reliable environment to the users, the integrity of data in the cloud may still be compromised, due to the existence of hardware/software failures and human errors.

To protect the integrity of data in the cloud, a number of mechanisms have been proposed. In these mechanisms, a signature is attached to each block in data, and the integrity of data relies on the correctness of all the signatures. One of the most significant and common features of these mechanisms is to allow a public verifier to efficiently check data integrity in the cloud without downloading the entire data, referred to as public auditing (or denoted as Provable Data Possession). This public verifier could be a client who would like to utilize cloud data for particular purposes (e.g., search, computation, data mining, etc.) or a thirdparty auditor (TPA) who is able to provide verification services on data integrity to users. Most of the previous works focus on auditing the integrity of personal data. Different from these works, several recent works focus on how to preserve identity privacy from public verifiers when auditing the integrity of shared data. Unfortunately, none of the above mechanisms, considers the efficiency of user revocation when auditing the correctness of shared data in the cloud.

With shared data, once a user modifies a block, she also needs to compute a new signature for the modified block. Due to the modifications from different users, different blocks are signed by different users. For security reasons, when a user leaves the group or misbehaves, this user must be revoked from the group. As a result, this revoked user should no longer be able to access and modify shared data, and the signatures generated by this revoked user are no longer valid to the group. Therefore, although the content of shared data is not changed during user revocation, the blocks, which were previously signed by the revoked user, still need to be re-signed by an existing user in the group. As a result, the integrity of the entire data can still be verified with the public keys of existing users only.

Since shared data is outsourced to the cloud and users no longer store it on local devices, a straightforward method to re-compute these signatures during user revocation is to ask an existing user to first download the blocks previously signed by the revoked user verify the correctness of these blocks, then re-sign these blocks, and finally upload the new signatures to the cloud. However, this straightforward method may cost the existing user a huge amount of communication and computation resources by downloading and verifying blocks, and by re-computing and uploading signatures, especially when the number of re-signed blocks is quite large or the membership of the group is frequently changing. To make this matter even worse, existing users may access their data sharing services provided by the cloud with resource limited devices, such as mobile phones, which further prevents existing users from maintaining the correctness of shared data efficiently during user revocation.

Clearly, if the cloud could possess each user’s private key, it can easily finish the re-signing task for existing users without asking them to download and re-sign blocks. However, since the cloud is not in the same trusted domain with each user in the group, outsourcing every user’s private key to the cloud would introduce significant security issues. Another important problem we need to consider is that the re-computation of any signature during user revocation should not affect the most attractive property of public auditing — auditing data integrity publicly without retrieving the entire data. Therefore, how to efficiently reduce the significant burden to existing users introduced by user revocation, and still allow a public verifier to check the integrity of shared data without downloading the entire data from the cloud, is a challenging task.

In this paper, we propose Panda, a novel public auditing mechanism for the integrity of shared data with efficient user revocation in the cloud. In our mechanism, by utilizing the idea of proxy re-signatures, once a user in the group is revoked, the cloud is able to resign the blocks, which were signed by the revoked user, with a re-signing key. As a result, the efficiency of user revocation can be significantly improved, and computation and communication resources of existing users can be easily saved. Meanwhile, the cloud, who is not in the same trusted domain with each user, is only able to convert a signature of the revoked user into a signature of an existing user on the same block, but it cannot sign arbitrary blocks on behalf of either the revoked user or an existing user. By designing a new proxy re-signature scheme with nice properties, which traditional proxy resignatures do no have, our mechanism is always able to check the integrity of shared data without retrieving the entire data from the cloud.