Congestion Aware Routing in Nonlinear Elastic Optical Networks

22/10/201902/07/2019 by admin

CLOUDQUAL A Quality Model for Cloud Services

22/10/201902/07/2019 by admin

Cloud Service Negotiation in Internet of Things Environment A Mixed Approach

22/10/201902/07/2019 by admin

Behavioral Malware Detection in Delay Tolerant Networks

22/10/201902/07/2019 by admin

BEHAVIORAL MALWARE DETECTION IN DELAY TOLERANT NETWORKS

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

MASTER OF TECHNOLOGY

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

BONAFIDE CERTIFICATE

Certified that this project report titled “BEHAVIORAL MALWARE DETECTION IN DELAY TOLERANT NETWORKS” is the bonafide work of Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide Signature of the H.O.D

Name Name

CHAPTER 1

1.0 ABSTRACT:

The delay-tolerant-network (DTN) model is becoming a viable communication alternative to the traditional infrastructural model for modern mobile consumer electronics equipped with short-range communication technologies such as Bluetooth, NFC, and Wi-Fi Direct. Proximity malware is a class of malware that exploits the opportunistic contacts and distributed nature of DTNs for propagation. Behavioral characterization of malware is an effective alternative to pattern matching in detecting malware, especially when dealing with polymorphic or obfuscated malware.

In this paper, we first propose a general behavioral characterization of proximity malware which based on naive Bayesian model, which has been successfully applied in non-DTN settings such as filtering email spams and detecting botnets. We identify two unique challenges for extending Bayesian malware detection to DTNs (“insufficient evidence versus evidence collection risk” and “filtering false evidence sequentially and distributedly”), and propose a simple yet effective method, look ahead, to address the challenges. Furthermore, we propose two extensions to look ahead, dogmatic filtering, and adaptive look ahead, to address the challenge of “malicious nodes sharing false evidence.” Real mobile network traces are used to verify the effectiveness of the proposed methods.

1.1 INTRODUCTION

MALWARE:

Malware, short for malicious software, is any software used to disrupt computer operation, gather sensitive information, or gain access to private computer systems. It can appear in the form of executable code, scripts, active content, and other software. ‘Malware’ is a general term used to refer to a variety of forms of hostile or intrusive software. The term badware is sometimes used, and applied to both true (malicious) malware and unintentionally harmful software, viruses, worms, trojan horses, ransomware, spyware, adware, scareware and other malicious programs of active malware threats were worms or trojans rather than viruses. In law, malware is sometimes known as a computer contaminant, as in the legal codes of several malware is often disguised as, or embedded in, non-malicious files.

Spyware or other malware is sometimes found embedded in programs supplied officially by companies, e.g., downloadable from websites, that appear useful or attractive, but may have, for example, additional hidden tracking functionality that gathers marketing statistics. An example of such software, which was described as illegitimate, is the Sony rootkit, a Trojan embedded into CDs sold by Sony, which silently installed and concealed itself on purchasers’ computers with the intention of preventing illicit copying; it also reported on users’ listening habits, and created vulnerabilities that were exploited by unrelated malware.

PURPOSES:

Many early infectious programs, including the first Internet Worm, were written as experiments or pranks. Today, malware is used by both black hat hackers and governments, to steal personal, financial, or business information and sometimes for sabotage (e.g., Stuxnet). Malware is sometimes used broadly against government or corporate websites to gather guarded information, or to disrupt their operation in general. However, malware is often used against individuals to gain information such as personal identification numbers or details, bank or credit card numbers, and passwords. Left unguarded, personal and networked computers can be at considerable risk against these threats. (These are most frequently defended against by various types of firewall, anti-virussoftware, and network hardware).

Since the rise of widespread broadband Internet access, malicious software has more frequently been designed for profit. Since 2003, the majority of widespread viruses and worms have been designed to take control of users’ computers for illicit purposes. Infected “zombie computers” are used to send email spam, to host contraband data such as child pornography, or to engage in distributed denial-of-service attacks as a form of extortion.

Programs designed to monitor users’ web browsing, display unsolicited advertisements, or redirect affiliate marketing revenues are called spyware. Spyware programs do not spread like viruses; instead they are generally installed by exploiting security holes. They can also be packaged together with user-installed software, such as peer-to-peer applications.

Ransomware affects an infected computer in some way, and demands payment to reverse the damage. For example, programs such as CryptoLocker encrypt files securely, and only decrypt them on payment of a substantial sum of money.

INFECTIOUS MALWARE: VIRUSES AND WORMS:

The best-known types of malware, viruses and worms, are known for the manner in which they spread, rather than any specific types of behavior. The term computer virus is used for a program that embeds itself in some other executable software (including the operating system itself) on the target system without the users consent and when that is run causes the virus to spread to other executables. On the other hand, a worm is a stand-alone malware program that actively transmits itself over a network to infect other computers. These definitions lead to the observation that a virus requires the user to run an infected program or operating system for the virus to spread, whereas a worm spreads itself.

CONCEALMENT: Viruses, trojan horses, rootkits, and backdoors

TROJAN HORSES

For a malicious program to accomplish its goals, it must be able to run without being detected, shut down, or deleted. When a malicious program is disguised as something normal or desirable, users may unwittingly install it. This is the technique of the Trojan horse or trojan. In broad terms, a Trojan horse is any program that invites the user to run it, concealing harmful or malicious executable code of any description. The code may take effect immediately and can lead to many undesirable effects, such as encrypting the user’s files or downloading and implementing further malicious functionality.[citation needed]

In the case of some spyware, adware, etc. the supplier may require the user to acknowledge or accept its installation, describing its behavior in loose terms that may easily be misunderstood or ignored, with the intention of deceiving the user into installing it without the supplier technically in breach of the law.[citation needed]

ROOTKITS

Once a malicious program is installed on a system, it is essential that it stays concealed, to avoid detection. Software packages known as rootkits allow this concealment, by modifying the host’s operating system so that the malware is hidden from the user. Rootkits can prevent a malicious process from being visible in the system’s list of processes, or keep its files from being read.

Some malicious programs contain routines to defend against removal, not merely to hide them. An early example of this behavior is recorded in the Jargon File tale of a pair of programs infesting a Xerox CP-V time sharing system:

Each ghost-job would detect the fact that the other had been killed, and would start a new copy of the recently-stopped program within a few milliseconds. The only way to kill both ghosts was to kill them simultaneously (very difficult) or to deliberately crash the system.

BACKDOORS

A backdoor is a method of bypassing normal authentication procedures, usually over a connection to a network such as the Internet. Once a system has been compromised, one or more backdoors may be installed in order to allow access in the future,[30] invisibly to the user.

The idea has often been suggested that computer manufacturers preinstall backdoors on their systems to provide technical support for customers, but this has never been reliably verified. It was reported in 2014 that US government agencies had been diverting computers purchased by those considered “targets” to secret workshops where software or hardware permitting remote access by the agency was installed, considered to be among the most productive operations to obtain access to networks around the world. Backdoors may be installed by Trojan horses, worms, implants, or other methods.

Malware authors target bugs, or loopholes, to exploit. A common method is exploitation of vulnerability, where software designed to store data in a specified region of memory does not prevent more data than the buffer can accommodate being supplied. Malware may provide data that overflows the buffer, with malicious executable code or data after the end; when this payload is accessed it does what the attacker, not the legitimate software, determines.

INSECURE DESIGN OR USER ERROR

Early PCs had to be booted from floppy disks; when built-in hard drives became common the operating system was normally started from them, but it was possible to boot from another boot device if available, such as a floppy disk, CD-ROM, DVD-ROM, or USB flash drive. It was common to configure the computer to boot from one of these devices when available. Normally none would be available; the user would intentionally insert, say, a CD into the optical drive to boot the computer in some special way, for example to install an operating system. Even without booting, computers can be configured to execute software on some media as soon as they become available, e.g. to autorun a CD or USB device when inserted.

Malicious software distributors would trick the user into booting or running from an infected device or medium; for example, a virus could make an infected computer add autorunnable code to any USB stick plugged into it; anyone who then attached the stick to another computer set to autorun from USB would in turn become infected, and also pass on the infection in the same way. More generally, any device that plugs into a USB port-—”including gadgets like lights, fans, speakers, toys, even a digital microscope”—can be used to spread malware. Devices can be infected during manufacturing or supply if quality control is inadequate.

This form of infection can largely be avoided by setting up computers by default to boot from the internal hard drive, if available, and not to autorun from devices. Intentional booting from another device is always possible by pressing certain keys during boot.

Older email software would automatically open HTML email containing potentially malicious JavaScript code; users may also execute disguised malicious email attachments and infected executable files supplied in other ways.

1.2 SCOPE OF THE PROJECT:

We present a general behavioral characterization of proximity malware, which captures the functional but imperfect nature in detecting proximity malware characterization, and with a simple cut-off malware containment strategy, we formulate the malware detection process as a distributed decision problem. We analyze the risk associated with the decision, and design a simple, yet effective, strategy, look ahead, which naturally reflects individual nodes’ intrinsic risk inclinations against malware infection. Look ahead extends the naive Bayesian model, and addresses the DTN specific, malware-related, “insufficient evidence versus evidence collection risk” problem.

We consider the benefits of sharing assessments among nodes, and address challenges derived from the DTN model: liars (i.e., bad-mouthing and false praising malicious nodes) and defectors (i.e., good nodes that have turned rogue due to malware infections). We present two alternative techniques, dogmatic filtering and adaptive look ahead, that naturally extend look ahead to consolidate evidence provided by others, while containing the negative effect of false evidence. A nice property of the proposed evidence consolidation methods is that the results will not worsen even if liars are the majority in the neighborhood traces are used to verify the effectiveness of the methods.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Existing worms, spam, and phishing exploit gaps in traditional threat models that usually revolve around preventing unauthorized access and information disclosure. The new threat landscape requires security researchers to consider a wider range of attacks: opportunistic attacks in addition to targeted ones; attacks coming not just from malicious users, but also from subverted (yet otherwise benign) hosts; coordinated/distributed attacks in addition to isolated, single-source methods; and attacks blending flaws across layers, rather than exploiting a single vulnerability. Some of the largest security lapses in the last decade are due to designers ignoring the complexity of the threat landscape.

The increasing penetration of wireless networking, and more specifically wifi, may soon reach critical mass, making it necessary to examine whether the current state of wireless security is adequate for fending off likely attacks.

Three types of threats that seem insufficiently addressed by existing technology and deployment techniques. The first threat is wildfire worms, a class of worms that spreads contagiously between hosts on neighboring APs. We show that such worms can spread to a large fraction of hosts in a dense urban setting, and that the propagation speed can be such that most existing defenses cannot react in a timely fashion. Worse, such worms can penetrate through networks protected by WEP and other security mechanisms. The second threat we discuss is large-scale spoofing attacks that can be used for massive phishing and spam campaigns. We show how an attacker can easily use a botnet by acquiring access to wifi-capable zombie hosts, and can use these zombies to target not just the local wireless LAN, but any LAN within range, greatly increasing his reach across heterogeneous networks.

2.2 DISADVANTAGES:

Viruses can cause many problems on your computer. Usually, they display pop-up ads on your desktop or steal your information. Some of the more nasty ones can even crash your computer or delete your files.

Your computer gets slowed down. Many “hackers” get jobs with software firms by finding and exploiting problems with software.

Some the applications won’t start (ex: I hate mozilla virus won’t let you start the mozilla) you cannot see some of the settings in your OS. (Ex one kind of virus disables hide folder options and you will never be able to set it).

To quantify these threats, we rely on real-world data extracted from wifi maps of large metropolitan areas in the country. Existing results suggest that a carefully crafted wireless worm can infect up to 80% of all wifi connected hosts in some metropolitan areas within 20 minutes, and that an attacker can launch phishing attacks or build a tracking system to monitor the location of 10-50% of wireless users in these metropolitan areas with just 1,000 zombies under his control.

2.3 PROPOSED SYSTEM:

In this paper, we present a simple, yet effective solution, look ahead, which naturally reflects individual nodes’ intrinsic risk inclinations against malware infection, to balance between these two extremes. Essentially, we extend the naive Bayesian model, which has been applied in filtering email, spams detecting botnets, and designing IDSs.

We analyze the risk associated with the decision, and design a simple, yet effective, strategy, look ahead, which naturally reflects individual nodes’ intrinsic risk inclinations against malware infection. Look ahead extends the naive Bayesian model, and addresses the DTN specific, malware-related, “insufficient evidence versus evidence collection risk” Proximity malware is a malicious program that disrupts the host node’s normal function and has a chance of duplicating itself to other nodes during (opportunistic) contact opportunities between nodes in the DTN.

2.4 ADVANTAGES:

Two DTN specific, malware-related:

1. Insufficient evidence versus evidence collection risk. In DTNs, evidence (such as Bluetooth connection or SSH session requests) is collected only when nodes come into contact. But contacting malware-infected nodes carries the risk of being infected. Thus, nodes must make decisions (such as whether to cut off other nodes and, if yes, when) online based on potentially insufficient evidence.

2. Filtering false evidence sequentially and distributedly. Sharing evidence among opportunistic acquaintances helps alleviating the aforementioned insufficient evidence problem; however, false evidence shared by malicious nodes (the liars) may negate the benefits of sharing. In DTNs, nodes must decide whether to accept received evidence sequentially and distributedly.

2.5 HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1.1 GHz
- RAM – 256 MB (min)
- Hard Disk – 20 GB
- Floppy Drive – 1.44 MB
- Key Board – Standard Windows Keyboard
- Mouse – Two or Three Button Mouse
- Monitor – SVGA

SOFTWARE REQUIREMENTS:

Operating System : Windows XP
Front End : JAVA JDK 1.7
Document : MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN

ARCHITECTURE DIAGRAM / DATA FLOW DIAGRAM / UML DIAGRAM:

The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system

The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.

DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.

DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data. The physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

All processes must have at least one data flow in and one data flow out.
All processes should modify the incoming data, producing new forms of outgoing data.
Each data store must be involved with at least one data flow.
Each external entity must be involved with at least one data flow.
A data flow must be attached to at least one process.

3.0 ARCHITECTURE DIAGRAM

DATAFLOW DIAGRAM:

LEVEL 1

Server Client

LEVEL 2

UML DIAGRAMS:

USE CASE DIAGRAM:

CLASS DIAGRAM:

SEQUENCE DIAGRAM:

ACTIVITY DIAGRAMS:

CHAPTER 4

4.0 IMPLEMENTATION

4.1 MODULES:

DELAY TOLERANT NETWORKS:

PROXIMITY MALWARE:

MALWARE PROPAGATION:

BAYESIAN FILTERING:

4.1 MODULE DESCRIPTION:

DELAY TOLERANT NETWORKS:

Delay-tolerant networking (DTN) is an approach to computer network architecture that seeks to address the technical issues in heterogeneous networks that may lack continuous network connectivity. Examples of such networks are those operating in mobile or extreme terrestrial environments, or planned networks in space.

Recently, the term disruption-tolerant networking has gained currency in the United States due to support from DARPA, which has funded many DTN projects. Disruption may occur because of the limits of wireless radio range, sparsity of mobile nodes, energy resources, attack, and noise.

DTNs, including a growing number of academic conferences on delay and disruption-tolerant networking, and growing interest in combining work from sensor networks and MANETs with the work on DTN. This field saw many optimizations on classic ad hoc and delay-tolerant networking algorithms and began to examine factors such as security, reliability, verifiability, and other areas of research that are well understood in traditional computer networking.

PROXIMITY MALWARE:

Proximity malware is a malicious program that disrupts the host node’s normal function and has a chance of duplicating itself to other nodes during (opportunistic) contact opportunities between nodes in the DTN. When duplication occurs, the other node is infected with the malware. In our model, we assume that each node is capable of assessing the other party for suspicious actions after each encounter, resulting in a binary assessment. For example, a node can assess a Bluetooth connection or an SSH session for potential Cabir or Ikee infection.

The watchdog components in previous works on malicious behavior detection in MANETs and distributed reputation systems are other examples. A node is either evil or good, based on if it is or is not infected by the malware. The suspiciousaction assessment is assumed to be an imperfect but functional indicator of malware infections: It may occasionally assess an evil node’s actions as “nonsuspicious” or a good node’s actions as “suspicious,” but most suspicious actions are correctly attributed to evil nodes. A previous work on distributed IDS presents an example for such imperfect but functional binary classifier on nodes’ behaviors.

MALWARE PROPAGATION:

We analyzed malware propagation through proximity channels in social networks. Akritidis et al. quantified the threat of proximity malware in wide-area wireless networks in optimal malware signature distribution in heterogeneous, resource-constrained mobile networks in traditional, non-DTN, networks and Bayer et al.

We proposed to detect malware with learned behavioral model, in terms of system call and program flow. We extend the Naive Bayesian model, which has been applied in filtering email spams detecting botnets and designing IDSs and address DTN-specific, malware-related, problems. In the context of detecting slowly propagating Internet worm, Dash et al. presented a distributed IDS architecture of local/global detector that resembles the neighborhood-watch model, with the assumption of attested/honest evidence, i.e., without liars.

BAYESIAN FILTERING:

Naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without accepting Bayesian probability or using any Bayesian methods.

The implications are as follows:

. Given enough assessments, honest nodes are likely to obtain a close estimation of a node’s suspiciousness (suppose they have not cut the node off yet), even if they only use their own assessments.

. The liars have to share a significant amount of false evidence to sway the public’s opinion on a node’s suspiciousness.

. The most susceptible victims of liars are the nodes that have little evidence Dogmatic filtering. Dogmatic filtering is based on the observation that one’s own assessments are truthful and, therefore, can be used to bootstrap the evidence consolidation process. A node shall only accept evidence that will not sway its current opinion too much. We call this observation the dogmatic principle.

We extend the Naive Bayesian model, which has been applied in filtering email, spams detecting botnets and designing IDSs and address DTN-specific, malware-related, problems. In the context of detecting slowly propagating Internet worm, Dash et al. presented a distributed IDS architecture of local/global detector that resembles the neighborhood-watch model,

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

5.1.2 TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.

5.1.3 SOCIAL FEASIBILITY:

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months. This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program testing checks for two types of errors: syntax and logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

UNIT TESTING:

Description	Expected result
Test for application window properties.	All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations.	All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem. The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

FUNCTIONAL TESTING:

Description	Expected result
Test for all modules.	All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group.	The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

Load testing
Performance testing
Usability testing
Reliability testing
Security testing

5.1.5 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Load Testing

Description	Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received.	Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

PERFORMANCE TESTING:

Description	Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.	Should handle large input values, and produce accurate result in a expected time.

5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

RELIABILITY TESTING:

Description	Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application.	In case of failure of the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

SECURITY TESTING:

Description	Expected result
Checking that the user identification is authenticated.	In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers.	The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

White box testing, sometimes called glass-box testing is a test case design method that uses the control structure of the procedural design to derive test cases. Using white box testing method, the software engineer can derive test cases. The White box testing focuses on the inner structure of the software structure to be tested.

5.1.8 WHITE BOX TESTING:

Description	Expected result
Exercise all logical decisions on their true and false sides.	All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds.	All the loops must be finite.
Exercise internal data structures to ensure their validity.	All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software. That is, black testing enables the software engineer to derive sets of input conditions that will fully exercise all functional requirements for a program. Black box testing is not alternative to white box techniques. Rather it is a complementary approach that is likely to uncover a different class of errors than white box methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

5.1.10 BLACK BOX TESTING:

Description	Expected result
To check for incorrect or missing functions.	All the functions must be valid.
To check for interface errors.	The entire interface must function normally.
To check for errors in a data structures or external data base access.	The database updation and retrieval must be done.
To check for initialization and termination errors.	All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

7.0 SOFTWARE DESCRIPTION:

7.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

The Java Programming Language

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

Simple
- Architecture neutral
- Object oriented
- Portable
- Distributed
- High performance
- Interpreted
- Multithreaded
- Robust
- Dynamic
- Secure

With most programming languages, you either compile or interpret a program so that you can run it on your computer. The Java programming language is unusual in that a program is both compiled and interpreted. With the compiler, first you translate a program into an intermediate language called Java byte codes —the platform-independent codes interpreted by the interpreter on the Java platform. The interpreter parses and runs each Java byte code instruction on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web browser that can run applets, is an implementation of the Java VM. Java byte codes help make “write once, run anywhere” possible. You can compile your program into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the same program written in the Java programming language can run on Windows 2000, a Solaris workstation, or on an iMac.

7.2 THE JAVA PLATFORM:

A platform is the hardware or software environment in which a program runs. We’ve already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that it’s a software-only platform that runs on top of other hardware-based platforms.

The Java platform has two components:

The Java Virtual Machine (Java VM)
The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The next section, What Can Java Technology Do? Highlights what functionality some of the packages in the Java API provide.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware platform. As a platform-independent environment, the Java platform can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can bring performance close to that of native code without threatening portability.

7.3 WHAT CAN JAVA TECHNOLOGY DO?

The most common types of programs written in the Java programming language are applets and applications. If you’ve surfed the Web, you’re probably already familiar with applets. An applet is a program that adheres to certain conventions that allow it to run within a Java-enabled browser.

However, the Java programming language is not just for writing cute, entertaining applets for the Web. The general-purpose, high-level Java programming language is also a powerful software platform. Using the generous API, you can write many types of programs.

An application is a standalone program that runs directly on the Java platform. A special kind of application known as a server serves and supports clients on a network. Examples of servers are Web servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet.

A servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of applications. Instead of working in browsers, though, servlets run within Java Web servers, configuring or tailoring the server.

How does the API support all these kinds of programs? It does so with packages of software components that provides a wide range of functionality. Every full implementation of the Java platform gives you the following features:

The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
Applets: The set of conventions used by applets.
Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
Software components: Known as JavaBeans^TM, can plug into existing component architectures.
Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
Java Database Connectivity (JDBC^TM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

7.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

We can’t promise you fame, fortune, or even a job if you learn the Java programming language. Still, it is likely to make your programs better and requires less effort than other languages. We believe that Java technology will help you do the following:

Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure Java^TMProduct Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

7.5 ODBC:

Microsoft Open Database Connectivity (ODBC) is a standard programming interface for application developers and database systems providers. Before ODBC became a de facto standard for Windows programs to interface with database systems, programmers had to use proprietary languages for each database they wanted to connect to. Now, ODBC has made the choice of the database system almost irrelevant from a coding perspective, which is as it should be. Application developers have much more important things to worry about than the syntax that is needed to port their program from one database to another when business needs suddenly change.

Through the ODBC Administrator in Control Panel, you can specify the particular database that is associated with a data source that an ODBC application program is written to use. Think of an ODBC data source as a door with a name on it. Each door will lead you to a particular database. For example, the data source named Sales Figures might be a SQL Server database, whereas the Accounts Payable data source could refer to an Access database. The physical database referred to by a data source can reside anywhere on the LAN.

The ODBC system files are not installed on your system by Windows 95. Rather, they are installed when you setup a separate database application, such as SQL Server Client or Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this program and each maintains a separate list of ODBC data sources.

From a programming perspective, the beauty of ODBC is that the application can be written to use the same set of function calls to interface with any data source, regardless of the database vendor. The source code of the application doesn’t change whether it talks to Oracle or SQL Server. We only mention these two as an example. There are ODBC drivers available for several dozen popular database systems. Even Excel spreadsheets and plain text files can be turned into data sources. The operating system uses the Registry information written by ODBC Administrator to determine which low-level ODBC drivers are needed to talk to the data source (such as the interface to Oracle or SQL Server). The loading of the ODBC drivers is transparent to the ODBC application program. In a client/server environment, the ODBC API even handles many of the network issues for the application programmer.

The advantages of this scheme are so numerous that you are probably thinking there must be some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to the native database interface. ODBC has had many detractors make the charge that it is too slow. Microsoft has always claimed that the critical factor in performance is the quality of the driver software that is used. In our humble opinion, this is true. The availability of good ODBC drivers has improved a great deal recently. And anyway, the criticism about performance is somewhat analogous to those who said that compilers would never match the speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the opportunity to write cleaner programs, which means you finish sooner. Meanwhile, computers get faster every year.

7.6 JDBC:

In an effort to set an independent database standard API for Java; Sun Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access mechanism that provides a consistent interface to a variety of RDBMSs. This consistent interface is achieved through the use of “plug-in” database connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he or she must provide the driver for each platform that the database and Java run on.

To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you discovered earlier in this chapter, ODBC has widespread support on a variety of platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster than developing a completely new connectivity solution.

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

The remainder of this section will cover enough information about JDBC for you to know what it is about and how to use it effectively. This is by no means a complete overview of JDBC. That would fill an entire book.

7.7 JDBC Goals:

Few software packages are designed without goals in mind. JDBC is one that, because of its many goals, drove the development of the API. These goals, in conjunction with early reviewer feedback, have finalized the JDBC class library into a solid framework for building database applications in Java.

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although not the lowest database interface level possible, it is at a low enough level for higher-level tools and APIs to be created. Conversely, it is at a high enough level for application programmers to use it confidently. Attaining this goal allows for future tool vendors to “generate” JDBC code and to hide many of JDBC’s complexities from the end user.

SQL Conformance

SQL syntax varies as you move from database vendor to database vendor. In an effort to support a wide variety of vendors, JDBC will allow any query statement to be passed through it to the underlying database driver. This allows the connectivity module to handle non-standard functionality in a manner that is suitable for its users.

JDBC must be implemental on top of common database interfaces

The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal allows JDBC to use existing ODBC level drivers by the use of a software interface. This interface would translate JDBC calls to ODBC and vice versa.

Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

Keep it simple

This goal probably appears in all software design goal listings. JDBC is no exception. Sun felt that the design of JDBC should be very simple, allowing for only one method of completing a task per mechanism. Allowing duplicate functionality only serves to confuse the users of the API.

Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

Keep the common cases simple

Because more often than not, the usual SQL calls used by the programmer are simple SELECT’s, INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to perform with JDBC. However, more complex SQL statements should also be possible.

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple Architecture-neutral

Object-oriented Portable

Distributed High-performance

Interpreted Multithreaded

Robust Dynamic Secure

Java is also unusual in that each Java program is both compiled and interpreted. With a compile you translate a Java program into an intermediate language called Java byte codes the platform-independent code instruction is passed and run on the computer.

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

7.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

The IP layer provides a connectionless and unreliable delivery system. It considers each datagram independently of the others. Any association between datagram must be supplied by the higher layers. The IP layer supplies a checksum that includes its own header. The header includes the source and destination addresses. The IP layer handles routing through an Internet. It is also responsible for breaking up large datagram into smaller ones for transmission and reassembling them at the other end.

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

A service exists on a host, and is identified by its port. This is a 16 bit number. To send a message to a server, you send it to the port for that service of the host that it is running on. This is not location transparency! Certain of these ports are “well known”.

Sockets:

A socket is a data structure maintained by the system to handle network connections. A socket is created using the call socket. It returns an integer that is like a file descriptor. In fact, under Windows, this handle can be used with Read File and Write File functions.

#include <sys/types.h>

#include <sys/socket.h>

int socket(int family, int type, int protocol);

Here “family” will be AF_INET for IP communications, protocol will be zero, and type will depend on whether TCP or UDP is used. Two processes wishing to communicate over a network create a socket each. These are similar to two ends of a pipe – but the actual pipe does not yet exist.

7.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

7.8.1. Map Visualizations:

Charts showing values that relate to geographical areas. Some examples include: (a) population density in each state of the United States, (b) income per capita for each country in Europe, (c) life expectancy in each country of the world. The tasks in this project include: Sourcing freely redistributable vector outlines for the countries of the world, states/provinces in particular countries (USA in particular, but also other areas);

Creating an appropriate dataset interface (plus default implementation), a rendered, and integrating this with the existing XYPlot class in JFreeChart; Testing, documenting, testing some more, documenting some more.

7.8.2. Time Series Chart Interactivity

Implement a new (to JFreeChart) feature for interactive time series charts — to display a separate control that shows a small version of ALL the time series data, with a sliding “view” rectangle that allows you to select the subset of the time series data to display in the main chart.

7.8.3. Dashboards

There is currently a lot of interest in dashboard displays. Create a flexible dashboard mechanism that supports a subset of JFreeChart chart types (dials, pies, thermometers, bars, and lines/time series) that can be delivered easily via both Java Web Start and an applet.

7.8.4. Property Editors

The property editor mechanism in JFreeChart only handles a small subset of the properties that can be set for charts. Extend (or reimplement) this mechanism to provide greater end-user control over the appearance of the charts.

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION

Behavioral characterization of malware is an effective alternative to pattern matching in detecting malware, especially when dealing with polymorphic or obfuscated malware. Naive Bayesian model has been successfully applied in non-DTN settings, such as filtering email spams and detecting botnets.

We propose a general behavioral characterization of DTN-based proximity malware. We present look ahead, along with dogmatic filtering and adaptive look ahead, to address two unique challenging in extending Bayesian filtering to DTNs: “insufficient evidence versus evidence collection risk” and “filtering false evidence sequentially and distributedly.” In prospect, extension of the behavioral characterization of proximity malware to account for strategic malware detection evasion with game theory is a challenging yet interesting future work.

CHAPTER 9

9.0 REFERENCES:

[1] C. Kolbitsch, P. Comparetti, C. Kruegel, E. Kirda, X. Zhou, and X. Wang, “Effective and Efficient Malware Detection at the End Host,” Proc. 18th Conf. USENIX Security Symp., 2009.

[2] U. Bayer, P. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda, “Scalable, Behavior-Based Malware Clustering,” Proc. 16th Ann. Network and Distributed System Security Symp. (NDSS), 2009.

[3] G. Zyba, G. Voelker, M. Liljenstam, A. Me´hes, and P. Johansson, “Defending Mobile Phones from Proximity Malware,” Proc. IEEE INFOCOM, 2009.

[4] F. Li, Y. Yang, and J. Wu, “CPMC: An Efficient Proximity Malware Coping Scheme in Smartphone-Based Mobile Networks,” Proc. IEEE INFOCOM, 2010.

[6] J. Zdziarski, Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press, 2005.

[7] R. Villamarı´n-Salomo´n and J. Brustoloni, “Bayesian Bot Detection Based on DNS Traffic Similarity,” Proc. ACMymp. Applied Computing (SAC), 2013.

Automatic Scaling of Internet Applications for Cloud Computing Services

22/10/201902/07/2019 by admin

AUTOMATIC SCALING OF INTERNET APPLICATIONS FOR CLOUD

COMPUTING SERVICES

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

MASTER OF TECHNOLOGY

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

CERTIFICATE

Certified that this project report titled “Automatic Scaling of Internet Applications for Cloud Computing Services” is the bonafide work of Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide Signature of the H.O.D

Name Name

DECLARATION

I hereby declare that the project work entitled “Automatic Scaling of Internet Applications for Cloud Computing Services” Submitted to BHARATHIDASAN UNIVERSITY in partial fulfillment of the requirement for the award of the Degree of MASTER OF SCIENCE IN COMPUTER SCIENCE is a record of original work done by me the guidance of Prof.A.Vinayagam M.Sc., M.Phil., M.E., to the best of my knowledge, the work reported here is not a part of any other thesis or work on the basis of which a degree or award was conferred on an earlier occasion to me or any other candidate.

(Student Name)

(Reg.No)

Place:

Date:

ACKNOWLEDGEMENT

I am extremely glad to present my project “Automatic Scaling of Internet Applications for Cloud Computing Services” which is a part of my curriculum of third semester Master of Science in Computer science. I take this opportunity to express my sincere gratitude to those who helped me in bringing out this project work.

I would like to express my Director, Dr. K. ANANDAN, M.A.(Eco.), M.Ed., M.Phil.,(Edn.), PGDCA., CGT., M.A.(Psy.) of who had given me an opportunity to undertake this project.

I am highly indebted to Co-Ordinator Prof. Muniappan Department of Physics and thank from my deep heart for her valuable comments I received through my project.

I wish to express my deep sense of gratitude to my guide
Prof. A.Vinayagam M.Sc., M.Phil., M.E., for her immense help and encouragement for successful completion of this project.

I also express my sincere thanks to the all the staff members of Computer science for their kind advice.

And last, but not the least, I express my deep gratitude to my parents and friends for their encouragement and support throughout the project.

CHAPTER 1

1.1 ABSTRACT:

Many Internet applications can benefit from an automatic scaling property where their resource usage can be scaled up and down automatically by the cloud service provider. We present a system that provides automatic scaling for Internet applications in the cloud environment. We encapsulate each application instance inside a virtual machine (VM) and use virtualization technology to provide fault isolation. We model it as the Class Constrained Bin Packing (CCBP) problem where each server is a bin and each class represents an application. The class constraint reflects the practical limit on the number of applications a server can run simultaneously.

We develop an efficient semi-online color set algorithm that achieves good demand satisfaction ratio and saves energy by reducing the number of servers used when the load is low. Experiment results demonstrate that our system can improve the throughput an open source implementation of restore the normal QoS five times as fast during flash crowds. Large scale simulations demonstrate that our algorithm is extremely scalable applications. This is an order of magnitude improvement over traditional application placement algorithms in enterprise environments.

1.2 INTRODUCTION

One of the often cited benefits of cloud computing service is the resource elasticity: a business customer can scale up and down its resource usage as needed without upfront capital investment or long term commitment. The Amazon EC2 service, for example, allows users to buy as many virtual machine (VM) instances as they want and operate them much like physical hardware. However, the users still need to decide how much resources are necessary and forhow long. We believe many Internet applications can benefit from an auto scaling property where their resource usage can be scaled up and down automatically by the cloud service provider. A user only needs to upload the application onto a single server in the cloud, and the cloud service will replicate the application onto more or fewer servers as its demand comes and goes. The users are charged only for what they actually use—the so-called “pay as you go” model.

The typical architecture of data center servers for Internet applications it consists of a load balancing switch, a set of application servers, and a set of backend storage servers. The front end switch is typically a Layer 7 switch which parses application level information in Web requests and forwards them to the servers with the corresponding applications running. The switch sometimes runs in a redundant pair for fault tolerance. Each application can run on multiple server machines and the set of their running instances are often managed by some clustering software such as Web Logic.

Each server machine can host multiple applications. The applications store their state information in the backend storage servers. It is important that the applications themselves are stateless so that they can be replicated safely. The storage servers may also become overloaded, but the focus of this work is on the application tier. The Google AppEngine service, for example, requires that the applications be structured in such a two tier architecture and uses the BigTable as its scalable storage solution. A detailed comparison with AppEngine is deferred to Section 7 so that sufficient background can be established. Some distributed data processing applications cannot be mapped into such a tiered architecture easily and thus are not the target of this work. We believe our architecture is representative of a large set of Internet services hosted in the cloud computing environment.

Even though the cloud computing model is sometimes advocated as providing infinite capacity on demand, the capacity of data centers in the real world is finite. The illusion of infinite capacity in the cloud is provided through statistical multiplexing. When a large number of applications experience their peak demand around the same time, the available resources in the cloud can become constrained and some of the demand may not be satisfied. We define the demand satisfaction ratio as the percentage of application demand that is satisfied successfully. The amount of computing capacity available to an application is limited by the placement of its running instances on the servers.

The more instances an application has and the more powerful the underlying servers are, the higher the potential capacity for satisfying the application demand. On the other hand, when the demand of the applications is low, it is important to conserve energy by reducing the number of servers used. Various studies have found that the cost of electricity is a major portion of the operation cost of large data centers. At the same time, the average server utilization in many Internet data centers is very low: real world estimates range from 5% to 20%. Moreover, work has found that the most effective way to conserve energy is to turn the whole server off. The application placement problem is essential to achieving a high demand satisfaction ratio without wasting energy.

In this paper, we present a system that provides automatic scaling for Internet applications in the cloud environment. Our contributions include the following.

We summarize the automatic scaling problem in the cloud environment, and model it as a modified Class Constrained Bin Packing (CCBP) problem where each server is a bin and each class represents an application. We develop an innovative auto scaling algorithm to solve the problem and present a rigorous analysis on the quality of it with provable bounds. Compared to the existing Bin Packing solutions, we creatively support item departure which can effectively avoid the frequent placement changes1 caused by repacking.

We support green computing by adjusting the placement of application instances adaptively and putting idle machines into the standby mode. Experiments and simulations show that our algorithm is highly efficient and scalable which can achieve high demand satisfaction ratio, low placement change frequency, short request response time, and good energy saving.

We build a real cloud computing system which supports our auto scaling algorithm. We compare the performance of our system with an open source implementation of the Amazon EC2 auto scaling system in a testbed of 30 Dell Power Edge blade servers. Experiments show that our system can restore the normal QoS five times as fast when a flash crowd happens.

We use a fast restart technique based on virtual machine (VM) suspend and resume that reduces the application start up time dramatically for Internet services.

1.3 LITRATURE SURVEY

AUTHOR AND PUBLICATION: C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici, “A SCALABLE APPLICATION PLACEMENT CONTROLLER FOR ENTERPRISE DATA CENTERS,” in Proc. Int. World Wide Web Conf. (WWW’07), May 2007, pp. 331–340.

EXPLANATION:

Given a set of machines and a set of Web applications with dynamically changing demands, an online application placement controller decides how many instances to run for each application and where to put them, while observing all kinds of resource constraints. This NP hard problem has real usage in commercial middleware products. Existing approximation algorithms for this problem can scale to at most a few hundred machines, and may produce placement solutions that are far from optimal when system resources are tight. In this paper, we propose a new algorithm that can produce within 30seconds high-quality solutions for hard placement problems with thousands of machines and thousands of applications. This scalability is crucial for dynamic resource provisioning in large-scale enterprise data centers. Our algorithm allows multiple applications to share a single machine, and strivesto maximize the total satisfied application demand, to minimize the number of application starts and stops, and to balance the load across machines. Compared with existing state-of-the-art algorithms, for systems with 100 machines or less, our algorithm is up to 134 times faster, reduces application starts and stops by up to 97%, and produces placement solutions that satisfy up to 25% more application demands. Our algorithm has been implemented and adopted in a leading commercial middleware product for managing the performance of Web applications.

AUTHOR AND PUBLICATION: C. Adam and R. Stadler, “SERVICE MIDDLEWARE FOR SELF-MANAGING LARGE-SCALE SYSTEMS,” IEEE Trans. Netw. Serv. Manage., vol. 4, no. 3, pp. 50–64, Dec. 2007.

EXPLANATION:

Resource management poses particular challenges in large-scale systems, such as server clusters that simultaneously process requests from a large number of clients. A resource management scheme for such systems must scale both in the in the number of cluster nodes and the number of applications the cluster supports. Current solutions do not exhibit both of these properties at the same time. Many are centralized, which limits their scalability in terms of the number of nodes, or they are decentralized but rely on replicated directories, which also reduces their ability to scale. In this paper, we propose novel solutions to request routing and application placement- two key mechanisms in a scalable resource management scheme.

Our solution to request routing is based on selective update propagation, which ensures that the control load on a cluster node is independent of the system size. Application placement is approached in a decentralized manner, by using a distributed algorithm that maximizes resource utilization and allows for service differentiation under overload. The paper demonstrates how the above solutions can be integrated into an overall design for a peer-to-peer management middleware that exhibits properties of self-organization. Through complexity analysis and simulation, we show to which extent the system design is scalable. We have built a prototype using accepted technologies and have evaluated it using a standard benchmark. The testbed measurements show that the implementation, within the parameter range tested, operates efficiently, quickly adapts to a changing environment and allows for effective service differentiation by a system administrator.

AUTHOR AND PUBLICATION: J. Famaey, W. D. Cock, T. Wauters, F. D. Turck, B. Dhoedt, and P. Demeester, “A LATENCY-AWARE ALGORITHM FOR DYNAMIC SERVICE PLACEMENT IN LARGE-SCALE OVERLAYS,” in Proc. IFIP/IEEE Int. Conf. Symp. Integrat. Netw. Manage. (IM’09), 2009, pp. 414–421.

EXPLANATION:

A generic and self-managing service hosting infrastructure, provides a means to offer a large variety of services to users across the Internet. Such an infrastructure provides mechanisms to automatically allocate resources to services, discover the location of these services, and route client requests to a suitable service instance. In this paper we propose a dynamic and latency-aware algorithm for assigning resources to services. Additionally, the proposed service hosting architecture and its protocols to support the service placement algorithm, are described in detail. Extensive simulations were performed to compare the solution of our latency-aware algorithm to the latency-unaware variant, in terms of system efficiency and scalability.

AUTHOR AND PUBLICATION: E. Caron, L. Rodero-Merino, F. Desprez, and A.Muresan, “AUTOSCALING, LOAD BALANCING AND MONITORING IN COMMERCIAL AND OPENSOURCE CLOUDS,” INRIA, Rapport de recherche RR-7857, Feb. 2012.

EXPLANATION:

Now – a – days because of increased use of internet the associate resource are increasing rapidly resulting generation of high work load. To provide the reliable service to client with QOS the load balancing mechanism is necessary in cloud environment, to prevent system from overloading and crash an autoscaling mechanism must also be provided according to the application and incoming user traffic. load balancing mechanism provides the distribution of load among one or more nodes of cloud system, for efficient service model autoscaling feature also enabled with the load balancer to handle the excess load. Auto scaling scaled-up and scaled down the platform dynamically according to the clients incoming traffic this save money and physical resources. Latency based routing is the new concept in cloud computing which provide the load balancing based on DNS latency to global client by mapping domain name system (DNS) through the different hosted zone .provide the load balancing based on geographical service region. To achieve above mentioned we use the public cloud services such as amazons’EC2. ELB. This research is divided in four part such as: i) load balancing ii) auto scaling iii)latency based routing iv) resource monitoring . while discussing each topic in detailed we will implement the individual service and test while providing load from external software tool putty we will produce result for efficient load balancing.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Existing algorithms for the online and offline class-constrained bin packing problem is motivated by applications in the data-placement problem to video-on-demand servers and applications in the cutting and packing area. For the online problem we provide lower bounds for any bounded space algorithm and we also present an algorithm for the unbounded version with approximation factor low value.

For the offline problem we present practical approximation algorithms for two special cases of the problem, with conditions already considered in the literature: when all items have the same size and the parameterized version of the problem. We also perform several tests with these practical algorithms. For the instances we considered representing practical ones, the algorithms optimal solutions an CCBP for the special case where the number of different classes of the input instance is bounded by a constant.

Therefore, in order to solve our problem, we modified the CCBP model to support the “Minimize the placement change frequency” goal and provide a new enhanced semionline approximation algorithm to solve it in the next section. Note that the equations above are just a formal presentation of the goals and constraints of our problem.

2.1.1 DISADVANTAGES:

Automatic scaling problem in the cloud environment, and model it as a modified Class Constrained Bin Packing (CCBP) problem where each server is a bin and each class represents an application.

In the traditional bin packing problem, a series of items of different sizes need to be packed into a minimum number of bins. The class constrained version of this problem divides the items into classes or colors.

Each bin has capacity v and can accommodate items from at most c distinct classes. It is “class constrained” because the class diversity of items packed into the same bin is constrained. The goal is to pack the items into a minimum number of bins.

Existing algorithm is the support for item departure which is essential to maintaining not performance in a cloud computing environment where the resource demands of Internet applications can vary dynamically.

2.2 PROPOSED SYSTEM:

We develop an efficient semi-online color set algorithm that achieves good demand satisfaction ratio and saves energy by reducing the number of servers used each class of items with a color and organize them into color sets as they arrive in the input sequence. The number of distinct colors in a color set is at most c (i.e., the maximum number of distinct classes in a bin). This ensures that items in a color set can always be packed into the same bin without violating the class constraint. The packing is still subject to the capacity constraint of the bin. All color sets contain exactly c colors except the last one which may contain fewer colors. Items from different color sets are packed independently.

A greedy algorithm is used to pack items within each color set: the items are packed into the current bin until the capacity is reached. Then the next bin is opened for packing. Thus each color set has at most one unfilled (i.e., non-full) bin. Note that a full bin may contain fewer than c colors. When a new item from a specific color set arrives, it is packed into the corresponding unfilled bin. If all bins of that color set are full, then a new bin is opened to accommodate the item. The load increase of an application is modeled as the arrival of items with the corresponding color. A naive algorithm is to always pack the item into the unfilled bin if there is one. If the unfilled bin does not contain that color already, then a new color is added into the bin.

We allocate the new colors to the unfilled sets first using the following add_new_colors procedure.

Procedure add_new_colors:

Sort the list of unfilled color sets in descending order of their cardinality. Use a greedy algorithm to add the new colors into those sets according to their positions in the list.

If we run out of the new colors before filling up all but the last unfilled sets, use the consolidate_unfilled_sets procedure below to consolidate the remaining unfilled sets until there is only one left.

If there are still new colors left after filling up all unfilled sets in the system, we partition the remaining new colors into additional color sets using a greedy algorithm.

The consolidate_unfilled_sets procedure below consolidates unfilled sets in the system until there is only one left.

Procedure consolidate_unfilled_sets:

Sort the list of unfilled color sets in descending order of their cardinality Use the last set in the list (with the fewest colors) to fill the first set in the list (with the most colors) through the fill procedure below. Remove the resulting full set or empty set from the list.

2.2.1 ADVANTAGES:

We support green computing by adjusting the placement of application instances adaptively and putting idle machines into the standby mode. Experiments and simulations show that our algorithm is highly efficient and scalable which can achieve high demand satisfaction ratio, low placement change frequency, short request response time, and good energy saving.

We build a real cloud computing system which supports our auto scaling algorithm. We compare the performance of our system with an open source implementation of the internet auto scaling system in a testbed of 30 Dell PowerEdge blade servers.

Experiments show that our system can restore the normal QoS five times as fast when a flash crowd happens. We use a fast restart technique based on virtual machine (VM) suspend and resume that reduces the application start up time dramatically for Internet services.

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1.1 GHz
- RAM – 256 MB (min)
- Hard Disk – 20 GB
- Floppy Drive – 1.44 MB
- Key Board – Standard Windows Keyboard
- Mouse – Two or Three Button Mouse
- Monitor – SVGA

2.3.2 SOFTWARE REQUIREMENTS:

Operating System : Windows XP or Win7
Front End : JAVA JDK 1.7
Back End : MYSQL Server
Server : Apache Tomact Server
Script : JSP Script
Document : MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system

The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.

DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.

DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data’s in the physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

All processes must have at least one data flow in and one data flow out.
All processes should modify the incoming data, producing new forms of outgoing data.
Each data store must be involved with at least one data flow.
Each external entity must be involved with at least one data flow.
A data flow must be attached to at least one process.

3.1 ARCHITECTURE DIAGRAM

3.2 DATAFLOW DIAGRAM

ADMIN:

USER:

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:

ADMIN:

USER:

3.3 CLASS DIAGRAM:

3.4 SEQUENCE DIAGRAM:

ADMIN:

USER:

3.5 ACTIVITY DIAGRAM:

ADMIN:

USER:

CHAPTER 4

4.0 IMPLEMENTATION:

4.1 ALGORITHM:

Our algorithm belongs to the family of color set algorithms with significant modification to adapt to our problem. A detailed comparison with the existing algorithm is deferred to Section 7 so that sufficient background can be established. We label each class of items with a color and organize them into color sets as they arrive in the input sequence. The number of distinct colors in a color set is at most c (i.e., the maximum number of distinct classes in a bin). This ensures that items in a color set can always be packed into the same bin without violating the class constraint. The packing is still subject to the capacity constraint of the bin. All color sets contain exactly c colors except the last one which may contain fewer colors.

Items from different color sets are packed independently. A greedy algorithm is used to pack items within each color set: the items are packed into the current bin until the capacity is reached. Then the next bin is opened for packing. Thus each color set has at most one unfilled (i.e., non-full) bin. Note that a full bin may contain fewer than c colors. When a new item from a specific color set arrives, it is packed into the corresponding unfilled bin. If all bins of that color set are full, then a new bin is opened to accommodate the item.

Our basic idea is to fill up the unfilled sets (except the last one) while minimizing its impact on the existing color assignment. We first check if there are any pending requests to add new colors into the system. If there are, we allocate the new colors to the unfilled sets first using the following add_new_colors procedure.

Procedure add_new_colors:

Sort the list of unfilled color sets in descending order of their cardinality. Use a greedy algorithm to add the new colors into those sets according to their positions in the list. If we run out of the new colors before filling up all but the last unfilled sets, use the consolidate_unfilled_sets procedure below to consolidate the remaining unfilled sets until there is only one left.

If there are still new colors left after filling up all unfilled sets in the system, we partition the remaining new colors into additional color sets using a greedy algorithm. The consolidate_unfilled_sets procedure below consolidates unfilled sets in the system until there is only one left.

Procedure consolidate_unfilled_sets:

Remove the resulting full set or empty set from the list.

Repeat the previous step until there is only one unfilled set left in the list. The fill( , ) procedure below uses the colors in set to fill the set . Procedure fill( , ):

Sort the list of colors in in ascending order of their numbers of items.

Addthe first color in the list (with the fewest items) into. Use “item departure” operation in and “item arrival” operation in to move all items of that color from to . Then remove that color from the list. Repeat the above step until either becomes empty or becomes full.

4.2 MODULES:

USER MODULES:

LOCAL NODE MANAGER (LNM):

APPLICATION LOAD INCREASE:

APPLICATION LOAD DECREASE:

AUTOMATIC SCALING RESOURCES:

APPROXIMATION RATIO:

4.3 MODULE DESCRIPTION:

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:

5.1.2 TECHNICAL FEASIBILITY

5.1.3 SOCIAL FEASIBILITY:

5.2 SYSTEM TESTING:

This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

Description	Expected result
Test for application window properties.	All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations.	All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.2 FUNCTIONAL TESTING:

Description	Expected result
Test for all modules.	All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group.	The result after execution should give the accurate result.

5.1. 3 NON-FUNCTIONAL TESTING:

Load testing
Performance testing
Usability testing
Reliability testing
Security testing

5.1.4 LOAD TESTING:

Description	Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received.	Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Description	Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.	Should handle large input values, and produce accurate result in a expected time.

5.1.6 RELIABILITY TESTING:

Description	Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application.	In case of failure of the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Description	Expected result
Checking that the user identification is authenticated.	In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers.	The peers should know group key in the same group.

5.1.8 WHITE BOX TESTING:

Description	Expected result
Exercise all logical decisions on their true and false sides.	All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds.	All the loops must be finite.
Exercise internal data structures to ensure their validity.	All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Description	Expected result
To check for incorrect or missing functions.	All the functions must be valid.
To check for interface errors.	The entire interface must function normally.
To check for errors in a data structures or external data base access.	The database updation and retrieval must be done.
To check for initialization and termination errors.	All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

6.0 SOFTWARE DESCRIPTION:

6.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

The Java Programming Language

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

Simple
- Architecture neutral
- Object oriented
- Portable
- Distributed
- High performance
- Interpreted
- Multithreaded
- Robust
- Dynamic
- Secure

6.2 THE JAVA PLATFORM:

The Java platform has two components:

The Java Virtual Machine (Java VM)
The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

6.3 WHAT CAN JAVA TECHNOLOGY DO?

The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
Applets: The set of conventions used by applets.
Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
Software components: Known as JavaBeans^TM, can plug into existing component architectures.
Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
Java Database Connectivity (JDBC^TM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

6.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure Java^TMProduct Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

6.5 ODBC:

6.6 JDBC:

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

6.7 JDBC Goals:

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

SQL Conformance

JDBC must be implemental on top of common database interfaces

Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

Keep it simple

Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

Keep the common cases simple

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple Architecture-neutral

Object-oriented Portable

Distributed High-performance

Interpreted Multithreaded

Robust Dynamic Secure

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

6.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

Sockets:

#include <sys/types.h>

#include <sys/socket.h>

int socket(int family, int type, int protocol);

6.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

6.8.1. Map Visualizations:

6.8.2. Time Series Chart Interactivity

6.8.3. Dashboards

6.8.4. Property Editors

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION

We presented the design and implementation of a system that can scale up and down the number of application instances automatically based on demand. We developed a color set algorithm to decide the application placement and the load distribution. Our system achieves high satisfaction ratio of application demand even when the load is very high. It saves energy by reducing the number of running instances when the load is low.

There are several directions for future work. Some cloud service providers may provide multiple levels of services to their customers. When the resources become tight, they may want to give their premium customers a higher demand satisfaction ratio than other customers. In the future, we plan to extend our system to support differentiated services but also consider fairness when allocating the resources across the applications. We mentioned in the paper that we can divide multiple generations of hardware in a data center into “equivalence classes” and run our algorithm within each class.

Our future work is to develop an efficient algorithm to distribute incoming requests among the set of equivalence classes and to balance the load across those server clusters adaptively. As analyzed in the paper, CCBP works well when the aggregate load of applications in a color set is high. Another direction for future work is to extend the algorithm to pack applications with complementary bottleneck resources together, e.g., to co-locate a CPU intensive application with a memory intensive one so that different dimensions of server resources can be adequately utilized.

CHAPTER 9

9.1 REFERENCES

C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici, “A scalable application placement controller for enterprise data centers,” in Proc. Int. World Wide Web Conf. (WWW’07), May 2007, pp. 331–340.

C. Adam and R. Stadler, “Service middleware for self-managing large-scale systems,” IEEE Trans. Netw. Serv. Manage., vol. 4, no. 3, pp. 50–64, Dec. 2007.

J. Famaey, W. D. Cock, T. Wauters, F. D. Turck, B. Dhoedt, and P. Demeester, “A latency-aware algorithm for dynamic service placement in large-scale overlays,” in Proc. IFIP/IEEE Int. Conf. Symp. Integrat. Netw. Manage. (IM’09), 2009, pp. 414–421.

E. Caron, L. Rodero-Merino, F. Desprez, and A.Muresan, “Autoscaling, load balancing and monitoring in commercial and opensource clouds,” INRIA, Rapport de recherche RR-7857, Feb. 2012.

A. Karve, T. Kimbrel, G. Pacifici, M. Spreitzer, M. Steinder, M. Sviridenko, and A. Tantawi, “Dynamic placement for clustered web applications,” in Proc. Int. World Wide Web Conf. (WWW’06), May 2006, pp. 595–604.

D. Magenheimer, “Transcendent memory: A new approach to managingRAMin a virtualized environment,” in Proc. Linux Symp., 2009, pp. 191–200.

E. C. Xavier and F. K. Miyazawa, “The class constrained bin packing problem with applications to video-on-demand,” Theor. Comput.Sci., vol. 393, no. 1–3, pp. 240–259, 2008.

A Study on False Channel Condition Reporting Attacks in Wireless Networks

22/10/201902/07/2019 by admin

A STUDY ON FALSE CHANNEL CONDITION REPORTING ATTACKS IN WIRELESS NETWORKS

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

MASTER OF TECHNOLOGY

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

BONAFIDE CERTIFICATE

Certified that this project report titled “A STUDY ON FALSE CHANNEL CONDITION REPORTING ATTACKS IN WIRELESS NETWORKS” is the bonafide work of Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide Signature of the H.O.D

Name Name

CHAPTER 1

ABSTRACT:

Wireless networking protocols are increasingly being designed to exploit a user’s measured channel condition; we call such protocols channel-aware. Each user reports the measured channel condition to a manager of wireless resources and a channel-aware protocol uses these reports to determine how resources are allocated to users. In a channel-aware protocol, each user’s reported channel condition affects the performance of every other user. The deployment of channel-aware protocols increases the risks posed by false channel-condition feedback.

We study what happens in the presence of an attacker that falsely reports its channel condition. We perform case studies on channel-aware network protocols to understand how an attack can use false feedback and how much the attack can affect network performance. The results of the case studies show that we need a secure channel condition estimation algorithm to fundamentally defend against the channel-condition misreporting attack. We design such an algorithm and evaluate our algorithm through analysis and simulation. Our evaluation quantifies the effect of our algorithm on system performance as well as the security and the performance of our algorithm.

INTRODUCTION

Many protocols in modern wireless networks treat a link’s channel condition information as a protocol input parameter; we call such protocols channel-aware. Examples include cooperative relaying network architectures, efficient ad hoc network routing metrics, and opportunistic schedulers. While work on channel-aware protocols has mainly focused on how channel condition information can be used to more efficiently utilize wireless resources, security aspects of channel-aware protocols have only recently been studied. These works on security of channel-aware protocols revealed new threats in specific network environments by simulation or measurement.

However, understanding the effect of possible attacks across varied network environments is still an open area for study. In particular, we consider the effect of a user equipment’s reporting false channel condition. This issue is partially addressed in the work of Racic et al. in a limited network setting. They consider a particular scheduler in a cellular network with handover process and propose a secure handover algorithm. In contrast, we reveal the possible effects of false channel condition reporting in various channel-aware network protocols and propose a primitive defense mechanism that provides secure channel condition estimation.

Our contributions are:

• We analyze specific attack mechanisms and evaluate the effects of misreporting channel condition on various channel-aware wireless network protocols including cooperative relaying protocols, routing metrics in wireless ad-hoc network and opportunistic schedulers.

• We propose a secure channel condition estimation algorithm that can be used to construct a secure channel-aware protocol in single-hop settings.

• We analyze our algorithm in the respects of performance and security, and we perform a simulation study to understand the impact of our algorithm on system performance.

The false channel condition reporting attack that we introduce in this paper is difficult to identify by existing mechanisms, since our attack is mostly protocol compliant; only the channel-condition measurement mechanism need to be modified. Our attack can thus be performed using modified user equipment legitimately registered to a network.

1.3 LITRATURE SURVEY

EXPLOITING AND DEFENDING OPPORTUNISTIC SCHEDULING IN CELLULAR DATA NETWORKS

PUBLICATION: R. Racic, D. Ma, H. Chen, and X. Liu, IEEE Trans. Mobile Comput., vol. 9, no. 5, pp. 609–620, May 2010.

Third Generation (3G) cellular networks take advantage of time-varying and location-dependent channel conditions of mobile users to provide broadband services. Under fairness and QoS constraints, they use opportunistic scheduling to efficiently utilize the available spectrum. Opportunistic scheduling algorithms rely on the collaboration among all mobile users to achieve their design objectives. However, we demonstrate that rogue cellular devices can exploit vulnerabilities in popular opportunistic scheduling algorithms, such as Proportional Fair (PF) and Temporal Fair (TF), to usurp the majority of time slots in 3G networks. Our simulations show that under realistic conditions, only five rogue device per 50-user cell can capture up to 95 percent of the time slots, and can cause 2-second end-to-end interpacket transmission delay on VoIP applications for every user in the same cell, rendering VoIP applications useless. To defend against this attack, we propose strengthening the PF and TF schedulers and a robust handoff scheme.

ON THE VULNERABILITY OF THE PROPORTIONAL FAIRNESS SCHEDULER TO RETRANSMISSION ATTACKS

PUBLICATION: U. Ben-Porat, A. Bremler-Barr, H. Levy, and B. Plattner, in Proc. IEEE INFOCOM, Shanghai, China, Apr. 2011, pp. 1431–1439.

Channel aware schedulers of modern wireless networks – such as the popular Proportional Fairness Scheduler (PFS) – improve throughput performance by exploiting channel fluctuations while maintaining fairness among the users. In order to simplify the analysis, PFS was introduced and vastly investigated in a model where frame losses do not occur, which is of course not the case in practical wireless networks. Recent studies focused on the efficiency of various implementations of PFS in a realistic model where frame losses can occur. In this work we show that the common straight forward adaptation of PFS to frame losses exposes the system to a malicious attack (which can alternatively be caused by malfunctioning user equipment) that can drastically degrade the performance of innocent users. We analyze the factors behind the vulnerability of the system and propose a modification of PFS designed for the frame loss model which is resilient to such malicious attack while maintaining the fairness properties of original PFS.

A MEASUREMENT STUDY OF SCHEDULER-BASED ATTACKS IN 3G WIRELESS NETWORKS

PUBLICATION: S. Bali, S. Machiraju, H. Zang, and V. Frost, in Proc. PAM, Berlin, Germany, 2007.

Though high-speed (3G) wide-area wireless networks have been rapidly proliferating, little is known about the robustness and security properties of these networks. In this paper, we make initial steps towards understanding these properties by studying Proportional Fair (PF), the scheduling algorithm used on the downlinks of these networks. We find that the fairness-ensuring mechanism of PF can be easily corrupted by a malicious user to monopolize the wireless channel thereby starving other users. Using extensive experiments on commercial and laboratory-based CDMA networks, we demonstrate this vulnerability and quantify the resulting performance impact. We find that delay jitter can be increased by up to 1 second and TCP throughput can be reduced by as much as 25−30 % by a single malicious user. Based on our results, we argue for the need to use a more robust scheduling algorithm and outline one such algorithm. 1

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Many protocols in modern wireless networks treat a link’s channel condition information as a protocol input parameter; we call such protocols channel-aware. Examples include cooperative relaying network architectures, efficient ad hoc network routing metrics, and opportunistic schedulers. While work on channel-aware protocols has mainly focused on how channel condition information can be used to more efficiently utilize wireless resources, security aspects of channel-aware protocols have only recently been studied. These works on security of channel-aware protocols revealed new threats in specific network environments by simulation or measurement. However, under-standing the effect of possible attacks across varied network environments is still an open area for study.

2.1.1 DISADVANTAGES:

Difficult to guarantee QoS in MANETs due to their unique features including user mobility, channel variance errors, and limited bandwidth.
Although these protocols can increase the QoS of the MANETs to a certain extent, they suffer from invalid reservation and race condition problems.

2.2 PROPOSED SYSTEM:

We introduce our attack concept and perform case studies to quantize the attack effects on specific channel-aware network protocols. Depending on deployed PHY-layer technologies (e.g. OFDM), a system can utilize conditions for subchannels to perform more efficient frequency-selective scheduling. Our work can apply for this case by handling each subchannel condition information separately. However, for clarity of presentation, we consider a single channel between network participants in this paper.

We can easily implement false channel condition reporting attack by modifying only a subcomponent to report channel condition. This subcomponent of user equipment can be implemented in hardware or software. One recent trend of user equipment implementation is to increasingly move hardware part to software part for adaptable configuration of a general hardware. The increasing software control of user equipment makes false channel condition reporting attack an increasingly practical attack.

2.2.1 ADVANTAGES:

We analyze specific attack mechanisms and evaluate the effects of misreporting channel condition on various channel-aware wireless network protocols including cooperative relaying protocols, routing metrics in wireless ad-hoc network and opportunistic schedulers.

We propose a secure channel condition estimation algorithm that can be used to construct a secure channel-aware protocol in single-hop settings.

We analyze our algorithm in the respects of performance and security, and we perform a simulation study to understand the impact of our algorithm on system performance.

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1.1 GHz
- RAM – 256 MB (min)
- Hard Disk – 20 GB
- Floppy Drive – 1.44 MB
- Key Board – Standard Windows Keyboard
- Mouse – Two or Three Button Mouse
- Monitor – SVGA

2.3.2 SOFTWARE REQUIREMENTS:

Operating System : Windows XP or Win 7
Front End : Java JDK 1.7
Back End : MS-ACCESS
Tools : Netbeans IDE 7
Document : MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system

The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.

DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.

DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data. The physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

All processes must have at least one data flow in and one data flow out.
All processes should modify the incoming data, producing new forms of outgoing data.
Each data store must be involved with at least one data flow.
Each external entity must be involved with at least one data flow.
A data flow must be attached to at least one process.

3.1 ARCHITECTURE DIAGRAM:

3.2 DATAFLOW DIAGRAM:

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:

3.3 CLASS DIAGRAM:

3.4 SEQUENCE DIAGRAM:

3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

We performed a simulation study to evaluate the overclaiming attack’s effect on the normal users’ performance in a cooperative relaying environment. We quantifyWe use the ns-2 simulator patched with EURANE UMTS system simulator. Our simulated network consists of one base station serving four users. The base station sends 11Mbps of Constant Bit Rate (CBR) traffic to each user. There is one attacker who may falsely report its channel condition to the base station. We represent channel condition using the Channel Quality Indicator (CQI) defined in the 3GPP standard.

The same equation and block size are used for a case study of opportunistic scheduler presented in Section 2.2.3. We assume that one victim is close to the attacker so that when the victim experiences poor channel condition, the victim can use the attacker as a relaying node. The other two normal users get packets directly from the base station and do not participate in the cooperative relaying protocol. The victim and the two normal users honestly report their channel condition. The channel for the attacker and two normal users uses a shadowing plus Rayleigh model of a moving node 100m away from the base station with velocity of 3km/h. We vary the victim’s distance to the base station from 100m to 500m to see the effect of the victim’s channel condition on the performance degradation.

The attacker’s goal in this simulation is to reduce the victim’s throughput. The attacker can adopt two approaches. In the conservative approach, the attacker does not forward packets for the victim without falsely reporting its channel condition. In the aggressive approach, the attacker overclaims its channel condition so that the attacker can increase its probability of relaying packets for the victim.

Our simulations do not consider the overhead that an actual relaying protocol might incur in finding a new relaying node due to channel condition variation since such overhead is not related to the effect of attack. We assume that each transmission uses an orthogonal carrier so that transmissions do not interfere with each other.

Our simulations do not implement a relay discovery protocol; rather, we compare the attacker and victim CQI, and use the link with better CQI value to transmit to the victim. This relaying scheme is an example; a system operator may choose a different scheme. However, the scheme that we chose is good for exploiting increased diversity to optimize throughput. We ran each of our simulations for 100 simulated seconds.

4.1 ALGORITHM

In this section, we evaluate the performance and the security of our algorithm. Firstly, we analyze the performance of our algorithm according to algorithm parameters. This analysis can be used for parameter design guidelines. This analysis result is compared to simulation results. Secondly, we analyze the security of our algorithm. In this analysis, we show how much a brute-force attacker can be successful in guessing the value included in a challenge according to algorithm parameters. This analysis can be used for understanding tradeoff between security and system overhead. Thirdly, we integrate our algorithm into a network simulator and evaluate the effect of our algorithm on the system performance. We show that our algorithm securely and effectively estimates channel condition through most of its parameter space.

We used identical parameters, except we replaced the static channel with various variable channel models. We measure the throughput of normal users under scheduling policies of MAX-SINR, PF and MAX-SINR with our algorithm. In MAX-SINR with our algorithm, a base station does not use reported CQI-level to determine a user with the best channel condition in a give time slot. Instead, the base station uses CQI-level estimated by our algorithm. In the case study of opportunistic scheduler, our observation was that PF scheduler prevented attackers from stealing throughput. Hence, we concluded that PF was a good candidate for defending against false channel condition reporting attack. However, our simulation results for the system performance show that MAX-SINR with our algorithm can achieve higher throughput than PF scheduler in most cases. The fact that the performance of our algorithm depends on channel characteristic affects the throughput of normal users in case of MAX-SINR with our algorithm.

4.2 MODULE DESCRIPTION:

SERVER CLIENT MODULE:

NETWORK SECURITY:

ATTACK MODEL:

COOPERATIVE RELAYING:

EFFICIENT ROUTING METRICS:

PERFORMANCE ANALYSIS

4.3 MODULES DESCRIPTION:

SERVER CLIENT MODULE:

Client-server computing or networking is a distributed application architecture that partitions tasks or workloads between service providers (servers) and service requesters, called clients. Often clients and servers operate over a computer network on separate hardware. A server machine is a high-performance host that is running one or more server programs which share its resources with clients. A client also shares any of its resources; Clients therefore initiate communication sessions with servers which await (listen to) incoming requests.

NETWORK SECURITY:

Network-accessible resources may be deployed in a network as surveillance and early-warning tools, as the detection of attackers are not normally accessed for legitimate purposes. Techniques used by the attackers that attempt to compromise these decoy resources are studied during and after an attack to keep an eye on new exploitation techniques. Such analysis may be used to further tighten security of the actual network being protected by the data’s. Data forwarding can also direct an attacker’s attention away from legitimate servers. A user encourages attackers to spend their time and energy on the decoy server while distracting their attention from the data on the real server. Similar to a server, a user is a network set up with intentional vulnerabilities. Its purpose is also to invite attacks so that the attacker’s methods can be studied and that information can be used to increase network security.

ATTACK MODEL:

We introduce our attack concept and perform case studies to quantize the attack effects on specific channel-aware network protocols. We evaluate the effect of falsely reported channel condition under three types of channel-aware protocols: cooperative relaying protocols in hybrid networks, efficient routing metrics in wireless ad hoc networks, and opportunistic schedulers in high-speed wireless networks. For each protocol, we suggest possible attack mechanisms and quantify the effectiveness of the attack. We show that we can defend against some attacks using existing algorithms, and that other attacks require new security mechanisms. Each following case study has the same presentation format. First, we briefly explain the protocols. Then, we discuss effective attack scenarios for each protocol. Finally, we use simulations to evaluate the effect of each attack scenario.

COOPERATIVE RELAYING:

In a mobile wireless network, mobile nodes can experience different channel conditions depending on their different locations. When a node experiences a channel condition that is too poor to receive packets from a source node, a third node may have a good channel condition to both the source and the intended destination. Cooperative relaying network architectures help a node that has poor channel condition to route its packet through a node with a good channel condition, thus improving system throughput.

A cooperative relaying protocol must distribute channel condition information for each candidate path, find the most appropriate relay path, and provide incentives to motivate nodes to forward packets for other nodes. Specifically, in UCAN user equipment has two wireless adaptors, one High Data Rate (HDR) cellular interface and one IEEE 802.11 interface. The HDR interface is used for communication with a base station and the IEEE 802.11 interface is used for peer-to-peer communication with other user equipment in a network.

EFFICIENT ROUTING METRICS:

Routing Protocols in Ad Hoc Networks a wireless ad hoc network supports communication between nodes without need for centralized infrastructure such as base stations or access points. To deliver packets to destinations out of a source node’s transmission range, the source employs the help of intermediate nodes to forward each packet to its destination. Routing protocols in wireless ad hoc network discover routes between nodes. When there are multiple valid routes from a source to a destination, a routing protocol needs to choose among valid routes. A routing metric is a value associated to a route and represents the desirability of a route. A typical metric in the seminal routing protocols is minimum hop count. The rationale behind the metric of minimum hop count is that a route with fewer hops allows a packet to be delivered with the smaller number of transmissions.

PERFORMANCE ANALYSIS:

Our analysis assumes that the channel condition does not change. Though this assumption does not hold in a mobile environment, the purpose of our analysis is not to capture every detail of real world but to verify our simulator. For the evaluation in a realistic environment, we perform simulations with channel models considering variable channel conditions, as described in that the challenge size and Psref (i) are the same for different challenges for easy comparison of performance. The equations in our analysis do not assume the same values of challenge size and Psref (i). However, with different values of challenge size and Psref (i), it is not easy to understand the parameters’ effect on the performance. In this analysis, we assume that the challenges are authenticated.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:

5.1.2 TECHNICAL FEASIBILITY

5.1.3 SOCIAL FEASIBILITY:

5.2 SYSTEM TESTING:

5.2.1 UNIT TESTING:

Description	Expected result
Test for application window properties.	All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations.	All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.2 FUNCTIONAL TESTING:

Description	Expected result
Test for all modules.	All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group.	The result after execution should give the accurate result.

5.1. 3 NON-FUNCTIONAL TESTING:

Load testing
Performance testing
Usability testing
Reliability testing
Security testing

5.1.4 LOAD TESTING:

Description	Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received.	Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Description	Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.	Should handle large input values, and produce accurate result in a expected time.

5.1.6 RELIABILITY TESTING:

Description	Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application.	In case of failure of the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Description	Expected result
Checking that the user identification is authenticated.	In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers.	The peers should know group key in the same group.

5.1.8 WHITE BOX TESTING:

Description	Expected result
Exercise all logical decisions on their true and false sides.	All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds.	All the loops must be finite.
Exercise internal data structures to ensure their validity.	All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Description	Expected result
To check for incorrect or missing functions.	All the functions must be valid.
To check for interface errors.	The entire interface must function normally.
To check for errors in a data structures or external data base access.	The database updation and retrieval must be done.
To check for initialization and termination errors.	All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

6.0 SOFTWARE DESCRIPTION:

6.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

The Java Programming Language

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

Simple
- Architecture neutral
- Object oriented
- Portable
- Distributed
- High performance
- Interpreted
- Multithreaded
- Robust
- Dynamic
- Secure

6.2 THE JAVA PLATFORM:

The Java platform has two components:

The Java Virtual Machine (Java VM)
The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

6.3 WHAT CAN JAVA TECHNOLOGY DO?

The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
Applets: The set of conventions used by applets.
Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
Software components: Known as JavaBeans^TM, can plug into existing component architectures.
Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
Java Database Connectivity (JDBC^TM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

6.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure Java^TMProduct Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

6.5 ODBC:

6.6 JDBC:

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

6.7 JDBC Goals:

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

SQL Conformance

JDBC must be implemental on top of common database interfaces

Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

Keep it simple

Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

Keep the common cases simple

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple Architecture-neutral

Object-oriented Portable

Distributed High-performance

Interpreted Multithreaded

Robust Dynamic Secure

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

6.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

Sockets:

#include <sys/types.h>

#include <sys/socket.h>

int socket(int family, int type, int protocol);

6.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

6.8.1. Map Visualizations:

6.8.2. Time Series Chart Interactivity

6.8.3. Dashboards

6.8.4. Property Editors

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION

In this paper, we have studied the threat imposed by falsely reporting users’ channel condition. Through case studies for three different types of wireless network protocols, we show that in a cooperative relaying network and a network using ETX, a false reporting attack can significantly reduce the performance of other users. Our false channel-feedback attack can arise in any channel-aware protocol where a user reports its own channel condition. To counter such attacks, we propose a secure channel condition estimation algorithm to prevent the overclaiming attack. Through analysis and simulations, we show that with proper parameters, we can prevent the over claiming attack.

CHAPTER 9

9.1 REFERENCES

Dongho Kim and Yih-Chun Hu, “A Study on False Channel Condition Reporting Attacks in Wireless Networks”, IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 13, NO. 5, MAY 2014.

H. Luo, R. Ramjee, P. Sinha, L. E. Li, and S. Lu, “Ucan: A unified cellular and ad-hoc network architecture,” in Proc. ACM MobiCom, San Diego, CA, USA, 2003, pp. 353–367.

D. S. J. De Couto, D. Aguayo, J. Bicket, and R. Morris, “A highthroughput path metric for multi-hop wireless routing,” in Proc. ACM MobiCom, San Diego, CA, USA, 2003, pp. 134–146.

R. Draves, J. Padhye, and B. Zill, “Comparison of routing metrics for static multi-hop wireless networks,” in Proc. ACM SIGCOMM, Portland, OR, USA, 2004, pp. 133–144.

A. Jalali, R. Padovani, and R. Pankaj, “Data throughput of CDMA-HDR a high efficiency-high data rate personal communication wireless system,” in Proc. IEEE VTC, Tokyo, Japan, 2000, pp. 1854–1858.

P. Viswanath, D. N. C. Tse, and R. Laroia, “Opportunistic beamforming using dumb antennas,” IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1277–1294, Jun. 2002.

R. Racic, D. Ma, H. Chen, and X. Liu, “Exploiting and defending opportunistic scheduling in cellular data networks,” IEEE Trans. Mobile Comput., vol. 9, no. 5, pp. 609–620, May 2010.

U. Ben-Porat, A. Bremler-Barr, H. Levy, and B. Plattner, “On the vulnerability of the proportional fairness scheduler to retransmission attacks,” in Proc. IEEE INFOCOM, Shanghai, China, Apr. 2011, pp. 1431–1439.

S. Bali, S. Machiraju, H. Zang, and V. Frost, “A measurement study of scheduler-based attacks in 3G wireless networks,” in Proc. PAM, Berlin, Germany, 2007.

A QoS-Oriented Distributed Routing Protocol for Hybrid Wireless Networks

22/10/201902/07/2019 by admin

A QOS-ORIENTED DISTRIBUTED ROUTING PROTOCOL FOR HYBRID

WIRELESS NETWORKS

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

MASTER OF TECHNOLOGY

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

BONAFIDE CERTIFICATE

Certified that this project report titled “A QOS-ORIENTED DISTRIBUTED ROUTING PROTOCOL FOR HYBRID WIRELESS NETWORKS” is the bonafide work of Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide Signature of the H.O.D

Name Name

CHAPTER 1

ABSTRACT:

As wireless communication gains popularity, significant research has been devoted to supporting real-time transmission with stringent Quality of Service (QoS) requirements for wireless applications. At the same time, a wireless hybrid network that integrates a mobile wireless ad hoc network (MANET) and a wireless infrastructure network has been proven to be a better alternative for the next generation wireless networks. By directly adopting resource reservation-based QoS routing for MANETs, hybrids networks inherit invalid reservation and race condition problems in MANETs. How to guarantee the QoS in hybrid networks remains an open problem.

In this paper, we propose a QoS-Oriented Distributed routing protocol (QOD) to enhance the QoS support capability of hybrid networks. Taking advantage of fewer transmission hops and anycast transmission features of the hybrid networks, QOD transforms the packet routing problem to a resource scheduling problem.

QOD incorporates five algorithms:

1) A QoS-guaranteed neighbor selection algorithm to meet the transmission delay requirement,

2) A distributed packet scheduling algorithm to further reduce transmission delay,

3) A mobility-based segment resizing algorithm that adaptively adjusts segment size according to node mobility in order to reduce transmission time,

4) A traffic redundant elimination algorithm to increase the transmission throughput, and

5) A data redundancy elimination-based transmission algorithm to eliminate the redundant data to further improve the transmission QoS.

Analytical and simulation results based on the random way-point model and the real human mobility model show that QOD can provide high QoS performance in terms of overhead, transmission delay, mobility-resilience, and scalability.

1.2 INTRODUCTION

The rapid development of wireless networks has stimulated numerous wireless applications that have been used in wide areas such as commerce, emergency services, military, education, and entertainment. The number of WiFi capable mobile devices including laptops and handheld devices (e.g., smartphone and tablet PC) has been increasing rapidly. For example, the number of wireless Internet users has tripled world-wide in the last three years, and the number of smartphone users in US has increased from 92.8 million in 2011 to 121.4 million in 2012, and will reach around 207 million by 2017. Nowadays, people wish to watch videos, play games, watch TV, and make longdistanceconferencing via wireless mobile devices “on the go.” Therefore, video streaming applications such as Qik, Flixwagon, and FaceTime on the infrastructure wireless networks have received increasing attention recently. These applications use an infrastructure to directly connect mobile users for video watching or interaction in real time. The widespread use of wireless and mobile devices and the increasing demand for mobile multimedia streaming services are leading to a promising near future where wireless multimedia services (e.g., mobile gaming, online TV, and online conferences) are widely deployed.

The emergence and the envisioned future of real time andmultimedia applications have stimulated the need of high Quality of Service (QoS) support in wireless and mobile networking environments. The QoS support reduces endto- end transmission delay and enhances throughput to guarantee the seamless communication between mobile devices and wireless infrastructures.

At the same time, hybrid wireless networks (i.e., multihop cellular networks) have been proven to be a better network structure for the next generation wireless networks and can help to tackle the stringent end-to-end QoS requirements of different applications. Hybrid networks synergistically combine infrastructure networks and MANETs to leverage each other. Specifically, infrastructure networks improve the scalability of MANETs, while MANETs automatically establish self-organizing networks, extending the coverage of the infrastructure networks. In a vehicle opportunistic access network (an instance of hybrid networks), people in vehicles need to upload or download videos from remote Internet servers through access points (APs) (i.e., base stations) spreading out in a city.

Since it is unlikely that the base stations cover the entire city to maintain sufficiently strong signal everywhere to support an application requiring high link rates, the vehicles themselves can form a MANET to extend the coverage of the base stations, providing continuous network connections. How to guarantee the QoS in hybrid wireless networks with high mobility and fluctuating bandwidth still remains an open question. In the infrastructure wireless networks, QoS provision (e.g., Intserv, RSVP ) has been proposed for QoS routing, which often requires node negotiation, admission control, resource reservation, and, priority scheduling of packets.

However, it is more difficult to guarantee QoS in MANETs due to their unique features including user mobility, channel variance errors, and limited bandwidth. Thus, attempts to directly adapt the QoS solutions for infrastructure networks to MANETs generally do not have great succes. Numerous reservation-based QoS routing protocols have been proposed for MANETs that create routes formed by nodes and links that reserve their resources to fulfill QoS requirements. Although these protocols can increase the QoS of the MANETs to a certain extent, they suffer from invalid reservation and race condition problems. Invalid reservation problem means that the reserved resources become useless if the data transmission path between a source node and a destination node breaks. Race condition problem means a double allocation of the same resource to two different QoS paths. However, little effort has been devoted to support QoS routing in hybrid networks. Most of the current works in hybrid networks focus on increasing network capacity or routing reliability but cannot provide QoS-guaranteed services. Direct adoption of the reservation-based QoS routing protocols of MANETs into hybrid networks inherits the invalid reservation and race condition problems.

In order to enhance the QoS support capability of hybrid networks, in this paper, we propose a QoS-Oriented Distributed routing protocol (QOD). Usually, a hybrid network has widespread base stations. The data transmission in hybrid networks has two features. First, an AP can be a source or a destination to any mobile node. Second, the number of transmission hops between a mobile node and an AP is small. The first feature allows a stream to have anycast transmission along multiple transmission paths to its destination through base stations, and the second feature enables a source node to connect to an AP through an intermediate node. Taking full advantage of the two features, QOD transforms the packet routing problem into a dynamic resource scheduling problem. Specifically, in QOD, if a source node is not within the transmission range of the AP, a source node selects nearby neighbors that can provide QoS services to forward its packets to base stations in a distributed manner. The source node schedules the packet streams to neighbors based on their queuing condition, channel condition, and mobility, aiming toreduce transmission time and increase network capacity. The neighbors then forward packets to base stations, which further forward packets to the destination. In this paper, we focus on the neighbor node selection for QoS-guaranteed transmission. QOD is the first work for QoS routing in hybrid networks.

LITRATURE SURVEY

QOS MULTICAST ROUTING BY USING MULTIPLE PATHS/TREES IN WIRELESSAD HOC NETWORKS

AUTHOR: H. Wu and X. Jia

PUBLISH: Ad Hoc Networks, vol. 5, pp. 600-612, 2009.

In this paper, we investigate the issues of QoS multicast routing in wireless ad hoc networks. Due to limited bandwidth of a wireless node, a QoS multicast call could often be blocked if there does not exist a single multicast tree that has the requested bandwidth, even though there is enough bandwidth in the system to support the call. In this paper, we propose a new multicast routing scheme by using multiple paths or multiple trees to meet the bandwidth requirement of a call. Three multicast routing strategies are studied, SPT (shortest path tree) based multiple-paths (SPTM), least cost tree based multiple-paths (LCTM) and multiple least cost trees (MLCT). The final routing tree(s) can meet the user’s QoS requirements such that the delay from the source to any destination node shall not exceed the required bound and the aggregate bandwidth of the paths or trees shall meet the bandwidth requirement of the call. Extensive simulations have been conducted to evaluate the performance of our three multicast routing strategies. The simulation results show that the new scheme improves the call success ratio and makes a better use of network resources.

QUALITY OF SERVICE PROVISIONING IN AD HOC WIRELESS NETWORKS: A SURVEY OF ISSUES AND SOLUTIONS

AUTHOR: T. Reddy, I. Karthigeyan, B. Manoj, and C. Murthy

PULISH: Ad Hoc Networks, vol. 4, no. 1, pp. 83-124, 2006.

An ad hoc wireless network (AWN) is a collection of mobile hosts forming a temporary network on the fly, without using any fixed infrastructure. Characteristics of AWNs such as lack of central coordination, mobility of hosts, dynamically varying network topology, and limited availability of resources make QoS provisioning very challenging in such networks. In this paper, we describe the issues and challenges in providing QoS for AWNs and review some of the QoS solutions proposed. We first provide a layer-wise classification of the existing QoS solutions, and then discuss each of these solutions.

QOS ROUTING BASED ON MULTI-CLASS NODES FOR MOBILE AD HOC NETWORKS

AUTHOR: X. Du, Ad Hoc Networks

PUBLISH: vol. 2, pp. 241-254, 2004.

Efficient routing is very important for Mobile Ad hoc Networks (MANETs). Most existing routing protocols consider homogeneous ad hoc networks, in which all nodes are identical, i.e., they have the same communication capabilities and characteristics. Although a homogeneous network model is simple and easy to analyze, it misses important characteristics of many realistic MANETs such as military battlefield networks. In addition, a homogeneous ad hoc network suffers from poor performance limits and scalability. In many ad hoc networks, multiple types of nodes do co-exist; and some nodes have larger transmission power, higher transmission data rate, better processing capability, and are more robust against bit errors and congestion than other nodes. Hence, a heterogeneous network model is more realistic and provides many advantages (e.g., leading to more efficient routing protocol design). In this paper,

We present a new routing protocol called Multi-Class (MC) routing, which is specifically designed for heterogeneous MANETs. Moreover, we also design a new Medium Access Control (MAC) protocol for heterogeneous MANETs, which is more efficient than IEEE 802.11b. Extensive simulation results demonstrate that the MC routing has very good performance, and outperforms a popular routing protocol — Zone Routing Protocol, in terms of reliability, scalability, route discovery latency, overhead, as well as packet delay and throughput.

PROVISIONING OF ADAPTABILITY TO VARIABLE TOPOLOGIES FOR ROUTING SCHEMES IN MANETS

AUTHOR: S. Jiang, Y. Liu, Y. Jiang, and Q. Yin,

PUBLISH: IEEE J. Selected Areas in Comm., vol. 22, no. 7, pp. 1347-1356, Sept. 2004.

Frequent changes in network topologies caused by mobility in mobile ad hoc networks (MANETs) impose great challenges to designing routing schemes for such networks. Various routing schemes each aiming at particular type of MANET (e.g., flat or clustered MANETs) with different mobility degrees (e.g., low, medium, and high mobility) have been proposed in the literature. However, since a mobile node should not be limited to operate in a particular MANET assumed by a routing scheme, an important issue is how to enable a mobile node to achieve routing performance as high as possible when it roams across different types of MANETs. To handle this issue, a quantity that can predict the link status for a time period in the future with the consideration of mobility is required. In this paper, we discuss such a quantity and investigate how well this quantity can be used by the link caching scheme in the dynamic source routing protocol to provide the adaptability to variable topologies caused by mobility through computer simulation in NS-2.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1EXISTING SYSTEM:

Existing approaches for providing guaranteed services in the infrastructure networks are based on two models: integrated services (IntServ) and differentiated service (DiffServ) [42]. IntServ is a stateful model that uses resource reservation for individual flow, and uses admission control and a scheduler to maintain the QoS of traffic flows. In contrast, DiffServ is a stateless model which uses coarsegrained class-based mechanism for traffic management a number of queuing scheduling algorithms. Reservation-based QoS routing protocols have been proposed for MANETs that create routes formed by nodes and links that reserve their resources to fulfill QoS requirements although these protocols can increase the QoS of the MANETs to a certain extent.

2.2 DISADVANTAGES:

Cannot provide QoS-guaranteed services.
Suffer from invalid reservation and race condition problems .
Invalid reservation problem means that the reserved resources become useless and Race condition problem means a double allocation of the same resource to two different QoS paths.

PROPOSED SYSTEM:

We propose a QoS-Oriented Distributed routing protocol (QOD). Usually, a hybrid network has widespread base stations.

The data transmission in hybrid networks has two features.

First, an AP can be a source or a destination to any mobile node. Second, the number of transmission hops between a mobile node and an AP is small. The first feature allows a stream to have anycast transmission along multiple transmission paths to its destination through base stations, and the second feature enables a source node to connect to an AP through an intermediate node. Taking full advantage of the two features, QOD transforms the packet routing problem into a dynamic resource scheduling problem. Specifically, in QOD, if a source node is not within the transmission range of the AP, a source node selects nearby neighbors that can provide QoS services to forward its packets to base stations in a distributed manner. The source node schedules the packet streams to neighbors based on their queuing condition, channel condition, and mobility, aiming to reduce transmission time and increase network capacity. The neighbors then forward packets to base stations, which further forward packets to the destination.

ADVANTAGES:

QoS-guaranteed neighbor selection algorithm. The algorithm selects qualified neighbors and employs deadline-driven scheduling mechanism to guarantee QoS routing.

Distributed packet scheduling algorithm. After qualified neighbors are identified, this algorithm schedules packet routing. It assigns earlier generated packets to forwarders with higher queuing delays, while assigns more recently generated packets to forwarders with lower queuing delays to reduce total transmission delay.

Mobility-based segment resizing algorithm. The source node adaptively resizes each packet in its packet stream for each neighbor node according to the neighbor’s mobility in order to increase the scheduling feasibility of the packets from the source node.

Soft-deadline based forwarding scheduling algorithm. In this algorithm, an intermediate node first forwards the packet with the least time allowed to wait before being forwarded out to achieve fairness in packet forwarding.

Data redundancy elimination based transmission. Due to the broadcasting feature of the wireless networks, the APs and mobile nodes can overhear and cache packets. This algorithm eliminates the redundant data to improve the QoS of the packet transmission.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1.1 GHz
- RAM – 256 MB (min)
- Hard Disk – 20 GB
- Floppy Drive – 1.44 MB
- Key Board – Standard Windows Keyboard
- Mouse – Two or Three Button Mouse
- Monitor – SVGA

SOFTWARE REQUIREMENTS:

Operating System : Windows XP
Front End : Java JDK 1.7
Document : MS-Office 2007

CHAPTER 3

SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system

The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.

DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.

DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data. The physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

All processes must have at least one data flow in and one data flow out.
All processes should modify the incoming data, producing new forms of outgoing data.
Each data store must be involved with at least one data flow.
Each external entity must be involved with at least one data flow.
A data flow must be attached to at least one process.

3.1 SYSTEM ARCHITECTURE:

3.2 DATAFLOW DIAGRAM

SENSOR NODE:

MOBILE RELAY NODE:

SINK:

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:

3.3 CLASS DIAGRAM:

3.4 SEQUENCE DIAGRAM:

3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

QOD ROUTING PROTOCOL:

Scheduling feasibility is the ability of a node to guarantee a packet to arrive at its destination within QoS requirements. As mentioned, when the QoS of the direct transmission between a source node and an AP cannot be guaranteed, the source node sends a request message to its neighbor nodes. After receiving a forward request from a source node, a neighbor node ni with space utility less than a threshold replies the source node. The reply message contains information about available resources for checking packet scheduling feasibility (Section 2.4), packet arrival interval Ta, transmission delay TI!D, and packet deadline Dp of the packets in each flow being forwarded by the neighbor for queuing delay estimation and distributed packet scheduling and the node’s mobility speed for determining packet size. Based on this information, the source node chooses the replied neighbors that can guarantee the delay QoS of packet transmission to APs.

The selected neighbor nodes periodically report their statuses to the source node, which ensures their scheduling feasibility and locally schedules the packet stream to them. The individual packets are forwarded to the neighbor nodes that are scheduling feasible in a round-robin fashion from a longer delayed node to a shorter delayed node, aiming to reduce the entire packet transmission delay. Algorithm 1 shows the pseudocode for the QOD routing protocol executed by each node. The QOD distributed routing algorithm is developed based on the assumption that the neighboring nodes in the network have different channel utilities and workloads using IEEE 802.11 protocol. Otherwise, there is no need for packet scheduling in routing, since all neighbors produce comparative delay for packet forwarding. Therefore, we analyze the difference in node channel utilities and workloads in a network with IEEE 802.11 protocol in order to see whether the assumption holds true in practice.

4.1 ALGORITHM

The packets travel from different APs, which may lead to different packet transmission delay, resulting in a jitter at the receiver side. The jitter problem can be solved by using token buckets mechanism at the destination APs to shape the traffic flows. This technique is orthogonal to our study in this paper and its details are beyond the scope of this paper.

4.2 MODULES:

HYBRID WIRELESS NETWORKS:

DISTRIBUTED PACKET SCHEDULING:

NEIGHBOR SELECTION ALGORITHM:

MOBILITY-BASED PACKET RESIZING:

4.3 MODULE DESCRIPTION:

HYBRID WIRELESS NETWORKS:

Hybrid wireless networks (i.e., multihop cellular networks) have been proven to be a better network structure for the next generation wireless networks and can help to tackle the stringent end-to end QoS requirements of different applications. Hybrid networks synergistically combine infrastructure networks and MANETs to leverage each other. Specifically, infrastructure networks improve the scalability of MANETs, while MANETs automatically establish self-organizing networks, extending the coverage of the infrastructure networks.

In a vehicle opportunistic access network (an instance of hybrid networks), people in vehicles need to upload or download videos from remote Internet servers through access points (APs) (i.e., base stations) spreading out in a city. Since it is unlikely that the base stations cover the entire city to maintain sufficiently strong signal everywhere to support an application requiring high link rates, the vehicles themselves can form a MANET to extend the coverage of the base stations, providing continuous network connections.

The QoS requirements mainly include end-to-end delay bound, which is essential for many applications with stringent real-time requirement. While throughput guarantee is also important, it is automatically guaranteed by bounding the transmission delay for a certain amount of packets. The source node conducts admission control to check whether there are enough resources to satisfy the requirements of QoS of the packet stream in the network model of a hybrid network. For example, when a source node n1 wants to upload files to an Internet server through APs, it can choose to send packets to the APs directly by itself or require its neighbor nodes n2, n3, or n4 to assist the packet transmission.

DISTRIBUTED PACKET SCHEDULING:

QoS of the packet transmission and how a source node assigns traffic to the intermediate nodes to ensure their scheduling feasibility in order to further reduce the stream transmission time, a distributed packet scheduling algorithm is proposed for packet routing. This algorithm assigns earlier generated packets to forwarders with higher queuing delays and scheduling feasibility, while assigns more recently generated packets to forwarders with lower queuing delays and scheduling feasibility, so that the transmission delay of an entire packet stream can be reduced.

NEIGHBOR SELECTION ALGORITHM:

Since short delay is the major real-time QoS requirement for traffic transmission, QOD incorporates the Earliest Deadline First scheduling algorithm (EDF), which is a deadline driven scheduling algorithm for data traffic scheduling in intermediate nodes. In this algorithm, an intermediate node assigns the highest priority to the packet with the closest deadline and forwards the packet with the highest priority first. Let us use SpðiÞ to denote the size of the packet steam from node ni, use Wi to denote the bandwidth of node i, and TaðiÞ to denote the packet arrival interval from node ni.

MOBILITY-BASED PACKET RESIZING:

In a highly dynamic mobile wireless network, the transmission link between two nodes is frequently broken down. The delay generated in the packet retransmission degrades the QoS of the transmission of a packet flow. On the other hand, a node in a highly dynamic network has higher probability to meet different mobile nodes and APs, which is beneficial to resource scheduling. As (2) shows, the space utility of an intermediate node that is used for forwarding a packet p is reducing packet size can increase the scheduling feasibility of an intermediate node and reduces packet dropping probability. However, we cannot make the size of the packet too small because it generates more packets to be transmitted, producing higher packet overhead. Based on this rationale and taking advantage of the benefits of node mobility, we propose a mobility-based packet resizing algorithm for QOD in this section. The basic idea is that the larger size packets are assigned to lower mobility intermediate nodes and smaller size packets are assigned to higher mobility intermediate nodes, which increases the QoS-guaranteed packet transmissions. Specifically, in QOD, as the mobility of a node increases, the size of a packet Sp sent from a node to its neighbor nodes i decreases as following:

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:

5.1.2 TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.

5.1.3 SOCIAL FEASIBILITY:

5.2 SYSTEM TESTING:

5.2.1 UNIT TESTING:

UNIT TESTING:

Description	Expected result
Test for application window properties.	All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations.	All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

FUNCTIONAL TESTING:

Description	Expected result
Test for all modules.	All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group.	The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

Load testing
Performance testing
Usability testing
Reliability testing
Security testing

5.1.5 LOAD TESTING:

Load Testing

Description	Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received.	Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

PERFORMANCE TESTING:

Description	Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.	Should handle large input values, and produce accurate result in a expected time.

5.1.6 RELIABILITY TESTING:

RELIABILITY TESTING:

Description	Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application.	In case of failure of the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

SECURITY TESTING:

Description	Expected result
Checking that the user identification is authenticated.	In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers.	The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

5.1.8 WHITE BOX TESTING:

Description	Expected result
Exercise all logical decisions on their true and false sides.	All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds.	All the loops must be finite.
Exercise internal data structures to ensure their validity.	All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

5.1.10 BLACK BOX TESTING:

Description	Expected result
To check for incorrect or missing functions.	All the functions must be valid.
To check for interface errors.	The entire interface must function normally.
To check for errors in a data structures or external data base access.	The database updation and retrieval must be done.
To check for initialization and termination errors.	All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

CONCLUSION:

We propose a QoS oriented distributed routing protocol (QOD) for hybrid networks to provide QoS services in a highly dynamic scenario. Taking advantage of the unique features of hybrid networks, i.e., anycast transmission and short transmission hops, QOD transforms the packet routing problem to a packet scheduling problem. In QOD, a source node directly transmits packets to an AP if the direct transmission can guarantee the QoS of the traffic. Otherwise, the source node schedules the packets to a number of qualified neighbor nodes.

Specifically, QOD incorporates five algorithms. The QoS-guaranteed neighbor selection algorithm chooses qualified neighbors for packet forwarding. The distributed packet scheduling algorithm schedules the packet transmission to further reduce the packet transmission time. The mobility-based packet resizing algorithm resizes packets and assigns smaller packets to nodes with faster mobility to guarantee the routing QoS in a highly mobile environment.

The traffic redundant elimination-based transmission algorithm can further increase the transmission throughput. The soft-deadline-based forwarding scheduling achieves fairness in packet forwarding scheduling when some packets are not scheduling feasible. Experimental results show that QOD can achieve high mobility-resilience, scalability, and contention reduction. In the future, we plan to evaluate the performance of QOD based on the real testbed.

CHAPTER 9

REFERENCES:

[1] H. Wu and X. Jia, “QoS Multicast Routing by Using Multiple Paths/Trees in Wireless Ad Hoc Networks,” Ad Hoc Networks, vol. 5, pp. 600-612, 2009.

[2] T. Reddy, I. Karthigeyan, B. Manoj, and C. Murthy, “Quality of Service Provisioning in Ad Hoc Wireless Networks: A Survey of Issues and Solutions,” Ad Hoc Networks, vol. 4, no. 1, pp. 83-124, 2006.

[3] X. Du, “QoS Routing Based on Multi-Class Nodes for Mobile Ad Hoc Networks,” Ad Hoc Networks, vol. 2, pp. 241-254, 2004.

[4] S. Jiang, Y. Liu, Y. Jiang, and Q. Yin, “Provisioning of Adaptability to Variable Topologies for Routing Schemes in MANETs,” IEEE J. Selected Areas in Comm., vol. 22, no. 7, pp. 1347-1356, Sept. 2004.

[5] M. Conti, E. Gregori, and G. Maselli, “Reliable and Efficient Forwarding in Ad Hoc Networks,” Ad Hoc Networks, vol. 4, pp. 398-415, 2006.

[6] G. Chakrabarti and S. Kulkarni, “Load Balancing and Resource Reservation in Mobile Ad Hoc Networks,” Ad Hoc Networks, vol. 4, pp. 186-203, 2006.

[7] Z. Shen and J.P. Thomas, “Security and QoS Self-Optimization in Mobile Ad Hoc Networks,” IEEE Trans. Mobile Computing, vol. 7, pp. 1138-1151, Sept. 2008.

[8] S. Ibrahim, K. Sadek, W. Su, and R. Liu, “Cooperative Communications with Relay-Selection: When to Cooperate and Whom to Cooperate With?” IEEE Trans. Wireless Comm., vol. 7, no. 7, pp. 2814-2827, July 2008.

A Denial of Service Attack to UMTS Networks Using SIM-Less Devices

22/10/201902/07/2019 by admin

A Cocktail Approach for Travel Package Recommendation

22/10/201902/07/2019 by admin

In this paper, propose a cocktail approach to generate the lists for personalized travel package recommendation. Furthermore, we extend the TAST model to the tourist-relation-area-season topic (TRAST) model for capturing the latent relationships among the tourists in each travel group. Finally, we evaluate the TAST model, the TRAST model, and the cocktail recommendation approach on the real-world travel package data.

TRAST model can be easily extended for computing relationships among many more tourists. However, the computation cost will also go up. To simplify the problem, in this paper, each time we only consider two tourists in a travel group as a tourist pair for mining their relationships. By this TRAST model, all the tourists’ travel preferences are represented by relationship distributions.

We can use their relationship distributions as features to cluster them, so as to put them into different travel groups. Thus, in this scenario, many clustering methods can be adopted. Since choosing clustering algorithm is beyond the scope of this paper, in the experiments, we refer to K-means one of the most popular clustering algorithms.

INTRODUCTION:

AS an emerging trend, more and more travel companies provide online services. However, the rapid growth of online travel information imposes an increasing challenge for tourists who have to choose from a large number of available travel packages for satisfying their personalized needs. Moreover, to increase the profit, the travel companies have to understand the preferences from different tourists and serve more attractive packages. Therefore, the demand for intelligent travel services is expected to increase dramatically. Since recommender systems have been successfully applied to enhance the quality of service in a number of fields, it is natural choice to provide travel package recommendations. Actually, recommendations for tourists have been studied before and to the best of our knowledge, the first operative tourism recommender system was introduced by Delgado and DavidsonDespite of the increasing interests in this field, the problem of leveraging unique features to distinguish personalized travel package recommendations from traditional recommender systems remains pretty open.

Indeed, there are many technical and domain challenges inherent in designing and implementing an effective recommender system for personalized travel package recommendation. First, travel data are much fewer and sparser than traditional items, such as movies for recommendation, because the costs for a travel are much more expensive than for watching a movie. Second, every travel package consists of many landscapes (places of interest and attractions), and, thus, has intrinsic complex spatio-temporal relationships. For example, a travel package only includes the landscapes which are geographically colocated together. Also, different travel packages areusually developed for different travel seasons.

Therefore, the landscapes in a travel package usually have spatialtemporal autocorrelations. Third, traditional recommender systems usually rely on user explicit ratings. However, for travel data, the user ratings are usually not conveniently available. Finally, the traditional items for recommendation usually have a long period of stable value, while the values of travel packages can easily depreciate over time and a package usually only lasts for a certain period of time. The travel companies need to actively create new tour packages to replace the old ones based on the interests of the tourists. To address these challenges, in our preliminary work we proposed a cocktail approach on personalized travel package recommendation. Specifically, we first analyze the key characteristics of the existing travel packages. Along this line, travel time and travel destinations are divided into different seasons and areas. Then, wedevelop a tourist-area-season topic (TAST) model, which can represent travel packages and tourists by different topic distributions. In the TAST model, the extraction of topics is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes. As a result, the TAST model can well represent the content of the travel packages and the interests of the tourists. Based on this TAST model, a cocktail approach is developed for personalized travel package recommendation by considering some additional factors including the seasonal behaviors of tourists, the prices of travel packages, and the cold start problem of new packages.

Finally, the experimental results on real-world travel data show that the TAST model can effectively capture the unique characteristics of travel data and the cocktail recommendation approach performs much better than traditional techniques. In this paper, we further study some related topic models of the TAST model, and explain the corresponding travel package recommendation strategies based on them. Also, we propose the tourist-relation-area-season topic (TRAST) model, which helps understand the reasons why tourists form a travel group. This goes beyond personalized package recommendations and is helpful for capturing the latent relationships among the tourists in each travel group.

In addition, we conduct systematic experiments on the realworld data. These experiments not only demonstrate that the TRAST model can be used as an assessment for travel group automatic formation but also provide more insights into the TAST model and the cocktail recommendation

approach. In summary, the contributions of the TAST model, the cocktail approaches, and the TRAST model for travel package recommendations are shown in Fig. 1, whereeach dashed rectangular box in the dashed circle identifies atravel group and the tourists in the same travel group are represented by the same icons.

LITRATURE SURVEY

COST-AWARE TRAVEL TOUR RECOMMENDATION

AUTHOR: Y. Ge et al.,

PUBLISH: Proc. 17th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (SIGKDD ’11), pp. 983-991, 2011.

Recent years have witnessed an increased interest in recommender systems. Despite significant progress in this field, there still remain numerous avenues to explore. Indeed, this paper provides a study of exploiting online travel information for personalized travel package recommendation. A critical challenge along this line is to address the unique characteristics of travel data, which distinguish travel packages from traditional items for recommendation. To that end, in this paper, we first analyze the characteristics of the existing travel packages and develop a tourist-area-season topic (TAST) model. This TAST model can represent travel packages and tourists by different topic distributions, where the topic extraction is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes. Then, based on this topic model representation, we propose a cocktail approach to generate the lists for personalized travel package recommendation. Furthermore, we extend the TAST model to the tourist-relation-area-season topic (TRAST) model for capturing the latent relationships among the tourists in each travel group. Finally, we evaluate the TAST model, the TRAST model, and the cocktail recommendation approach on the real-world travel package data. Experimental results show that the TAST model can effectively capture the unique characteristics of the travel data and the cocktail approach is, thus, much more effective than traditional recommendation techniques for travel package recommendation. Also, by considering tourist relationships, the TRAST model can be used as an effective assessment for travel group formation.

FLDA: MATRIX FACTORIZATION THROUGH LATENT DIRICHLET ALLOCATION

AUTHOR: D. Agarwal and B. Chen

PUBLISH: Proc. Third ACM Int’l Conf. Web Search and Data Mining (WSDM ’10), pp. 91-100, 2010.

We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a “bag-of-words” representation for item meta-data is natural. Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively. Because of data sparseness, regularization is key to good predictive accuracy. Our method works by regularizing both user and item factors simultaneously through user features and the bag of words associated with each item. Specifically, each word in an item is associated with a discrete latent factor often referred to as the topic of the word; item topics are obtained by averaging topics across all words in an item. Then, user rating on an item is modeled as user’s affinity to the item’s topics where user affinity to topics (user factors) and topic assignments to words in items (item factors) are learned jointly in a supervised fashion. To avoid overfitting, user and item factors are regularized through Gaussian linear regression and Latent Dirichlet Allocation (LDA) priors respectively. We show our model is accurate, interpretable and handles both cold-start and warm-start scenarios seamlessly through a single model. The efficacy of our method is illustrated on benchmark datasets and a new dataset from Yahoo! Buzz where fLDA provides superior predictive accuracy in cold-start scenarios and is comparable to state-of-the-art methods in warm-start scenarios. As a by-product, fLDA also identifies interesting topics that explains user-item interactions. Our method also generalizes a recently proposed technique called supervised LDA (sLDA) to collaborative filtering applications. While sLDA estimates item topic vectors in a supervised fashion for a single regression, fLDA incorporates multiple regressions (one for each user) in estimating the item factors.

MAP-BASED INTERACTION WITH A CONVERSATIONAL MOBILE RECOMMENDER SYSTEM,

AUTHOR: D. Agarwal and B. Chen

PUBLISH: Proc. Second Int’l Conf. Mobile Ubiquitous Computing, Systems, Services and Technologies (UBICOMM ’08), pp. 212-218, 2008.

Recommender systems are information search and decision support tools used when there is an overwhelming set of options to consider or when the user lacks the domain-specific knowledge necessary to take autonomous decisions. They provide users with personalized recommendations adapted to their needs and preferences in a particular usage context. In this paper, we present an approach for integrating recommendation and electronic map technologies to build a map-based conversational mobile recommender system that can effectively and intuitively support users in finding their desired products and services. The results of our real-user study show that integrating map-based visualization and interaction in mobile recommender systems improves the system recommendation effectiveness and increases the user satisfaction.

GENERATING COMPARATIVE DESCRIPTIONS OF PLACES OF INTEREST IN THE TOURISM DOMAIN

AUTHOR: B.D. Carolis, N. Novielli, V.L. Plantamura, and E. Gentile

PUBLISH: Proc. Third ACM Conf. Recommender Systems (RecSys ’09), pp. 277-280, 2009.

When visiting cities as tourists, most of the times people do not make very detailed plans and, when choosing where to go and what to seem they tend to select the area with the major number of interesting facilities. Therefore, it would be useful to support the user choice with contextual information presentation, information clustering and comparative explanations of places of potential interest in a given area. In this paper we illustrate how My Map, a mobile recommender system in the Tourism domain, generates comparative descriptions to support users in making decisions about what to see, among relevant objects of interest.

CHAPTER 2

EXISTING SYSTEM:

We first analyze the characteristics of the existing travel packages and develop a tourist-area-season topic (TAST) model. This TAST model can represent travel packages and tourists by different topic distributions, where the topic extraction is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes a tourist-area-season topic (TAST) model which can represent travel packages and tourists by different topic distributions. In the TAST model, the extraction of topics is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes.

As a result, the TAST model can well represent the content of the travel packages and the interests of the tourists. Based on this TAST model, a cocktail approach is developed for personalized travel package recommendation by considering some additional factors including the seasonal behaviors of tourists, the prices of travel packages, and the cold start problem of new packages. Finally, the experimental results on real-world travel data show that the TAST model can effectively capture the unique characteristics of travel data and the cocktail recommendation approach performs much better than traditional techniques.

DISADAVANTAGES:

The previous cocktail recommendation approach (Cocktail) is mainly based on the TAST model and the collaborative filtering method.

Indeed, another possible cocktail approach is the content-based cocktail, and in the following, we call this method TAST Content.

The main difference between TAST Content and Cocktail is that in TAST Content the content similarity between packages and tourists are used for ranking packages instead of using collaborative filtering interests of the tourists, thus it may also suffer from the overspecialization problem.

PROPOSED SYSTEM:

We propose the tourist-relation-area-season topic (TRAST) model, which helps understand the reasons why tourists form a travel group. This goes beyond personalized package recommendations and is helpful for capturing the latent relationships among the tourists in each travel group. In addition, we conduct systematic experiments on the real world data. These experiments demonstrate that the TRAST model can be used as an assessment for travel group automatic formation but also provide more insights into the TAST model and the cocktail recommendation approach.

Our contributions of the cocktail approaches, and the TRAST model for travel package recommendations are each dashed rectangular box in the dashed circle identifies a travel group and the tourists in the same travel group are represented in this TRAST model, all the tourists’ travel preferences are represented by relationship distributions. For a set of tourists, who want to travel the same package, we can use their relationship distributions as features to cluster them, so as to put them into different travel groups. Thus, in this scenario, many clustering methods can be adopted. Since choosing clustering algorithm is beyond the scope of this paper, in the experiments, we refer to K-means one of the most popular clustering algorithms.

ADVANTAGES:

The results of the season splitting and price segmentation,
The understanding of the extracted topics,
A recommendation performance comparison between Cocktail and benchmark methods,
The evaluation of the TRAST model, and
A brief discussion on recommendations for travel groups.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1.1 GHz
- RAM – 256 MB (min)
- Hard Disk – 20 GB
- Floppy Drive – 1.44 MB
- Key Board – Standard Windows Keyboard
- Mouse – Two or Three Button Mouse
- Monitor – SVGA

2.3.2 SOFTWARE REQUIREMENTS:

Operating System : Windows XP
Front End : JAVA JDK 1.7/JSP
Back End : MYSQL Server

CHAPTER 3

3.0 SYSTEM DESIGN:

SYSTEM DESIGN

SYSTEM ARCHITECTURE:

ARCHITECTURE DIAGRAM:

CHAPTER 4

DATA FLOW DIAGRAM:

The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of input data to the system, various processing carried out on this data, and the output data is generated by this system.
The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling language in the field of object-oriented software engineering. The standard is managed, and was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of object oriented computer software. In its current form UML is comprised of two major components: a Meta-model and a notation. In the future, some form of method or process may also be added to; or associated with, UML.

The Unified Modeling Language is a standard language for specifying, Visualization, Constructing and documenting the artifacts of software system, as well as for business modeling and other non-software systems.

The UML represents a collection of best engineering practices that have proven successful in the modeling of large and complex systems.

The UML is a very important part of developing objects oriented software and the software development process. The UML uses mostly graphical notations to express the design of software projects.

GOALS:

The Primary goals in the design of the UML are as follows:

Provide users a ready-to-use, expressive visual modeling Language so that they can develop and exchange meaningful models.
Provide extendibility and specialization mechanisms to extend the core concepts.
Be independent of particular programming languages and development process.
Provide a formal basis for understanding the modeling language.
Encourage the growth of OO tools market.
Support higher level development concepts such as collaborations, frameworks, patterns and components.
Integrate best practices.

USE CASE DIAGRAM:

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical overview of the functionality provided by a system in terms of actors, their goals (represented as use cases), and any dependencies between those use cases. The main purpose of a use case diagram is to show what system functions are performed for which actor. Roles of the actors in the system can be depicted.

CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static structure diagram that describes the structure of a system by showing the system’s classes, their attributes, operations (or methods), and the relationships among the classes. It explains which class contains information.

SEQUENCE DIAGRAM:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams.

ACTIVITY DIAGRAM:

Activity diagrams are graphical representations of workflows of stepwise activities and actions with support for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams can be used to describe the business and operational step-by-step workflows of components in a system. An activity diagram shows the overall flow of control.

MODULES:

COCKTAIL APPROACH

TRAVEL PACKAGE

TRAST MODEL

RECOMMENDATIONS

MODULES DESCRIPTION:

COCKTAIL APPROACH:

Our cocktail approaches the specific preferences of the tourist while he/she is planning a trip, such as the transportation preference. In other words, we focus on designing the recommendation algorithm to attract the tourists before they make a travel decision rather than providing the travel support in the on-tour stage. Thus, our approach may be only useful in some situations (e.g., email marketing). Also, if we want to deploy this work for real-world services, tourist package and locating in the top of the recommendation list to a lack of interest of those packages, relevant (desirable) travel packages in the test set may be just a small fraction of the entire relevant ones that are actually of interest to each tourist.

TRAVEL PACKAGE:

We aim to make personalized travel package recommendations for the tourists. Thus, the users

are the tourists and the items are the existing packages, and we exploit a real-world travel data set provided by a travel company packages in a way that each tourist has traveled at least two different packages.

First, it is very sparse, and each tourist has only a few travel records. The extreme sparseness of the data leads to difficulties for using traditional recommendation techniques, such as collaborative filtering. For example, it is hard to find the credible nearest neighbors for the tourists because there are very few cotraveling packages.

Second, the travel data has strong time dependence. The travel packages often have a life cycle along with the change to the business demand, i.e., they only last for a certain period. In contrast, most of the landscapes will still be active after the original package has been discarded. These landscapes can be used to form new packages together with some other landscapes. Thus, we can observe that the landscapes are more sustainable and important than the package itself.

Third, landscape has some intrinsic features like the geographic location and the right travel seasons. Only the landscapes with similar spatial-temporal features are suitable for the same packages, i.e., the landscapes in one package have spatial-temporal autocorrelations and follow the first law of geography-everything is related to everything else, but the nearby things are more related than distant things.

Fourth, the tourists will consider both time and financial costs before they accept a package. This is quite different from the traditional recommendations where the cost of an item is usually not a concern. Thus, it is very important to profile the tourists based on their interests as well as the time and the money they can afford. Since the package with a higher price often tends to have more time and vice versa, in this paper we only take the price factor into consideration.

Fifth, people often travel with their friends, family, or colleagues. Even when two tourists in the same travel group are totally strangers, there must be some reasons for the travel company to put them together. For instance, they may be of the same age or have the same travel schedule. Hence, it is also very important to understand the relationships among the tourists in the same travel group. This understanding can help to form the travel group. Last but not least, few tourist ratings are available for travel packages. However, we can see that every choice of a travel package indicates the strong interest of the tourist in the content provided in the package.

TRAST MODEL:

TRAST model, all the tourists’ travel preferences are represented by relationship distributions. For a set of tourists, who want to travel the same package, we can use their relationship distributions as features to cluster them, so as to put them into different travel groups. Thus, in this scenario, many clustering methods can be adopted. Since choosing clustering algorithm is beyond the scope of this paper, in the experiments, we refer to K-means, one of the most popular clustering algorithms. Thus, the TRAST model can be used as an assessment for travel group automatic formation. Indeed, in real applications, when generating a travel group, some more external constraints, such as tourists’ travel date requirements, the travel company’s travel group schedule should also be considered by TRAST1 to represent the latent relationships directly.

TRAST model, the purchases of the tourists in each travel group are summed up as one single expense record and, thus, it has more complex generative process. We can understand this process by a simple example. Assume that two selected tourists in a travel group (U00d) are u1 and u2, who are young and dating with each other. Now, they decide to travel in winter (Sd) and the destination is North America (Ad). To generate a travel landscape (l), we first extract a relationship (r, e.g., lover), and then find a topic (t) for lovers to travel in the winter (e.g., skiing). Finally, based on this skiing topic and the selected travel area (e.g., Northeast America), we draw a landscape (e.g., Stowe, Vermont).

RECOMMENDATIONS:

The evaluations in previous sections are mainly focused on the individual (personalized) recommendations. Since there are tourists who frequently travel together, it is interesting to know whether the latent variables (e.g., the topics of each individual tourist and the relationships of a travel group) as well as the cocktail approaches are useful for making recommendations to a group of tourists. To this end, we performed an experimental study on group recommendations.

We adopt the widely used degree of agreement (DOA) and Top-K [23] as the evaluation metrics. Also, a simple user study was conducted and volunteers were invited to rate the recommendations. For comparison, we recorded the best performance of each algorithm by tuning their parameters, and we also set some general rules for fair comparison. For instance, for collaborative filtering-based methods, we usually consider the contribution of the nearest neighbors with similarity values larger than 0.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:

5.1.2 TECHNICAL FEASIBILITY:

5.1.3 SOCIAL FEASIBILITY:

5.2 SYSTEM TESTING:

5.2.1 UNIT TESTING:

UNIT TESTING:

Description	Expected result
Test for application window properties.	All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations.	All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

FUNCTIONAL TESTING:

Description	Expected result
Test for all modules.	All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group.	The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

Load testing
Performance testing
Usability testing
Reliability testing
Security testing

5.1.5 LOAD TESTING:

Load Testing

Description	Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received.	Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

PERFORMANCE TESTING:

Description	Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.	Should handle large input values, and produce accurate result in a expected time.

5.1.6 RELIABILITY TESTING:

RELIABILITY TESTING:

Description	Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application.	In case of failure of the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

SECURITY TESTING:

Description	Expected result
Checking that the user identification is authenticated.	In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers.	The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

5.1.8 WHITE BOX TESTING:

Description	Expected result
Exercise all logical decisions on their true and false sides.	All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds.	All the loops must be finite.
Exercise internal data structures to ensure their validity.	All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

5.1.10 BLACK BOX TESTING:

Description	Expected result
To check for incorrect or missing functions.	All the functions must be valid.
To check for interface errors.	The entire interface must function normally.
To check for errors in a data structures or external data base access.	The database updation and retrieval must be done.
To check for initialization and termination errors.	All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

6.0 SOFTWARE DESCRIPTION:

6.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

The Java Programming Language

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

Simple
- Architecture neutral
- Object oriented
- Portable
- Distributed
- High performance
- Interpreted
- Multithreaded
- Robust
- Dynamic
- Secure

6.2 THE JAVA PLATFORM:

The Java platform has two components:

The Java Virtual Machine (Java VM)
The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

6.3 WHAT CAN JAVA TECHNOLOGY DO?

The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
Applets: The set of conventions used by applets.
Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
Software components: Known as JavaBeans^TM, can plug into existing component architectures.
Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
Java Database Connectivity (JDBC^TM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

7.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure Java^TMProduct Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

6.5 ODBC:

Microsoft Open Database Connectivity (ODBC) is a standard programming interface for application developers and database systems providers. Before ODBC became a de facto standard for Windows programs to interface with database systems, programmers had to use proprietary languages for each database they wanted to connect to. Now, ODBC has made the choice of the database system almost irrelevant from a coding perspective, which is as it should be. Application developers have much more important things to worry about than the syntax that is needed to port their program from one database to another when business needs suddenly change.

6.6 JDBC:

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

6.7 JDBC Goals:

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

SQL Conformance

JDBC must be implemental on top of common database interfaces

Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

Keep it simple

Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

Keep the common cases simple

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple Architecture-neutral

Object-oriented Portable

Distributed High-performance

Interpreted Multithreaded

Robust Dynamic Secure

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

7.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

Sockets:

#include <sys/types.h>

#include <sys/socket.h>

int socket(int family, int type, int protocol);

6.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

6.8.1. Map Visualizations:

6.8.2. Time Series Chart Interactivity

6.8.3. Dashboards

6.8.4. Property Editors

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

CONCLUSION:

We present study on personalized travel package recommendation. Specifically, we first analyzed the unique characteristics of travel packages and developed the TAST model, a Bayesian network for travel package and tourist representation. The TAST model can discover the interests of the tourists and extract the spatial-temporal correlations among landscapes. Then, we exploited the TAST model for developing a cocktail approach on personalized travel package recommendation. This cocktail approach follows a hybrid recommendation strategy and has the ability to combine several constraints existing in the real-world scenario.

Furthermore, we extended the TAST model to the TRAST model, which can capture the relationships among tourists in each travel group. Finally, an empirical study was conducted on real-world travel data. Experimental results demonstrate that the TAST model can capture the unique characteristics of the travel packages, the cocktail approach can lead to better performances of travel package recommendation, and the TRAST model can be used as an effective assessment for travel group automatic formation. We hope these encouraging results could lead to many future works.

Single Image Super-Resolution Based on Gradient Proﬁle Sharpness

22/10/201902/07/2019 by admin

Real-Time Big Data Analytical Architecture for Remote Sensing Application

05/08/201902/07/2019 by admin

In today’s era, there is a great deal added to real-time remote sensing Big Data than it seems at first, and extracting the useful information in an efficient manner leads a system toward a major computational challenges, such as to analyze, aggregate, and store, where data are remotely collected. Keeping in view the above mentioned factors, there is a need for designing a system architecture that welcomes both realtime, as well as offline data processing. In this paper, we propose real-time Big Data analytical architecture for remote sensing satellite application.

The proposed architecture comprises three main units:

1) Remote sensing Big Data acquisition unit (RSDU);

2) Data processing unit (DPU); and

3) Data analysis decision unit (DADU).

First, RSDU acquires data from the satellite and sends this data to the Base Station, where initial processing takes place. Second, DPU plays a vital role in architecture for efficient processing of real-time Big Data by providing filtration, load balancing, and parallel processing. Third, DADU is the upper layer unit of the proposed architecture, which is responsible for compilation, storage of the results, and generation of decision based on the results received from DPU.

1.2 INTRODUCTION:

Recently, a great deal of interest in the field of Big Data and its analysis has risen mainly driven from extensive number of research challenges strappingly related to bonafide applications, such as modeling, processing, querying, mining, and distributing large-scale repositories. The term “Big Data” classifies specific kinds of data sets comprising formless data, which dwell in data layer of technical computing applications and the Web. The data stored in the underlying layer of all these technical computing application scenarios have some precise individualities in common, such as 1) largescale data, which refers to the size and the data warehouse; 2) scalability issues, which refer to the application’s likely to be running on large scale (e.g., Big Data); 3) sustain extraction transformation loading (ETL) method from low, raw data to well thought-out data up to certain extent; and 4) development of uncomplicated interpretable analytical over Big Data warehouses with a view to deliver an intelligent and momentous knowledge for them.

Big Data are usually generated by online transaction, video/audio, email, number of clicks, logs, posts, social network data, scientific data, remote access sensory data, mobile phones, and their applications. These data are accumulated in databases that grow extraordinarily and become complicated to confine, form, store, manage, share, process, analyze, and visualize via typical database software tools. Advancement in Big Data sensing and computer technology revolutionizes the way remote data collected, processed, analyzed, and managed. Particularly, most recently designed sensors used in the earth and planetary observatory system are generating continuous stream of data. Moreover, majority of work have been done in the various fields of remote sensory satellite image data, such as change detection, gradient-based edge detection region similarity based edge detection and intensity gradient technique for efficient intraprediction.

In this paper, we referred the high speed continuous stream of data or high volume offline data to “Big Data,” which is leading us to a new world of challenges. Such consequences of transformation of remotely sensed data to the scientific understanding are a critical task. Hence the rate at which volume of the remote access data is increasing, a number of individual users as well as organizations are now demanding an efficient mechanism to collect, process, and analyze, and store these data and its resources. Big Data analysis is somehow a challenging task than locating, identifying, understanding, and citing data. Having a large-scale data, all of this has to happen in a mechanized manner since it requires diverse data structure as well as semantics to be articulated in forms of computer-readable format.

However, by analyzing simple data having one data set, a mechanism is required of how to design a database. There might be alternative ways to store all of the same information. In such conditions, the mentioned design might have an advantage over others for certain process and possible drawbacks for some other purposes. In order to address these needs, various analytical platforms have been provided by relational databases vendors. These platforms come in various shapes from software only to analytical services that run in third-party hosted environment. In remote access networks, where the data source such as sensors can produce an overwhelming amount of raw data.

We refer it to the first step, i.e., data acquisition, in which much of the data are of no interest that can be filtered or compressed by orders of magnitude. With a view to using such filters, they do not discard useful information. For instance, in consideration of new reports, is it adequate to keep that information that is mentioned with the company name? Alternatively, is it necessary that we may need the entire report, or simply a small piece around the mentioned name? The second challenge is by default generation of accurate metadata that describe the composition of data and the way it was collected and analyzed. Such kind of metadata is hard to analyze since we may need to know the source for each data in remote access.

1.3 LITRATURE SURVEY:

BIG DATA AND CLOUD COMPUTING: CURRENT STATE AND FUTURE OPPORTUNITIES

AUTHOR: D. Agrawal, S. Das, and A. E. Abbadi

PUBLISH: Proc. Int. Conf. Extending Database Technol. (EDBT), 2011, pp. 530–533.

EXPLANATION:

Scalable database management systems (DBMS)—both for update intensive application workloads as well as decision support systems for descriptive and deep analytics—are a critical part of the cloud infrastructure and play an important role in ensuring the smooth transition of applications from the traditional enterprise infrastructures to next generation cloud infrastructures. Though scalable data management has been a vision for more than three decades and much research has focussed on large scale data management in traditional enterprise setting, cloud computing brings its own set of novel challenges that must be addressed to ensure the success of data management solutions in the cloud environment. This tutorial presents an organized picture of the challenges faced by application developers and DBMS designers in developing and deploying internet scale applications. Our background study encompasses both classes of systems: (i) for supporting update heavy applications, and (ii) for ad-hoc analytics and decision support. We then focus on providing an in-depth analysis of systems for supporting update intensive web-applications and provide a survey of the state-of-theart in this domain. We crystallize the design choices made by some successful systems large scale database management systems, analyze the application demands and access patterns, and enumerate the desiderata for a cloud-bound DBMS.

CHANGE DETECTION IN SYNTHETIC APERTURE RADAR IMAGE BASED ON FUZZY ACTIVE CONTOUR MODELS AND GENETIC ALGORITHMS

AUTHOR: J. Shi, J. Wu, A. Paul, L. Jiao, and M. Gong

PUBLISH: Math. Prob. Eng., vol. 2014, 15 pp., Apr. 2014.

EXPLANATION:

This paper presents an unsupervised change detection approach for synthetic aperture radar images based on a fuzzy active contour model and a genetic algorithm. The aim is to partition the difference image which is generated from multitemporal satellite images into changed and unchanged regions. Fuzzy technique is an appropriate approach to analyze the difference image where regions are not always statistically homogeneous. Since interval type-2 fuzzy sets are well-suited for modeling various uncertainties in comparison to traditional fuzzy sets, they are combined with active contour methodology for properly modeling uncertainties in the difference image. The interval type-2 fuzzy active contour model is designed to provide preliminary analysis of the difference image by generating intermediate change detection masks. Each intermediate change detection mask has a cost value. A genetic algorithm is employed to find the final change detection mask with the minimum cost value by evolving the realization of intermediate change detection masks. Experimental results on real synthetic aperture radar images demonstrate that change detection results obtained by the improved fuzzy active contour model exhibits less error than previous approaches.

A BIG DATA ARCHITECTURE FOR LARGE SCALE SECURITY MONITORING

AUTHOR: S. Marchal, X. Jiang, R. State, and T. Engel

PUBLISH: Proc. IEEE Int. Congr. Big Data, 2014, pp. 56–63.

EXPLANATION:

Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this paper we propose a solution to cope with the tremendous amount of data to analyse for security monitoring perspectives. We introduce an architecture dedicated to security monitoring of local enterprise networks. The application domain of such a system is mainly network intrusion detection and prevention, but can be used as well for forensic analysis. This architecture integrates two systems, one dedicated to scalable distributed data storage and management and the other dedicated to data exploitation. DNS data, NetFlow records, HTTP traffic and honeypot data are mined and correlated in a distributed system that leverages state of the art big data solution. Data correlation schemes are proposed and their performance are evaluated against several well-known big data framework including Hadoop and Spark.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Existing methods inapplicable on standard computers it is not desirable or possible to load the entire image into memory before doing any processing. In this situation, it is necessary to load only part of the image and process it before saving the result to the disk and proceeding to the next part. This corresponds to the concept of on-the-flow processing. Remote sensing processing can be seen as a chain of events or steps is generally independent from the following ones and generally focuses on a particular domain. For example, the image can be radio metrically corrected to compensate for the atmospheric effects, indices computed, before an object extraction based on these indexes takes place.

The typical processing chain will process the whole image for each step, returning the final result after everything is done. For some processing chains, iterations between the different steps are required to find the correct set of parameters. Due to the variability of satellite images and the variety of the tasks that need to be performed, fully automated tasks are rare. Humans are still an important part of the loop. These concepts are linked in the sense that both rely on the ability to process only one part of the data.

In the case of simple algorithms, this is quite easy: the input is just split into different non-overlapping pieces that are processed one by one. But most algorithms do consider the neighborhood of each pixel. As a consequence, in most cases, the data will have to be split into partially overlapping pieces. The objective is to obtain the same result as the original algorithm as if the processing was done in one go. Depending on the algorithm, this is unfortunately not always possible.

2.1.1 DISADVANTAGES:

A reader that loads the image, or part of the image in memory from the file on disk;

A filter which carries out a local processing that does not require access to neighboring pixels (a simple threshold for example), the processing can happen on CPU or GPU;

A filter that requires the value of neighboring pixels to compute the value of a given pixel (a convolution filter is a typical example), the processing can happen on CPU or GPU;

A writer to output the resulting image in memory into a file on disk, note that the file could be written in several steps. We will illustrate in this example how it is possible to compute part of the image in the whole pipeline, incurring only minimal computation overhead.

2.2 PROPOSED SYSTEM:

We present a remote sensing Big Data analytical architecture, which is used to analyze real time, as well as offline data. At first, the data are remotely preprocessed, which is then readable by the machines. Afterward, this useful information is transmitted to the Earth Base Station for further data processing. Earth Base Station performs two types of processing, such as processing of real-time and offline data. In case of the offline data, the data are transmitted to offline data-storage device. The incorporation of offline data-storage device helps in later usage of the data, whereas the real-time data is directly transmitted to the filtration and load balancer server, where filtration algorithm is employed, which extracts the useful information from the Big Data.

On the other hand, the load balancer balances the processing power by equal distribution of the real-time data to the servers. The filtration and load-balancing server not only filters and balances the load, but it is also used to enhance the system efficiency. Furthermore, the filtered data are then processed by the parallel servers and are sent to data aggregation unit (if required, they can store the processed data in the result storage device) for comparison purposes by the decision and analyzing server. The proposed architecture welcomes remote access sensory data as well as direct access network data (e.g., GPRS, 3G, xDSL, or WAN). The proposed architecture and the algorithms are implemented in applying remote sensing earth observatory data.

We proposed architecture has the capability of dividing, load balancing, and parallel processing of only useful data. Thus, it results in efficiently analyzing real-time remote sensing Big Data using earth observatory system. Furthermore, the proposed architecture has the capability of storing incoming raw data to perform offline analysis on largely stored dumps, when required. Finally, a detailed analysis of remotely sensed earth observatory Big Data for land and sea area are provided using .NET. In addition, various algorithms are proposed for each level of RSDU, DPU, and DADU to detect land as well as sea area to elaborate the working of architecture.

2.2.1 ADVANTAGES:

Big Data process high-speed, large amount of real-time remote sensory image data using our proposed architecture. It works on both DPU and DADU by taking data from medical application.

Our architecture for offline as well online traffic, we perform a simple analysis on remote sensing earth observatory data. We assume that the data are big in nature and difficult to handle for a single server.

The data are continuously coming from a satellite with high speed. Hence, special algorithms are needed to process, analyze, and make a decision from that Big Data. Here, in this section, we analyze remote sensing data for finding land, sea, or ice area.

We have used the proposed architecture to perform analysis and proposed an algorithm for handling, processing, analyzing, and decision-making for remote sensing Big Data images using our proposed architecture.

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1.1 GHz
- RAM – 256 MB (min)
- Hard Disk – 20 GB
- Floppy Drive – 1.44 MB
- Key Board – Standard Windows Keyboard
- Mouse – Two or Three Button Mouse
- Monitor – SVGA

2.3.2 SOFTWARE REQUIREMENTS:

Operating System : Windows XP or Win7
Front End : Microsoft Visual Studio .NET 2008
Script : C# Script
Back End : MS-SQL Server 2005
Document : MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system

The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.

DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.

DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data’s in the physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

All processes must have at least one data flow in and one data flow out.
All processes should modify the incoming data, producing new forms of outgoing data.
Each data store must be involved with at least one data flow.
Each external entity must be involved with at least one data flow.
A data flow must be attached to at least one process.

3.1 ARCHITECTURE DIAGRAM:

3.2 DATAFLOW DIAGRAM:

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:

3.3 CLASS DIAGRAM:

3.4 SEQUENCE DIAGRAM:

3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

Big Data covers diverse technologies same as cloud computing. The input of Big Data comes from social networks (Facebook, Twitter, LinkedIn, etc.), Web servers, satellite imagery, sensory data, banking transactions, etc. Regardless of very recent emergence of Big Data architecture in scientific applications, numerous efforts toward Big Data analytics architecture can already be found in the literature. Among numerous others, we propose remote sensing Big Data architecture to analyze the Big Data in an efficient manner as shown in Fig. 1. Fig. 1 delineates n number of satellites that obtain the earth observatory Big Data images with sensors or conventional cameras through which sceneries are recorded using radiations. Special techniques are applied to process and interpret remote sensing imagery for the purpose of producing conventional maps, thematic maps, resource surveys, etc. We have divided remote sensing Big Data architecture.

Healthcare scenarios, medical practitioners gather massive volume of data about patients, medical history, medications, and other details. The above-mentioned data are accumulated in drug-manufacturing companies. The nature of these data is very complex, and sometimes the practitioners are unable to show a relationship with other information, which results in missing of important information. With a view in employing advance analytic techniques for organizing and extracting useful information from Big Data results in personalized medication, the advance Big Data analytic techniques give insight into hereditarily causes of the disease.

4.1 ALGORITHM:

This algorithm takes satellite data or product and then filters and divides them into segments and performs load-balancing algorithm.

The processing algorithm calculates results for different parameters against each incoming block and sends them to the next level. In step 1, the calculation of mean, SD, absolute difference, and the number of values, which are greater than the maximum threshold, are performed. Furthermore, in the next step, the results are transmitted to the aggregation server.

ACA collects the results from each processing servers against each Bi and then combines, organizes, and stores these results in RDBMS database.

4.2 MODULES:

DATA ANALYSIS DECISION UNIT (DADU):

DATA PROCESSING UNIT (DPU):

REMOTE SENSING APPLICATION RSDU:

FINDINGS AND DISCUSSION:

ALGORITHM DESIGN AND TESTING:

4.3 MODULE DESCRIPTION:

DATA PROCESSING UNIT (DPU):

In data processing unit (DPU), the filtration and load balancer server have two basic responsibilities, such as filtration of data and load balancing of processing power. Filtration identifies the useful data for analysis since it only allows useful information, whereas the rest of the data are blocked and are discarded. Hence, it results in enhancing the performance of the whole proposed system. Apparently, the load-balancing part of the server provides the facility of dividing the whole filtered data into parts and assign them to various processing servers. The filtration and load-balancing algorithm varies from analysis to analysis; e.g., if there is only a need for analysis of sea wave and temperature data, the measurement of these described data is filtered out, and is segmented into parts.

Each processing server has its algorithm implementation for processing incoming segment of data from FLBS. Each processing server makes statistical calculations, any measurements, and performs other mathematical or logical tasks to generate intermediate results against each segment of data. Since these servers perform tasks independently and in parallel, the performance proposed system is dramatically enhanced, and the results against each segment are generated in real time. The results generated by each server are then sent to the aggregation server for compilation, organization, and storing for further processing.

DATA ANALYSIS DECISION UNIT (DADU):

DADU contains three major portions, such as aggregation and compilation server, results storage server(s), and decision making server. When the results are ready for compilation, the processing servers in DPU send the partial results to the aggregation and compilation server, since the aggregated results are not in organized and compiled form. Therefore, there is a need to aggregate the related results and organized them into a proper form for further processing and to store them. In the proposed architecture, aggregation and compilation server is supported by various algorithms that compile, organize, store, and transmit the results. Again, the algorithm varies from requirement to requirement and depends on the analysis needs. Aggregation server stores the compiled and organized results into the result’s storage with the intention that any server can use it as it can process at any time.

The aggregation server also sends the same copy of that result to the decision-making server to process that result for making decision. The decision-making server is supported by the decision algorithms, which inquire different things from the result, and then make various decisions (e.g., in our analysis, we analyze land, sea, and ice, whereas other finding such as fire, storms, Tsunami, earthquake can also be found). The decision algorithm must be strong and correct enough that efficiently produce results to discover hidden things and make decisions. The decision part of the architecture is significant since any small error in decision-making can degrade the efficiency of the whole analysis. DADU finally displays or broadcasts the decisions, so that any application can utilize those decisions at real time to make their development. The applications can be any business software, general purpose community software, or other social networks that need those findings (i.e., decision-making).

REMOTE SENSING APPLICATION RSDU:

Remote sensing promotes the expansion of earth observatory system as cost-effective parallel data acquisition system to satisfy specific computational requirements. The Earth and Space Science Society originally approved this solution as the standard for parallel processing in this particular qualifications for improved Big Data acquisition, soon it was recognized that traditional data processing technologies could not provide sufficient power for processing such kind of data. Therefore, the need for parallel processing of the massive volume of data was required, which could efficiently analyze the Big Data. For that reason, the proposed RSDU is introduced in the remote sensing Big Data architecture that gathers the data from various satellites around the globe as possible that the received raw data are distorted by scattering and absorption by various atmospheric gasses and dust particles. We assume that the satellite can correct the erroneous data.

However, to make the raw data into image format, the remote sensing satellite uses effective data analysis, remote sensing satellite preprocesses data under many situations to integrate the data from different sources, which not only decreases storage cost, but also improves analysis accuracy. The data must be corrected in different methods to remove distortions caused due to the motion of the platform relative to the earth, platform attitude, earth curvature, nonuniformity of illumination, variations in sensor characteristics, etc. The data is then transmitted to Earth Base Station for further processing using direct communication link. We divided the data processing procedure into two steps, such as real-time Big Data processing and offline Big Data processing. In the case of offline data processing, the Earth Base Station transmits the data to the data center for storage. This data is then used for future analyses. However, in real-time data processing, the data are directly transmitted to the filtration and load balancer server (FLBS), since storing of incoming real-time data degrades the performance of real-time processing.

FINDINGS AND DISCUSSION:

Preprocessed and formatted data from satellite contains all or some of the following parts depending on the product.

1) Main product header (MPH): It includes the products basis information, i.e., id, measurement and sensing time, orbit, information, etc.

2) Special products head (SPH): It contains information specific to each product or product group, i.e., number of data sets descriptors (DSD), directory of remaining data sets in the file, etc.

3) Annotation data sets (ADS): It contains information of quality, time tagged processing parameters, geo location tie points, solar, angles, etc.

4) Global annotation data sets (GADs): It contains calling factors, offsets, calibration information, etc.

5) Measurement data set (MDS): It contains measurements or graphical parameters calculated from the measurement including quality flag and the time tag measurement as well. The image data are also stored in this part and are the main element of our analysis.

The MPH and SPH data are in ASCII format, whereas all the other data sets are in binary format. MDS, ADS, and GADs consist of the sequence of records and one or more fields of the data for each record. In our case, the MDS contains number of records, and each record contains a number of fields. Each record of the MDS corresponds to one row of the satellite image, which is our main focus during analysis.

ALGORITHM DESIGN AND TESTING:

Our algorithms are proposed to process high-speed, large amount of real-time remote sensory image data using our proposed architecture. It works on both DPU and DADU by taking data from satellite as input to identify land and sea area from the data set. The set of algorithms contains four simple algorithms, i.e., algorithm I, algorithm II, algorithm III, and algorithm IV that work on filtrations and load balancer, processing servers, aggregation server, and on decision-making server, respectively. Algorithm I, i.e., filtration and load balancer algorithm (FLBA) works on filtration and load balancer to filter only the require data by discarding all other information. It also provides load balancing by dividing the data into fixed size blocks and sending them to the processing server, i.e., one or more distinct blocks to each server. This filtration, dividing, and load-balancing task speeds up our performance by neglecting unnecessary data and by providing parallel processing. Algorithm II, i.e., processing and calculation algorithm (PCA) processes filtered data and is implemented on each processing server. It provides various parameter calculations that are used in the decision-making process. The parameters calculations results are then sent to aggregation server for further processing. Algorithm III, i.e., aggregation and compilations algorithm (ACA) stores, compiles, and organizes the results, which can be used by decision-making server for land and sea area detection. Algorithm IV, i.e., decision-making algorithm (DMA) identifies land area and sea area by comparing the parameters results, i.e., from aggregation servers, with threshold values.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:

5.1.2 TECHNICAL FEASIBILITY:

5.1.3 SOCIAL FEASIBILITY:

5.2 SYSTEM TESTING:

5.2.1 UNIT TESTING:

UNIT TESTING:

Description	Expected result
Test for application window properties.	All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations.	All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

FUNCTIONAL TESTING:

Description	Expected result
Test for all modules.	All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group.	The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

Load testing
Performance testing
Usability testing
Reliability testing
Security testing

5.1.5 LOAD TESTING:

Load Testing

Description	Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received.	Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

PERFORMANCE TESTING:

Description	Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.	Should handle large input values, and produce accurate result in a expected time.

5.1.6 RELIABILITY TESTING:

RELIABILITY TESTING:

Description	Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application.	In case of failure of the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

SECURITY TESTING:

Description	Expected result
Checking that the user identification is authenticated.	In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers.	The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

5.1.8 WHITE BOX TESTING:

Description	Expected result
Exercise all logical decisions on their true and false sides.	All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds.	All the loops must be finite.
Exercise internal data structures to ensure their validity.	All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

5.1.10 BLACK BOX TESTING:

Description	Expected result
To check for incorrect or missing functions.	All the functions must be valid.
To check for interface errors.	The entire interface must function normally.
To check for errors in a data structures or external data base access.	The database updation and retrieval must be done.
To check for initialization and termination errors.	All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 7

7.0 SOFTWARE SPECIFICATION:

7.1 FEATURES OF .NET:

Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET Framework is a language-neutral platform for writing programs that can easily and securely interoperate. There’s no language barrier with .NET: there are numerous languages available to the developer including Managed C++, C#, Visual Basic and Java Script.

The .NET framework provides the foundation for components to interact seamlessly, whether locally or remotely on different platforms. It standardizes common data types and communications protocols so that components created in different languages can easily interoperate.

“.NET” is also the collective name given to various software components built upon the .NET platform. These will be both products (Visual Studio.NET and Windows.NET Server, for instance) and services (like Passport, .NET My Services, and so on).

7.2 THE .NET FRAMEWORK

The .NET Framework has two main parts:

1. The Common Language Runtime (CLR).

2. A hierarchical set of class libraries.

The CLR is described as the “execution engine” of .NET. It provides the environment within which programs run. The most important features are

Conversion from a low-level assembler-style language, called Intermediate Language (IL), into code native to the platform being executed on.
Memory management, notably including garbage collection.
Checking and enforcing security restrictions on the running code.
Loading and executing programs, with version control and other such features.
The following features of the .NET framework are also worth description:

Managed Code

The code that targets .NET, and which contains certain extra Information – “metadata” – to describe itself. Whilst both managed and unmanaged code can run in the runtime, only managed code contains the information that allows the CLR to guarantee, for instance, safe execution and interoperability.

Managed Data

With Managed Code comes Managed Data. CLR provides memory allocation and Deal location facilities, and garbage collection. Some .NET languages use Managed Data by default, such as C#, Visual Basic.NET and JScript.NET, whereas others, namely C++, do not. Targeting CLR can, depending on the language you’re using, impose certain constraints on the features available. As with managed and unmanaged code, one can have both managed and unmanaged data in .NET applications – data that doesn’t get garbage collected but instead is looked after by unmanaged code.

Common Type System

The CLR uses something called the Common Type System (CTS) to strictly enforce type-safety. This ensures that all classes are compatible with each other, by describing types in a common way. CTS define how types work within the runtime, which enables types in one language to interoperate with types in another language, including cross-language exception handling. As well as ensuring that types are only used in appropriate ways, the runtime also ensures that code doesn’t attempt to access memory that hasn’t been allocated to it.

Common Language Specification

The CLR provides built-in support for language interoperability. To ensure that you can develop managed code that can be fully used by developers using any programming language, a set of language features and rules for using them called the Common Language Specification (CLS) has been defined. Components that follow these rules and expose only CLS features are considered CLS-compliant.

7.3 THE CLASS LIBRARY

.NET provides a single-rooted hierarchy of classes, containing over 7000 types. The root of the namespace is called System; this contains basic types like Byte, Double, Boolean, and String, as well as Object. All objects derive from System. Object. As well as objects, there are value types. Value types can be allocated on the stack, which can provide useful flexibility. There are also efficient means of converting value types to object types if and when necessary.

The set of classes is pretty comprehensive, providing collections, file, screen, and network I/O, threading, and so on, as well as XML and database connectivity.

The class library is subdivided into a number of sets (or namespaces), each providing distinct areas of functionality, with dependencies between the namespaces kept to a minimum.

7.4 LANGUAGES SUPPORTED BY .NET

The multi-language capability of the .NET Framework and Visual Studio .NET enables developers to use their existing programming skills to build all types of applications and XML Web services. The .NET framework supports new versions of Microsoft’s old favorites Visual Basic and C++ (as VB.NET and Managed C++), but there are also a number of new additions to the family.

Visual Basic .NET has been updated to include many new and improved language features that make it a powerful object-oriented programming language. These features include inheritance, interfaces, and overloading, among others. Visual Basic also now supports structured exception handling, custom attributes and also supports multi-threading.

Visual Basic .NET is also CLS compliant, which means that any CLS-compliant language can use the classes, objects, and components you create in Visual Basic .NET.

Managed Extensions for C++ and attributed programming are just some of the enhancements made to the C++ language. Managed Extensions simplify the task of migrating existing C++ applications to the new .NET Framework.

C# is Microsoft’s new language. It’s a C-style language that is essentially “C++ for Rapid Application Development”. Unlike other languages, its specification is just the grammar of the language. It has no standard library of its own, and instead has been designed with the intention of using the .NET libraries as its own.

Microsoft Visual J# .NET provides the easiest transition for Java-language developers into the world of XML Web Services and dramatically improves the interoperability of Java-language programs with existing software written in a variety of other programming languages.

Active State has created Visual Perl and Visual Python, which enable .NET-aware applications to be built in either Perl or Python. Both products can be integrated into the Visual Studio .NET environment. Visual Perl includes support for Active State’s Perl Dev Kit.

Other languages for which .NET compilers are available include

FORTRAN
COBOL
Eiffel

ASP.NET XML WEB SERVICES	Windows Forms
Base Class Libraries
Common Language Runtime
Operating System

Fig1 .Net Framework

C#.NET is also compliant with CLS (Common Language Specification) and supports structured exception handling. CLS is set of rules and constructs that are supported by the CLR (Common Language Runtime). CLR is the runtime environment provided by the .NET Framework; it manages the execution of the code and also makes the development process easier by providing services.

C#.NET is a CLS-compliant language. Any objects, classes, or components that created in C#.NET can be used in any other CLS-compliant language. In addition, we can use objects, classes, and components created in other CLS-compliant languages in C#.NET .The use of CLS ensures complete interoperability among applications, regardless of the languages used to create the application.

CONSTRUCTORS AND DESTRUCTORS:

Constructors are used to initialize objects, whereas destructors are used to destroy them. In other words, destructors are used to release the resources allocated to the object. In C#.NET the sub finalize procedure is available. The sub finalize procedure is used to complete the tasks that must be performed when an object is destroyed. The sub finalize procedure is called automatically when an object is destroyed. In addition, the sub finalize procedure can be called only from the class it belongs to or from derived classes.

GARBAGE COLLECTION

Garbage Collection is another new feature in C#.NET. The .NET Framework monitors allocated resources, such as objects and variables. In addition, the .NET Framework automatically releases memory for reuse by destroying objects that are no longer in use.

In C#.NET, the garbage collector checks for the objects that are not currently in use by applications. When the garbage collector comes across an object that is marked for garbage collection, it releases the memory occupied by the object.

OVERLOADING

Overloading is another feature in C#. Overloading enables us to define multiple procedures with the same name, where each procedure has a different set of arguments. Besides using overloading for procedures, we can use it for constructors and properties in a class.

MULTITHREADING:

C#.NET also supports multithreading. An application that supports multithreading can handle multiple tasks simultaneously, we can use multithreading to decrease the time taken by an application to respond to user interaction.

STRUCTURED EXCEPTION HANDLING

C#.NET supports structured handling, which enables us to detect and remove errors at runtime. In C#.NET, we need to use Try…Catch…Finally statements to create exception handlers. Using Try…Catch…Finally statements, we can create robust and effective exception handlers to improve the performance of our application.

7.5 THE .NET FRAMEWORK

The .NET Framework is a new computing platform that simplifies application development in the highly distributed environment of the Internet.

OBJECTIVES OF .NET FRAMEWORK

1. To provide a consistent object-oriented programming environment whether object codes is stored and executed locally on Internet-distributed, or executed remotely.

2. To provide a code-execution environment to minimizes software deployment and guarantees safe execution of code.

3. Eliminates the performance problems.

There are different types of application, such as Windows-based applications and Web-based applications.

7.6 FEATURES OF SQL-SERVER

The OLAP Services feature available in SQL Server version 7.0 is now called SQL Server 2000 Analysis Services. The term OLAP Services has been replaced with the term Analysis Services. Analysis Services also includes a new data mining component. The Repository component available in SQL Server version 7.0 is now called Microsoft SQL Server 2000 Meta Data Services. References to the component now use the term Meta Data Services. The term repository is used only in reference to the repository engine within Meta Data Services

SQL-SERVER database consist of six type of objects,

They are,

1. TABLE

2. QUERY

3. FORM

4. REPORT

5. MACRO

7.7 TABLE:

A database is a collection of data about a specific topic.

VIEWS OF TABLE:

We can work with a table in two types,

1. Design View

2. Datasheet View

Design View

To build or modify the structure of a table we work in the table design view. We can specify what kind of data will be hold.

Datasheet View

To add, edit or analyses the data itself we work in tables datasheet view mode.

QUERY:

A query is a question that has to be asked the data. Access gathers data that answers the question from one or more table. The data that make up the answer is either dynaset (if you edit it) or a snapshot (it cannot be edited).Each time we run query, we get latest information in the dynaset. Access either displays the dynaset or snapshot for us to view or perform an action on it, such as deleting or updating.

CHAPTER 7

7.0 APPENDIX

7.1 SAMPLE SCREEN SHOTS:

7.2 SAMPLE SOURCE CODE:

CHAPTER 8

8.1 CONCLUSION AND FUTURE:

In this paper, we proposed architecture for real-time Big Data analysis for remote sensing applications in the architecture efficiently processed and analyzed real-time and offline remote sensing Big Data for decision-making. The proposed architecture is composed of three major units, such as 1) RSDU; 2) DPU; and 3) DADU. These units implement algorithms for each level of the architecture depending on the required analysis. The architecture of real-time Big is generic (application independent) that is used for any type of remote sensing Big Data analysis. Furthermore, the capabilities of filtering, dividing, and parallel processing of only useful information are performed by discarding all other extra data. These processes make a better choice for real-time remote sensing Big Data analysis.

The algorithms proposed in this paper for each unit and subunits are used to analyze remote sensing data sets, which helps in better understanding of land and sea area. The proposed architecture welcomes researchers and organizations for any type of remote sensory Big Data analysis by developing algorithms for each level of the architecture depending on their analysis requirement. For future work, we are planning to extend the proposed architecture to make it compatible for Big Data analysis for all applications, e.g., sensors and social networking. We are also planning to use the proposed architecture to perform complex analysis on earth observatory data for decision making at realtime, such as earthquake prediction, Tsunami prediction, fire detection, etc.

Real-Time Big Data Analytical Architecture for Remote Sensing Application

05/08/201902/07/2019 by admin

Rank-Based Similarity Search Reducing the Dimensional Dependence

05/08/201902/07/2019 by admin

This paper introduces a data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or other selection tests involving numerical constraints on distance values.

1.2 INTRODUCTION

Of the fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, perhaps the most widely-encountered is that of similarity search. Similarity search is the foundation of k-nearest-neighbor (k-NN) classification, which often produces competitively-low error rates in practice, particularly when the number of classes is large. The error rate of nearest-neighbor classification has been shown to be ‘asymptotically optimal’ as the training set size increases. For clustering, many of the most effective and popular strategies require the determination of neighbor sets based at a substantial proportion of the data set objects: examples include hierarchical (agglomerative) methods such as content-based filtering methods for recommender systems and anomaly detection methods commonly make use of k-NN techniques, either through the direct use of k-NN search, or by means of k-NN cluster analysis.

A very popular density-based measure, the Local Outlier Factor (LOF), relies heavily on k-NN set computation to determine the relative density of the data in the vicinity of the test point [8]. For data mining applications based on similarity search, data objects are typically modeled as feature vectors of attributes for which some measure of similarity is defined Motivated at least in part by the impact of similarity search on problems in data mining, machine learning, pattern recognition, and statistics, the design and analysis of scalable and effective similarity search structures has been the subject of intensive research for many decades. Until relatively recently, most data structures for similarity search targeted low-dimensional real vector space representations and the euclidean or other Lp distance metrics.

However, many public and commercial data sets available today are more naturally represented as vectors spanning many hundreds or thousands of feature attributes that can be real or integer-valued, ordinal or categorical, or even a mixture of these types. This has spurred the development of search structures for more general metric spaces, such as the MultiVantage-Point Tree, the Geometric Near-neighbor Access Tree (GNAT), Spatial Approximation Tree (SAT), the M-tree, and (more recently) the Cover Tree (CT). Despite their various advantages, spatial and metric search structures are both limited by an effect often referred to as the curse of dimensionality.

One way in which the curse may manifest itself is in a tendency of distances to concentrate strongly around their mean values as the dimension increases. Consequently, most pairwise distances become difficult to distinguish, and the triangle inequality can no longer be effectively used to eliminate candidates from consideration along search paths. Evidence suggests that when the representational dimension of feature vectors is high (roughly 20 or more traditional similarity search accesses an unacceptably-high proportion of the data elements, unless the underlying data distribution has special properties. Even though the local neighborhood information employed by data mining applications is useful and meaningful, high data dimensionality tends to make this local information very expensive to obtain.

The performance of similarity search indices depends crucially on the way in which they use similarity information for the identification and selection of objects relevant to the query. Virtually all existing indices make use of numerical constraints for pruning and selection. Such constraints include the triangle inequality (a linear constraint on three distance values), other bounding surfaces defined in terms of distance (such as hypercubes or hyperspheres), range queries involving approximation factors as in Locality-Sensitive Hashing (LSH) or absolute quantities as additive distance terms. One serious drawback of such operations based on numerical constraints such as the triangle inequality or distance ranges is that the number of objects actually examined can be highly variable, so much so that the overall execution time cannot be easily predicted.

Similarity search, researchers and practitioners have investigated practical methods for speeding up the computation of neighborhood information at the expense of accuracy. For data mining applications, the approaches considered have included feature sampling for local outlier detection, data sampling for clustering, and approximate similarity search for k-NN classification. Examples of fast approximate similarity search indices include the BD-Tree, a widely-recognized benchmark for approximate k-NN search; it makes use of splitting rules and early termination to improve upon the performance of the basic KD-Tree. One of the most popular methods for indexing, Locality-Sensitive Hashing can also achieve good practical search performance for range queries by managing parameters that influence a tradeoff between accuracy and time.

1.3 LITRATURE SURVEY

THE RELEVANT SET CORRELATION MODEL FOR DATA CLUSTERING

AUTHOR:

PUBLISH:

EXPLANATION:

AUTHOR:

PUBLISH:

EXPLANATION:

AUTHOR:

PUBLISH:

EXPLANATION:

CHAPTER 2

2.0 SYSTEM ANALYSIS:

2.1 EXISTING SYSTEM:

2.1.1 DISADVANTAGES:

2.2 PROPOSED SYSTEM:

2.2.1 ADVANTAGES:

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1.1 GHz
- RAM – 256 MB (min)
- Hard Disk – 20 GB
- Floppy Drive – 1.44 MB
- Key Board – Standard Windows Keyboard
- Mouse – Two or Three Button Mouse
- Monitor – SVGA

2.3.2 SOFTWARE REQUIREMENTS:

JAVA

Operating System : Windows XP or Win7
Front End : JAVA JDK 1.7
Back End : MYSQL Server
Server : Apache Tomact Server
Script : JSP Script
Document : MS-Office 2007

.NET

Operating System : Windows XP or Win7
Front End : Microsoft Visual Studio .NET 2008
Script : C# Script
Back End : MS-SQL Server 2005
Document : MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system

The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.

DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.

DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data’s in the physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

All processes must have at least one data flow in and one data flow out.
All processes should modify the incoming data, producing new forms of outgoing data.
Each data store must be involved with at least one data flow.
Each external entity must be involved with at least one data flow.
A data flow must be attached to at least one process.

3.1 ARCHITECTURE DIAGRAM

3.2 DATAFLOW DIAGRAM

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:

3.3 CLASS DIAGRAM:

3.4 SEQUENCE DIAGRAM:

3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

4.1 ALGORITHM

4.2 MODULES:

4.3 MODULE DESCRIPTION:

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:

5.1.2 TECHNICAL FEASIBILITY

5.1.3 SOCIAL FEASIBILITY:

5.2 SYSTEM TESTING:

5.2.1 UNIT TESTING:

Description	Expected result
Test for application window properties.	All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations.	All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.2 FUNCTIONAL TESTING:

Description	Expected result
Test for all modules.	All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group.	The result after execution should give the accurate result.

5.1. 3 NON-FUNCTIONAL TESTING:

Load testing
Performance testing
Usability testing
Reliability testing
Security testing

5.1.4 LOAD TESTING:

Description	Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received.	Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Description	Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.	Should handle large input values, and produce accurate result in a expected time.

5.1.6 RELIABILITY TESTING:

Description	Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application.	In case of failure of the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Description	Expected result
Checking that the user identification is authenticated.	In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers.	The peers should know group key in the same group.

5.1.8 WHITE BOX TESTING:

Description	Expected result
Exercise all logical decisions on their true and false sides.	All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds.	All the loops must be finite.
Exercise internal data structures to ensure their validity.	All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Description	Expected result
To check for incorrect or missing functions.	All the functions must be valid.
To check for interface errors.	The entire interface must function normally.
To check for errors in a data structures or external data base access.	The database updation and retrieval must be done.
To check for initialization and termination errors.	All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

6.0 SOFTWARE DESCRIPTION:

6.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

The Java Programming Language

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

Simple
- Architecture neutral
- Object oriented
- Portable
- Distributed
- High performance
- Interpreted
- Multithreaded
- Robust
- Dynamic
- Secure

6.2 THE JAVA PLATFORM:

The Java platform has two components:

The Java Virtual Machine (Java VM)
The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

6.3 WHAT CAN JAVA TECHNOLOGY DO?

The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
Applets: The set of conventions used by applets.
Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
Software components: Known as JavaBeans^TM, can plug into existing component architectures.
Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
Java Database Connectivity (JDBC^TM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

6.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure Java^TMProduct Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

6.5 ODBC:

6.6 JDBC:

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

6.7 JDBC Goals:

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

SQL Conformance

JDBC must be implemental on top of common database interfaces

Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

Keep it simple

Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

Keep the common cases simple

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple Architecture-neutral

Object-oriented Portable

Distributed High-performance

Interpreted Multithreaded

Robust Dynamic Secure

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

6.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

Sockets:

#include <sys/types.h>

#include <sys/socket.h>

int socket(int family, int type, int protocol);

6.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

6.8.1. Map Visualizations:

6.8.2. Time Series Chart Interactivity

6.8.3. Dashboards

6.8.4. Property Editors

CHAPTER 7

7.0 APPENDIX

7.1 SAMPLE SCREEN SHOTS:

7.2 SAMPLE SOURCE CODE:

CHAPTER 8

8.1 CONCLUSION

8.2 FUTURE ENHANCEMENT:

PSMPA Patient Self-Controllable and Multi-Level Privacy-Preserving Cooperative Authentication in Dist

05/08/201902/07/2019 by admin

The Distributed m-healthcare cloud computing system considerably facilitates secure and efficient patient treatment for medical consultation by sharing personal health information among the healthcare providers. This system should bring about the challenge of keeping both the data confidentiality and patients’ identity privacy simultaneously. Many existing access control and anonymous authentication schemes cannot be straightforwardly exploited. To solve the problem proposed a novel authorized accessible privacy model (AAPM) is established. Patients can authorize physicians by setting an access tree supporting flexible threshold predicates.

Our new technique of attribute based designated verifier signature, a patient self-controllable multi-level privacy preserving cooperative authentication scheme (PSMPA) realizing three levels of security and privacy requirement in distributed m-healthcare cloud computing system is proposed. The directly authorized physicians, the indirectly authorized physicians and the unauthorized persons in medical consultation can respectively decipher the personal health information and/or verify patients’ identities by satisfying the access tree with their own attribute sets.

1.2 INTRODUCTION:

Distributed m-healthcare cloud computing systems have been increasingly adopted worldwide including the European Commission activities, the US Health Insurance Portability and Accountability Act (HIPAA) and many other governments for efficient and high-quality medical treatment. In m-healthcare social networks, the personal health information is always shared among the patients located in respective social communities suffering from the same disease for mutual support, and across distributed healthcare providers (HPs) equipped with their own cloud servers for medical consultant. However, it also brings about a series of challenges, especially how to ensure the security and privacy of the patients’ personal health information from various attacks in the wireless communication channel such as eavesdropping and tampering As to the security facet, one of the main issues is access control of patients’ personal health information, namely it is only the authorized physicians or institutions that can recover the patients’ personal health information during the data sharing in the distributed m-healthcare cloud computing system. In practice, most patients are concerned about the confidentiality of their personal health information since it is likely to make them in trouble for each kind of unauthorized collection and disclosure.

Therefore, in distributed m-healthcare cloud computing systems, which part of the patients’ personal health information should be shared and which physicians their personal health information should be shared with have become two intractable problems demanding urgent solutions. There has emerged various research results focusing on them. A fine-grained distributed data access control scheme is proposed using the technique of attribute based encryption (ABE). A rendezvous-based access control method provides access privilege if and only if the patient and the physician meet in the physical world. Recently, a patient-centric and fine-grained data access control in multi-owner settings is constructed for securing personal health records in cloud computing. However, it mainly focuses on the central cloud computing system which is not sufficient for efficiently processing the increasing volume of personal health information in m-healthcare cloud computing system.

Moreover, it is not enough for to only guarantee the data confidentiality of the patient’s personal health information in the honest-but-curious cloud server model since the frequent communication between a patient and a professional physician can lead the adversary to conclude that the patient is suffering from a specific disease with a high probability. Unfortunately, the problem of how to protect both the patients’ data confidentiality and identity privacy in the distributed m-healthcare cloud computing scenario under the malicious model was left untouched.

In this paper, we consider simultaneously achieving data confidentiality and identity privacy with high efficiency. As is described in Fig. 1, in distributed m-healthcare cloud computing systems, all the members can be classified into three categories: the directly authorized physicians with green labels in the local healthcare provider who are authorized by the patients and can both access the patient’s personal health information and verify the patient’s identity and the indirectly authorized physicians with yellow labels in the remote healthcare providers who are authorized by the directly authorized physicians for medical consultant or some research purposes (i.e., since they are not authorized by the patients, we use the term ‘indirectly authorized’ instead). They can only access the personal health information, but not the patient’s identity. For the unauthorized persons with red labels, nothing could be obtained. By extending the techniques of attribute based access control and designated verifier signatures (DVS) on de-identified health information

1.3 LITRATURE SURVEY

SECURING PERSONAL HEALTH RECORDS IN CLOUD COMPUTING: PATIENT-CENTRIC AND FINE-GRAINED DATA ACCESS CONTROL IN MULTI-OWNER SETTINGS

AUTHOR: M. Li, S. Yu, K. Ren, and W. Lou

PUBLISH: Proc. 6th Int. ICST Conf. Security Privacy Comm. Netw., 2010, pp. 89–106.

EXPLANATION:

Online personal health record (PHR) enables patients to manage their own medical records in a centralized way, which greatly facilitates the storage, access and sharing of personal health data. With the emergence of cloud computing, it is attractive for the PHR service providers to shift their PHR applications and storage into the cloud, in order to enjoy the elastic resources and reduce the operational cost. However, by storing PHRs in the cloud, the patients lose physical control to their personal health data, which makes it necessary for each patient to encrypt her PHR data before uploading to the cloud servers. Under encryption, it is challenging to achieve fine-grained access control to PHR data in a scalable and efficient way. For each patient, the PHR data should be encrypted so that it is scalable with the number of users having access. Also, since there are multiple owners (patients) in a PHR system and every owner would encrypt her PHR files using a different set of cryptographic keys, it is important to reduce the key distribution complexity in such multi-owner settings. Existing cryptographic enforced access control schemes are mostly designed for the single-owner scenarios. In this paper, we propose a novel framework for access control to PHRs within cloud computing environment. To enable fine-grained and scalable access control for PHRs, we leverage attribute based encryption (ABE) techniques to encrypt each patients’ PHR data. To reduce the key distribution complexity, we divide the system into multiple security domains, where each domain manages only a subset of the users. In this way, each patient has full control over her own privacy, and the key management complexity is reduced dramatically.

PRIVACY AND EMERGENCY RESPONSE IN E-HEALTHCARE LEVERAGING WIRELESS BODY SENSOR NETWORKS

AUTHOR: J. Sun, Y. Fang, and X. Zhu

PUBLISH: IEEE Wireless Commun., vol. 17, no. 1, pp. 66–73, Feb. 2010.

EXPLANATION:

Electronic healthcare is becoming a vital part of our living environment and exhibits advantages over paper-based legacy systems. Privacy is the foremost concern of patients and the biggest impediment to e-healthcare deployment. In addressing privacy issues, conflicts from the functional requirements must be taken into account. One such requirement is efficient and effective response to medical emergencies. In this article, we provide detailed discussions on the privacy and security issues in e-healthcare systems and viable techniques for these issues. Furthermore, we demonstrate the design challenge in the fulfillment of conflicting goals through an exemplary scenario, where the wireless body sensor network is leveraged, and a sound solution is proposed to overcome the conflict.

HCPP: CRYPTOGRAPHY BASED SECURE EHR SYSTEM FOR PATIENT PRIVACY AND EMERGENCY HEALTHCARE

AUTHOR: J. Sun, X. Zhu, C. Zhang, and Y. Fang

PUBLISH: Proc. 31st Int. Conf. Distrib. Comput. Syst., 2011, pp. 373–382.

EXPLANATION:

Privacy concern is arguably the major barrier that hinders the deployment of electronic health record (EHR) systems which are considered more efficient, less error-prone, and of higher availability compared to traditional paper record systems. Patients are unwilling to accept the EHR system unless their protected health information (PHI) containing highly confidential data is guaranteed proper use and disclosure, which cannot be easily achieved without patients’ control over their own PHI. However, cautions must be taken to handle emergencies in which the patient may be physically incompetent to retrieve the controlled PHI for emergency treatment. In this paper, we propose a secure EHR system, HCPP (Healthcaresystem for Patient Privacy), based on cryptographic constructions and existing wireless network infrastructures, to provide privacy protection to patients under any circumstances while enabling timelyPHI retrieval for life-saving treatment in emergency situations. Furthermore, our HCPP system restricts PHI access to authorized (not arbitrary) physicians, who can be traced and held accountable if the accessed PHI is found improperly disclosed. Last but not least, HCPP leverages wireless network access to support efficient and private storage/retrieval of PHI, which underlies a secure and feasible EHR system.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Existing system data confidentiality is much important but in existing system framework it is not enough for to only guarantee the data confidentiality of the patient’s personal health information in the honest-but-curious cloud server model since the frequent communication between a patient and a professional physician can lead the adversary to conclude that the patient is suffering from a specific disease with a high probability. Unfortunately, the problem of how to protect both the patients’ data confidentiality and identity privacy in the distributed m-healthcare cloud computing scenario under the malicious model was left untouched.

Patients are unwilling to accept the EHR system unless their protected health information (PHI) containing highly confidential data is guaranteed proper use and disclosure, which cannot be easily achieved without patients’ control over their own PHI. However, cautions must be taken to handle emergencies in which the patient may be physically incompetent to retrieve the controlled PHI for emergency treatment a secure EHR system, HCPP (Health care system for Patient Privacy), based on cryptographic constructions and existing wireless network infrastructures, to provide privacy protection to patients under any circumstances while enabling timelyPHI retrieval for life-saving treatment in emergency situations.

2.1.1 DISADVANTAGES:

Existing applications in e-healthcare scenario can be realized through real-time, continuous vital monitoring to give immediate alerts of changes in patient status. Also, the WBAN operates in environments with open access by various people such as hospital or medical organization, which also accommodates attackers. The open wireless channel makes the data prone to be eavesdropped, modified, and injected. Many kinds of security threats have been existed, such as unauthenticated or unauthorized access, message disclosure, message modification, denial-of-service, node capture and compromised node, and routing attacks, etc. Among which two kinds of threats play the leading role, the threats from device compromise and the threats from network dynamics.

Existing problem of security is rising nowadays. Especially, the privacy of communication through Internet may be at risk of attacking in a number of ways. On-line collecting, transmitting, and processing of personal data cause a severe threat to privacy. Once the utilization of Internet-based services is concerned on-line, the lack of privacy in network communication is the main conversation in the public. This problem is far more significant in modern medical environment, as e-healthcare networks are implemented and developed. According to common standards, the network linked with general practitioners, hospitals, and social centers at a national or international scale. While suffering the risk of leaking the privacy data, such networks’ privacy information is facing great danger.

Data confidentiality is low.
Data redundancy is high.
There is a violation in data security.

2.2 PROPOSED SYSTEM:

We presented a new architecture of pseudonymiaztion for protecting privacy in E-health (PIPE) integrated pseudonymization of medical data, identity management, obfuscation of metadata with anonymous authentication to prevent disclosure attacks and statistical analysis in and suggested a secure mechanism guaranteeing anonymity and privacy in both the personal health information transferring and storage at a central m-healthcare cloud server.

We proposed an anonymous authentication of membership in dynamic groups. However, since the anonymous authentication mentioned above are established based on public key infrastructure (PKI), the need of an online certificate authority (CA) and one unique public key encryption for each symmetric key k for data encryption at the portal of authorized physicians made the overhead of the construction grow linearly with size of the group. Furthermore, the anonymity level depends on the size of the anonymity set making the anonymous authentication impractical in specific surroundings where the patients are sparsely distributed.

In this paper, the security and anonymity level of our proposed construction is significantly enhanced by associating it to the underlying Gap Bilinear Diffie-Hellman (GBDH) problem and the number of patients’ attributes to deal with the privacy leakage in patient sparsely distributed scenarios significantly, without the knowledge of which physician in the healthcare provider is professional in treating his illness, the best way for the patient is to encrypt his own PHI under a specified access policy rather than assign each physician a secret key. As a result, the authorized physicians whose attribute set satisfy the access policy can recover the PHI and the access control management also becomes more efficient.

2.2.1 ADVANTAGES:

Our advantages a patient-centric and fine-grained data access control using ABE to secure personal health records in cloud computing without privacy-preserving authentication. For comparison, to achieve the same functions of PSMPA, it could be considered as the combination of ABE and DVS that the computational complexity of PSMPA remains constant regardless of the number of directly authorized physicians and nearly half of the combination construction of ABE and DVS supporting flexible predicate.

The communication cost of PSMPA also remains constant; almost half of the combination construction and independent of the number of attributes d in that though the storage overhead of PSMPA is slightly more than the combination construction, it is independent of the number of directly authorized physicians and performs significantly better than traditional DVS, all of whose computational, communication and storage overhead increase linearly to the number of directly authorized physicians.

M-healthcare system is fully controlled and secured with encryption standards.
There is no data loss and data redundancy.
System provides full protection for patient’s data and their attributes.

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1.1 GHz
- RAM – 256 MB (min)
- Hard Disk – 20 GB
- Floppy Drive – 1.44 MB
- Key Board – Standard Windows Keyboard
- Mouse – Two or Three Button Mouse
- Monitor – SVGA

2.3.2 SOFTWARE REQUIREMENTS:

Operating System : Windows XP or Win7
Front End : Microsoft Visual Studio .NET 2008
Script : C# Script
Back End : MS-SQL Server 2005
Document : MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system

The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.

DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.

DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data’s in the physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

All processes must have at least one data flow in and one data flow out.
All processes should modify the incoming data, producing new forms of outgoing data.
Each data store must be involved with at least one data flow.
Each external entity must be involved with at least one data flow.
A data flow must be attached to at least one process.

3.1 ARCHITECTURE DIAGRAM

3.2 DATAFLOW DIAGRAM:

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:

3.3 CLASS DIAGRAM:

3.4 SEQUENCE DIAGRAM:

3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

In our implementation, we choose MIRACLE Library for simulating cryptographic operations using Microsoft C/C++ compilers. To achieve a comparable security of 1,024-bit RSA, According to the standards of Paring-based Crypto Librarya patient-centric and fine-grained data access control using ABE to secure personal health records in cloud computing [30] without privacy-preserving authentication. For comparison, to achieve the same functions of PSMPA, it could be considered as the combination of ABE and DVS that the computational complexity of PSMPA remains constant regardless of the number of directly authorized physicians and nearly half of the combination construction of ABE and DVS supporting flexible predicate. Fig. 5 illustrates the communication cost of PSMPA also remains constant, almost half of the combination construction and independent of the number of attributes d in vD. Fig. 6 shows that though the storage overhead of PSMPA is slightly more than the combination construction, it is independent of the number of directly authorized physicians and performs significantly better than traditional DVS, all of whose computational, communication and storage overhead increase linearly to the number of directly authorized physicians. that the computational and communication overhead of the combination construction decrease slightly faster than PSMPA as the threshold k increases, however, even when k reaches the maximum value equaling to d, the overheads are still much more than PSMPA. The comparison between our scheme and the anonymous authentication based on PKI the storage, communication and computational overhead towards N and k is identical to DVS, since to realize the same identity privacy, in all the constructions a pair of public key and private key would be assigned to each directly authorized physician and the number of signature operations is also linear to the number of physicians, independent of the threshold k. The simulation results show our PSMPA better adapts to the distributed m-healthcare cloud computing system than previous schemes, especially for enhancing the energy constrained mobile devices (the data sink’s) efficiency.

4.1 ALGORITHM

Attribute Based Designated Verifier Signature Scheme We propose a patient self-controllable and multi-level privacy-preserving cooperative authentication scheme based on ADVS to realize three levels of security and privacy requirement in distributed m-healthcare cloud computing system which mainly consists of the following five algorithms: Setup, Key Extraction, Sign, Verify and Transcript Simulation Generation. Denote the universe of attributes as U.

4.2 MODULES:

E-HEALTHCARE SYSTEM FRAMEWORK:

AUTHORIZED ACCESSIBLE PRIVACY MODEL:

SECURITY VERIFICATION:

PERFORMANCE EVALUATION:

4.3 MODULE DESCRIPTION:

E-healthcare System Framework:

E-healthcare System consists of three components: body area networks (BANs), wireless transmission networks and the healthcare providers equipped with their own cloud servers. The patient’s personal health information is securely transmitted to the healthcare provider for the authorized physicians to access and perform medical treatment. Illustrate the unique characteristics of distributed m-healthcare cloud computing systems where all the personal health information can be shared among patients suffering from the same disease for mutual support or among the authorized physicians in distributed healthcare providers and medical research institutions for medical consultation.

Authorized accessible privacy model:

Multi-level privacy-preserving cooperative authentication is established to allow the patients to authorize corresponding privileges to different kinds of physicians located in distributed healthcare providers by setting an access tree supporting flexible threshold predicates. Propose a novel authorized accessible privacy model for distributed m-healthcare cloud computing systems which consists of the following two components: an attribute based designated verifier signature scheme (ADVS) and the corresponding adversary model.

Security Verification:

The security and anonymity level of our proposed construction is significantly enhanced by associating it to the underlying Gap Bilinear Diffie-Hellman (GBDH) problem and the number of patients’ attributes to deal with the privacy leakage in patient sparsely distributed scenarios. More significantly, without the knowledge of which physician in the healthcare provider is professional in treating his illness, the best way for the patient is to encrypt his own PHI under a specified access policy rather than assign each physician a secret key. As a result, the authorized physicians whose attribute set satisfy the access policy can recover the PHI and the access control management also becomes more efficient.

Performance Evaluation:

The efficiency of PSMPA in terms of storage overhead, computational complexity and communication cost. a patient-centric and fine-grained data access control using ABE to secure personal health records in cloud computing without privacy-preserving authentication. To achieve the same security, our construction performs more efficiently than the traditional designated verifier signature for all the directly authorized physicians, where the overheads are linear to the number of directly authorized physicians.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:

5.1.2 TECHNICAL FEASIBILITY:

5.1.3 SOCIAL FEASIBILITY:

5.2 SYSTEM TESTING:

5.2.1 UNIT TESTING:

UNIT TESTING:

Description	Expected result
Test for application window properties.	All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations.	All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

FUNCTIONAL TESTING:

Description	Expected result
Test for all modules.	All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group.	The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

Load testing
Performance testing
Usability testing
Reliability testing
Security testing

5.1.5 LOAD TESTING:

Load Testing

Description	Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received.	Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

PERFORMANCE TESTING:

Description	Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.	Should handle large input values, and produce accurate result in a expected time.

5.1.6 RELIABILITY TESTING:

RELIABILITY TESTING:

Description	Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application.	In case of failure of the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

SECURITY TESTING:

Description	Expected result
Checking that the user identification is authenticated.	In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers.	The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

5.1.8 WHITE BOX TESTING:

Description	Expected result
Exercise all logical decisions on their true and false sides.	All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds.	All the loops must be finite.
Exercise internal data structures to ensure their validity.	All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

5.1.10 BLACK BOX TESTING:

Description	Expected result
To check for incorrect or missing functions.	All the functions must be valid.
To check for interface errors.	The entire interface must function normally.
To check for errors in a data structures or external data base access.	The database updation and retrieval must be done.
To check for initialization and termination errors.	All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 7

7.0 SOFTWARE SPECIFICATION:

7.1 FEATURES OF .NET:

7.2 THE .NET FRAMEWORK

The .NET Framework has two main parts:

1. The Common Language Runtime (CLR).

2. A hierarchical set of class libraries.

The CLR is described as the “execution engine” of .NET. It provides the environment within which programs run. The most important features are

Conversion from a low-level assembler-style language, called Intermediate Language (IL), into code native to the platform being executed on.
Memory management, notably including garbage collection.
Checking and enforcing security restrictions on the running code.
Loading and executing programs, with version control and other such features.
The following features of the .NET framework are also worth description:

Managed Code

Managed Data

Common Type System

Common Language Specification

7.3 THE CLASS LIBRARY

The set of classes is pretty comprehensive, providing collections, file, screen, and network I/O, threading, and so on, as well as XML and database connectivity.

The class library is subdivided into a number of sets (or namespaces), each providing distinct areas of functionality, with dependencies between the namespaces kept to a minimum.

7.4 LANGUAGES SUPPORTED BY .NET

Visual Basic .NET is also CLS compliant, which means that any CLS-compliant language can use the classes, objects, and components you create in Visual Basic .NET.

Other languages for which .NET compilers are available include

FORTRAN
COBOL
Eiffel

ASP.NET XML WEB SERVICES	Windows Forms
Base Class Libraries
Common Language Runtime
Operating System

Fig1 .Net Framework

CONSTRUCTORS AND DESTRUCTORS:

GARBAGE COLLECTION

OVERLOADING

MULTITHREADING:

STRUCTURED EXCEPTION HANDLING

7.5 THE .NET FRAMEWORK

The .NET Framework is a new computing platform that simplifies application development in the highly distributed environment of the Internet.

OBJECTIVES OF .NET FRAMEWORK

1. To provide a consistent object-oriented programming environment whether object codes is stored and executed locally on Internet-distributed, or executed remotely.

2. To provide a code-execution environment to minimizes software deployment and guarantees safe execution of code.

3. Eliminates the performance problems.

There are different types of application, such as Windows-based applications and Web-based applications.

7.6 FEATURES OF SQL-SERVER

SQL-SERVER database consist of six type of objects,

They are,

1. TABLE

2. QUERY

3. FORM

4. REPORT

5. MACRO

7.7 TABLE:

A database is a collection of data about a specific topic.

VIEWS OF TABLE:

We can work with a table in two types,

1. Design View

2. Datasheet View

Design View

To build or modify the structure of a table we work in the table design view. We can specify what kind of data will be hold.

Datasheet View

To add, edit or analyses the data itself we work in tables datasheet view mode.

QUERY:

CHAPTER 7

7.0 APPENDIX

7.1 SAMPLE SCREEN SHOTS:

7.2 SAMPLE SOURCE CODE:

CHAPTER 8

8.1 CONCLUSION AND FUTURE ENHANCEMENT:

In this paper, a novel authorized accessible privacy model and a patient self-controllable multi-level privacy preserving cooperative authentication scheme realizing three different levels of security and privacy requirement in the distributed m-healthcare cloud computing system are proposed, followed by the formal security proof and efficiency evaluations which illustrate our PSMPA can resist various kinds of malicious attacks and far outperforms previous schemes in terms of storage, computational and communication overhead.

Privacy-Preserving Detection of Sensitive Data Exposure

05/08/201902/07/2019 by admin

Statistics from security firms, research institutions and government organizations show that the numbers of data-leak instances have grown rapidly in recent years. Among various data-leak cases, human mistakes are one of the main causes of data loss. There exist solutions detecting inadvertent sensitive data leaks caused by human mistakes and to provide alerts for organizations. A common approach is to screen content in storage and transmission for exposed sensitive information. Such an approach usually requires the detection operation to be conducted in secrecy. However, this secrecy requirement is challenging to satisfy in practice, as detection servers may be compromised or outsourced.

In this paper, we present a privacy preserving data-leak detection (DLD) solution to solve the issue where a special set of sensitive data digests is used in detection. The advantage of our method is that it enables the data owner to safely delegate the detection operation to a semihonest provider without revealing the sensitive data to the provider. We describe how Internet service providers can offer their customers DLD as an add-on service with strong privacy guarantees. The evaluation results show that our method can support accurate detection with very small number of false alarms under various data-leak scenarios.

1.2 INTRODUCTION

According to a report from Risk Based Security (RBS), the number of leaked sensitive data records has increased dramatically during the last few years, i.e., from 412 million in 2012 to 822 million in 2013. Deliberately planned attacks, inadvertent leaks (e.g., forwarding confidential emails to unclassified email accounts), and human mistakes (e.g., assigning the wrong privilege) lead to most of the data-leak incidents. Detecting and preventing data leaks requires a set of complementary solutions, which may include data-leak detection, data confinement, stealthy malware detection and policy enforcement.

Network data-leak detection (DLD) typically performs deep packet inspection (DPI) and searches for any occurrences of sensitive data patterns. DPI is a technique to analyze payloads of IP/TCP packets for inspecting application layer data, e.g., HTTP header/content. Alerts are triggered when the amount of sensitive data found in traffic passes a threshold. The detection system can be deployed on a router or integrated into existing network intrusion detection systems (NIDS). Straightforward realizations of data-leak detection require the plaintext sensitive data.

However, this requirement is undesirable, as it may threaten the confidentiality of the sensitive information. If a detection system is compromised, then it may expose the plaintext sensitive data (in memory). In addition, the data owner may need to outsource the data-leak detection to providers, but may be unwilling to reveal the plaintext sensitive data to them. Therefore, one needs new data-leak detection solutions that allow the providers to scan content for leaks without learning the sensitive information.

In this paper, we propose a data-leak detection solution which can be outsourced and be deployed in a semihonest detection environment. We design, implement, and evaluate our fuzzy fingerprint technique that enhances data privacy during data-leak detection operations. Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data. Using our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected. Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. In another scenario, individuals can mark their own sensitive data and ask the administrator of their local network to detect data leaks for them.

In our detection procedure, the data owner computes a special set of digests or fingerprints from the sensitive data and then discloses only a small amount of them to the DLD provider. The DLD provider computes fingerprints from network traffic and identifies potential leaks in them. To prevent the DLD provider from gathering exact knowledge about the sensitive data, the collection of potential leaks is composed of real leaks and noises. It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.

Our contributions are summarized as follows.

1) We describe a privacy-preserving data-leak detection model for preventing inadvertent data leak in network traffic. Our model supports detection operation delegation and ISPs can provide data-leak detection as an add-on service to their customers using our model. We design, implement, and evaluate an efficient technique, fuzzy fingerprint, for privacy-preserving data-leak detection. Fuzzy fingerprints are special sensitive data digests prepared by the data owner for release to the DLD provider.

2) We implement our detection system and perform extensive experimental evaluation on 2.6 GB Enron dataset, Internet surfing traffic of 20 users, and also 5 simulated real-worlds data-leak scenarios to measure its privacy guarantee, detection rate and efficiency. Our results indicate high accuracy achieved by our underlying scheme with very low false positive rate. Our results also show that the detection accuracy does not degrade much when only partial (sampled) sensitive-data digests are used. In addition, we give an empirical analysis of our fuzzification as well as of the fairness of fingerprint partial disclosure.

1.3 LITRATURE SURVEY

PRIVACY-AWARE COLLABORATIVE SPAM FILTERING

AUTHORS: K. Li, Z. Zhong, and L. Ramaswamy

PUBLISH: IEEE Trans. Parallel Distrib. Syst., vol. 20, no. 5, pp. 725–739, May 2009.

EXPLANATION:

While the concept of collaboration provides a natural defense against massive spam e-mails directed at large numbers of recipients, designing effective collaborative anti-spam systems raises several important research challenges. First and foremost, since e-mails may contain confidential information, any collaborative anti-spam approach has to guarantee strong privacy protection to the participating entities. Second, the continuously evolving nature of spam demands the collaborative techniques to be resilient to various kinds of camouflage attacks. Third, the collaboration has to be lightweight, efficient, and scalable. Toward addressing these challenges, this paper presents ALPACAS-a privacy-aware framework for collaborative spam filtering. In designing the ALPACAS framework, we make two unique contributions. The first is a feature-preserving message transformation technique that is highly resilient against the latest kinds of spam attacks. The second is a privacy-preserving protocol that provides enhanced privacy guarantees to the participating entities. Our experimental results conducted on a real e-mail data set shows that the proposed framework provides a 10 fold improvement in the false negative rate over the Bayesian-based Bogofilter when faced with one of the recent kinds of spam attacks. Further, the privacy breaches are extremely rare. This demonstrates the strong privacy protection provided by the ALPACAS system.

DATA LEAK DETECTION AS A SERVICE: CHALLENGES AND SOLUTIONS

AUTHORS: X. Shu and D. Yao

PUBLISH: Proc. 8th Int. Conf. Secur. Privacy Commun. Netw., 2012, pp. 222–240

EXPLANATION:

We describe network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others. We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on the privacy, efficiency, accuracy and noise tolerance of our techniques. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed. It also indicates that the detection accuracy does not degrade when partial digests are used. We further provide a quantifiable method to measure the privacy guarantee offered by our fuzzy fingerprint framework.

QUANTIFYING INFORMATION LEAKS IN OUTBOUND WEB TRAFFIC

AUTHORS: K. Borders and A. Prakash

PUBLISH: Proc. 30th IEEE Symp. Secur. Privacy, May 2009, pp. 129–140.

EXPLANATION:

As the Internet grows and network bandwidth continues to increase, administrators are faced with the task of keeping confidential information from leaving their networks. Todaypsilas network traffic is so voluminous that manual inspection would be unreasonably expensive. In response, researchers have created data loss prevention systems that check outgoing traffic for known confidential information. These systems stop naive adversaries from leaking data, but are fundamentally unable to identify encrypted or obfuscated information leaks. What remains is a high-capacity pipe for tunneling data to the Internet. We present an approach for quantifying information leak capacity in network traffic. Instead of trying to detect the presence of sensitive data-an impossible task in the general case–our goal is to measure and constrain its maximum volume. We take advantage of the insight that most network traffic is repeated or determined by external information, such as protocol specifications or messages sent by a server. By filtering this data, we can isolate and quantify true information flowing from a computer. In this paper, we present measurement algorithms for the Hypertext Transfer Protocol (HTTP), the main protocol for Web browsing. When applied to real Web browsing traffic, the algorithms were able to discount 98.5% of measured bytes and effectively isolate information leaks.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Existing detecting and preventing data leaks requires a set of complementary solutions, which may include data-leak detection, data confinement, stealthy malware detection, and policy enforcement.

Network data-leak detection (DLD) typically performs deep packet inspection (DPI) and searches for any occurrences of sensitive data patterns. DPI is a technique to analyze payloads of IP/TCP packets for inspecting application layer data, e.g., HTTP header/content.

Alerts are triggered when the amount of sensitive data found in traffic passes a threshold. The detection system can be deployed on a router or integrated into existing network intrusion detection systems (NIDS).

Straightforward realizations of data-leak detection require the plaintext sensitive data. However, this requirement is undesirable, as it may threaten the confidentiality of the sensitive information. If a detection system is compromised, then it may expose the plaintext sensitive data (in memory).

In addition, the data owner may need to outsource the data-leak detection to providers, but may be unwilling to reveal the plaintext sensitive data to them. Therefore, one needs new data-leak detection solutions that allow the providers to scan content for leaks without learning the sensitive information.

2.1.1 DISADVANTAGES:

As the Internet grows and network bandwidth continues to increase, administrators are faced with the task of keeping confidential information from leaving their networks. In response, researchers have created data loss prevention systems that check outgoing traffic for known confidential information.

These systems stop naive adversaries from leaking data, but are fundamentally unable to identify encrypted or obfuscated information leaks. What remains is a high-capacity pipe for tunneling data to the Internet.

Existing approach for quantifying information leak capacity in network traffic instead of trying to detect the presence of sensitive data-an impossible task in the general case–our goal is to measure and constrain its maximum volume.

We take disadvantage of the insight that most network traffic is repeated or determined by external information, such as protocol specifications or messages sent by a server. By filtering this data, we can isolate and quantify true information flowing from a computer.

2.2 PROPOSED SYSTEM:

We propose a data-leak detection solution which can be outsourced and be deployed in a semihonest detection environment. We design, implement, and evaluate our fuzzy fingerprint technique that enhances data privacy during data-leak detection operations.

Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data.

Our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected. Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. In another scenario, individuals can mark their own sensitive data and ask the administrator of their local network to detect data leaks for them.

Our detection procedure, the data owner computes a special set of digests or fingerprints from the sensitive data and then discloses only a small amount of them to the DLD provider. The DLD provider computes fingerprints from network traffic and identifies potential leaks in them.

To prevent the DLD provider from gathering exact knowledge about the sensitive data, the collection of potential leaks is composed of real leaks and noises. It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.

2.2.1 ADVANTAGES:

We describe privacy-preserving data-leak detection model for preventing inadvertent data leak in network traffic. Our model supports detection operation delegation and ISPs can provide data-leak detection as an add-on service to their customers using our model.

We design, implement, and evaluate an efficient technique, fuzzy fingerprint, for privacy-preserving data-leak detection. Fuzzy fingerprints are special sensitive data digests prepared by the data owner for release to the DLD provider.

We implement our detection system and perform extensive experimental evaluation on internet surfing traffic of 20 users, and also 5 simulated real-worlds data-leak scenarios to measure its privacy guarantee, detection rate and efficiency.

Our results indicate high accuracy achieved by our underlying scheme with very low false positive rate. Our results also show that the detection accuracy does not degrade much when only partial (sampled) sensitive-data digests are used an empirical analysis of our fuzzification as well as of the fairness of fingerprint partial disclosure.

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v Processor – Pentium –IV

Speed – 1.1 GHz
- RAM – 256 MB (min)
- Hard Disk – 20 GB
- Floppy Drive – 1.44 MB
- Key Board – Standard Windows Keyboard
- Mouse – Two or Three Button Mouse
- Monitor – SVGA

2.3.2 SOFTWARE REQUIREMENTS:

Operating System : Windows XP or Win7
Front End : Microsoft Visual Studio .NET
Back End : MS-SQL Server
Server : ASP .NET Web Server
Script : C# Script
Document : MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system

The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.

DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.

DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data’s in the physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

All processes must have at least one data flow in and one data flow out.
All processes should modify the incoming data, producing new forms of outgoing data.
Each data store must be involved with at least one data flow.
Each external entity must be involved with at least one data flow.
A data flow must be attached to at least one process.

3.2 DATAFLOW DIAGRAM

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:

3.3 CLASS DIAGRAM:

3.4 SEQUENCE DIAGRAM:

3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

FUZZY FINGERPRINT METHOD AND PROTOCOL

We describe technical details of our fuzzy fingerprint mechanism in this section. The DLD provider obtains digests of sensitive data from the data owner. The data owner uses a sliding window and Rabin fingerprint algorithm to generate short and hard to-reverse (i.e., one-way) digests through the fast polynomial modulus operation. The sliding window generates small fragments of the processed data (sensitive data or network traffic), which preserves the local features of the data and provides the noise tolerance property. Rabin fingerprints are computed as polynomial modulus operations, and can be implemented with fast XOR, shift, and table look-up operations.

The Rabin fingerprint algorithm has a unique min-wise independence property, which supports fast random fingerprints selection (in uniform distribution) for partial fingerprints disclosure. The shingle-and-fingerprint process is defined as follows. A sliding window is used to generate q-grams on an input binary string first. The fingerprints of q-grams are then computed. A shingle (q-gram) is a fixed-size sequence of contiguous bytes. For example, the 3-gram shingle set of string abcdefgh consists of six elements {abc, bcd, cde, def, efg, fgh}. Local feature preservation is accomplished through the use of shingles. Therefore, our approach can tolerate sensitive data modification to some extent, e.g., inserted tags, small amount of character substitution, and lightly reformatted data.

From the detection perspective, a straightforward method is for the DLD provider to raise an alert if any sensitive fingerprint matches the fingerprints from the traffic.1 However, this approach has a privacy issue. If there is a data leak, there is a match between two fingerprints from sensitive data and network traffic. Then, the DLD provider learns the corresponding shingle, as it knows the content of the packet. Therefore, the central challenge is to prevent the DLD provider from learning the sensitive values even in data-leak scenarios, while allowing the provider to carry out the traffic inspection.

We propose an efficient technique to address this problem. The main idea is to relax the comparison criteria by strategically introducing matching instances on the DLD provider’s side without increasing false alarms for the data owner. Specifically, i) the data owner perturbs the sensitive-data fingerprints before disclosing them to the DLD provider, and ii) the DLD provider detects leaking by a range-based comparison instead of the exact match. The range used in the comparison is pre-defined by the data owner and correlates to the perturbation procedure.

4.2 MODULES:

NETWORK SECURITY PRIVACY:

SECURITY GOAL AND THREAT MODEL:

PRIVACY GOAL AND THREAT MODEL:

PRIVACY-ENHANCING DLD:

EXPERIMENTAL EVALUATION

4.3 MODULE DESCRIPTION:

NETWORK SECURITY PRIVACY:

SECURITY GOAL AND THREAT MODEL:

We categorize three causes for sensitive data to appear on the outbound traffic of an organization, including the legitimate data use by the employees.

• Case I Inadvertent data leak: The sensitive data is accidentally leaked in the outbound traffic by a legitimate user. This paper focuses on detecting this type of accidental data leaks over supervised network channels. Inadvertent data leak may be due to human errors such as forgetting to use encryption, carelessly forwarding an internal email and attachments to outsiders or due to application flaws (such as described in a supervised network channel could be an unencrypted channel or an encrypted channel where the content in it can be extracted and checked by an authority. Such a channel is widely used for advanced NIDS where MITM (man-in-the-middle) SSL sessions are established instead of normal SSL sessions.

• Case II Malicious data leak: A rogue insider or a piece of stealthy software may steal sensitive personal or organizational data from a host. Because the malicious adversary can use strong private encryption, steganography or covert channels to disable content-based traffic inspection, this type of leaks is out of the scope of our network-based solution host-based defenses (such as detecting the infection onset need to be deployed instead.

• Case III Legitimate and intended data transfer: The sensitive data is sent by a legitimate user intended for legitimate purposes. In this paper, we assume that the data owner is aware of legitimate data transfers and permits such transfers. So the data owner can tell whether a piece of sensitive data in the network traffic is a leak using legitimate data transfer policies.

PRIVACY GOAL AND THREAT MODEL:

DLD provider from gaining knowledge of sensitive data during the detection process, we need to set up a privacy goal that is complementary to the security goal above. We model the DLD provider as a semi-honest adversary, who follows our protocol to carry out the operations, but may attempt to gain knowledge about the sensitive data of the data owner. Our privacy goal is defined as follows. The DLD provider is given digests of sensitive data from the data owner and the content of network traffic to be examined. The DLD provider should not find out the exact value of a piece of sensitive data with a probability greater than 1 K, where K is an integer representing the number of all possible sensitive-data candidates that can be inferred by the DLD provider. We present a privacy-preserving DLD model with a new fuzzy fingerprint mechanism to improve the data protection against semi-honest DLD provider. We generate digests of sensitive data through a one-way function, and then hide the sensitive values among other non-sensitive values via fuzzification. The privacy guarantee of such an approach is much higher than 1 K when there is no leak in traffic, because the adversary’s inference can only be gained through brute-force guesses. The traffic content is accessible by the DLD provider in plaintext. Therefore, in the event of a data leak, the DLD provider may learn sensitive information from the traffic, which is inevitable for all deep packet inspection approaches. Our solution confines the amount of maximal information learned during the detection and provides quantitative guarantee for data privacy.

PRIVACY-ENHANCING DLD:

Our privacy-preserving data-leak detection method supports practical data-leak detection as a service and minimizes the knowledge that a DLD provider may gain during the process. Fig. 1 lists the six operations executed by the data owner and the DLD provider in our protocol. They include PREPROCESS run by the data owner to prepare the digests of sensitive data, RELEASE for the data owner to send the digests to the DLD provider, MONITOR and DETECT for the DLD provider to collect outgoing traffic of the organization, compute digests of traffic content, and identify potential leaks, REPORT for the DLD provider to return data-leak alerts to the data owner where there may be false positives (i.e., false alarms), and POSTPROCESS for the data owner to pinpoint true data-leak instances. Details are presented in the next section. The protocol is based on strategically computing data similarity, specifically the quantitative similarity between the sensitive information and the observed network traffic. High similarity indicates potential data leak. For data-leak detection, the ability to tolerate a certain degree of data transformation in traffic is important. We refer to this property as noise tolerance.

Our key idea for fast and noise-tolerant comparison is the design and use of a set of local features that are representatives of local data patterns, e.g., when byte b2 appears in the sensitive data, it is usually surrounded by bytes b1 and b3 forming a local pattern b1, b2, b3. Local features preserve data patterns even when modifications (insertion, deletion, and substitution) are made to parts of the data. For example, if a byte b4 is inserted after b3, the local pattern b1, b2, b3 is retained though the global pattern (e.g., a hash of the entire document) is destroyed. To achieve the privacy goal, the data owner generates a special type of digests, which we call fuzzy fingerprints. Intuitively, the purpose of fuzzy fingerprints is to hide the true sensitive data in a crowd. It prevents the DLD provider from learning its exact value. We describe the technical details next.

EXPERIMENTAL EVALUATION:

Our data-leak detection solution can be outsourced and be deployed in a fuzzy fingerprint technique that enhances data privacy during data-leak detection operations. Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data. Using our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected.

Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. Our fuzzy fingerprint framework in Python, including packet collection, shingling, Rabin fingerprinting, as well as partial disclosure and fingerprint filter extensions Rabin fingerprint is based on cyclic redundancy code (CRC). We use the padding scheme mentioned in to handle small inputs. In all experiments, the shingles are in 8-byte, and the fingerprints are in 32-bit (33-bit irreducible polynomials in Rabin fingerprint).

We set up a networking environment in VirtualBox, and make a scenario where the sensitive data is leaked from a local network to the Internet. Multiple users’ hosts (Windows 7) are put in the local network, which connect to the Internet via a gateway (Fedora). Multiple servers (HTTP, FTP, etc.) and an attacker-controlled host are put on the Internet side. The gateway dumps the network traffic and sends it to a DLD server/provider (Linux). Using the sensitive-data fingerprints defined by the users in the local network, the DLD server performs off-line data-leak detection.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:

5.1.2 TECHNICAL FEASIBILITY:

5.1.3 SOCIAL FEASIBILITY:

5.2 SYSTEM TESTING:

5.2.1 UNIT TESTING:

UNIT TESTING:

Description	Expected result
Test for application window properties.	All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations.	All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

FUNCTIONAL TESTING:

Description	Expected result
Test for all modules.	All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group.	The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

Load testing
Performance testing
Usability testing
Reliability testing
Security testing

5.1.5 LOAD TESTING:

Load Testing

Description	Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received.	Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

PERFORMANCE TESTING:

Description	Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.	Should handle large input values, and produce accurate result in a expected time.

5.1.6 RELIABILITY TESTING:

RELIABILITY TESTING:

Description	Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application.	In case of failure of the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

SECURITY TESTING:

Description	Expected result
Checking that the user identification is authenticated.	In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers.	The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

5.1.8 WHITE BOX TESTING:

Description	Expected result
Exercise all logical decisions on their true and false sides.	All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds.	All the loops must be finite.
Exercise internal data structures to ensure their validity.	All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

5.1.10 BLACK BOX TESTING:

Description	Expected result
To check for incorrect or missing functions.	All the functions must be valid.
To check for interface errors.	The entire interface must function normally.
To check for errors in a data structures or external data base access.	The database updation and retrieval must be done.
To check for initialization and termination errors.	All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 7

7.0 SOFTWARE SPECIFICATION:

7.1 FEATURES OF .NET:

7.2 THE .NET FRAMEWORK

The .NET Framework has two main parts:

1. The Common Language Runtime (CLR).

2. A hierarchical set of class libraries.

The CLR is described as the “execution engine” of .NET. It provides the environment within which programs run. The most important features are

Conversion from a low-level assembler-style language, called Intermediate Language (IL), into code native to the platform being executed on.
Memory management, notably including garbage collection.
Checking and enforcing security restrictions on the running code.
Loading and executing programs, with version control and other such features.
The following features of the .NET framework are also worth description:

Managed Code

Managed Data

Common Type System

Common Language Specification

7.3 THE CLASS LIBRARY

The set of classes is pretty comprehensive, providing collections, file, screen, and network I/O, threading, and so on, as well as XML and database connectivity.

The class library is subdivided into a number of sets (or namespaces), each providing distinct areas of functionality, with dependencies between the namespaces kept to a minimum.

7.4 LANGUAGES SUPPORTED BY .NET

Visual Basic .NET is also CLS compliant, which means that any CLS-compliant language can use the classes, objects, and components you create in Visual Basic .NET.

Other languages for which .NET compilers are available include

FORTRAN
COBOL
Eiffel

ASP.NET XML WEB SERVICES	Windows Forms
Base Class Libraries
Common Language Runtime
Operating System

Fig1 .Net Framework

CONSTRUCTORS AND DESTRUCTORS:

GARBAGE COLLECTION

OVERLOADING

MULTITHREADING:

STRUCTURED EXCEPTION HANDLING

7.5 THE .NET FRAMEWORK

The .NET Framework is a new computing platform that simplifies application development in the highly distributed environment of the Internet.

OBJECTIVES OF .NET FRAMEWORK

1. To provide a consistent object-oriented programming environment whether object codes is stored and executed locally on Internet-distributed, or executed remotely.

2. To provide a code-execution environment to minimizes software deployment and guarantees safe execution of code.

3. Eliminates the performance problems.

There are different types of application, such as Windows-based applications and Web-based applications.

7.6 FEATURES OF SQL-SERVER

SQL-SERVER database consist of six type of objects,

They are,

1. TABLE

2. QUERY

3. FORM

4. REPORT

5. MACRO

7.7 TABLE:

A database is a collection of data about a specific topic.

VIEWS OF TABLE:

We can work with a table in two types,

1. Design View

2. Datasheet View

Design View

To build or modify the structure of a table we work in the table design view. We can specify what kind of data will be hold.

Datasheet View

To add, edit or analyses the data itself we work in tables datasheet view mode.

QUERY:

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION

Our fuzzy fingerprint method differs from these solutions and enables its adopter to provide dataleak detection as a service. The customer or data owner does not need to fully trust the DLD provider using our approach. Bloom filter is a space-saving data structure for set membership test, and it is used in network security from network layer in the fuzzy Bloom filter invented in constructs a special Bloom filter that probabilistically sets the corresponding filter bits to 1’s. We designed to support a resource-sufficient routing scheme; it is a potential privacy-preserving technique. We do not invent a variant of Bloom filter for our fuzzy fingerprint, and our fuzzification process is separate from membership test. The advantage of separating fingerprint fuzzification from membership test is that it is flexible to test whether the fingerprint is sensitive with or without fuzzification

Our fuzzy fingerprint solution for data-leak detection, there are other privacy-preserving techniques invented for specific processes, e.g., DATA matching or for general purpose use, e.g., secure multi-party computation (SMC). SMC is a cryptographic mechanism, which supports a wide range of fundamental arithmetic, set, and string operations as well as complex functions such as knapsack computation, automated trouble-shooting, network event statistics, private information retrieval, genomic computation, private database query, private join operations and distributed data mining. The provable privacy guarantees offered by SMC comes at a cost in terms of computational complexity and realization difficulty. The advantage of our approach is its concision and efficiency.

8.2 FUTURE ENHANCEMENT:

We proposed fuzzy fingerprint, a privacy-preserving data-leak detection model and present its realization. Using special digests, the exposure of the sensitive data is kept to a minimum during the detection. We have conducted extensive experiments to validate the accuracy, privacy, and efficiency of our solutions. For future work, we plan to focus on designing a host-assisted mechanism for the complete data-leak detection for large-scale organizations.