Hop-by-Hop Message Authentication and Source Privacy in Wireless Sensor Networks

HOP-BY-HOP MESSAGE AUTHENTICATION AND SOURCE PRIVACY IN WIRELESS SENSOR NETWORKS

By

A

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the                                                  FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

Of

MASTER OF TECHNOLOGY

IN

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

BONAFIDE CERTIFICATE

Certified that this project report titled “HOP-BY-HOP MESSAGE AUTHENTICATION AND SOURCE PRIVACY IN WIRELESS SENSOR NETWORKS” is the bonafide work of Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide                                                                             Signature of the H.O.D

Name                                                                                                           Name

CHAPTER 1

ABSTRACT:

Message authentication is one of the most effective ways to thwart unauthorized and corrupted messages from being forwarded in wireless sensor networks (WSNs). For this reason, many message authentication schemes have been developed, based on either symmetric-key cryptosystems or public-key cryptosystems. Most of them, however, have the limitations of high computational and communication overhead in addition to lack of scalability and resilience to node compromise attacks. To address these issues, a polynomial-based scheme was recently introduced. However, this scheme and its extensions all have the weakness of a built-in threshold determined by the degree of the polynomial: when the number of messages transmitted is larger than this threshold, the adversary can fully recover the polynomial.

In this paper, we propose a scalable authentication scheme based on elliptic curve cryptography (ECC). While enabling intermediate nodes authentication, our proposed scheme allows any node to transmit an unlimited number of messages without suffering the threshold problem. In addition, our scheme can also provide message source privacy. Both theoretical analysis and simulation results demonstrate that our proposed scheme is more efficient than the polynomial-based approach in terms of computational and communication overhead under comparable security levels while providing message source privacy.

INTRODUCTION:

MESSAGE authentication plays a key role in thwarting unauthorized and corrupted messages from being forwarded in networks to save the precious sensor energy. For this reason, many authentication schemes have been proposed in literature to provide message authenticity and integrity verification for wireless sensor networks (WSNs) .These schemes can largely be divided into two categories: public-key based approaches and symmetric-key based approaches. The symmetric-key based approach requires complex key management, lacks of scalability, and is not resilient to large numbers of node compromise attacks since the message sender and the receiver have to share a secret key. The shared key is used by the sender to generate a message authentication code (MAC) for each transmitted message. However, for this method, the authenticity and integrity of the message can only be verified by the node with the shared secret key, which is generally shared by a group of sensor nodes. An intruder can compromise the key by capturing a single sensor node.

 In addition, this method does not work in multicast networks. To solve the scalability problem, a secret polynomial based message authentication scheme was introduced in. The idea of this scheme is similar to a threshold secret sharing, where the threshold is determined by the degree of the polynomial. This approach offers information-theoretic security of the shared secret key when the number of messages transmitted is less than the threshold. The intermediate nodes verify the authenticity of the message through a polynomial evaluation. However, when the number of messages transmitted is larger than the threshold, the polynomial can be fully recovered and the system is completely broken. An alternative solution was proposed in  to thwart the intruder from recovering the polynomial by computing the coefficients of the polynomial. The idea is to add a random noise, also called a perturbation factor, to the polynomial so that the coefficients of the polynomial cannot be easily solved. However, a recent study shows that the random noise can be completely removed from the polynomial using error-correcting code techniques . For the public-key based approach, each message is transmitted along with the digital signature of the message generated using the sender’s private key. Every intermediate forwarder and the final receiver can authenticate the message using the sender’s public key. One of the limitations of the public-key based scheme is the high computational overhead. The recent progress on elliptic curve cryptography (ECC) shows that the public key schemes can be more advantageous in terms of key by capturing a single sensor node. In addition, this method does not work in multicast networks. To solve the scalability problem, a secret polynomial based message authentication scheme was introduced in. The idea of this scheme is similar to a threshold secret sharing, where the threshold is determined by the degree of the polynomial.

This approach offers information-theoretic security of the shared secret key when the number of messages transmitted is less than the threshold. The intermediate nodes verify the authenticity of the message through a polynomial evaluation. However, when the number of messages transmitted is larger than the threshold, the polynomial can be fully recovered and the system is completely broken. An alternative solution was proposed in to thwart the intruder from recovering the polynomial by computing the coefficients of the polynomial. The idea is to add a random noise, also called a perturbation factor, to the polynomial so that the coefficients of the polynomial cannot be easily solved. However, a recent study shows that the random noise can be completely removed from the polynomial using error-correcting code techniques. For the public-key based approach, each message is transmitted along with the digital signature of the message generated using the sender’s private key. Every intermediate forwarder and the final receiver can authenticate the message using the sender’s public key. One of the limitations of the public-key based scheme is the high computational overhead. The recent progress on elliptic curve cryptography (ECC) shows that the public key schemes can be more advantageous in terms of computational complexity, memory usage, and security resilience, since public-key based approaches have a simple and clean key management.

In this paper, we propose an unconditionally secure and efficient source anonymous message authentication (SAMA) scheme based on the optimal modified ElGamal signature (MES) scheme on elliptic curves. This MES scheme is secure against adaptive chosen-message attacks in the random oracle model. Our scheme enables the intermediate nodes to authenticate the message so that all corrupted message can be detected and dropped to conserve the sensor power. While achieving compromiseresiliency, flexible-time authentication and source identity protection, our scheme does not have the threshold problem. Both theoretical analysis and simulation results demonstrate that our proposed scheme is more efficient than the polynomial-based algorithms under comparable security levels.

The major contributions of this paper are the following: 1. We develop a source anonymous message authentication code (SAMAC) on elliptic curves that can provide unconditional source anonymity. 2. We offer an efficient hop-by-hop message authentication mechanism for WSNs without the threshold limitation. 3. We devise network implementation criteria on source node privacy protection in WSNs. 4. We propose an efficient key management framework to ensure isolation of the compromised nodes. 5. We provide extensive simulation results under ns-2 and TelosB on multiple security levels. To the best of our knowledge, this is the first scheme that provides hop-by-hop node authentication without the threshold limitation, and has performance better than the symmetric-key based schemes.

The distributed nature of our algorithm makes the scheme suitable for decentralized networks. The remainder of this paper is organized as follows: Section 2 presents the terminology and the preliminary that will be used in this paper. Section 3 discusses the related work, with a focus on polynomial-based schemes. Section 4 describes the proposed source anonymous message authentication scheme on elliptic curves. Section 5 discusses the ambiguity set (AS) selection strategies for source privacy. Section 6 describes key management and compromised node detection. Performance analysis and simulation results are provided in Section 7. We conclude in Section 8. through multi-hop communications. We assume there is a security server (SS) that is responsible for generation, storage and distribution of the security parameters among the network.

This server will never be compromised. However, after deployment, the sensor nodes may be captured and compromised by attackers. Once compromised, all information stored in the sensor nodes can be accessed by the attackers. The compromised nodes can be reprogrammed and fully controlled by the attackers. However, the compromised nodes will not be able to create new public keys that can be accepted by the SS and other nodes. Based on the above assumptions, this paper considers two types of attacks launched by the adversaries:  Passive attacks. Through passive attacks, the adversaries could eavesdrop on messages transmitted in the network and perform traffic analysis. Active attacks. Active attacks can only be launched from the compromised sensor nodes. Once the sensor nodes are compromised, the adversaries will obtain all the information stored in the compromised nodes, including the security parameters of the compromised nodes. The adversaries can modify the contents of the messages, and inject their own messages.

LITRATURE SURVEY:

ATTACKING CRYPTOGRAPHIC SCHEMES BASED ON ‘PERTURBATION POLYNOMIALS

AUTHOR:  M. Albrecht, C. Gentry, S. Halevi, and J. Katz,

PUBLISH:  Report 2009/098, http://eprint.iacr.org/, 2009.

We show attacks on several cryptographic schemes that have recently been proposed for achieving various security goals in sensor networks. Roughly speaking, these schemes all use “perturbation polynomials” to add “noise” to polynomial-based systems that oer information- theoretic security, in an attempt to increase the resilience threshold while maintaining eciency. We show that the heuristic security arguments given for these modified schemes do not hold, and that they can be completely broken once we allow even a slight extension of the parameters beyond those achieved by the underlying information-theoretic schemes. Our attacks apply to the key predistribution scheme of Zhang et al. (MobiHoc 2007), the access-control schemes of Subramanian et al. (PerCom 2007), and the authentication schemes of Zhang et al. (INFOCOM 2008).

CRYPTOGRAPHIC KEY LENGTH RECOMMENDATION

PUBLISH:  http://www.keylength.com/en/3/, 2013.

In most cryptographic functions, the key length is an important security parameter. Both academic and private organizations provide recommendations and mathematical formulas to approximate the minimum key size requirement for security. Despite the availability of these publications, choosing an appropriate key size to protect your system from attacks remains a headache as you need to read and understand all these papers.
This web site implements mathematical formulas and summarizes reports from well-known organizations allowing you to quickly evaluate the minimum security requirements for your system. You can also easily compare all these techniques and find the appropriate key length for your desired level of protection. The lengths provided here are designed to resist mathematic attacks; they do not take algorithmic attacks, hardware flaws, etc. into account.

LIGHTWEIGHT AND COMPROMISE-RESILIENT MESSAGE AUTHENTICATION IN SENSOR NETWORKS

AUTHOR:   W. Zhang, N. Subramanian, and G. WangProc.

PUBLISH:   IEEE INFOCOM, Apr. 2008.

Numerous authentication schemes have been proposed in the past for protecting communication authenticity and integrity in wireless sensor networks. Most of them however have following limitations: high computation or communication overhead, no resilience to a large number of node compromises, delayed authentication, lack of scalability, etc. To address these issues, we propose in this paper a novel message authentication approach which adopts a perturbed polynomial-based technique to simultaneously accomplish the goals of lightweight, resilience to a large number of node compromises, immediate authentication, scalability, and non-repudiation. Extensive analysis and experiments have also been conducted to evaluate the scheme in terms of security properties and system overhead.

COMPARING SYMMETRIC-KEY AND PUBLIC-KEY BASED SECURITY SCHEMES IN SENSOR NETWORKS: A CASE STUDY OF USER ACCESS CONTROL

AUTHOR:  H. Wang, S. Sheng, C. Tan, and Q.

PUBLISH:  Li Proc. IEEE 28th Int’l Conf. Distributed Computing Systems (ICDCS), pp. 11-18, 2008.

While symmetric-key schemes are efficient in processing time for sensor networks, they generally require complicated key management, which may introduce large memory and communication overhead. On the contrary, public-key based schemes have simple and clean key management, but cost more computational time. The recent progress of elliptic curve cryptography (ECC) implementation on sensors motivates us to design a public-key scheme and compare its performance with the symmetric-key counterparts. This paper builds the user access control on commercial off-the-shelf sensor devices as a case study to show that the public-key scheme can be more advantageous in terms of the memory usage, message complexity, and security resilience. Meanwhile, our work also provides insights in integrating and designing public-key based security protocols for sensor networks.

CHAPTER 2

EXISTING SYSTEM:

  • The public-key based approach, each message is transmitted along with the digital signature of the message generated using the sender’s private key. Every intermediate forwarder and the final receiver can authenticate the message using the sender’s public key. One of the limitations of the public-key based scheme is the high computational overhead.
  • Computational complexity, memory usage, and security resilience, since public-key based approaches have a simple and clean key management.

DISADVANTAGES OF EXISTING SYSTEM:

  • High computational and communication overhead.
  • Lack of scalability and resilience to node compromise attacks.
  • Polynomial-based scheme have the weakness of a built-in threshold determined by the degree of the polynomial.

PROPOSED SYSTEM:

  • We propose an unconditionally secure and efficient SAMA. The main idea is that for each message m to be released, the message sender, or the sending node, generates a source anonymous message authenticator for the message m.
  • The generation is based on the MES scheme on elliptic curves.  For a ring signature, each ring member is required to compute a forgery signature for all other members in the AS.
  • In our scheme, the entire SAMA generation requires only three steps, which link all non-senders and the message sender to the SAMA alike. In addition, our design enables the SAMA to be verified through a single equation without individually verifying the signatures.

ADVANTAGES OF PROPOSED SYSTEM:

  • A novel and efficient SAMA based on ECC. While ensuring message sender privacy, SAMA can be applied to any message to provide message content authenticity.
  • To provide hop-by-hop message authentication without the weakness of the built- in threshold of the polynomial-based scheme, we then proposed a hop-by-hop message authentication scheme based on the SAMA.
  • When applied to WSNs with fixed sink nodes, we also discussed possible techniques for compromised node identification

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP or Win7
  • Front End                                :           Java JDK 1.7
  • Tools                                       :           Eclipse
  • Document                               :           MS-Office 2007

CHAPTER 3

SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

  • The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system
  • The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  • DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  • DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

 

PROCESS:

People, procedures or devices that produce data. The physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

  1. All processes must have at least one data flow in and one data flow out.
  2. All processes should modify the incoming data, producing new forms of outgoing data.
  3. Each data store must be involved with at least one data flow.
  4. Each external entity must be involved with at least one data flow.
  5. A data flow must be attached to at least one process.


ARCHITECTURE DIAGRAM:

 

DATAFLOW DIAGRAM:

UML DIAGRAMS:

USE CASE DIAGRAM:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Client                                                                                                                                     

  Server                                                                                                                                                                                                                                                                                  

CLASS DIAGRAM:

SEQUENCE DIAGRAM:

 

       Files Transmitted

                                        ECC Encrypted Data

SAMA Verification                                                                                                  

                                                                   Checking for key

                                                                                                                                  Data Received

ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

Privacy is sometimes referred to as anonymity. Communication anonymity in information management has been discussed in a number of previous works in generally refers to the state of being unidentifiable within a set of subjects. This set is called the AS. Sender anonymity means that a particular message is not linkable to any sender, and no message is linkable to a particular sender. We will start with the definition of the unconditionally secure SAMA.


 

4.1 ALGORITHM:



4.3 MODULES:

SERVER CLIENT MODULE:

KEY MANAGEMENT AND NODE DETECTION

SYMMETRIC-KEY CRYPTOSYSTEM

PUBLIC-KEY CRYPTOSYSTEM

HOP-BY-HOP AUTHENTICATION

4.4 MODULES DESCRIPTON:

SERVER CLIENT MODULE:

Client-server computing or networking is a distributed application architecture that partitions tasks or workloads between service providers (servers) and service requesters, called clients. Often clients and servers operate over a computer network on separate hardware. A server machine is a high-performance host that is running one or more server programs which share its resources with clients. A client also shares any of its resources; Clients therefore initiate communication sessions with servers which await (listen to) incoming requests.

KEY MANAGEMENT AND NODE DETECTION:

Recently, message sender anonymity based on ring a signature was introduced in this approach enables the message sender to generate a source anonymous message signature with content authenticity assurance. To generate a ring signature, a ring member randomly selects an AS and forges a message signature for all other members. Then he uses his trap-door information to glue the ring together. The original scheme has very limited flexibility and very high complexity.

We focused on the cryptographic algorithm public-key based approach, each message is transmitted along with the digital signature of the message generated using the sender’s private key. Every intermediate forwarder and the final receiver can authenticate the message using the sender’s public key. The recent progress on ECC shows that the public-key schemes can be more advantageous in terms of memory usage, message complexity, and security resilience, since public-key based approaches have a simple and clean key management.

SYMMETRIC-KEY CRYPTOSYSTEM:

Symmetric key and hash based authentication schemes were proposed for WSNs. In these schemes, each symmetric authentication key is shared by a group of sensor nodes. An intruder can compromise the key by capturing a single sensor node. Therefore, these schemes are not resilient to node compromise attacks. Another type of symmetric-key scheme requires synchronization among nodes. These schemes, including TESLA and its variants, can also provide message sender authentication. However, this scheme requires initial time synchronization, which is not easy to be implemented in large scale WSNs. In addition, they also introduce delay in message authentication, and the delay increases as the network scales up.

PUBLIC-KEY CRYPTOSYSTEM:

A secret polynomial based message authentication scheme was introduced in scheme offers information- theoretic security with ideas similar to a threshold secret sharing, where the threshold is determined by the degree of the polynomial. When the number of messages transmitted is below the threshold, the scheme enables the intermediate node to verify the authenticity of the message through polynomial evaluation. However, when the number of messages transmitted is larger than the threshold, the polynomial can be fully recovered and the system is completely broken. To increase the threshold and the complexity for the intruder to reconstruct the secret polynomial, a random noise, also called a perturbation factor, was added to the polynomial in to thwart the adversary from computing the coefficient of the polynomial.

However, the added perturbation factor can be completely removed using error-correcting code techniques  for the public-key based approach, each message is transmitted along with the digital signature of the message generated using the sender’s private key. Every intermediate forwarder and the final receiver can authenticate the message using the sender’s public key. The recent progress on ECC shows that the public-key schemes can be more advantageous in terms of memory usage, message complexity, and security resilience, since public-key based approaches have a simple and clean key management.

HOP-BY-HOP AUTHENTICATION:

Hop-by-hop authentication can be achieved through a public-key encryption system, the public-key based schemes were generally considered as not preferred, mainly due to their high computational overhead. However, our research demonstrates that it is not always true, especially for elliptic curve public-key cryptosystems.

In our scheme, each SAMA contains an AS of n randomly selected nodes that dynamically changes for each message. For n ¼ 1, our scheme can provide at least the same security as the bivariate polynomial-based scheme. For n > 1, we can provide extra source privacy benefits. Even if one message is corrupted, other messages transmitted in the network can still be secure.

Therefore, n can be much smaller than the parameters dx and dy. In fact, even a small n may provide adequate source privacy, while ensuring high system performance. In addition, in the bivariate polynomial-based scheme, there is only one base station that can send messages. All the other nodes can only act as intermediate nodes or receivers. This property makes the base station easy to attack, and severely narrows the applicability of this scheme. In fact, the major traffic inWSNs is packet delivery from the sensor nodes to the sink node.

Our scheme enables every node to transmit the message to the sink node as a message initiator.

The recent progress on ECC has demonstrated that the public-key based schemes have more advantages in terms of memory usage, message complexity, and security resilience, since public-key based approaches have a simple and clean key management.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company.  For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are      

  • ECONOMICAL FEASIBILITY
  • TECHNICAL FEASIBILITY
  • SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:                  

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

5.1.2 TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.  

5.1.3 SOCIAL FEASIBILITY:  

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months. This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program  testing  checks  for  two  types  of  errors:  syntax  and  logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

UNIT TESTING:

Description Expected result
Test for application window properties. All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem.  The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

FUNCTIONAL TESTING:

Description Expected result
Test for all modules. All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

 The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

  • Load testing
  • Performance testing
  • Usability testing
  • Reliability testing
  • Security testing

5.1.5 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Load Testing

Description Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

PERFORMANCE TESTING:

Description Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.   Should handle large input values, and produce accurate result in a  expected time.  

5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

RELIABILITY TESTING:

Description Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of  the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

SECURITY TESTING:

  Description Expected result
Checking that the user identification is authenticated. In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

White  box  testing,  sometimes called  glass-box  testing is  a test  case  design method  that  uses  the  control  structure  of the procedural  design  to  derive  test  cases. Using  white  box  testing  method,  the software  engineer  can  derive  test  cases. The White box testing focuses on the inner structure of the software structure to be tested.

5.1.8 WHITE BOX TESTING:

Description Expected result
Exercise all logical decisions on their true and false sides. All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds. All the loops must be finite.
Exercise internal data structures to ensure their validity. All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software.  That  is,  black  testing  enables  the software engineer  to  derive  sets  of  input  conditions  that  will  fully  exercise  all  functional requirements  for  a  program.  Black box testing is not alternative to white box techniques.  Rather  it  is  a  complementary  approach  that  is  likely  to  uncover  a different  class  of  errors  than  white box  methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

5.1.10 BLACK BOX TESTING:

Description Expected result
To check for incorrect or missing functions. All the functions must be valid.
To check for interface errors. The entire interface must function normally.
To check for errors in a data structures or external data base access. The database updation and retrieval must be done.
To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

6.0 SOFTWARE DESCRIPTION:

 

6.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

 

The Java Programming Language

 

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

  • Simple
    • Architecture neutral
    • Object oriented
    • Portable
    • Distributed     
    • High performance
    • Interpreted     
    • Multithreaded
    • Robust
    • Dynamic
    • Secure     

With most programming languages, you either compile or interpret a program so that you can run it on your computer. The Java programming language is unusual in that a program is both compiled and interpreted. With the compiler, first you translate a program into an intermediate language called Java byte codes —the platform-independent codes interpreted by the interpreter on the Java platform. The interpreter parses and runs each Java byte code instruction on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web browser that can run applets, is an implementation of the Java VM. Java byte codes help make “write once, run anywhere” possible. You can compile your program into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the same program written in the Java programming language can run on Windows 2000, a Solaris workstation, or on an iMac.

6.2 THE JAVA PLATFORM:

A platform is the hardware or software environment in which a program runs. We’ve already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that it’s a software-only platform that runs on top of other hardware-based platforms.

The Java platform has two components:

  • The Java Virtual Machine (Java VM)
  • The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The next section, What Can Java Technology Do? Highlights what functionality some of the packages in the Java API provide.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware platform. As a platform-independent environment, the Java platform can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can bring performance close to that of native code without threatening portability.

6.3 WHAT CAN JAVA TECHNOLOGY DO?

The most common types of programs written in the Java programming language are applets and applications. If you’ve surfed the Web, you’re probably already familiar with applets. An applet is a program that adheres to certain conventions that allow it to run within a Java-enabled browser.

However, the Java programming language is not just for writing cute, entertaining applets for the Web. The general-purpose, high-level Java programming language is also a powerful software platform. Using the generous API, you can write many types of programs.

An application is a standalone program that runs directly on the Java platform. A special kind of application known as a server serves and supports clients on a network. Examples of servers are Web servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet.

A servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of applications. Instead of working in browsers, though, servlets run within Java Web servers, configuring or tailoring the server.

How does the API support all these kinds of programs? It does so with packages of software components that provides a wide range of functionality. Every full implementation of the Java platform gives you the following features:

  • The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
  • Applets: The set of conventions used by applets.
  • Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
  • Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
  • Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
  • Software components: Known as JavaBeansTM, can plug into existing component architectures.
  • Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
  • Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

 

6.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

We can’t promise you fame, fortune, or even a job if you learn the Java programming language. Still, it is likely to make your programs better and requires less effort than other languages. We believe that Java technology will help you do the following:

  • Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
  • Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
  • Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
  • Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
  • Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure JavaTM Product Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
  • Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
  • Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

 

6.5 ODBC:

 

Microsoft Open Database Connectivity (ODBC) is a standard programming interface for application developers and database systems providers. Before ODBC became a de facto standard for Windows programs to interface with database systems, programmers had to use proprietary languages for each database they wanted to connect to. Now, ODBC has made the choice of the database system almost irrelevant from a coding perspective, which is as it should be. Application developers have much more important things to worry about than the syntax that is needed to port their program from one database to another when business needs suddenly change.

Through the ODBC Administrator in Control Panel, you can specify the particular database that is associated with a data source that an ODBC application program is written to use. Think of an ODBC data source as a door with a name on it. Each door will lead you to a particular database. For example, the data source named Sales Figures might be a SQL Server database, whereas the Accounts Payable data source could refer to an Access database. The physical database referred to by a data source can reside anywhere on the LAN.

The ODBC system files are not installed on your system by Windows 95. Rather, they are installed when you setup a separate database application, such as SQL Server Client or Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this program and each maintains a separate list of ODBC data sources.

From a programming perspective, the beauty of ODBC is that the application can be written to use the same set of function calls to interface with any data source, regardless of the database vendor. The source code of the application doesn’t change whether it talks to Oracle or SQL Server. We only mention these two as an example. There are ODBC drivers available for several dozen popular database systems. Even Excel spreadsheets and plain text files can be turned into data sources. The operating system uses the Registry information written by ODBC Administrator to determine which low-level ODBC drivers are needed to talk to the data source (such as the interface to Oracle or SQL Server). The loading of the ODBC drivers is transparent to the ODBC application program. In a client/server environment, the ODBC API even handles many of the network issues for the application programmer.

The advantages of this scheme are so numerous that you are probably thinking there must be some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to the native database interface. ODBC has had many detractors make the charge that it is too slow. Microsoft has always claimed that the critical factor in performance is the quality of the driver software that is used. In our humble opinion, this is true. The availability of good ODBC drivers has improved a great deal recently. And anyway, the criticism about performance is somewhat analogous to those who said that compilers would never match the speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the opportunity to write cleaner programs, which means you finish sooner. Meanwhile, computers get faster every year.

6.6 JDBC:

In an effort to set an independent database standard API for Java; Sun Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access mechanism that provides a consistent interface to a variety of RDBMSs. This consistent interface is achieved through the use of “plug-in” database connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he or she must provide the driver for each platform that the database and Java run on.

To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you discovered earlier in this chapter, ODBC has widespread support on a variety of platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster than developing a completely new connectivity solution.

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

The remainder of this section will cover enough information about JDBC for you to know what it is about and how to use it effectively. This is by no means a complete overview of JDBC. That would fill an entire book.

 

6.7 JDBC Goals:

Few software packages are designed without goals in mind. JDBC is one that, because of its many goals, drove the development of the API. These goals, in conjunction with early reviewer feedback, have finalized the JDBC class library into a solid framework for building database applications in Java.

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although not the lowest database interface level possible, it is at a low enough level for higher-level tools and APIs to be created. Conversely, it is at a high enough level for application programmers to use it confidently. Attaining this goal allows for future tool vendors to “generate” JDBC code and to hide many of JDBC’s complexities from the end user.

SQL Conformance

SQL syntax varies as you move from database vendor to database vendor. In an effort to support a wide variety of vendors, JDBC will allow any query statement to be passed through it to the underlying database driver. This allows the connectivity module to handle non-standard functionality in a manner that is suitable for its users.

JDBC must be implemental on top of common database interfaces

The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal allows JDBC to use existing ODBC level drivers by the use of a software interface. This interface would translate JDBC calls to ODBC and vice versa.

  1. Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

  • Keep it simple

This goal probably appears in all software design goal listings. JDBC is no exception. Sun felt that the design of JDBC should be very simple, allowing for only one method of completing a task per mechanism. Allowing duplicate functionality only serves to confuse the users of the API.

  • Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

  • Keep the common cases simple

Because more often than not, the usual SQL calls used by the programmer are simple SELECT’s, INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to perform with JDBC. However, more complex SQL statements should also be possible.

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple                                     Architecture-neutral

Object-oriented                       Portable

Distributed                              High-performance

Interpreted                              Multithreaded

Robust                                     Dynamic Secure

Java is also unusual in that each Java program is both compiled and interpreted. With a compile you translate a Java program into an intermediate language called Java byte codes the platform-independent code instruction is passed and run on the computer.

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

6.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

The IP layer provides a connectionless and unreliable delivery system. It considers each datagram independently of the others. Any association between datagram must be supplied by the higher layers. The IP layer supplies a checksum that includes its own header. The header includes the source and destination addresses. The IP layer handles routing through an Internet. It is also responsible for breaking up large datagram into smaller ones for transmission and reassembling them at the other end.

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

A service exists on a host, and is identified by its port. This is a 16 bit number. To send a message to a server, you send it to the port for that service of the host that it is running on. This is not location transparency! Certain of these ports are “well known”.

Sockets:

A socket is a data structure maintained by the system to handle network connections. A socket is created using the call socket. It returns an integer that is like a file descriptor. In fact, under Windows, this handle can be used with Read File and Write File functions.

#include <sys/types.h>
#include <sys/socket.h>
int socket(int family, int type, int protocol);

Here “family” will be AF_INET for IP communications, protocol will be zero, and type will depend on whether TCP or UDP is used. Two processes wishing to communicate over a network create a socket each. These are similar to two ends of a pipe – but the actual pipe does not yet exist.

6.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

 

6.8.1. Map Visualizations:

Charts showing values that relate to geographical areas. Some examples include: (a) population density in each state of the United States, (b) income per capita for each country in Europe, (c) life expectancy in each country of the world. The tasks in this project include: Sourcing freely redistributable vector outlines for the countries of the world, states/provinces in particular countries (USA in particular, but also other areas);

Creating an appropriate dataset interface (plus default implementation), a rendered, and integrating this with the existing XYPlot class in JFreeChart; Testing, documenting, testing some more, documenting some more.

6.8.2. Time Series Chart Interactivity

Implement a new (to JFreeChart) feature for interactive time series charts — to display a separate control that shows a small version of ALL the time series data, with a sliding “view” rectangle that allows you to select the subset of the time series data to display in the main chart.

6.8.3. Dashboards

There is currently a lot of interest in dashboard displays. Create a flexible dashboard mechanism that supports a subset of JFreeChart chart types (dials, pies, thermometers, bars, and lines/time series) that can be delivered easily via both Java Web Start and an applet.

 

6.8.4. Property Editors

The property editor mechanism in JFreeChart only handles a small subset of the properties that can be set for charts. Extend (or reimplement) this mechanism to provide greater end-user control over the appearance of the charts.

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION:

In this paper, we first proposed a novel and efficient SAMA based on ECC. While ensuring message sender privacy, SAMA can be applied to any message to provide message content authenticity. To provide hop-by-hop message authentication without the weakness of the builtin threshold of the polynomial-based scheme, we then proposed a hop-by-hop message authentication scheme based on the SAMA. When applied to WSNs with fixed sink nodes, we also discussed possible techniques for compromised node identification.

We compared our proposed scheme with the bivariate polynomial-based scheme through simulations using ns-2 and TelosB. Both theoretical and simulation results show that, in comparable scenarios, our proposed scheme is more efficient than the bivariate polynomial-based scheme in terms of computational overhead, energy consumption, delivery ratio, message delay, and memory consumption.

CHAPTER 9

REFERENCE:

[1] M. Albrecht, C. Gentry, S. Halevi, and J. Katz, “Attacking Cryptographic Schemes Based on ‘Perturbation Polynomials’,” Report 2009/098, http://eprint.iacr.org/, 2009.

[2]  “Cryptographic Key Length Recommendation,” http://www.keylength.com/en/3/, 2013.

[3]  W. Zhang, N. Subramanian, and G. Wang, “Lightweight and Compromise-Resilient Message Authentication in Sensor Networks,” Proc. IEEE INFOCOM, Apr. 2008.

[4]  H. Wang, S. Sheng, C. Tan, and Q. Li, “Comparing Symmetric-Key and Public-Key Based Security Schemes in Sensor Networks: A Case Study of User Access Control,” Proc. IEEE 28th Int’l Conf. Distributed Computing Systems (ICDCS), pp. 11-18, 2008.

[5] Q. Liu, Y. Ge, Z. Li, H. Xiong, and E. Chen, “Personalized Travel Package Recommendation,” Proc. IEEE 11th Int’l Conf. Data Mining (ICDM ’11), pp. 407-416, 2011.

 [6]  Q. Liu, E. Chen, H. Xiong, C. Ding, and J. Chen, “Enhancing Collaborative Filtering by User Interests Expansion via Personalized Ranking,” IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 1, pp. 218-233, Feb. 2012.

Developing Vehicular Data Cloud Services in the IoT Environment

Designing an Architecture for Monitoring Patients at Home Ontologies and Web Services for Clinical an

DESIGNING AN ARCHITECTURE FOR MONITORING PATIENTS AT HOME ONTOLOGIES AND WEB SERVICES FOR CLINICAL AND TECHNICAL MANAGEMENT INTEGRATION

By

A

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the                                                  FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

Of

MASTER OF TECHNOLOGY

IN

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

CERTIFICATE

Certified that this project report titled “Designing an Architecture for Monitoring Patients at Home Ontologies and Web Services for Clinical and Technical Management Integration” is the bonafide work of Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide                                                                             Signature of the H.O.D

Name                                                                                                           Name

DECLARATION

I hereby declare that the project work entitled “Designing an Architecture for Monitoring Patients at Home Ontologies and Web Services for Clinical and Technical Management Integration” Submitted to BHARATHIDASAN UNIVERSITY in partial fulfillment of the requirement for the award of the Degree of MASTER OF SCIENCE IN COMPUTER SCIENCE is a record of original work done by me the guidance of  Prof.A.Vinayagam M.Sc., M.Phil., M.E., to the best of my knowledge, the work reported here is not a part of any other thesis or work on the basis of which a degree or award was conferred on an earlier occasion to me or any other candidate.

                                                                                                        (Student Name)

                                                                                                             (Reg.No)

Place:

Date:

ACKNOWLEDGEMENT

I am extremely glad to present my project “Designing an Architecture for Monitoring Patients at Home Ontologies and Web Services for Clinical and Technical Management Integration” which is a part of my curriculum of third semester Master of Science in Computer science. I take this opportunity to express my sincere gratitude to those who helped me in bringing out this project work.

I would like to express my Director, Dr. K. ANANDAN, M.A.(Eco.), M.Ed.,  M.Phil.,(Edn.), PGDCA., CGT., M.A.(Psy.) of who had given me an opportunity to undertake this project.

I am highly indebted to Co-Ordinator Prof. Muniappan Department of Physics and thank from my deep heart for her valuable comments I received through my project.

I wish to express my deep sense of gratitude to my guide                                                  
Prof. A.Vinayagam M.Sc., M.Phil., M.E., for her immense help and encouragement for successful completion of this project.

I also express my sincere thanks to the all the staff members of Computer science for their kind advice.

And last, but not the least, I express my deep gratitude to my parents and friends for their encouragement and support throughout the project.

CHAPTER 1

  1. ABSTRACT:

This paper presents the design and implementation of an architecture based on the combination of ontologies, rules, web services, and the autonomic computing paradigm to manage data in home-based telemonitoring scenarios.

The architecture includes two layers: 1) a conceptual layer and 2) a data and communication layer. On the one hand, the conceptual layer based on ontologies is proposed to unify the management procedure and integrate incoming data from all the sources involved in the telemonitoring process. On the other hand, the data and communication layer based on REST web service (WS) technologies is proposed to provide practical backup to the use of the ontology, to provide a real implementation of the tasks it describes and thus to provide a means of exchanging data (support communication tasks).

We study regarding chronic obstructive pulmonary disease data management is presented in order to evaluate the efficiency of the architecture. This proposed ontology-based solution defines a flexible and scalable architecture in order to address main challenges presented in home-based telemonitoring scenarios and thus provide a means to integrate, unify, and transfer data supporting both clinical and technical management tasks.

1.2 INTRODUCTION

Patient empowerment is considered as a philosophy of health care based on the perspective that better outcomes are achieved when patients become active participants in their own health management. This new paradigm is a central idea in the European Union (EU) health strategy supported by international health organizations including the World Health Organization among others, and its effectiveness in yielding quality of care is an obvious and essential area of research. This new idea invites to look for new ways of providing healthcare, e.g., by using information and communications technologies. In this context, home-based telemonitoring systems can be used as self-care management tools, while collaborative processes among healthcare personnel and patients are maintained, thus the patient’s safe control is guaranteed. Telemonitoring systems face the problem of delivering medicine to the current growing population with chronic conditions while at the same time covering the dimensions of quality of care and new paradigms such as empowerment can be supported.

By periodically collecting patients’ themselves clinical data (located at their home sites) and transferring them to physicians located in remote sites, patient’s health status supervision and feedback provision are possible. This type of telemedicine system guarantees patient control while reducing costs and avoiding hospital overflows. These two sites (home site and healthcare site) comprised a typical home-based telemonitoring system. At home site, data acquired by using MDs together with the patient’s feedback are collected in a concentrator device (HG) used to evaluate and/or transfer the acquired data outside the patient’s home if necessary. At the health-care site, a server device is used to manage information from the home site as well as to manage and store the patient’s monitoring guidelines defined by physicians (TS, telemonitoring server). In fact, this telemonitoring process, and consequently the evolution of the patient’s health status, ismanaged through the indications or monitoring guidelines provided by physicians.

Although significant contributions have been made in this field in recent decades, telemedicine and in e-health scenarios in general still pose numerous challenges that need to be addressed by researchers in order to take maximum advantage of the benefits that these systems provide and to support their long-term implementation. Interoperability and integration are critical challenges that also need to be addressed when developing monitoring systems in order to provide effective healthcare and to make possible seamless communication among the different heterogeneous health entities that participate in the monitoring process. This integration should be addressed at both end sites of the scenario but also in the communication link, thus integrating the way of transferring and exchanging information efficiently between them.

We providing personalized care services and taking into account the patient’s context have been identified as additional requirements. Furthermore, apart from clinical data aspects, technical issues should be also addressed in this scenario. Technical management of all the devices that comprise the telemonitoring scenario (e.g., the MDs and HG) is an important task that may or may not be integrated under the same architecture as clinical management. Hence, at this technical level, research is still required to address these challenges. Consequently there is a need for the development of new telemonitoring architectures.

Great efforts have been made in recent years in developing standards to deal with interoperability at different points of the e-health communication infrastructure such as the ISO/IEEE 11073 (X73) for MDs interoperability, the OpenEHR initiative for storage, management and retrieval of electronic health record (EHR) information or as the standardized Health Level Seven7 (HL7) messages to solve clinical data transferences. Nevertheless, additional efforts are required to enable them to work together and ultimately provide a higher level of integration.

Specifically, in this telemonitoring scenario, there is not a unique standard-based solution to address data and management integration. Since several standards can be used (some of them in combination with proprietary protocols or other standards) at different points of this scenario, the interoperability problem remains unsolved unless these standards would merge into one or alignments and combination of them would be done. According to Berges et al. interoperability does not mean to have a unique representation but a semantically acknowledged equivalent one. That is the reason to propose in this study an ontology-based architecture in order to provide with a common knowledge about the exchanged data and the management of such data. This ontology constitutes the knowledge equivalent one. Then, at both ends of the architecture other standards could be used for other managing purposes relating this model with the specific desired approach. Using this alternative, a knowledge model is first provided that avoids alignment of models two by two, while all being related through the main ontology.

Ontologies-based solutions have been popularized over the past few years. Ontologies provide a higher level of abstraction and have been successfully used in telemonitoring scenarios and other areas to provide knowledge representation and semantic integration, thus a common understanding about data exchanged by all the entities. Furthermore, its combination with rules allows providing personalized management services and thus personalized care. Although there are works that describe the details of an ontology approach in this domain, they do not devote much attention to the architecture implementation and the communication used to exchange the information described. Consequently, fewworks have given details about this practical implementation of the ontology-based system which may be of interest for the development of other ontology-based applications in and outside the e-health domain.

This paper presents an ontology-driven architecture to integrate data management and enable its communication in a telemonitoring scenario. The proposed architecture includes two layers: the conceptual layer (the ontology) and the communication and data layer. The conceptual layer uses the HOTMES and its extensions introduced. Specifically, the OWL-DL language was selected to define this ontology model. The second layer is based on WS technologies. WSs have been successfully used in network management and also in other works to exchange data modeled by ontology. However, our proposal, inspired on the representational state transfer (REST) style and based on a generic communication method, provides a different design approach that may be reusable for other systems based on ontologies. Furthermore, security issues have been considered. The aim is to define a flexible and scalable architecture in order to address main challenges presented in home-based telemonitoring scenarios and thus provide a means to integrate and transfer data supporting both clinical and technical data management.

1.3 LITRATURE SURVEY

AUTHOR AND PUBLICATION: JD. Trigo, I. Mart´ınez, A. Alesanco, A. Kollmann, J. Escayola, D. Hayn, G. Schreier, and J. Garc´ıa, “AN INTEGRATED HEALTHCARE INFORMATION SYSTEM FOR END-TO-END STANDARDIZED EXCHANGE AND HOMOGENEOUS MANAGEMENT OF DIGITAL ECG FORMATS,” IEEE Trans. Inf. Technol. Biomed., vol. 16, no. 4, pp. 518–529, Jul. 2012.

EXPLANATION:

This paper investigates the application of the enterprise information system (EIS) paradigm to standardized cardiovascular condition monitoring. There are many specifications in cardiology, particularly in the ECG standardization arena. The existence of ECG formats, however, does not guarantee the implementation of homogeneous, standardized solutions for ECG management. In fact, hospital management services need to cope with various ECG formats and, moreover, several different visualization applications. This heterogeneity hampers the normalization of integrated, standardized healthcare information systems, hence the need for finding an appropriate combination of ECG formats and suitable EIS-based software architecture that enables standardized exchange and homogeneous management of ECG formats. Determining such a combination is one objective of this paper.

We develop the integrated healthcare information system that satisfies the requirements posed by the previous determination. The ECG formats selected include ISO/IEEE11073, Standard Communications Protocol for Computer-Assisted Electrocardiography, and an ECG ontology. The EIS-enabling techniques and technologies selected include web services, simple object access protocol, extensible markup language, or business process execution language. Such a selection ensures the standardized exchange of ECGs within, or across, healthcare information systems while providing modularity and accessibility.

AUTHOR AND PUBLICATION: D. Ria˜no, F. Real, J. A. L´opez-Vallverd´u, F. Campana, S. Ercolani, P. Mecocci, R. Annicchiarico, and C. Caltagirone, “AN ONTOLOGY-BASED PERSONALIZATION OF HEALTH-CARE KNOWLEDGE TO SUPPORT CLINICAL DECISIONS FOR CHRONICALLY ILL PATIENTS,” J. Biomed. Informat., vol. 45, no. 3, pp. 429–446, 2012.

EXPLANATION:

Chronically ill patients are complex health care cases that require the coordinated interaction of multiple professionals. A correct intervention of these sort of patients entails the accurate analysis of the conditions of each concrete patient and the adaptation of evidence-based standard intervention plans to these conditions. There are some other clinical circumstances such as wrong diagnoses, unobserved comorbidities, missing information, unobserved related diseases or prevention, whose detection depends on the capacities of deduction of the professionals involved. In this paper, we introduce ontology for the care of chronically ill patients and implement two personalization processes and a decision support tool. The first personalization process adapts the contents of the ontology to the particularities observed in the health-care record of a given concrete patient, automatically providing a personalized ontology containing only the clinical information that is relevant for health-care professionals to manage that patient. The second personalization process uses the personalized ontology of a patient to automatically transform intervention plans describing health-care general treatments into individual intervention plans. For comorbid patients, this process concludes with the semi-automatic integration of several individual plans into a single personalized plan. Finally, the ontology is also used as the knowledge base of a decision support tool that helps health-care professionals to detect anomalous circumstances such as wrong diagnoses, unobserved comorbidities, missing information, unobserved related diseases, or preventive actions. Seven health-care centers participating in the K4CARE project, together with the group SAGESA and the Local Health System in the town of Pollenza have served as the validation platform for these two processes and tool. Health-care professionals participating in the evaluation agree about the average quality 84% (5.9/7.0) and utility 90% (6.3/7.0) of the tools and also about the correct reasoning of the decision support tool, according to clinical standards.

AUTHOR AND PUBLICATION: I.Berges, J. Bermudez, and A. Illarramendi, “TOWARDS SEMANTIC INTEROPERABILITY OF ELECTRONIC HEALTH RECORDS,” IEEE Trans. Inf. Technol. Biomed., vol. 16, no. 3, pp. 424–431, May 2012.

EXPLANATION:

Although the goal of achieving semantic interoperability of electronic health records (EHRs) is pursued by many researchers, it has not been accomplished yet. In this paper, we present a proposal that smoothes out the way toward the achievement of that goal. In particular, our study focuses on medical diagnoses statements. In summary, the main contributions of our ontology-based proposal are the following: first, it includes a canonical ontology whose EHR-related terms focus on semantic aspects. As a result, their descriptions are independent of languages and technology aspects used in different organizations to represent EHRs. Moreover, those terms are related to their corresponding codes in well-known medical terminologies. Second, it deals with modules that allow obtaining rich ontological representations of EHR information managed by proprietary models of health information systems. The features of one specific module are shown as reference. Third, it considers the necessary mapping axioms between ontological terms enhanced with so-called path mappings. This feature smoothes out structural differences between heterogeneous EHR representations, allowing proper alignment of information.

AUTHOR AND PUBLICATION: N. Lasierra,A.Alesanco, J.Garc´ıa, andD.O’Sullivan, “DATA MANAGEMENT IN HOME SCENARIOS USING AN AUTONOMIC ONTOLOGY-BASED APPROACH,” in Proc. of the 9th IEEE Int. Conf. Pervasive Workshop on Manag. Ubiquitous Commun. Services part of PerCom, 2012, pp. 94–99.

EXPLANATION:

An ontology-based approach to deal with data and management procedure integration in home-based scenarios is presented in this paper. The proposed ontology not only provides a means to represent exchanged data but also to unify the way of accessing, controlling, evaluating and transferring information remotely. The structure of this ontology has been inspired by the autonomic computing paradigm, thus it describes the tasks that comprise the MAPE (Monitor, Analyze, Plan and Execute) process. Furthermore the use of SPARQL (Simple Protocol and RDF Query Language) is proposed in this paper to express conditions and rules that determine the performance of these tasks according to each situation. Finally two practical application cases of the proposed ontology-based approach are presented.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Telemonitoring systems face the problem of delivering medicine to the current growing population with chronic conditions while at the same time covering the dimensions of quality of care and new paradigms such as empowerment can be supported. By periodically collecting patients’ themselves clinical data (located at their home sites) and transferring them to physicians located in remote sites, patient’s health status supervision and feedback provision are possible.

This type of telemedicine system guarantees patient control while reducing costs and avoiding hospital overflows. These two sites (home site and healthcare site) comprised a typical home-based telemonitoring system. At home site, data acquired by using MDs together with the patient’s feedback are collected in a concentrator device (HG) used to evaluate and/or transfer the acquired data outside the patient’s home if necessary.

2.1.1 DISADVANTAGES:

  • Existing models for chronic diseases pose several technology-oriented challenges for home-based care, where assistance services rely on a close collaboration among different stakeholders, such as health operators, patient relatives, and social community members.
  • An ontology-based context model and a related context management system providing a configurable and extensible service-oriented framework to ease the development of applications for monitoring and handling patient chronic conditions.
  • The system has been developed in a prototypal version, and integrated with a service platform for supporting operators of home-based care networks in cooperating and sharing patient-related information and coordinating mutual interventions for handling critical and alarm situations.


2.2 PROPOSED SYSTEM:

We present an ontology-driven architecture to integrate data management and enable its communication in a telemonitoring scenario. It enables to not only integrate patient’s clinical data management but also technical data management of all devices that are included in the scenario. The proposed architecture includes two layers: the conceptual layer (the ontology) and the communication and data layer.

The conceptual layer uses the HOTMES and its extensions introduced specifically in the OWL-DL language was selected to define this ontology model. The second layer is based on WS technologies. WSs have been successfully used in network management and also in other works to exchange data modeled by ontology is our proposal, inspired on the representational state transfer (REST) style and based on a generic communication method, provides a different design approach that may be reusable for other systems based on ontologies.

Furthermore, security issues have been considered. The aim is to define a flexible and scalable architecture in order to address main challenges presented in home-based telemonitoring scenarios and thus provide a means to integrate and transfer data supporting both clinical and technical data management.

2.2.1 ADVANTAGES:

Ontologies provide a higher level of abstraction and have been successfully used in telemonitoring scenarios and other areas to provide knowledge representation and semantic integration, thus a common understanding about data exchanged by all the entities. Furthermore, its combination with rules allows providing personalized management services and thus personalized care.

We describe the details of an ontology approach in this domain, they do not devote much attention to the architecture implementation and the communication used to exchange the information described.

Our implementation of the ontology-based system which may be of interest for the development of other ontology-based applications in and outside the e-health domain the ontology for interpreting the data transferred for the communication of end sources of the architecture. The data and communication layer deals with data management and transmission.

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

2.3.2 SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP or Win7
  • Front End                                :           JAVA JDK 1.7
  • Back End                                :           MYSQL Server
  • Server                                      :           Apache Tomact Server
  • Script                                       :           JSP Script
  • Document                               :           MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

  • The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system
  • The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  • DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  • DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data’s in the physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

  1. All processes must have at least one data flow in and one data flow out.
  2. All processes should modify the incoming data, producing new forms of outgoing data.
  3. Each data store must be involved with at least one data flow.
  4. Each external entity must be involved with at least one data flow.
  5. A data flow must be attached to at least one process.


3.1 ARCHITECTURE DIAGRAM

3.2 DATAFLOW DIAGRAM

ADMIN:

USER:

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:


3.3 SEQUENCE DIAGRAM:

3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

ONTOLOGIES:

According to one of the most widely accepted definitions of ontologies in computer science, ontology can be described as “an explicit and formal specification of a shared conceptualization”.  In simple words, ontologies represent concepts and basic relationships for the purpose of comprehension of a common knowledge area. To develop an ontology means to formalize a common view of a certain domain.

1) OWL Language: In computer science, there are plenty of formal languages that can be used to define and constructontologies. These languages allow encoding knowledge contained in ontology in a simple and formal way. However, the standardized RDF and OWL have been gaining popularity in the semantic web world. Ontology can be formally described in OWL using following basic elements: 1) classes; 2) individuals; and 3) properties. These elements are used in order to describe concepts, instances, or members of a class and relationships between individuals of two classes (object properties) or to link individuals with datatype values, respectively (data type properties). Apart from these basic elements OWL provides with class descriptors used to precisely describe OWL classes which includes properties restrictions (value and cardinality constraints), class axioms, properties axioms, and properties over individuals.

2) Rules: Generally, ontology-based solutions combine knowledge presented in ontologies with dynamic knowledge presented by the use of rules. A system based on the use of rules usually contains a set of if-then rules (which indicate what should be done according to a situation) and a rule engine used to apply them. By using rules, the behavior of individuals can be expressed inside a domain. Hence, they can be used to generate new knowledge and can also be used to provide personalized services. One of the most popular languages for rules definition is SWRL.

However, in our study, we used SPARQL to define some rules is a query language it can be used as a rule language by combining CONSTRUCT clause and FILTER restrictions. On the one hand, the CONSTRUCT query form returns a single RDF graph built based on the results of matching with the graph pattern of the query and by taking the specified graph template. On the other hand, the FILTER clause can be used to restrict solutions to those which the filter expression considers as TRUE. Only if the filter function evaluates to true is the solution to be included in the solution sequence. Note that although this language was good enough for our purpose, its limitations should be studied for other purposes (e.g., recursive tasks) and the adequacy of SWRL could be studied for complex applications.

WEB SERVICES

Web services are used in this study as software technology to access and exchange information modeled by the ontology. According to the W3C, a WS is a “software system designed to support interoperable machine-to-machine interaction over a communication network”. Systems may interact with the web services by exchanging SOAP messages serialized in XML for its message format and sent over other application layer protocols, usually HTTP. Although SOAP-based web services are the most popular types of WSs, there are other styles of programming a WS such as the REST style.

1) Rest Style for DesigningWeb Services: REST is a style of software architecture for distributed hypermedia systems such as the World Wide Web first defined in 2000 by Fielding. This style is based on the idea of transferring the representations of resources, a resource being any item of interest. One of the key advantages of the REST architecture are scalability of components and generality of interfaces. Although REST was initially described in the context of HTTP, this paradigm can be applied to other protocols or implementations. Web services can also be described using this style. A WS implemented using HTTP and the principles of REST architecture is designated as REST(ful) WS. Requests made from the client and responses from the WS are used to transfer resources information. Each resource is identified through an URI. Stateless behavior of data using XML and/or JSON and explicitly used HTTP methods (PUT, GET, POST, DELETE) to exchange resources are the key characteristics of a REST(ful) WS.

4.1 MODULES:

MANAGEMENT PROFILE:

DATA AND COMMUNICATION LAYER:

HG AND TS MANAGEMENT MODULES:

COMMUNICATION FLOW AND WORKFLOW:

4.3 MODULE DESCRIPTION:

CLINICAL MANAGEMENT PROFILE:

COPD patients were identified as candidates to be monitored at home sites. From a clinical point of view, it was an interesting case study (some estimations suggest that up to 10% of the European population suffers COPD). From a technical point of view, the case of the COPD patient led to define a complex technical management profile (because different MDs are required to be used by the patient) and interesting option to test the performance of the agent. Hence, one patient profile was designed according to the clinical HOTMES ontology and one technical management profile was designed according to the technical HOTMES ontology.

The patient profile includes the required tasks to monitor a COPD patient such as controlling the FEV1 measurement in order to detect the presence and severity of the airway obstruction. It was configured by a primary care physician by means of published clinical guidelines in patient profile included 15 monitoring task, 11 analysis task, 9 planning task, and 3 execution task. This configuration led to include 144 new instances and to configure 18 rules. The details of this profile and its evaluation to configure other type of profiles can be technical management profile was designed to monitor the state of theMDs used by the COPD patient (weighing scale, a blood pressure monitor, a pulse-oximeter, and a glucometer) and the consumption of resources of the correspondent HG. In addition rules were configured and 83 new instances were required to be configured in the technical management profile in additional information of the application of the HOTMES ontology for technical tasks.

DATA AND COMMUNICATION LAYER:

In the data layer, the communication between the end sites is established using WS technologies. Consequently, a WS has been designed to be placed in the TS and also a web client to be installed in the HG (to establish a communication with the TS). This communication allows the HG to ask for its associated management profile to the TS and to transmit acquired information from the HG to the TS.

A REST WS was developed in order to enhance the scalability and flexibility of the architecture and improve the performance (efficiency). This WS comprises and defines a set of operations over the following resources: an OWL ontology, the rules (transferred by means of an XML), OWL individuals (sent by the IndividualWS structure), properties datatype values corresponding to an individual (identified by the URI of the individual and the URI of the property sent in a string generic type), and inform messages to provide some control functions to the web pair communication.

Each one of these resources was identified by an URI, and a set of operations was defined for each particular resource using HTTP methods (e.g., GET or PUT). This WS interface allows information described in the ontology to be exchanged in a generic manner. This is one key that contributes to the reusability and easy extension of the architecture. Described communication methods do not depend on the knowledge itself described in the ontology (related to the service) but on the fact of using an ontology to represent such knowledge. A summary of the resources and defined operations is depicted in Table I. As mentioned in the description of the converter module, individuals are exchanged by using a developed structure designated as IndividualWS. Using OWL language, an individual of the ontology can be described as a member of a class with individual axioms or facts as individual property values (datatype and object properties).

HG AND TS MANAGEMENT MODULES:

Two management modules and web technology modules inside the HG and the TS constitute the main parts of the telemedicine system (see Fig. 1). The modules that comprise the architecture have been developed using .NET technologies. Specifically, the .NET framework (version 3.5) has been used to process the ontology and create new instances, data acquisition, and manipulation when the rules are applied. Regarding the web modules the components of the remote management module installed in the TS are depicted in Fig. 1. This management module includes the following three components:

1) Ontology knowledge base module: This module contains the ontology knowledge models and the instances of the registered management profiles. The TDB triple-store has been used to store the ontology model and new instances in this knowledge base module.

2) Converter module: The communication module of this architecture is mainly based on OWL instances exchanged generically by means of a developed object structure named IndividualWS. The converter module is used to wrap and unwrap the individuals structure used to exchange information with web clients. Furthermore, this module incorporates some reasoning tasks. Ontology-based reasoning is used in order to check instances before including new information

in the model and to ensure the consistency of the model.

3) Rules module: This module is used to store rules associated with each management profile. These rules are subsequently transferred by means of an XML file. As shown in Fig. 1, an additional GUI is required in order to make easier for EM, technical or clinical (physician), the process of defining the profiles and the rules. We are currentlyworking in the development of this GUI combining ontology visualization techniques and usability methods. The methodology used to design this interface components of the management module installed in the HG are equally depicted in Fig. 1. This last management module has been designated the “Semantic Autonomic Agent.” This module plays a key role in the architecture. It is in charge of integrating incoming data and executing the management tasks described in the management profile.

The communication between this agent and the management module installed at the remote site is established through a web client connection to the WS installed in the remote TS. The architecture of the agent comprises the ontology knowledge base module, the rules module, the converter module, and the following modules.

1) MAPE module: This module constitutes the computing core of the agent. It will be used to run the tasks specified in each management profile, hence to execute the closed loop from the MAPE loop process.

2) Integrator module: Information transferred by MDs and also contextual data provided by patients will be acquired in this module, which integrates data coming from different data sources.

3) Reminders and alarms module: This module includes clock functionalities to ask patients about data (reminders) or to collect information from a specific software resource.

4) Actions module: This last module is used to execute actions described within the execution tasks of the management profile if an abnormal finding occurs.

FLOW AND WORKFLOW PERFORMANCE:

All the modules and sources involved in the management procedure. The first step (see Fig. 3) consists in the download of the management profile (patient profile or technical profile). First of all, an instance of the management profile should be configured by an EM placed at a remote site. Furthermore, a set of individual rules should be configured for each particular management purpose. As shown in Fig. 3, the designed GUI helps the physician with the ontology instantiation process and the rules definition. The outputs of this interface (which uses selected classes of the ontology as a navigation tool) are a personalized management profile and a set of rules gathered in an XML file. Other functionalities such as queries over acquired data or crossing data among patients to take some decisions could be of interest to be included in this tool.

The communication is always initiated by the user (web client at HG). Through a connection to the web service, the user (the patient in the telemonitoring scenario) situated at home site will acquire the required management profile. As shown in Fig. 3, if the user requests for an update of his/her management profile, then the version of the available profile at the TS will be requested for its evaluation (GET property value). When the user requests a new management profile, first, it is checked whether the ontology to download it is available (GET ontology). After that, the rules and the management profile will be downloaded when required.

The methods involved are 1) GET (rules) and 2) GET (individual). Note that the TLS authentication phase is not depicted in Fig. 3, but it is initially carried out in order to allow the web client connection to the web service. As depicted in Fig. 3, the associated management profile is extracted from the ontology and the instances of the ontology managed by Jena are wrapped into the IndividualWS structure through the converter module. Once the management profile is in the HG, it will be processed into the converter module, unwrapped, and inserted as individuals managed by Jena in the ontology. Once the management profile has been included in the ontology knowledge base module of the HG, it will be evaluated in the MAPE module and the management procedure will be performed by running the tasks specified in the profile.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company.  For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are 

  • ECONOMICAL FEASIBILITY
  • TECHNICAL FEASIBILITY
  • SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:     

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

 

5.1.2 TECHNICAL FEASIBILITY   

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.  

5.1.3 SOCIAL FEASIBILITY:  

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months.

This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

Description Expected result
Test for application window properties. All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program  testing  checks  for  two  types  of  errors:  syntax  and  logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

5.1.2 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem.  The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

Description Expected result
Test for all modules. All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.


5.1. 3 NON-FUNCTIONAL TESTING:

 The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

  • Load testing
  • Performance testing
  • Usability testing
  • Reliability testing
  • Security testing

5.1.4 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Description Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.


5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

Description Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.   Should handle large input values, and produce accurate result in a  expected time.  


5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

Description Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of  the server an alternate server should take over the job.


5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

  Description Expected result
Checking that the user identification is authenticated. In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.


5.1.8 WHITE BOX TESTING:

White  box  testing,  sometimes called  glass-box  testing is  a test  case  design method  that  uses  the  control  structure  of the procedural  design  to  derive  test  cases. Using  white  box  testing  method,  the software  engineer  can  derive  test  cases. The White box testing focuses on the inner structure of the software structure to be tested.

Description Expected result
Exercise all logical decisions on their true and false sides. All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds. All the loops must be finite.
Exercise internal data structures to ensure their validity. All the data structures must be valid.


5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software.  That  is,  black  testing  enables  the software engineer  to  derive  sets  of  input  conditions  that  will  fully  exercise  all  functional requirements  for  a  program.  Black box testing is not alternative to white box techniques.  Rather  it  is  a  complementary  approach  that  is  likely  to  uncover  a different  class  of  errors  than  white box  methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

Description Expected result
To check for incorrect or missing functions. All the functions must be valid.
To check for interface errors. The entire interface must function normally.
To check for errors in a data structures or external data base access. The database updation and retrieval must be done.
To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

6.0 SOFTWARE DESCRIPTION:

 

6.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

 

The Java Programming Language

 

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

  • Simple
    • Architecture neutral
    • Object oriented
    • Portable
    • Distributed     
    • High performance
    • Interpreted     
    • Multithreaded
    • Robust
    • Dynamic
    • Secure     

With most programming languages, you either compile or interpret a program so that you can run it on your computer. The Java programming language is unusual in that a program is both compiled and interpreted. With the compiler, first you translate a program into an intermediate language called Java byte codes —the platform-independent codes interpreted by the interpreter on the Java platform. The interpreter parses and runs each Java byte code instruction on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web browser that can run applets, is an implementation of the Java VM. Java byte codes help make “write once, run anywhere” possible. You can compile your program into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the same program written in the Java programming language can run on Windows 2000, a Solaris workstation, or on an iMac.

6.2 THE JAVA PLATFORM:

A platform is the hardware or software environment in which a program runs. We’ve already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that it’s a software-only platform that runs on top of other hardware-based platforms.

The Java platform has two components:

  • The Java Virtual Machine (Java VM)
  • The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The next section, What Can Java Technology Do? Highlights what functionality some of the packages in the Java API provide.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware platform. As a platform-independent environment, the Java platform can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can bring performance close to that of native code without threatening portability.

6.3 WHAT CAN JAVA TECHNOLOGY DO?

The most common types of programs written in the Java programming language are applets and applications. If you’ve surfed the Web, you’re probably already familiar with applets. An applet is a program that adheres to certain conventions that allow it to run within a Java-enabled browser.

However, the Java programming language is not just for writing cute, entertaining applets for the Web. The general-purpose, high-level Java programming language is also a powerful software platform. Using the generous API, you can write many types of programs.

An application is a standalone program that runs directly on the Java platform. A special kind of application known as a server serves and supports clients on a network. Examples of servers are Web servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet.

A servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of applications. Instead of working in browsers, though, servlets run within Java Web servers, configuring or tailoring the server.

How does the API support all these kinds of programs? It does so with packages of software components that provides a wide range of functionality. Every full implementation of the Java platform gives you the following features:

  • The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
  • Applets: The set of conventions used by applets.
  • Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
  • Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
  • Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
  • Software components: Known as JavaBeansTM, can plug into existing component architectures.
  • Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
  • Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

 

6.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

We can’t promise you fame, fortune, or even a job if you learn the Java programming language. Still, it is likely to make your programs better and requires less effort than other languages. We believe that Java technology will help you do the following:

  • Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
  • Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
  • Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
  • Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
  • Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure JavaTM Product Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
  • Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
  • Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

 

6.5 ODBC:

 

Microsoft Open Database Connectivity (ODBC) is a standard programming interface for application developers and database systems providers. Before ODBC became a de facto standard for Windows programs to interface with database systems, programmers had to use proprietary languages for each database they wanted to connect to. Now, ODBC has made the choice of the database system almost irrelevant from a coding perspective, which is as it should be. Application developers have much more important things to worry about than the syntax that is needed to port their program from one database to another when business needs suddenly change.

Through the ODBC Administrator in Control Panel, you can specify the particular database that is associated with a data source that an ODBC application program is written to use. Think of an ODBC data source as a door with a name on it. Each door will lead you to a particular database. For example, the data source named Sales Figures might be a SQL Server database, whereas the Accounts Payable data source could refer to an Access database. The physical database referred to by a data source can reside anywhere on the LAN.

The ODBC system files are not installed on your system by Windows 95. Rather, they are installed when you setup a separate database application, such as SQL Server Client or Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this program and each maintains a separate list of ODBC data sources.

From a programming perspective, the beauty of ODBC is that the application can be written to use the same set of function calls to interface with any data source, regardless of the database vendor. The source code of the application doesn’t change whether it talks to Oracle or SQL Server. We only mention these two as an example. There are ODBC drivers available for several dozen popular database systems. Even Excel spreadsheets and plain text files can be turned into data sources. The operating system uses the Registry information written by ODBC Administrator to determine which low-level ODBC drivers are needed to talk to the data source (such as the interface to Oracle or SQL Server). The loading of the ODBC drivers is transparent to the ODBC application program. In a client/server environment, the ODBC API even handles many of the network issues for the application programmer.

The advantages of this scheme are so numerous that you are probably thinking there must be some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to the native database interface. ODBC has had many detractors make the charge that it is too slow. Microsoft has always claimed that the critical factor in performance is the quality of the driver software that is used. In our humble opinion, this is true. The availability of good ODBC drivers has improved a great deal recently. And anyway, the criticism about performance is somewhat analogous to those who said that compilers would never match the speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the opportunity to write cleaner programs, which means you finish sooner. Meanwhile, computers get faster every year.

6.6 JDBC:

In an effort to set an independent database standard API for Java; Sun Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access mechanism that provides a consistent interface to a variety of RDBMSs. This consistent interface is achieved through the use of “plug-in” database connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he or she must provide the driver for each platform that the database and Java run on.

To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you discovered earlier in this chapter, ODBC has widespread support on a variety of platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster than developing a completely new connectivity solution.

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

The remainder of this section will cover enough information about JDBC for you to know what it is about and how to use it effectively. This is by no means a complete overview of JDBC. That would fill an entire book.

 

6.7 JDBC Goals:

Few software packages are designed without goals in mind. JDBC is one that, because of its many goals, drove the development of the API. These goals, in conjunction with early reviewer feedback, have finalized the JDBC class library into a solid framework for building database applications in Java.

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although not the lowest database interface level possible, it is at a low enough level for higher-level tools and APIs to be created. Conversely, it is at a high enough level for application programmers to use it confidently. Attaining this goal allows for future tool vendors to “generate” JDBC code and to hide many of JDBC’s complexities from the end user.

SQL Conformance

SQL syntax varies as you move from database vendor to database vendor. In an effort to support a wide variety of vendors, JDBC will allow any query statement to be passed through it to the underlying database driver. This allows the connectivity module to handle non-standard functionality in a manner that is suitable for its users.

JDBC must be implemental on top of common database interfaces

The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal allows JDBC to use existing ODBC level drivers by the use of a software interface. This interface would translate JDBC calls to ODBC and vice versa.

  1. Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

  • Keep it simple

This goal probably appears in all software design goal listings. JDBC is no exception. Sun felt that the design of JDBC should be very simple, allowing for only one method of completing a task per mechanism. Allowing duplicate functionality only serves to confuse the users of the API.

  • Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

  • Keep the common cases simple

Because more often than not, the usual SQL calls used by the programmer are simple SELECT’s, INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to perform with JDBC. However, more complex SQL statements should also be possible.

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple                                     Architecture-neutral

Object-oriented                       Portable

Distributed                              High-performance

Interpreted                              Multithreaded

Robust                                     Dynamic Secure

Java is also unusual in that each Java program is both compiled and interpreted. With a compile you translate a Java program into an intermediate language called Java byte codes the platform-independent code instruction is passed and run on the computer.

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

6.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

The IP layer provides a connectionless and unreliable delivery system. It considers each datagram independently of the others. Any association between datagram must be supplied by the higher layers. The IP layer supplies a checksum that includes its own header. The header includes the source and destination addresses. The IP layer handles routing through an Internet. It is also responsible for breaking up large datagram into smaller ones for transmission and reassembling them at the other end.

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

A service exists on a host, and is identified by its port. This is a 16 bit number. To send a message to a server, you send it to the port for that service of the host that it is running on. This is not location transparency! Certain of these ports are “well known”.

Sockets:

A socket is a data structure maintained by the system to handle network connections. A socket is created using the call socket. It returns an integer that is like a file descriptor. In fact, under Windows, this handle can be used with Read File and Write File functions.

#include <sys/types.h>
#include <sys/socket.h>
int socket(int family, int type, int protocol);

Here “family” will be AF_INET for IP communications, protocol will be zero, and type will depend on whether TCP or UDP is used. Two processes wishing to communicate over a network create a socket each. These are similar to two ends of a pipe – but the actual pipe does not yet exist.

6.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

 

6.8.1. Map Visualizations:

Charts showing values that relate to geographical areas. Some examples include: (a) population density in each state of the United States, (b) income per capita for each country in Europe, (c) life expectancy in each country of the world. The tasks in this project include: Sourcing freely redistributable vector outlines for the countries of the world, states/provinces in particular countries (USA in particular, but also other areas);

Creating an appropriate dataset interface (plus default implementation), a rendered, and integrating this with the existing XYPlot class in JFreeChart; Testing, documenting, testing some more, documenting some more.

6.8.2. Time Series Chart Interactivity

Implement a new (to JFreeChart) feature for interactive time series charts — to display a separate control that shows a small version of ALL the time series data, with a sliding “view” rectangle that allows you to select the subset of the time series data to display in the main chart.

6.8.3. Dashboards

There is currently a lot of interest in dashboard displays. Create a flexible dashboard mechanism that supports a subset of JFreeChart chart types (dials, pies, thermometers, bars, and lines/time series) that can be delivered easily via both Java Web Start and an applet.

 

6.8.4. Property Editors

The property editor mechanism in JFreeChart only handles a small subset of the properties that can be set for charts. Extend (or reimplement) this mechanism to provide greater end-user control over the appearance of the charts.

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION:

This study describes architecture to enable data integration and its management in an ontology-driven telemonitoring solution implemented in home-based scenarios. This is an innovative architecture that facilitates the integration of several management services at home sites using the same software engine. The architecture has been specifically studied to support both technical and clinical services in the telemonitoring scenario, thus avoiding installing additional software for technical purposes.

HOTMES ontology used at the conceptual layer to describe a management profile on the one hand, our ontology contributes to integrate data and its management offering benefits in terms of knowledge representation, workflow organization, and self-management capabilities to the system. Its combination with rules allows providing personalized services.

This application ontology could be in future improved by introducing concepts from domain ontology. On the other hand, the data and communication layer of the architecture, based on the REST WS, was oriented to minimizing the consumption of resources and providing reusable key ideas for future ontology-based architecture developments.

8.2 FUTURE ENHANCEMENT

This solution represents a further step toward the possibility of establishing more effective home-based telemonitoring systems and thus improving the remote care of patientswith chronic diseases. As it was reported in, good telemedicine implementations are developed after a process where the dynamic interaction among a combination of socio-technical and also clinical factors is optimized. It means that additional work should be done (e.g., to measure the interaction of the

patient–doctor using the system and also the truthfulness of the system for a long period of time) before adopting this solution in a real scenario its complete development, first, a concordance study should be conducted in order to determine its clinical efficiency. Then, a social impact study should be conducted in order to determine how the system allowed improving patient’s quality of life. Regarding these last studies, the results presented in evidence the benefits of telemonitoring systems while linking their success to the usability design issues and features.

CHAPTER 9

9.1 REFERENCES

[1] I. Martinez et al., “Seamless integration of ISO/IEEE11073 personal health devices and ISO/EN13606 electronic health records into an endto- end interoperable solution,” Telemed. J. E. Health, vol. 16, no. 10, pp. 993–1004, 2010.

[2] M. Figueredo and J. Dias, “Service oriented architecture to support realtime implementation of artifact detection in critical care monitoring,” in Proc. IEEE. Annu. Int. Conf. Eng. Med. Biol. Soc., 2011, pp. 4925–4928.

[3] JD. Trigo, I. Mart´ınez, A. Alesanco, A. Kollmann, J. Escayola, D. Hayn, G. Schreier, and J. Garc´ıa, “An integrated healthcare information system for end-to-end standardized exchange and homogeneous management of digital ECG formats,” IEEE Trans. Inf. Technol. Biomed., vol. 16, no. 4, pp. 518–529, Jul. 2012.

[4] F. Paganelli and D. Giuli, “An ontology-based system for context-aware and configurable services to support home-based continuous care,” IEEE Trans. Inform. Tech. Biomed., vol. 15, no. 2, pp. 324–333, 2011.

[5] D. Ria˜no, F. Real, J. A. L´opez-Vallverd´u, F. Campana, S. Ercolani, P. Mecocci, R. Annicchiarico, and C. Caltagirone, “An ontology-based personalization of health-care knowledge to support clinical decisions for chronically ill patients,” J. Biomed. Informat., vol. 45, no. 3, pp. 429–446, 2012.

[6] I. Berges, J. Bermudez, and A. Illarramendi, “Towards semantic interoperability of electronic health records,” IEEE Trans. Inf. Technol. Biomed., vol. 16, no. 3, pp. 424–431, May 2012.

[7] G. Mulligan and D. Gracanin, “A comparison of SOAP and REST implementations of a service based interaction independence middleware framework,” in Proc. Winter Simul. Conf., 2009, pp. 1423–1432.

Congestion Aware Routing in Nonlinear Elastic Optical Networks

CLOUDQUAL A Quality Model for Cloud Services

Cloud Service Negotiation in Internet of Things Environment A Mixed Approach

Behavioral Malware Detection in Delay Tolerant Networks

BEHAVIORAL MALWARE DETECTION IN DELAY TOLERANT NETWORKS

By

A

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the                                                  FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

Of

MASTER OF TECHNOLOGY

IN

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

BONAFIDE CERTIFICATE

Certified that this project report titled “BEHAVIORAL MALWARE DETECTION IN DELAY TOLERANT NETWORKS” is the bonafide work of Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide                                                                             Signature of the H.O.D

Name                                                                                                           Name

CHAPTER 1

1.0 ABSTRACT:

The delay-tolerant-network (DTN) model is becoming a viable communication alternative to the traditional infrastructural model for modern mobile consumer electronics equipped with short-range communication technologies such as Bluetooth, NFC, and Wi-Fi Direct. Proximity malware is a class of malware that exploits the opportunistic contacts and distributed nature of DTNs for propagation. Behavioral characterization of malware is an effective alternative to pattern matching in detecting malware, especially when dealing with polymorphic or obfuscated malware.

 In this paper, we first propose a general behavioral characterization of proximity malware which based on naive Bayesian model, which has been successfully applied in non-DTN settings such as filtering email spams and detecting botnets. We identify two unique challenges for extending Bayesian malware detection to DTNs (“insufficient evidence versus evidence collection risk” and “filtering false evidence sequentially and distributedly”), and propose a simple yet effective method, look ahead, to address the challenges. Furthermore, we propose two extensions to look ahead, dogmatic filtering, and adaptive look ahead, to address the challenge of “malicious nodes sharing false evidence.” Real mobile network traces are used to verify the effectiveness of the proposed methods.

1.1 INTRODUCTION

MALWARE:

Malware, short for malicious software, is any software used to disrupt computer operation, gather sensitive information, or gain access to private computer systems. It can appear in the form of executable codescripts, active content, and other software. ‘Malware’ is a general term used to refer to a variety of forms of hostile or intrusive software. The term badware is sometimes used, and applied to both true (malicious) malware and unintentionally harmful software, viruses, wormstrojan  horsesransomware, spyware, adware, scareware and other malicious programs of active malware threats were worms or trojans rather than viruses. In law, malware is sometimes known as a computer contaminant, as in the legal codes of several malware is often disguised as, or embedded in, non-malicious files.

Spyware or other malware is sometimes found embedded in programs supplied officially by companies, e.g., downloadable from websites, that appear useful or attractive, but may have, for example, additional hidden tracking functionality that gathers marketing statistics. An example of such software, which was described as illegitimate, is the Sony rootkit, a Trojan embedded into CDs sold by Sony, which silently installed and concealed itself on purchasers’ computers with the intention of preventing illicit copying; it also reported on users’ listening habits, and created vulnerabilities that were exploited by unrelated malware.

PURPOSES:

Many early infectious programs, including the first Internet Worm, were written as experiments or pranks. Today, malware is used by both black hat hackers and governments, to steal personal, financial, or business information and sometimes for sabotage (e.g., Stuxnet). Malware is sometimes used broadly against government or corporate websites to gather guarded information, or to disrupt their operation in general. However, malware is often used against individuals to gain information such as personal identification numbers or details, bank or credit card numbers, and passwords. Left unguarded, personal and networked computers can be at considerable risk against these threats. (These are most frequently defended against by various types of firewallanti-virussoftware, and network hardware).

Since the rise of widespread broadband Internet access, malicious software has more frequently been designed for profit. Since 2003, the majority of widespread viruses and worms have been designed to take control of users’ computers for illicit purposes. Infected “zombie computers” are used to send email spam, to host contraband data such as child pornography, or to engage in distributed denial-of-service attacks as a form of extortion.

Programs designed to monitor users’ web browsing, display unsolicited advertisements, or redirect affiliate marketing revenues are called spyware. Spyware programs do not spread like viruses; instead they are generally installed by exploiting security holes. They can also be packaged together with user-installed software, such as peer-to-peer applications.

Ransomware affects an infected computer in some way, and demands payment to reverse the damage. For example, programs such as CryptoLocker encrypt files securely, and only decrypt them on payment of a substantial sum of money.

INFECTIOUS MALWARE: VIRUSES AND WORMS:

The best-known types of malware, viruses and worms, are known for the manner in which they spread, rather than any specific types of behavior. The term computer virus is used for a program that embeds itself in some other executable software (including the operating system itself) on the target system without the users consent and when that is run causes the virus to spread to other executables. On the other hand, a worm is a stand-alone malware program that actively transmits itself over a network to infect other computers. These definitions lead to the observation that a virus requires the user to run an infected program or operating system for the virus to spread, whereas a worm spreads itself.

CONCEALMENT: Viruses, trojan horses, rootkits, and backdoors

TROJAN HORSES

For a malicious program to accomplish its goals, it must be able to run without being detected, shut down, or deleted. When a malicious program is disguised as something normal or desirable, users may unwittingly install it. This is the technique of the Trojan horse or trojan. In broad terms, a Trojan horse is any program that invites the user to run it, concealing harmful or malicious executable code of any description. The code may take effect immediately and can lead to many undesirable effects, such as encrypting the user’s files or downloading and implementing further malicious functionality.[citation needed]

In the case of some spyware, adware, etc. the supplier may require the user to acknowledge or accept its installation, describing its behavior in loose terms that may easily be misunderstood or ignored, with the intention of deceiving the user into installing it without the supplier technically in breach of the law.[citation needed]

ROOTKITS

Once a malicious program is installed on a system, it is essential that it stays concealed, to avoid detection. Software packages known as rootkits allow this concealment, by modifying the host’s operating system so that the malware is hidden from the user. Rootkits can prevent a malicious process from being visible in the system’s list of processes, or keep its files from being read.

Some malicious programs contain routines to defend against removal, not merely to hide them. An early example of this behavior is recorded in the Jargon File tale of a pair of programs infesting a Xerox CP-V time sharing system:

Each ghost-job would detect the fact that the other had been killed, and would start a new copy of the recently-stopped program within a few milliseconds. The only way to kill both ghosts was to kill them simultaneously (very difficult) or to deliberately crash the system.

BACKDOORS

backdoor is a method of bypassing normal authentication procedures, usually over a connection to a network such as the Internet. Once a system has been compromised, one or more backdoors may be installed in order to allow access in the future,[30] invisibly to the user.

The idea has often been suggested that computer manufacturers preinstall backdoors on their systems to provide technical support for customers, but this has never been reliably verified. It was reported in 2014 that US government agencies had been diverting computers purchased by those considered “targets” to secret workshops where software or hardware permitting remote access by the agency was installed, considered to be among the most productive operations to obtain access to networks around the world. Backdoors may be installed by Trojan horseswormsimplants, or other methods.

Malware authors target bugs, or loopholes, to exploit. A common method is exploitation of vulnerability, where software designed to store data in a specified region of memory does not prevent more data than the buffer can accommodate being supplied. Malware may provide data that overflows the buffer, with malicious executable code or data after the end; when this payload is accessed it does what the attacker, not the legitimate software, determines.

INSECURE DESIGN OR USER ERROR

Early PCs had to be booted from floppy disks; when built-in hard drives became common the operating system was normally started from them, but it was possible to boot from another boot device if available, such as a floppy disk, CD-ROM, DVD-ROM, or USB flash drive. It was common to configure the computer to boot from one of these devices when available. Normally none would be available; the user would intentionally insert, say, a CD into the optical drive to boot the computer in some special way, for example to install an operating system. Even without booting, computers can be configured to execute software on some media as soon as they become available, e.g. to autorun a CD or USB device when inserted.

Malicious software distributors would trick the user into booting or running from an infected device or medium; for example, a virus could make an infected computer add autorunnable code to any USB stick plugged into it; anyone who then attached the stick to another computer set to autorun from USB would in turn become infected, and also pass on the infection in the same way. More generally, any device that plugs into a USB port-—”including gadgets like lights, fans, speakers, toys, even a digital microscope”—can be used to spread malware. Devices can be infected during manufacturing or supply if quality control is inadequate.

This form of infection can largely be avoided by setting up computers by default to boot from the internal hard drive, if available, and not to autorun from devices. Intentional booting from another device is always possible by pressing certain keys during boot.

Older email software would automatically open HTML email containing potentially malicious JavaScript code; users may also execute disguised malicious email attachments and infected executable files supplied in other ways.

1.2 SCOPE OF THE PROJECT:

We present a general behavioral characterization of proximity malware, which captures the functional but imperfect nature in detecting proximity malware characterization, and with a simple cut-off malware containment strategy, we formulate the malware detection process as a distributed decision problem. We analyze the risk associated with the decision, and design a simple, yet effective, strategy, look ahead, which naturally reflects individual nodes’ intrinsic risk inclinations against malware infection. Look ahead extends the naive Bayesian model, and addresses the DTN specific, malware-related, “insufficient evidence versus evidence collection risk” problem.

We consider the benefits of sharing assessments among nodes, and address challenges derived from the DTN model: liars (i.e., bad-mouthing and false praising malicious nodes) and defectors (i.e., good nodes that have turned rogue due to malware infections). We present two alternative techniques, dogmatic filtering and adaptive look ahead, that naturally extend look ahead to consolidate evidence provided by others, while containing the negative effect of false evidence. A nice property of the proposed evidence consolidation methods is that the results will not worsen even if liars are the majority in the neighborhood traces are used to verify the effectiveness of the methods.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Existing worms, spam, and phishing exploit gaps in traditional threat models that usually revolve around preventing unauthorized access and information disclosure. The new threat landscape requires security researchers to consider a wider range of attacks: opportunistic attacks in addition to targeted ones; attacks coming not just from malicious users, but also from subverted (yet otherwise benign) hosts; coordinated/distributed attacks in addition to isolated, single-source methods; and attacks blending flaws across layers, rather than exploiting a single vulnerability. Some of the largest security lapses in the last decade are due to designers ignoring the complexity of the threat landscape.

The increasing penetration of wireless networking, and more specifically wifi, may soon reach critical mass, making it necessary to examine whether the current state of wireless security is adequate for fending off likely attacks.

Three types of threats that seem insufficiently addressed by existing technology and deployment techniques. The first threat is wildfire worms, a class of worms that spreads contagiously between hosts on neighboring APs. We show that such worms can spread to a large fraction of hosts in a dense urban setting, and that the propagation speed can be such that most existing defenses cannot react in a timely fashion. Worse, such worms can penetrate through networks protected by WEP and other security mechanisms. The second threat we discuss is large-scale spoofing attacks that can be used for massive phishing and spam campaigns. We show how an attacker can easily use a botnet by acquiring access to wifi-capable zombie hosts, and can use these zombies to target not just the local wireless LAN, but any LAN within range, greatly increasing his reach across heterogeneous networks.

2.2 DISADVANTAGES:

  • Viruses can cause many problems on your computer. Usually, they display pop-up ads on your desktop or steal your information. Some of the more nasty ones can even crash your computer or delete your files.
  • Your computer gets slowed down. Many “hackers” get jobs with software firms by finding and exploiting problems with software.
  • Some the applications won’t start (ex: I hate mozilla virus won’t let you start the mozilla) you cannot see some of the settings in your OS. (Ex one kind of virus disables hide folder options and you will never be able to set it).

To quantify these threats, we rely on real-world data extracted from wifi maps of large metropolitan areas in the country. Existing  results suggest that a carefully crafted wireless worm can infect up to 80% of all wifi connected hosts in some metropolitan areas within 20 minutes, and that an attacker can launch phishing attacks or build a tracking system to monitor the location of 10-50% of wireless users in these metropolitan areas with just 1,000 zombies under his control.

2.3 PROPOSED SYSTEM:

In this paper, we present a simple, yet effective solution, look ahead, which naturally reflects individual nodes’ intrinsic risk inclinations against malware infection, to balance between these two extremes. Essentially, we extend the naive Bayesian model, which has been applied in filtering email, spams detecting botnets, and designing IDSs.

We analyze the risk associated with the decision, and design a simple, yet effective, strategy, look ahead, which naturally reflects individual nodes’ intrinsic risk inclinations against malware infection. Look ahead extends the naive Bayesian model, and addresses the DTN specific, malware-related, “insufficient evidence versus evidence collection risk” Proximity malware is a malicious program that disrupts the host node’s normal function and has a chance of duplicating itself to other nodes during (opportunistic) contact opportunities between nodes in the DTN.

We consider the benefits of sharing assessments among nodes, and address challenges derived from the DTN model: liars (i.e., bad-mouthing and false praising malicious nodes) and defectors (i.e., good nodes that have turned rogue due to malware infections). We present two alternative techniques, dogmatic filtering and adaptive look ahead, that naturally extend look ahead to consolidate evidence provided by others, while containing the negative effect of false evidence. A nice property of the proposed evidence consolidation methods is that the results will not worsen even if liars are the majority in the neighborhood traces are used to verify the effectiveness of the methods.

2.4 ADVANTAGES:

Two DTN specific, malware-related:

1. Insufficient evidence versus evidence collection risk. In DTNs, evidence (such as Bluetooth connection or SSH session requests) is collected only when nodes come into contact. But contacting malware-infected nodes carries the risk of being infected. Thus, nodes must make decisions (such as whether to cut off other nodes and, if yes, when) online based on potentially insufficient evidence.

2. Filtering false evidence sequentially and distributedly. Sharing evidence among opportunistic acquaintances helps alleviating the aforementioned insufficient evidence problem; however, false evidence shared by malicious nodes (the liars) may negate the benefits of sharing. In DTNs, nodes must decide whether to accept received evidence sequentially and distributedly.

2.5 HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP
  • Front End                                :           JAVA JDK 1.7
  • Document                               :           MS-Office 2007


CHAPTER 3

3.0 SYSTEM DESIGN

ARCHITECTURE DIAGRAM / DATA FLOW DIAGRAM / UML DIAGRAM:

  • The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system
  • The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  • DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  • DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

 

PROCESS:

People, procedures or devices that produce data. The physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

  1. All processes must have at least one data flow in and one data flow out.
  2. All processes should modify the incoming data, producing new forms of outgoing data.
  3. Each data store must be involved with at least one data flow.
  4. Each external entity must be involved with at least one data flow.
  5. A data flow must be attached to at least one process.

3.0 ARCHITECTURE DIAGRAM


DATAFLOW DIAGRAM:

LEVEL 1

        Server                                                                                                             Client

LEVEL 2


 

UML DIAGRAMS:                                                     

USE CASE DIAGRAM:

CLASS DIAGRAM:

SEQUENCE DIAGRAM:


ACTIVITY DIAGRAMS:


CHAPTER 4

4.0 IMPLEMENTATION

4.1 MODULES:

DELAY TOLERANT NETWORKS:

PROXIMITY MALWARE:

MALWARE PROPAGATION:

BAYESIAN FILTERING:

4.1 MODULE DESCRIPTION:

DELAY TOLERANT NETWORKS:

Delay-tolerant networking (DTN) is an approach to computer network architecture that seeks to address the technical issues in heterogeneous networks that may lack continuous network connectivity. Examples of such networks are those operating in mobile or extreme terrestrial environments, or planned networks in space.

Recently, the term disruption-tolerant networking has gained currency in the United States due to support from DARPA, which has funded many DTN projects. Disruption may occur because of the limits of wireless radio range, sparsity of mobile nodes, energy resources, attack, and noise.

DTNs, including a growing number of academic conferences on delay and disruption-tolerant networking, and growing interest in combining work from sensor networks and MANETs with the work on DTN. This field saw many optimizations on classic ad hoc and delay-tolerant networking algorithms and began to examine factors such as security, reliability, verifiability, and other areas of research that are well understood in traditional computer networking.

PROXIMITY MALWARE:

Proximity malware is a malicious program that disrupts the host node’s normal function and has a chance of duplicating itself to other nodes during (opportunistic) contact opportunities between nodes in the DTN. When duplication occurs, the other node is infected with the malware. In our model, we assume that each node is capable of assessing the other party for suspicious actions after each encounter, resulting in a binary assessment. For example, a node can assess a Bluetooth connection or an SSH session for potential Cabir or Ikee infection.

The watchdog components in previous works on malicious behavior detection in MANETs and distributed reputation systems are other examples. A node is either evil or good, based on if it is or is not infected by the malware. The suspiciousaction assessment is assumed to be an imperfect but functional indicator of malware infections: It may occasionally assess an evil node’s actions as “nonsuspicious” or a good node’s actions as “suspicious,” but most suspicious actions are correctly attributed to evil nodes. A previous work on distributed IDS presents an example for such imperfect but functional binary classifier on nodes’ behaviors.

MALWARE PROPAGATION:

We analyzed malware propagation through proximity channels in social networks. Akritidis et al. quantified the threat of proximity malware in wide-area wireless networks in optimal malware signature distribution in heterogeneous, resource-constrained mobile networks in traditional, non-DTN, networks and Bayer et al.

We proposed to detect malware with learned behavioral model, in terms of system call and program flow. We extend the Naive Bayesian model, which has been applied in filtering email spams detecting botnets and designing IDSs and address DTN-specific, malware-related, problems. In the context of detecting slowly propagating Internet worm, Dash et al. presented a distributed IDS architecture of local/global detector that resembles the neighborhood-watch model, with the assumption of attested/honest evidence, i.e., without liars.

BAYESIAN FILTERING:

Naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without accepting Bayesian probability or using any Bayesian methods.

The implications are as follows:

. Given enough assessments, honest nodes are likely to obtain a close estimation of a node’s suspiciousness (suppose they have not cut the node off yet), even if they only use their own assessments.

. The liars have to share a significant amount of false evidence to sway the public’s opinion on a node’s suspiciousness.

. The most susceptible victims of liars are the nodes that have little evidence Dogmatic filtering. Dogmatic filtering is based on the observation that one’s own assessments are truthful and, therefore, can be used to bootstrap the evidence consolidation process. A node shall only accept evidence that will not sway its current opinion too much. We call this observation the dogmatic principle.

We extend the Naive Bayesian model, which has been applied in filtering email, spams detecting botnets and designing IDSs and address DTN-specific, malware-related, problems. In the context of detecting slowly propagating Internet worm, Dash et al. presented a distributed IDS architecture of local/global detector that resembles the neighborhood-watch model,

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company.  For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are      

  • ECONOMICAL FEASIBILITY
  • TECHNICAL FEASIBILITY
  • SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:                  

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

5.1.2 TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.  

5.1.3 SOCIAL FEASIBILITY:  

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months. This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program  testing  checks  for  two  types  of  errors:  syntax  and  logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

UNIT TESTING:

Description Expected result
Test for application window properties. All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem.  The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

FUNCTIONAL TESTING:

Description Expected result
Test for all modules. All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

 The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

  • Load testing
  • Performance testing
  • Usability testing
  • Reliability testing
  • Security testing


5.1.5 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Load Testing

Description Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

PERFORMANCE TESTING:

Description Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.   Should handle large input values, and produce accurate result in a  expected time.  

5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

RELIABILITY TESTING:

Description Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of  the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

SECURITY TESTING:

  Description Expected result
Checking that the user identification is authenticated. In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

White  box  testing,  sometimes called  glass-box  testing is  a test  case  design method  that  uses  the  control  structure  of the procedural  design  to  derive  test  cases. Using  white  box  testing  method,  the software  engineer  can  derive  test  cases. The White box testing focuses on the inner structure of the software structure to be tested.

5.1.8 WHITE BOX TESTING:

Description Expected result
Exercise all logical decisions on their true and false sides. All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds. All the loops must be finite.
Exercise internal data structures to ensure their validity. All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software.  That  is,  black  testing  enables  the software engineer  to  derive  sets  of  input  conditions  that  will  fully  exercise  all  functional requirements  for  a  program.  Black box testing is not alternative to white box techniques.  Rather  it  is  a  complementary  approach  that  is  likely  to  uncover  a different  class  of  errors  than  white box  methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

5.1.10 BLACK BOX TESTING:

Description Expected result
To check for incorrect or missing functions. All the functions must be valid.
To check for interface errors. The entire interface must function normally.
To check for errors in a data structures or external data base access. The database updation and retrieval must be done.
To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

7.0 SOFTWARE DESCRIPTION:

 

7.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

 

The Java Programming Language

 

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

  • Simple
    • Architecture neutral
    • Object oriented
    • Portable
    • Distributed     
    • High performance
    • Interpreted     
    • Multithreaded
    • Robust
    • Dynamic
    • Secure     

With most programming languages, you either compile or interpret a program so that you can run it on your computer. The Java programming language is unusual in that a program is both compiled and interpreted. With the compiler, first you translate a program into an intermediate language called Java byte codes —the platform-independent codes interpreted by the interpreter on the Java platform. The interpreter parses and runs each Java byte code instruction on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web browser that can run applets, is an implementation of the Java VM. Java byte codes help make “write once, run anywhere” possible. You can compile your program into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the same program written in the Java programming language can run on Windows 2000, a Solaris workstation, or on an iMac.

7.2 THE JAVA PLATFORM:

A platform is the hardware or software environment in which a program runs. We’ve already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that it’s a software-only platform that runs on top of other hardware-based platforms.

The Java platform has two components:

  • The Java Virtual Machine (Java VM)
  • The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The next section, What Can Java Technology Do? Highlights what functionality some of the packages in the Java API provide.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware platform. As a platform-independent environment, the Java platform can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can bring performance close to that of native code without threatening portability.

7.3 WHAT CAN JAVA TECHNOLOGY DO?

The most common types of programs written in the Java programming language are applets and applications. If you’ve surfed the Web, you’re probably already familiar with applets. An applet is a program that adheres to certain conventions that allow it to run within a Java-enabled browser.

However, the Java programming language is not just for writing cute, entertaining applets for the Web. The general-purpose, high-level Java programming language is also a powerful software platform. Using the generous API, you can write many types of programs.

An application is a standalone program that runs directly on the Java platform. A special kind of application known as a server serves and supports clients on a network. Examples of servers are Web servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet.

A servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of applications. Instead of working in browsers, though, servlets run within Java Web servers, configuring or tailoring the server.

How does the API support all these kinds of programs? It does so with packages of software components that provides a wide range of functionality. Every full implementation of the Java platform gives you the following features:

  • The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
  • Applets: The set of conventions used by applets.
  • Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
  • Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
  • Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
  • Software components: Known as JavaBeansTM, can plug into existing component architectures.
  • Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
  • Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

 

7.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

We can’t promise you fame, fortune, or even a job if you learn the Java programming language. Still, it is likely to make your programs better and requires less effort than other languages. We believe that Java technology will help you do the following:

  • Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
  • Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
  • Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
  • Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
  • Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure JavaTM Product Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
  • Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
  • Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

 

7.5 ODBC:

 

Microsoft Open Database Connectivity (ODBC) is a standard programming interface for application developers and database systems providers. Before ODBC became a de facto standard for Windows programs to interface with database systems, programmers had to use proprietary languages for each database they wanted to connect to. Now, ODBC has made the choice of the database system almost irrelevant from a coding perspective, which is as it should be. Application developers have much more important things to worry about than the syntax that is needed to port their program from one database to another when business needs suddenly change.

Through the ODBC Administrator in Control Panel, you can specify the particular database that is associated with a data source that an ODBC application program is written to use. Think of an ODBC data source as a door with a name on it. Each door will lead you to a particular database. For example, the data source named Sales Figures might be a SQL Server database, whereas the Accounts Payable data source could refer to an Access database. The physical database referred to by a data source can reside anywhere on the LAN.

The ODBC system files are not installed on your system by Windows 95. Rather, they are installed when you setup a separate database application, such as SQL Server Client or Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this program and each maintains a separate list of ODBC data sources.

From a programming perspective, the beauty of ODBC is that the application can be written to use the same set of function calls to interface with any data source, regardless of the database vendor. The source code of the application doesn’t change whether it talks to Oracle or SQL Server. We only mention these two as an example. There are ODBC drivers available for several dozen popular database systems. Even Excel spreadsheets and plain text files can be turned into data sources. The operating system uses the Registry information written by ODBC Administrator to determine which low-level ODBC drivers are needed to talk to the data source (such as the interface to Oracle or SQL Server). The loading of the ODBC drivers is transparent to the ODBC application program. In a client/server environment, the ODBC API even handles many of the network issues for the application programmer.

The advantages of this scheme are so numerous that you are probably thinking there must be some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to the native database interface. ODBC has had many detractors make the charge that it is too slow. Microsoft has always claimed that the critical factor in performance is the quality of the driver software that is used. In our humble opinion, this is true. The availability of good ODBC drivers has improved a great deal recently. And anyway, the criticism about performance is somewhat analogous to those who said that compilers would never match the speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the opportunity to write cleaner programs, which means you finish sooner. Meanwhile, computers get faster every year.

7.6 JDBC:

In an effort to set an independent database standard API for Java; Sun Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access mechanism that provides a consistent interface to a variety of RDBMSs. This consistent interface is achieved through the use of “plug-in” database connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he or she must provide the driver for each platform that the database and Java run on.

To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you discovered earlier in this chapter, ODBC has widespread support on a variety of platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster than developing a completely new connectivity solution.

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

The remainder of this section will cover enough information about JDBC for you to know what it is about and how to use it effectively. This is by no means a complete overview of JDBC. That would fill an entire book.

 

7.7 JDBC Goals:

Few software packages are designed without goals in mind. JDBC is one that, because of its many goals, drove the development of the API. These goals, in conjunction with early reviewer feedback, have finalized the JDBC class library into a solid framework for building database applications in Java.

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although not the lowest database interface level possible, it is at a low enough level for higher-level tools and APIs to be created. Conversely, it is at a high enough level for application programmers to use it confidently. Attaining this goal allows for future tool vendors to “generate” JDBC code and to hide many of JDBC’s complexities from the end user.

SQL Conformance

SQL syntax varies as you move from database vendor to database vendor. In an effort to support a wide variety of vendors, JDBC will allow any query statement to be passed through it to the underlying database driver. This allows the connectivity module to handle non-standard functionality in a manner that is suitable for its users.

JDBC must be implemental on top of common database interfaces

The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal allows JDBC to use existing ODBC level drivers by the use of a software interface. This interface would translate JDBC calls to ODBC and vice versa.

  1. Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

  • Keep it simple

This goal probably appears in all software design goal listings. JDBC is no exception. Sun felt that the design of JDBC should be very simple, allowing for only one method of completing a task per mechanism. Allowing duplicate functionality only serves to confuse the users of the API.

  • Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

  • Keep the common cases simple

Because more often than not, the usual SQL calls used by the programmer are simple SELECT’s, INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to perform with JDBC. However, more complex SQL statements should also be possible.

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple                                     Architecture-neutral

Object-oriented                       Portable

Distributed                              High-performance

Interpreted                              Multithreaded

Robust                                     Dynamic Secure

Java is also unusual in that each Java program is both compiled and interpreted. With a compile you translate a Java program into an intermediate language called Java byte codes the platform-independent code instruction is passed and run on the computer.

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

7.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

The IP layer provides a connectionless and unreliable delivery system. It considers each datagram independently of the others. Any association between datagram must be supplied by the higher layers. The IP layer supplies a checksum that includes its own header. The header includes the source and destination addresses. The IP layer handles routing through an Internet. It is also responsible for breaking up large datagram into smaller ones for transmission and reassembling them at the other end.

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

A service exists on a host, and is identified by its port. This is a 16 bit number. To send a message to a server, you send it to the port for that service of the host that it is running on. This is not location transparency! Certain of these ports are “well known”.

Sockets:

A socket is a data structure maintained by the system to handle network connections. A socket is created using the call socket. It returns an integer that is like a file descriptor. In fact, under Windows, this handle can be used with Read File and Write File functions.

#include <sys/types.h>
#include <sys/socket.h>
int socket(int family, int type, int protocol);

Here “family” will be AF_INET for IP communications, protocol will be zero, and type will depend on whether TCP or UDP is used. Two processes wishing to communicate over a network create a socket each. These are similar to two ends of a pipe – but the actual pipe does not yet exist.

7.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

 

7.8.1. Map Visualizations:

Charts showing values that relate to geographical areas. Some examples include: (a) population density in each state of the United States, (b) income per capita for each country in Europe, (c) life expectancy in each country of the world. The tasks in this project include: Sourcing freely redistributable vector outlines for the countries of the world, states/provinces in particular countries (USA in particular, but also other areas);

Creating an appropriate dataset interface (plus default implementation), a rendered, and integrating this with the existing XYPlot class in JFreeChart; Testing, documenting, testing some more, documenting some more.

7.8.2. Time Series Chart Interactivity

Implement a new (to JFreeChart) feature for interactive time series charts — to display a separate control that shows a small version of ALL the time series data, with a sliding “view” rectangle that allows you to select the subset of the time series data to display in the main chart.

7.8.3. Dashboards

There is currently a lot of interest in dashboard displays. Create a flexible dashboard mechanism that supports a subset of JFreeChart chart types (dials, pies, thermometers, bars, and lines/time series) that can be delivered easily via both Java Web Start and an applet.

 

7.8.4. Property Editors

The property editor mechanism in JFreeChart only handles a small subset of the properties that can be set for charts. Extend (or reimplement) this mechanism to provide greater end-user control over the appearance of the charts.

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION

Behavioral characterization of malware is an effective alternative to pattern matching in detecting malware, especially when dealing with polymorphic or obfuscated malware. Naive Bayesian model has been successfully applied in non-DTN settings, such as filtering email spams and detecting botnets.

We propose a general behavioral characterization of DTN-based proximity malware. We present look ahead, along with dogmatic filtering and adaptive look ahead, to address two unique challenging in extending Bayesian filtering to DTNs: “insufficient evidence versus evidence collection risk” and “filtering false evidence sequentially and distributedly.” In prospect, extension of the behavioral characterization of proximity malware to account for strategic malware detection evasion with game theory is a challenging yet interesting future work.

CHAPTER 9

9.0 REFERENCES:

[1] C. Kolbitsch, P. Comparetti, C. Kruegel, E. Kirda, X. Zhou, and X. Wang, “Effective and Efficient Malware Detection at the End Host,” Proc. 18th Conf. USENIX Security Symp., 2009.

[2] U. Bayer, P. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda, “Scalable, Behavior-Based Malware Clustering,” Proc. 16th Ann. Network and Distributed System Security Symp. (NDSS), 2009.

[3] G. Zyba, G. Voelker, M. Liljenstam, A. Me´hes, and P. Johansson, “Defending Mobile Phones from Proximity Malware,” Proc. IEEE INFOCOM, 2009.

[4] F. Li, Y. Yang, and J. Wu, “CPMC: An Efficient Proximity Malware Coping Scheme in Smartphone-Based Mobile Networks,” Proc. IEEE INFOCOM, 2010.

[6] J. Zdziarski, Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press, 2005.

[7] R. Villamarı´n-Salomo´n and J. Brustoloni, “Bayesian Bot Detection Based on DNS Traffic Similarity,” Proc. ACMymp. Applied Computing (SAC), 2013.

Automatic Scaling of Internet Applications for Cloud Computing Services

AUTOMATIC SCALING OF INTERNET APPLICATIONS FOR CLOUD

COMPUTING SERVICES

By

A

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the                                                  FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

Of

MASTER OF TECHNOLOGY

IN

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

CERTIFICATE

Certified that this project report titled “Automatic Scaling of Internet Applications for Cloud Computing Services” is the bonafide work of Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide                                                                             Signature of the H.O.D

Name                                                                                                           Name

DECLARATION

I hereby declare that the project work entitled “Automatic Scaling of Internet Applications for Cloud Computing Services” Submitted to BHARATHIDASAN UNIVERSITY in partial fulfillment of the requirement for the award of the Degree of MASTER OF SCIENCE IN COMPUTER SCIENCE is a record of original work done by me the guidance of  Prof.A.Vinayagam M.Sc., M.Phil., M.E., to the best of my knowledge, the work reported here is not a part of any other thesis or work on the basis of which a degree or award was conferred on an earlier occasion to me or any other candidate.

                                                                                                        (Student Name)

                                                                                                             (Reg.No)

Place:

Date:

ACKNOWLEDGEMENT

I am extremely glad to present my project “Automatic Scaling of Internet Applications for Cloud Computing Services” which is a part of my curriculum of third semester Master of Science in Computer science. I take this opportunity to express my sincere gratitude to those who helped me in bringing out this project work.

I would like to express my Director, Dr. K. ANANDAN, M.A.(Eco.), M.Ed.,  M.Phil.,(Edn.), PGDCA., CGT., M.A.(Psy.) of who had given me an opportunity to undertake this project.

I am highly indebted to Co-Ordinator Prof. Muniappan Department of Physics and thank from my deep heart for her valuable comments I received through my project.

I wish to express my deep sense of gratitude to my guide                                                  
Prof. A.Vinayagam M.Sc., M.Phil., M.E., for her immense help and encouragement for successful completion of this project.

I also express my sincere thanks to the all the staff members of Computer science for their kind advice.

And last, but not the least, I express my deep gratitude to my parents and friends for their encouragement and support throughout the project.

CHAPTER 1

1.1 ABSTRACT:

Many Internet applications can benefit from an automatic scaling property where their resource usage can be scaled up and down automatically by the cloud service provider. We present a system that provides automatic scaling for Internet applications in the cloud environment. We encapsulate each application instance inside a virtual machine (VM) and use virtualization technology to provide fault isolation. We model it as the Class Constrained Bin Packing (CCBP) problem where each server is a bin and each class represents an application. The class constraint reflects the practical limit on the number of applications a server can run simultaneously.

We develop an efficient semi-online color set algorithm that achieves good demand satisfaction ratio and saves energy by reducing the number of servers used when the load is low. Experiment results demonstrate that our system can improve the throughput an open source implementation of restore the normal QoS five times as fast during flash crowds. Large scale simulations demonstrate that our algorithm is extremely scalable applications. This is an order of magnitude improvement over traditional application placement algorithms in enterprise environments.

1.2 INTRODUCTION

One of the often cited benefits of cloud computing service is the resource elasticity: a business customer can scale up and down its resource usage as needed without upfront capital investment or long term commitment. The Amazon EC2 service, for example, allows users to buy as many virtual machine (VM) instances as they want and operate them much like physical hardware. However, the users still need to decide how much resources are necessary and forhow long. We believe many Internet applications can benefit from an auto scaling property where their resource usage can be scaled up and down automatically by the cloud service provider. A user only needs to upload the application onto a single server in the cloud, and the cloud service will replicate the application onto more or fewer servers as its demand comes and goes. The users are charged only for what they actually use—the so-called “pay as you go” model.

The typical architecture of data center servers for Internet applications it consists of a load balancing switch, a set of application servers, and a set of backend storage servers. The front end switch is typically a Layer 7 switch which parses application level information in Web requests and forwards them to the servers with the corresponding applications running. The switch sometimes runs in a redundant pair for fault tolerance. Each application can run on multiple server machines and the set of their running instances are often managed by some clustering software such as Web Logic.

Each server machine can host multiple applications. The applications store their state information in the backend storage servers. It is important that the applications themselves are stateless so that they can be replicated safely. The storage servers may also become overloaded, but the focus of this work is on the application tier. The Google AppEngine service, for example, requires that the applications be structured in such a two tier architecture and uses the BigTable as its scalable storage solution. A detailed comparison with AppEngine is deferred to Section 7 so that sufficient background can be established. Some distributed data processing applications cannot be mapped into such a tiered architecture easily and thus are not the target of this work. We believe our architecture is representative of a large set of Internet services hosted in the cloud computing environment.

Even though the cloud computing model is sometimes advocated as providing infinite capacity on demand, the capacity of data centers in the real world is finite. The illusion of infinite capacity in the cloud is provided through statistical multiplexing. When a large number of applications experience their peak demand around the same time, the available resources in the cloud can become constrained and some of the demand may not be satisfied. We define the demand satisfaction ratio as the percentage of application demand that is satisfied successfully. The amount of computing capacity available to an application is limited by the placement of its running instances on the servers.

The more instances an application has and the more powerful the underlying servers are, the higher the potential capacity for satisfying the application demand. On the other hand, when the demand of the applications is low, it is important to conserve energy by reducing the number of servers used. Various studies have found that the cost of electricity is a major portion of the operation cost of large data centers. At the same time, the average server utilization in many Internet data centers is very low: real world estimates range from 5% to 20%. Moreover, work has found that the most effective way to conserve energy is to turn the whole server off. The application placement problem is essential to achieving a high demand satisfaction ratio without wasting energy.

In this paper, we present a system that provides automatic scaling for Internet applications in the cloud environment. Our contributions include the following.

  • We summarize the automatic scaling problem in the cloud environment, and model it as a modified Class Constrained Bin Packing (CCBP) problem where each server is a bin and each class represents an application. We develop an innovative auto scaling algorithm to solve the problem and present a rigorous analysis on the quality of it with provable bounds. Compared to the existing Bin Packing solutions, we creatively support item departure which can effectively avoid the frequent placement changes1 caused by repacking.
  • We support green computing by adjusting the placement of application instances adaptively and putting idle machines into the standby mode. Experiments and simulations show that our algorithm is highly efficient and scalable which can achieve high demand satisfaction ratio, low placement change frequency, short request response time, and good energy saving.
  • We build a real cloud computing system which supports our auto scaling algorithm. We compare the performance of our system with an open source implementation of the Amazon EC2 auto scaling system in a testbed of 30 Dell Power Edge blade servers. Experiments show that our system can restore the normal QoS five times as fast when a flash crowd happens.
  • We use a fast restart technique based on virtual machine (VM) suspend and resume that reduces the application start up time dramatically for Internet services. 


1.3 LITRATURE SURVEY

AUTHOR AND PUBLICATION: C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici, “A SCALABLE APPLICATION PLACEMENT CONTROLLER FOR ENTERPRISE DATA CENTERS,” in Proc. Int. World Wide Web Conf. (WWW’07), May 2007, pp. 331–340.

EXPLANATION:

Given a set of machines and a set of Web applications with dynamically changing demands, an online application placement controller decides how many instances to run for each application and where to put them, while observing all kinds of resource constraints. This NP hard problem has real usage in commercial middleware products. Existing approximation algorithms for this problem can scale to at most a few hundred machines, and may produce placement solutions that are far from optimal when system resources are tight. In this paper, we propose a new algorithm that can produce within 30seconds high-quality solutions for hard placement problems with thousands of machines and thousands of applications. This scalability is crucial for dynamic resource provisioning in large-scale enterprise data centers. Our algorithm allows multiple applications to share a single machine, and strivesto maximize the total satisfied application demand, to minimize the number of application starts and stops, and to balance the load across machines. Compared with existing state-of-the-art algorithms, for systems with 100 machines or less, our algorithm is up to 134 times faster, reduces application starts and stops by up to 97%, and produces placement solutions that satisfy up to 25% more application demands. Our algorithm has been implemented and adopted in a leading commercial middleware product for managing the performance of Web applications.

AUTHOR AND PUBLICATION: C. Adam and R. Stadler, “SERVICE MIDDLEWARE FOR SELF-MANAGING LARGE-SCALE SYSTEMS,” IEEE Trans. Netw. Serv. Manage., vol. 4, no. 3, pp. 50–64, Dec. 2007.

EXPLANATION:

Resource management poses particular challenges in large-scale systems, such as server clusters that simultaneously process requests from a large number of clients. A resource management scheme for such systems must scale both in the in the number of cluster nodes and the number of applications the cluster supports. Current solutions do not exhibit both of these properties at the same time. Many are centralized, which limits their scalability in terms of the number of nodes, or they are decentralized but rely on replicated directories, which also reduces their ability to scale. In this paper, we propose novel solutions to request routing and application placement- two key mechanisms in a scalable resource management scheme.

Our solution to request routing is based on selective update propagation, which ensures that the control load on a cluster node is independent of the system size. Application placement is approached in a decentralized manner, by using a distributed algorithm that maximizes resource utilization and allows for service differentiation under overload. The paper demonstrates how the above solutions can be integrated into an overall design for a peer-to-peer management middleware that exhibits properties of self-organization. Through complexity analysis and simulation, we show to which extent the system design is scalable. We have built a prototype using accepted technologies and have evaluated it using a standard benchmark. The testbed measurements show that the implementation, within the parameter range tested, operates efficiently, quickly adapts to a changing environment and allows for effective service differentiation by a system administrator.

AUTHOR AND PUBLICATION: J. Famaey, W. D. Cock, T. Wauters, F. D. Turck, B. Dhoedt, and P. Demeester, “A LATENCY-AWARE ALGORITHM FOR DYNAMIC SERVICE PLACEMENT IN LARGE-SCALE OVERLAYS,” in Proc. IFIP/IEEE Int. Conf. Symp. Integrat. Netw. Manage. (IM’09), 2009, pp. 414–421.

EXPLANATION:

A generic and self-managing service hosting infrastructure, provides a means to offer a large variety of services to users across the Internet. Such an infrastructure provides mechanisms to automatically allocate resources to services, discover the location of these services, and route client requests to a suitable service instance. In this paper we propose a dynamic and latency-aware algorithm for assigning resources to services. Additionally, the proposed service hosting architecture and its protocols to support the service placement algorithm, are described in detail. Extensive simulations were performed to compare the solution of our latency-aware algorithm to the latency-unaware variant, in terms of system efficiency and scalability.

AUTHOR AND PUBLICATION: E. Caron, L. Rodero-Merino, F. Desprez, and A.Muresan, “AUTOSCALING, LOAD BALANCING AND MONITORING IN COMMERCIAL AND OPENSOURCE CLOUDS,” INRIA, Rapport de recherche RR-7857, Feb. 2012.

EXPLANATION:

Now – a – days because of increased use of internet the associate resource are increasing rapidly resulting generation of high work load. To provide the reliable service to client with QOS the load balancing mechanism is necessary in cloud environment, to prevent system from overloading and crash an autoscaling mechanism must also be provided according to the application and incoming user traffic. load balancing mechanism provides the distribution of load among one or more nodes of cloud system, for efficient service model autoscaling feature also enabled with the load balancer to handle the excess load. Auto scaling scaled-up and scaled down the platform dynamically according to the clients incoming traffic this save money and physical resources. Latency based routing is the new concept in cloud computing which provide the load balancing based on DNS latency to global client by mapping domain name system (DNS) through the different hosted zone .provide the load balancing based on geographical service region. To achieve above mentioned we use the public cloud services such as amazons’EC2. ELB. This research is divided in four part such as: i) load balancing ii) auto scaling iii)latency based routing iv) resource monitoring . while discussing each topic in detailed we will implement the individual service and test while providing load from external software tool putty we will produce result for efficient load balancing.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Existing algorithms for the online and offline class-constrained bin packing problem is motivated by applications in the data-placement problem to video-on-demand servers and applications in the cutting and packing area. For the online problem we provide lower bounds for any bounded space algorithm and we also present an algorithm for the unbounded version with approximation factor low value.

For the offline problem we present practical approximation algorithms for two special cases of the problem, with conditions already considered in the literature: when all items have the same size and the parameterized version of the problem. We also perform several tests with these practical algorithms. For the instances we considered representing practical ones, the algorithms optimal solutions an CCBP for the special case where the number of different classes of the input instance is bounded by a constant.

Therefore, in order to solve our problem, we modified the CCBP model to support the “Minimize the placement change frequency” goal and provide a new enhanced semionline approximation algorithm to solve it in the next section. Note that the equations above are just a formal presentation of the goals and constraints of our problem.

2.1.1 DISADVANTAGES:

Automatic scaling problem in the cloud environment, and model it as a modified Class Constrained Bin Packing (CCBP) problem where each server is a bin and each class represents an application.

In the traditional bin packing problem, a series of items of different sizes need to be packed into a minimum number of bins. The class constrained version of this problem divides the items into classes or colors.

Each bin has capacity v and can accommodate items from at most c distinct classes. It is “class constrained” because the class diversity of items packed into the same bin is constrained. The goal is to pack the items into a minimum number of bins.

Existing algorithm is the support for item departure which is essential to maintaining not performance in a cloud computing environment where the resource demands of Internet applications can vary dynamically.

2.2 PROPOSED SYSTEM:

We develop an efficient semi-online color set algorithm that achieves good demand satisfaction ratio and saves energy by reducing the number of servers used each class of items with a color and organize them into color sets as they arrive in the input sequence. The number of distinct colors in a color set is at most c (i.e., the maximum number of distinct classes in a bin). This ensures that items in a color set can always be packed into the same bin without violating the class constraint. The packing is still subject to the capacity constraint of the bin. All color sets contain exactly c colors except the last one which may contain fewer colors. Items from different color sets are packed independently.

A greedy algorithm is used to pack items within each color set: the items are packed into the current bin until the capacity is reached. Then the next bin is opened for packing. Thus each color set has at most one unfilled (i.e., non-full) bin. Note that a full bin may contain fewer than c colors. When a new item from a specific color set arrives, it is packed into the corresponding unfilled bin. If all bins of that color set are full, then a new bin is opened to accommodate the item. The load increase of an application is modeled as the arrival of items with the corresponding color. A naive algorithm is to always pack the item into the unfilled bin if there is one. If the unfilled bin does not contain that color already, then a new color is added into the bin.

We allocate the new colors to the unfilled sets first using the following add_new_colors procedure.

Procedure add_new_colors:

Sort the list of unfilled color sets in descending order of their cardinality. Use a greedy algorithm to add the new colors into those sets according to their positions in the list.

If we run out of the new colors before filling up all but the last unfilled sets, use the consolidate_unfilled_sets procedure below to consolidate the remaining unfilled sets until there is only one left.

If there are still new colors left after filling up all unfilled sets in the system, we partition the remaining new colors into additional color sets using a greedy algorithm.

The consolidate_unfilled_sets procedure below consolidates unfilled sets in the system until there is only one left.

Procedure consolidate_unfilled_sets:

Sort the list of unfilled color sets in descending order of their cardinality Use the last set in the list (with the fewest colors) to fill the first set in the list (with the most colors) through the fill procedure below. Remove the resulting full set or empty set from the list.

2.2.1 ADVANTAGES:

We support green computing by adjusting the placement of application instances adaptively and putting idle machines into the standby mode. Experiments and simulations show that our algorithm is highly efficient and scalable which can achieve high demand satisfaction ratio, low placement change frequency, short request response time, and good energy saving.

We build a real cloud computing system which supports our auto scaling algorithm. We compare the performance of our system with an open source implementation of the internet auto scaling system in a testbed of 30 Dell PowerEdge blade servers.

Experiments show that our system can restore the normal QoS five times as fast when a flash crowd happens. We use a fast restart technique based on virtual machine (VM) suspend and resume that reduces the application start up time dramatically for Internet services.

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

2.3.2 SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP or Win7
  • Front End                                :           JAVA JDK 1.7
  • Back End                                :           MYSQL Server
  • Server                                      :           Apache Tomact Server
  • Script                                       :           JSP Script
  • Document                               :           MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

  • The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system
  • The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  • DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  • DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data’s in the physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

  1. All processes must have at least one data flow in and one data flow out.
  2. All processes should modify the incoming data, producing new forms of outgoing data.
  3. Each data store must be involved with at least one data flow.
  4. Each external entity must be involved with at least one data flow.
  5. A data flow must be attached to at least one process.


3.1 ARCHITECTURE DIAGRAM


3.2 DATAFLOW DIAGRAM

ADMIN:

USER:

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:

ADMIN:

USER:

3.3 CLASS DIAGRAM:


3.4 SEQUENCE DIAGRAM:

ADMIN:


USER:

3.5 ACTIVITY DIAGRAM:

ADMIN:


USER:

CHAPTER 4

4.0 IMPLEMENTATION:

4.1 ALGORITHM:

Our algorithm belongs to the family of color set algorithms with significant modification to adapt to our problem. A detailed comparison with the existing algorithm is deferred to Section 7 so that sufficient background can be established. We label each class of items with a color and organize them into color sets as they arrive in the input sequence. The number of distinct colors in a color set is at most c (i.e., the maximum number of distinct classes in a bin). This ensures that items in a color set can always be packed into the same bin without violating the class constraint. The packing is still subject to the capacity constraint of the bin. All color sets contain exactly c colors except the last one which may contain fewer colors.

Items from different color sets are packed independently. A greedy algorithm is used to pack items within each color set: the items are packed into the current bin until the capacity is reached. Then the next bin is opened for packing. Thus each color set has at most one unfilled (i.e., non-full) bin. Note that a full bin may contain fewer than c colors. When a new item from a specific color set arrives, it is packed into the corresponding unfilled bin. If all bins of that color set are full, then a new bin is opened to accommodate the item.

Our basic idea is to fill up the unfilled sets (except the last one) while minimizing its impact on the existing color assignment. We first check if there are any pending requests to add new colors into the system. If there are, we allocate the new colors to the unfilled sets first using the following add_new_colors procedure.

Procedure add_new_colors:

Sort the list of unfilled color sets in descending order of their cardinality. Use a greedy algorithm to add the new colors into those sets according to their positions in the list. If we run out of the new colors before filling up all but the last unfilled sets, use the consolidate_unfilled_sets procedure below to consolidate the remaining unfilled sets until there is only one left.

If there are still new colors left after filling up all unfilled sets in the system, we partition the remaining new colors into additional color sets using a greedy algorithm. The consolidate_unfilled_sets procedure below consolidates unfilled sets in the system until there is only one left.

Procedure consolidate_unfilled_sets:

Sort the list of unfilled color sets in descending order of their cardinality Use the last set in the list (with the fewest colors) to fill the first set in the list (with the most colors) through the fill procedure below.

Remove the resulting full set or empty set from the list.

Repeat the previous step until there is only one unfilled set left in the list. The fill( , ) procedure below uses the colors in set to fill the set . Procedure fill( , ):

Sort the list of colors in in ascending order of their numbers of items.

Addthe first color in the list (with the fewest items) into. Use “item departure” operation in and “item arrival” operation in to move all items of that color from to . Then remove that color from the list. Repeat the above step until either becomes empty or becomes full.

4.2 MODULES:

USER MODULES:

LOCAL NODE MANAGER (LNM):

APPLICATION LOAD INCREASE:

APPLICATION LOAD DECREASE:

AUTOMATIC SCALING RESOURCES:

APPROXIMATION RATIO:

4.3 MODULE DESCRIPTION:

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company.  For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are 

  • ECONOMICAL FEASIBILITY
  • TECHNICAL FEASIBILITY
  • SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:     

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

 

5.1.2 TECHNICAL FEASIBILITY   

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.  

5.1.3 SOCIAL FEASIBILITY:  

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months.

This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

Description Expected result
Test for application window properties. All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program  testing  checks  for  two  types  of  errors:  syntax  and  logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

5.1.2 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem.  The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

Description Expected result
Test for all modules. All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.


5.1. 3 NON-FUNCTIONAL TESTING:

 The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

  • Load testing
  • Performance testing
  • Usability testing
  • Reliability testing
  • Security testing

5.1.4 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Description Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.


5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

Description Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.   Should handle large input values, and produce accurate result in a  expected time.  


5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

Description Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of  the server an alternate server should take over the job.


5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

  Description Expected result
Checking that the user identification is authenticated. In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.


5.1.8 WHITE BOX TESTING:

White  box  testing,  sometimes called  glass-box  testing is  a test  case  design method  that  uses  the  control  structure  of the procedural  design  to  derive  test  cases. Using  white  box  testing  method,  the software  engineer  can  derive  test  cases. The White box testing focuses on the inner structure of the software structure to be tested.

Description Expected result
Exercise all logical decisions on their true and false sides. All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds. All the loops must be finite.
Exercise internal data structures to ensure their validity. All the data structures must be valid.


5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software.  That  is,  black  testing  enables  the software engineer  to  derive  sets  of  input  conditions  that  will  fully  exercise  all  functional requirements  for  a  program.  Black box testing is not alternative to white box techniques.  Rather  it  is  a  complementary  approach  that  is  likely  to  uncover  a different  class  of  errors  than  white box  methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

Description Expected result
To check for incorrect or missing functions. All the functions must be valid.
To check for interface errors. The entire interface must function normally.
To check for errors in a data structures or external data base access. The database updation and retrieval must be done.
To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

6.0 SOFTWARE DESCRIPTION:

 

6.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

 

The Java Programming Language

 

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

  • Simple
    • Architecture neutral
    • Object oriented
    • Portable
    • Distributed     
    • High performance
    • Interpreted     
    • Multithreaded
    • Robust
    • Dynamic
    • Secure     

With most programming languages, you either compile or interpret a program so that you can run it on your computer. The Java programming language is unusual in that a program is both compiled and interpreted. With the compiler, first you translate a program into an intermediate language called Java byte codes —the platform-independent codes interpreted by the interpreter on the Java platform. The interpreter parses and runs each Java byte code instruction on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web browser that can run applets, is an implementation of the Java VM. Java byte codes help make “write once, run anywhere” possible. You can compile your program into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the same program written in the Java programming language can run on Windows 2000, a Solaris workstation, or on an iMac.

6.2 THE JAVA PLATFORM:

A platform is the hardware or software environment in which a program runs. We’ve already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that it’s a software-only platform that runs on top of other hardware-based platforms.

The Java platform has two components:

  • The Java Virtual Machine (Java VM)
  • The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The next section, What Can Java Technology Do? Highlights what functionality some of the packages in the Java API provide.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware platform. As a platform-independent environment, the Java platform can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can bring performance close to that of native code without threatening portability.

6.3 WHAT CAN JAVA TECHNOLOGY DO?

The most common types of programs written in the Java programming language are applets and applications. If you’ve surfed the Web, you’re probably already familiar with applets. An applet is a program that adheres to certain conventions that allow it to run within a Java-enabled browser.

However, the Java programming language is not just for writing cute, entertaining applets for the Web. The general-purpose, high-level Java programming language is also a powerful software platform. Using the generous API, you can write many types of programs.

An application is a standalone program that runs directly on the Java platform. A special kind of application known as a server serves and supports clients on a network. Examples of servers are Web servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet.

A servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of applications. Instead of working in browsers, though, servlets run within Java Web servers, configuring or tailoring the server.

How does the API support all these kinds of programs? It does so with packages of software components that provides a wide range of functionality. Every full implementation of the Java platform gives you the following features:

  • The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
  • Applets: The set of conventions used by applets.
  • Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
  • Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
  • Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
  • Software components: Known as JavaBeansTM, can plug into existing component architectures.
  • Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
  • Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

 

6.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

We can’t promise you fame, fortune, or even a job if you learn the Java programming language. Still, it is likely to make your programs better and requires less effort than other languages. We believe that Java technology will help you do the following:

  • Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
  • Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
  • Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
  • Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
  • Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure JavaTM Product Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
  • Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
  • Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

 

6.5 ODBC:

 

Microsoft Open Database Connectivity (ODBC) is a standard programming interface for application developers and database systems providers. Before ODBC became a de facto standard for Windows programs to interface with database systems, programmers had to use proprietary languages for each database they wanted to connect to. Now, ODBC has made the choice of the database system almost irrelevant from a coding perspective, which is as it should be. Application developers have much more important things to worry about than the syntax that is needed to port their program from one database to another when business needs suddenly change.

Through the ODBC Administrator in Control Panel, you can specify the particular database that is associated with a data source that an ODBC application program is written to use. Think of an ODBC data source as a door with a name on it. Each door will lead you to a particular database. For example, the data source named Sales Figures might be a SQL Server database, whereas the Accounts Payable data source could refer to an Access database. The physical database referred to by a data source can reside anywhere on the LAN.

The ODBC system files are not installed on your system by Windows 95. Rather, they are installed when you setup a separate database application, such as SQL Server Client or Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this program and each maintains a separate list of ODBC data sources.

From a programming perspective, the beauty of ODBC is that the application can be written to use the same set of function calls to interface with any data source, regardless of the database vendor. The source code of the application doesn’t change whether it talks to Oracle or SQL Server. We only mention these two as an example. There are ODBC drivers available for several dozen popular database systems. Even Excel spreadsheets and plain text files can be turned into data sources. The operating system uses the Registry information written by ODBC Administrator to determine which low-level ODBC drivers are needed to talk to the data source (such as the interface to Oracle or SQL Server). The loading of the ODBC drivers is transparent to the ODBC application program. In a client/server environment, the ODBC API even handles many of the network issues for the application programmer.

The advantages of this scheme are so numerous that you are probably thinking there must be some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to the native database interface. ODBC has had many detractors make the charge that it is too slow. Microsoft has always claimed that the critical factor in performance is the quality of the driver software that is used. In our humble opinion, this is true. The availability of good ODBC drivers has improved a great deal recently. And anyway, the criticism about performance is somewhat analogous to those who said that compilers would never match the speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the opportunity to write cleaner programs, which means you finish sooner. Meanwhile, computers get faster every year.

6.6 JDBC:

In an effort to set an independent database standard API for Java; Sun Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access mechanism that provides a consistent interface to a variety of RDBMSs. This consistent interface is achieved through the use of “plug-in” database connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he or she must provide the driver for each platform that the database and Java run on.

To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you discovered earlier in this chapter, ODBC has widespread support on a variety of platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster than developing a completely new connectivity solution.

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

The remainder of this section will cover enough information about JDBC for you to know what it is about and how to use it effectively. This is by no means a complete overview of JDBC. That would fill an entire book.

 

6.7 JDBC Goals:

Few software packages are designed without goals in mind. JDBC is one that, because of its many goals, drove the development of the API. These goals, in conjunction with early reviewer feedback, have finalized the JDBC class library into a solid framework for building database applications in Java.

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although not the lowest database interface level possible, it is at a low enough level for higher-level tools and APIs to be created. Conversely, it is at a high enough level for application programmers to use it confidently. Attaining this goal allows for future tool vendors to “generate” JDBC code and to hide many of JDBC’s complexities from the end user.

SQL Conformance

SQL syntax varies as you move from database vendor to database vendor. In an effort to support a wide variety of vendors, JDBC will allow any query statement to be passed through it to the underlying database driver. This allows the connectivity module to handle non-standard functionality in a manner that is suitable for its users.

JDBC must be implemental on top of common database interfaces

The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal allows JDBC to use existing ODBC level drivers by the use of a software interface. This interface would translate JDBC calls to ODBC and vice versa.

  1. Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

  • Keep it simple

This goal probably appears in all software design goal listings. JDBC is no exception. Sun felt that the design of JDBC should be very simple, allowing for only one method of completing a task per mechanism. Allowing duplicate functionality only serves to confuse the users of the API.

  • Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

  • Keep the common cases simple

Because more often than not, the usual SQL calls used by the programmer are simple SELECT’s, INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to perform with JDBC. However, more complex SQL statements should also be possible.

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple                                     Architecture-neutral

Object-oriented                       Portable

Distributed                              High-performance

Interpreted                              Multithreaded

Robust                                     Dynamic Secure

Java is also unusual in that each Java program is both compiled and interpreted. With a compile you translate a Java program into an intermediate language called Java byte codes the platform-independent code instruction is passed and run on the computer.

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

6.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

The IP layer provides a connectionless and unreliable delivery system. It considers each datagram independently of the others. Any association between datagram must be supplied by the higher layers. The IP layer supplies a checksum that includes its own header. The header includes the source and destination addresses. The IP layer handles routing through an Internet. It is also responsible for breaking up large datagram into smaller ones for transmission and reassembling them at the other end.

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

A service exists on a host, and is identified by its port. This is a 16 bit number. To send a message to a server, you send it to the port for that service of the host that it is running on. This is not location transparency! Certain of these ports are “well known”.

Sockets:

A socket is a data structure maintained by the system to handle network connections. A socket is created using the call socket. It returns an integer that is like a file descriptor. In fact, under Windows, this handle can be used with Read File and Write File functions.

#include <sys/types.h>
#include <sys/socket.h>
int socket(int family, int type, int protocol);

Here “family” will be AF_INET for IP communications, protocol will be zero, and type will depend on whether TCP or UDP is used. Two processes wishing to communicate over a network create a socket each. These are similar to two ends of a pipe – but the actual pipe does not yet exist.

6.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

 

6.8.1. Map Visualizations:

Charts showing values that relate to geographical areas. Some examples include: (a) population density in each state of the United States, (b) income per capita for each country in Europe, (c) life expectancy in each country of the world. The tasks in this project include: Sourcing freely redistributable vector outlines for the countries of the world, states/provinces in particular countries (USA in particular, but also other areas);

Creating an appropriate dataset interface (plus default implementation), a rendered, and integrating this with the existing XYPlot class in JFreeChart; Testing, documenting, testing some more, documenting some more.

6.8.2. Time Series Chart Interactivity

Implement a new (to JFreeChart) feature for interactive time series charts — to display a separate control that shows a small version of ALL the time series data, with a sliding “view” rectangle that allows you to select the subset of the time series data to display in the main chart.

6.8.3. Dashboards

There is currently a lot of interest in dashboard displays. Create a flexible dashboard mechanism that supports a subset of JFreeChart chart types (dials, pies, thermometers, bars, and lines/time series) that can be delivered easily via both Java Web Start and an applet.

 

6.8.4. Property Editors

The property editor mechanism in JFreeChart only handles a small subset of the properties that can be set for charts. Extend (or reimplement) this mechanism to provide greater end-user control over the appearance of the charts.

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION

We presented the design and implementation of a system that can scale up and down the number of application instances automatically based on demand. We developed a color set algorithm to decide the application placement and the load distribution. Our system achieves high satisfaction ratio of application demand even when the load is very high. It saves energy by reducing the number of running instances when the load is low.

There are several directions for future work. Some cloud service providers may provide multiple levels of services to their customers. When the resources become tight, they may want to give their premium customers a higher demand satisfaction ratio than other customers. In the future, we plan to extend our system to support differentiated services but also consider fairness when allocating the resources across the applications. We mentioned in the paper that we can divide multiple generations of hardware in a data center into “equivalence classes” and run our algorithm within each class.

Our future work is to develop an efficient algorithm to distribute incoming requests among the set of equivalence classes and to balance the load across those server clusters adaptively. As analyzed in the paper, CCBP works well when the aggregate load of applications in a color set is high. Another direction for future work is to extend the algorithm to pack applications with complementary bottleneck resources together, e.g., to co-locate a CPU intensive application with a memory intensive one so that different dimensions of server resources can be adequately utilized.

CHAPTER 9

9.1 REFERENCES

  1. C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici, “A scalable application placement controller for enterprise data centers,” in Proc. Int. World Wide Web Conf. (WWW’07), May 2007, pp. 331–340.
  • C. Adam and R. Stadler, “Service middleware for self-managing large-scale systems,” IEEE Trans. Netw. Serv. Manage., vol. 4, no. 3, pp. 50–64, Dec. 2007.
  • J. Famaey, W. D. Cock, T. Wauters, F. D. Turck, B. Dhoedt, and P. Demeester, “A latency-aware algorithm for dynamic service placement in large-scale overlays,” in Proc. IFIP/IEEE Int. Conf. Symp. Integrat. Netw. Manage. (IM’09), 2009, pp. 414–421.
  • E. Caron, L. Rodero-Merino, F. Desprez, and A.Muresan, “Autoscaling, load balancing and monitoring in commercial and opensource clouds,” INRIA, Rapport de recherche RR-7857, Feb. 2012.
  • A. Karve, T. Kimbrel, G. Pacifici, M. Spreitzer, M. Steinder, M. Sviridenko, and A. Tantawi, “Dynamic placement for clustered web applications,” in Proc. Int. World Wide Web Conf. (WWW’06), May 2006, pp. 595–604.
  • D. Magenheimer, “Transcendent memory: A new approach to managingRAMin a virtualized environment,” in Proc. Linux Symp., 2009, pp. 191–200.
  • E. C. Xavier and F. K. Miyazawa, “The class constrained bin packing problem with applications to video-on-demand,” Theor. Comput.Sci., vol. 393, no. 1–3, pp. 240–259, 2008.

A Study on False Channel Condition Reporting Attacks in Wireless Networks

A STUDY ON FALSE CHANNEL CONDITION REPORTING ATTACKS IN WIRELESS NETWORKS

By

A

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the                                                  FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

Of

MASTER OF TECHNOLOGY

IN

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

BONAFIDE CERTIFICATE

Certified that this project report titled “A STUDY ON FALSE CHANNEL CONDITION REPORTING ATTACKS IN WIRELESS NETWORKSis the bonafide work of Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide                                                                             Signature of the H.O.D

Name                                                                                                           Name

CHAPTER 1

  1. ABSTRACT:

Wireless networking protocols are increasingly being designed to exploit a user’s measured channel condition; we call such protocols channel-aware. Each user reports the measured channel condition to a manager of wireless resources and a channel-aware protocol uses these reports to determine how resources are allocated to users. In a channel-aware protocol, each user’s reported channel condition affects the performance of every other user. The deployment of channel-aware protocols increases the risks posed by false channel-condition feedback.

We study what happens in the presence of an attacker that falsely reports its channel condition. We perform case studies on channel-aware network protocols to understand how an attack can use false feedback and how much the attack can affect network performance. The results of the case studies show that we need a secure channel condition estimation algorithm to fundamentally defend against the channel-condition misreporting attack. We design such an algorithm and evaluate our algorithm through analysis and simulation. Our evaluation quantifies the effect of our algorithm on system performance as well as the security and the performance of our algorithm.

  1. INTRODUCTION

Many protocols in modern wireless networks treat a link’s channel condition information as a protocol input parameter; we call such protocols channel-aware. Examples include cooperative relaying network architectures, efficient ad hoc network routing metrics, and opportunistic schedulers. While work on channel-aware protocols has mainly focused on how channel condition information can be used to more efficiently utilize wireless resources, security aspects of channel-aware protocols have only recently been studied. These works on security of channel-aware protocols revealed new threats in specific network environments by simulation or measurement.

However, understanding the effect of possible attacks across varied network environments is still an open area for study. In particular, we consider the effect of a user equipment’s reporting false channel condition. This issue is partially addressed in the work of Racic et al.  in a limited network setting. They consider a particular scheduler in a cellular network with handover process and propose a secure handover algorithm. In contrast, we reveal the possible effects of false channel condition reporting in various channel-aware network protocols and propose a primitive defense mechanism that provides secure channel condition estimation.

Our contributions are:

 • We analyze specific attack mechanisms and evaluate the effects of misreporting channel condition on various channel-aware wireless network protocols including cooperative relaying protocols, routing metrics in wireless ad-hoc network and opportunistic schedulers.

• We propose a secure channel condition estimation algorithm that can be used to construct a secure channel-aware protocol in single-hop settings.

• We analyze our algorithm in the respects of performance and security, and we perform a simulation study to understand the impact of our algorithm on system performance.

The false channel condition reporting attack that we introduce in this paper is difficult to identify by existing mechanisms, since our attack is mostly protocol compliant; only the channel-condition measurement mechanism need to be modified. Our attack can thus be performed using modified user equipment legitimately registered to a network.

1.3 LITRATURE SURVEY

EXPLOITING AND DEFENDING OPPORTUNISTIC SCHEDULING IN CELLULAR DATA NETWORKS

PUBLICATION: R. Racic, D. Ma, H. Chen, and X. Liu, IEEE Trans. Mobile Comput., vol. 9, no. 5, pp. 609–620, May 2010.

Third Generation (3G) cellular networks take advantage of time-varying and location-dependent channel conditions of mobile users to provide broadband services. Under fairness and QoS constraints, they use opportunistic scheduling to efficiently utilize the available spectrum. Opportunistic scheduling algorithms rely on the collaboration among all mobile users to achieve their design objectives. However, we demonstrate that rogue cellular devices can exploit vulnerabilities in popular opportunistic scheduling algorithms, such as Proportional Fair (PF) and Temporal Fair (TF), to usurp the majority of time slots in 3G networks. Our simulations show that under realistic conditions, only five rogue device per 50-user cell can capture up to 95 percent of the time slots, and can cause 2-second end-to-end interpacket transmission delay on VoIP applications for every user in the same cell, rendering VoIP applications useless. To defend against this attack, we propose strengthening the PF and TF schedulers and a robust handoff scheme.

ON THE VULNERABILITY OF THE PROPORTIONAL FAIRNESS SCHEDULER TO RETRANSMISSION ATTACKS

PUBLICATION: U. Ben-Porat, A. Bremler-Barr, H. Levy, and B. Plattner, in Proc. IEEE INFOCOM, Shanghai, China, Apr. 2011, pp. 1431–1439.

Channel aware schedulers of modern wireless networks – such as the popular Proportional Fairness Scheduler (PFS) – improve throughput performance by exploiting channel fluctuations while maintaining fairness among the users. In order to simplify the analysis, PFS was introduced and vastly investigated in a model where frame losses do not occur, which is of course not the case in practical wireless networks. Recent studies focused on the efficiency of various implementations of PFS in a realistic model where frame losses can occur. In this work we show that the common straight forward adaptation of PFS to frame losses exposes the system to a malicious attack (which can alternatively be caused by malfunctioning user equipment) that can drastically degrade the performance of innocent users. We analyze the factors behind the vulnerability of the system and propose a modification of PFS designed for the frame loss model which is resilient to such malicious attack while maintaining the fairness properties of original PFS.

A MEASUREMENT STUDY OF SCHEDULER-BASED ATTACKS IN 3G WIRELESS NETWORKS

PUBLICATION: S. Bali, S. Machiraju, H. Zang, and V. Frost, in Proc. PAM, Berlin, Germany, 2007.

Though high-speed (3G) wide-area wireless networks have been rapidly proliferating, little is known about the robustness and security properties of these networks. In this paper, we make initial steps towards understanding these properties by studying Proportional Fair (PF), the scheduling algorithm used on the downlinks of these networks. We find that the fairness-ensuring mechanism of PF can be easily corrupted by a malicious user to monopolize the wireless channel thereby starving other users. Using extensive experiments on commercial and laboratory-based CDMA networks, we demonstrate this vulnerability and quantify the resulting performance impact. We find that delay jitter can be increased by up to 1 second and TCP throughput can be reduced by as much as 25−30 % by a single malicious user. Based on our results, we argue for the need to use a more robust scheduling algorithm and outline one such algorithm. 1

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Many protocols in modern wireless networks treat a link’s channel condition information as a protocol input parameter; we call such protocols channel-aware. Examples include cooperative relaying network architectures, efficient ad hoc network routing metrics, and opportunistic schedulers. While work on channel-aware protocols has mainly focused on how channel condition information can be used to more efficiently utilize wireless resources, security aspects of channel-aware protocols have only recently been studied. These works on security of channel-aware protocols revealed new threats in specific network environments by simulation or measurement. However, under-standing the effect of possible attacks across varied network environments is still an open area for study.

2.1.1 DISADVANTAGES:

  • Difficult to guarantee QoS in MANETs due to their unique features including user mobility, channel variance errors, and limited bandwidth.
  • Although these protocols can increase the QoS of the MANETs to a certain extent, they suffer from invalid reservation and race condition problems.


2.2 PROPOSED SYSTEM:

We introduce our attack concept and perform case studies to quantize the attack effects on specific channel-aware network protocols. Depending on deployed PHY-layer technologies (e.g. OFDM), a system can utilize conditions for subchannels to perform more efficient frequency-selective scheduling. Our work can apply for this case by handling each subchannel condition information separately. However, for clarity of presentation, we consider a single channel between network participants in this paper.

We can easily implement false channel condition reporting attack by modifying only a subcomponent to report channel condition. This subcomponent of user equipment can be implemented in hardware or software. One recent trend of user equipment implementation is to increasingly move hardware part to software part for adaptable configuration of a general hardware. The increasing software control of user equipment makes false channel condition reporting attack an increasingly practical attack.

2.2.1 ADVANTAGES:

  • We analyze specific attack mechanisms and evaluate the effects of misreporting channel condition on various channel-aware wireless network protocols including cooperative relaying protocols, routing metrics in wireless ad-hoc network and opportunistic schedulers.
  • We propose a secure channel condition estimation algorithm that can be used to construct a secure channel-aware protocol in single-hop settings.
  • We analyze our algorithm in the respects of performance and security, and we perform a simulation study to understand the impact of our algorithm on system performance.


2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

2.3.2 SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP or Win 7
  • Front End                                :           Java JDK 1.7
  • Back End                                :           MS-ACCESS
  • Tools                                       :           Netbeans IDE 7
  • Document                               :           MS-Office 2007

CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

  • The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system
  • The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  • DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  • DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

 

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

 

PROCESS:

People, procedures or devices that produce data. The physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

  1. All processes must have at least one data flow in and one data flow out.
  2. All processes should modify the incoming data, producing new forms of outgoing data.
  3. Each data store must be involved with at least one data flow.
  4. Each external entity must be involved with at least one data flow.
  5. A data flow must be attached to at least one process.


3.1 ARCHITECTURE DIAGRAM:


3.2 DATAFLOW DIAGRAM:


UML DIAGRAMS:

3.2 USE CASE DIAGRAM:


3.3 CLASS DIAGRAM:

3.4 SEQUENCE DIAGRAM:


3.5 ACTIVITY DIAGRAM:

 

CHAPTER 4

4.0 IMPLEMENTATION:

We performed a simulation study to evaluate the overclaiming attack’s effect on the normal users’ performance in a cooperative relaying environment. We quantifyWe use the ns-2 simulator patched with EURANE UMTS system simulator. Our simulated network consists of one base station serving four users. The base station sends 11Mbps of Constant Bit Rate (CBR) traffic to each user. There is one attacker who may falsely report its channel condition to the base station. We represent channel condition using the Channel Quality Indicator (CQI) defined in the 3GPP standard.

The same equation and block size are used for a case study of opportunistic scheduler presented in Section 2.2.3. We assume that one victim is close to the attacker so that when the victim experiences poor channel condition, the victim can use the attacker as a relaying node. The other two normal users get packets directly from the base station and do not participate in the cooperative relaying protocol. The victim and the two normal users honestly report their channel condition. The channel for the attacker and two normal users uses a shadowing plus Rayleigh model of a moving node 100m away from the base station with velocity of 3km/h. We vary the victim’s distance to the base station from 100m to 500m to see the effect of the victim’s channel condition on the performance degradation.

The attacker’s goal in this simulation is to reduce the victim’s throughput. The attacker can adopt two approaches. In the conservative approach, the attacker does not forward packets for the victim without falsely reporting its channel condition. In the aggressive approach, the attacker overclaims its channel condition so that the attacker can increase its probability of relaying packets for the victim.

Our simulations do not consider the overhead that an actual relaying protocol might incur in finding a new relaying node due to channel condition variation since such overhead is not related to the effect of attack. We assume that each transmission uses an orthogonal carrier so that transmissions do not interfere with each other.

Our simulations do not implement a relay discovery protocol; rather, we compare the attacker and victim CQI, and use the link with better CQI value to transmit to the victim. This relaying scheme is an example; a system operator may choose a different scheme. However, the scheme that we chose is good for exploiting increased diversity to optimize throughput. We ran each of our simulations for 100 simulated seconds.

4.1 ALGORITHM

In this section, we evaluate the performance and the security of our algorithm. Firstly, we analyze the performance of our algorithm according to algorithm parameters. This analysis can be used for parameter design guidelines. This analysis result is compared to simulation results.   Secondly, we analyze the security of our algorithm. In this analysis, we show how much a brute-force attacker can be successful in guessing the value included in a challenge according to algorithm parameters. This analysis can be used for understanding tradeoff between security and system overhead. Thirdly, we integrate our algorithm into a network simulator and evaluate the effect of our algorithm on the system performance. We show that our algorithm securely and effectively estimates channel condition through most of its parameter space.

We used identical parameters, except we replaced the static channel with various variable channel models. We measure the throughput of normal users under scheduling policies of MAX-SINR, PF and MAX-SINR with our algorithm. In MAX-SINR with our algorithm, a base station does not use reported CQI-level to determine a user with the best channel condition in a give time slot. Instead, the base station uses CQI-level estimated by our algorithm. In the case study of opportunistic scheduler, our observation was that PF scheduler prevented attackers from stealing throughput. Hence, we concluded that PF was a good candidate for defending against false channel condition reporting attack. However, our simulation results for the system performance show that MAX-SINR with our algorithm can achieve higher throughput than PF scheduler in most cases. The fact that the performance of our algorithm depends on channel characteristic affects the throughput of normal users in case of MAX-SINR with our algorithm.

4.2 MODULE DESCRIPTION:

SERVER CLIENT MODULE:

NETWORK SECURITY:

  • ATTACK MODEL:

COOPERATIVE RELAYING:

EFFICIENT ROUTING METRICS:

PERFORMANCE ANALYSIS

4.3 MODULES DESCRIPTION:

SERVER CLIENT MODULE:

Client-server computing or networking is a distributed application architecture that partitions tasks or workloads between service providers (servers) and service requesters, called clients. Often clients and servers operate over a computer network on separate hardware. A server machine is a high-performance host that is running one or more server programs which share its resources with clients. A client also shares any of its resources; Clients therefore initiate communication sessions with servers which await (listen to) incoming requests.

NETWORK SECURITY:

Network-accessible resources may be deployed in a network as surveillance and early-warning tools, as the detection of attackers are not normally accessed for legitimate purposes. Techniques used by the attackers that attempt to compromise these decoy resources are studied during and after an attack to keep an eye on new exploitation techniques. Such analysis may be used to further tighten security of the actual network being protected by the data’s. Data forwarding can also direct an attacker’s attention away from legitimate servers. A user encourages attackers to spend their time and energy on the decoy server while distracting their attention from the data on the real server. Similar to a server, a user is a network set up with intentional vulnerabilities. Its purpose is also to invite attacks so that the attacker’s methods can be studied and that information can be used to increase network security.

ATTACK MODEL:

We introduce our attack concept and perform case studies to quantize the attack effects on specific channel-aware network protocols. We evaluate the effect of falsely reported channel condition under three types of channel-aware protocols: cooperative relaying protocols in hybrid networks, efficient routing metrics in wireless ad hoc networks, and opportunistic schedulers in high-speed wireless networks. For each protocol, we suggest possible attack mechanisms and quantify the effectiveness of the attack. We show that we can defend against some attacks using existing algorithms, and that other attacks require new security mechanisms. Each following case study has the same presentation format. First, we briefly explain the protocols. Then, we discuss effective attack scenarios for each protocol. Finally, we use simulations to evaluate the effect of each attack scenario.

COOPERATIVE RELAYING:

In a mobile wireless network, mobile nodes can experience different channel conditions depending on their different locations. When a node experiences a channel condition that is too poor to receive packets from a source node, a third node may have a good channel condition to both the source and the intended destination. Cooperative relaying network architectures help a node that has poor channel condition to route its packet through a node with a good channel condition, thus improving system throughput.

A cooperative relaying protocol must distribute channel condition information for each candidate path, find the most appropriate relay path, and provide incentives to motivate nodes to forward packets for other nodes. Specifically, in UCAN user equipment has two wireless adaptors, one High Data Rate (HDR) cellular interface and one IEEE 802.11 interface. The HDR interface is used for communication with a base station and the IEEE 802.11 interface is used for peer-to-peer communication with other user equipment in a network.

EFFICIENT ROUTING METRICS:

Routing Protocols in Ad Hoc Networks a wireless ad hoc network supports communication between nodes without need for centralized infrastructure such as base stations or access points. To deliver packets to destinations out of a source node’s transmission range, the source employs the help of intermediate nodes to forward each packet to its destination. Routing protocols in wireless ad hoc network discover routes between nodes. When there are multiple valid routes from a source to a destination, a routing protocol needs to choose among valid routes. A routing metric is a value associated to a route and represents the desirability of a route. A typical metric in the seminal routing protocols is minimum hop count. The rationale behind the metric of minimum hop count is that a route with fewer hops allows a packet to be delivered with the smaller number of transmissions.

PERFORMANCE ANALYSIS:

Our analysis assumes that the channel condition does not change. Though this assumption does not hold in a mobile environment, the purpose of our analysis is not to capture every detail of real world but to verify our simulator. For the evaluation in a realistic environment, we perform simulations with channel models considering variable channel conditions, as described in that the challenge size and Psref (i) are the same for different challenges for easy comparison of performance. The equations in our analysis do not assume the same values of challenge size and Psref (i). However, with different values of challenge size and Psref (i), it is not easy to understand the parameters’ effect on the performance. In this analysis, we assume that the challenges are authenticated.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company.  For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are 

  • ECONOMICAL FEASIBILITY
  • TECHNICAL FEASIBILITY
  • SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:     

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

 

5.1.2 TECHNICAL FEASIBILITY   

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.  

5.1.3 SOCIAL FEASIBILITY:  

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months.

This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

Description Expected result
Test for application window properties. All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program  testing  checks  for  two  types  of  errors:  syntax  and  logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

5.1.2 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem.  The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

Description Expected result
Test for all modules. All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.


5.1. 3 NON-FUNCTIONAL TESTING:

 The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

  • Load testing
  • Performance testing
  • Usability testing
  • Reliability testing
  • Security testing

5.1.4 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Description Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.


5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

Description Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.   Should handle large input values, and produce accurate result in a  expected time.  


5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

Description Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of  the server an alternate server should take over the job.


5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

  Description Expected result
Checking that the user identification is authenticated. In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.


5.1.8 WHITE BOX TESTING:

White  box  testing,  sometimes called  glass-box  testing is  a test  case  design method  that  uses  the  control  structure  of the procedural  design  to  derive  test  cases. Using  white  box  testing  method,  the software  engineer  can  derive  test  cases. The White box testing focuses on the inner structure of the software structure to be tested.

Description Expected result
Exercise all logical decisions on their true and false sides. All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds. All the loops must be finite.
Exercise internal data structures to ensure their validity. All the data structures must be valid.


5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software.  That  is,  black  testing  enables  the software engineer  to  derive  sets  of  input  conditions  that  will  fully  exercise  all  functional requirements  for  a  program.  Black box testing is not alternative to white box techniques.  Rather  it  is  a  complementary  approach  that  is  likely  to  uncover  a different  class  of  errors  than  white box  methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

Description Expected result
To check for incorrect or missing functions. All the functions must be valid.
To check for interface errors. The entire interface must function normally.
To check for errors in a data structures or external data base access. The database updation and retrieval must be done.
To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

6.0 SOFTWARE DESCRIPTION:

 

6.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

 

The Java Programming Language

 

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

  • Simple
    • Architecture neutral
    • Object oriented
    • Portable
    • Distributed     
    • High performance
    • Interpreted     
    • Multithreaded
    • Robust
    • Dynamic
    • Secure     

With most programming languages, you either compile or interpret a program so that you can run it on your computer. The Java programming language is unusual in that a program is both compiled and interpreted. With the compiler, first you translate a program into an intermediate language called Java byte codes —the platform-independent codes interpreted by the interpreter on the Java platform. The interpreter parses and runs each Java byte code instruction on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web browser that can run applets, is an implementation of the Java VM. Java byte codes help make “write once, run anywhere” possible. You can compile your program into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the same program written in the Java programming language can run on Windows 2000, a Solaris workstation, or on an iMac.

6.2 THE JAVA PLATFORM:

A platform is the hardware or software environment in which a program runs. We’ve already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that it’s a software-only platform that runs on top of other hardware-based platforms.

The Java platform has two components:

  • The Java Virtual Machine (Java VM)
  • The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The next section, What Can Java Technology Do? Highlights what functionality some of the packages in the Java API provide.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware platform. As a platform-independent environment, the Java platform can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can bring performance close to that of native code without threatening portability.

6.3 WHAT CAN JAVA TECHNOLOGY DO?

The most common types of programs written in the Java programming language are applets and applications. If you’ve surfed the Web, you’re probably already familiar with applets. An applet is a program that adheres to certain conventions that allow it to run within a Java-enabled browser.

However, the Java programming language is not just for writing cute, entertaining applets for the Web. The general-purpose, high-level Java programming language is also a powerful software platform. Using the generous API, you can write many types of programs.

An application is a standalone program that runs directly on the Java platform. A special kind of application known as a server serves and supports clients on a network. Examples of servers are Web servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet.

A servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of applications. Instead of working in browsers, though, servlets run within Java Web servers, configuring or tailoring the server.

How does the API support all these kinds of programs? It does so with packages of software components that provides a wide range of functionality. Every full implementation of the Java platform gives you the following features:

  • The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
  • Applets: The set of conventions used by applets.
  • Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
  • Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
  • Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
  • Software components: Known as JavaBeansTM, can plug into existing component architectures.
  • Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
  • Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

 

6.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

We can’t promise you fame, fortune, or even a job if you learn the Java programming language. Still, it is likely to make your programs better and requires less effort than other languages. We believe that Java technology will help you do the following:

  • Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
  • Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
  • Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
  • Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
  • Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure JavaTM Product Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
  • Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
  • Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

 

6.5 ODBC:

 

Microsoft Open Database Connectivity (ODBC) is a standard programming interface for application developers and database systems providers. Before ODBC became a de facto standard for Windows programs to interface with database systems, programmers had to use proprietary languages for each database they wanted to connect to. Now, ODBC has made the choice of the database system almost irrelevant from a coding perspective, which is as it should be. Application developers have much more important things to worry about than the syntax that is needed to port their program from one database to another when business needs suddenly change.

Through the ODBC Administrator in Control Panel, you can specify the particular database that is associated with a data source that an ODBC application program is written to use. Think of an ODBC data source as a door with a name on it. Each door will lead you to a particular database. For example, the data source named Sales Figures might be a SQL Server database, whereas the Accounts Payable data source could refer to an Access database. The physical database referred to by a data source can reside anywhere on the LAN.

The ODBC system files are not installed on your system by Windows 95. Rather, they are installed when you setup a separate database application, such as SQL Server Client or Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this program and each maintains a separate list of ODBC data sources.

From a programming perspective, the beauty of ODBC is that the application can be written to use the same set of function calls to interface with any data source, regardless of the database vendor. The source code of the application doesn’t change whether it talks to Oracle or SQL Server. We only mention these two as an example. There are ODBC drivers available for several dozen popular database systems. Even Excel spreadsheets and plain text files can be turned into data sources. The operating system uses the Registry information written by ODBC Administrator to determine which low-level ODBC drivers are needed to talk to the data source (such as the interface to Oracle or SQL Server). The loading of the ODBC drivers is transparent to the ODBC application program. In a client/server environment, the ODBC API even handles many of the network issues for the application programmer.

The advantages of this scheme are so numerous that you are probably thinking there must be some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to the native database interface. ODBC has had many detractors make the charge that it is too slow. Microsoft has always claimed that the critical factor in performance is the quality of the driver software that is used. In our humble opinion, this is true. The availability of good ODBC drivers has improved a great deal recently. And anyway, the criticism about performance is somewhat analogous to those who said that compilers would never match the speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the opportunity to write cleaner programs, which means you finish sooner. Meanwhile, computers get faster every year.

6.6 JDBC:

In an effort to set an independent database standard API for Java; Sun Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access mechanism that provides a consistent interface to a variety of RDBMSs. This consistent interface is achieved through the use of “plug-in” database connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he or she must provide the driver for each platform that the database and Java run on.

To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you discovered earlier in this chapter, ODBC has widespread support on a variety of platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster than developing a completely new connectivity solution.

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

The remainder of this section will cover enough information about JDBC for you to know what it is about and how to use it effectively. This is by no means a complete overview of JDBC. That would fill an entire book.

 

6.7 JDBC Goals:

Few software packages are designed without goals in mind. JDBC is one that, because of its many goals, drove the development of the API. These goals, in conjunction with early reviewer feedback, have finalized the JDBC class library into a solid framework for building database applications in Java.

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although not the lowest database interface level possible, it is at a low enough level for higher-level tools and APIs to be created. Conversely, it is at a high enough level for application programmers to use it confidently. Attaining this goal allows for future tool vendors to “generate” JDBC code and to hide many of JDBC’s complexities from the end user.

SQL Conformance

SQL syntax varies as you move from database vendor to database vendor. In an effort to support a wide variety of vendors, JDBC will allow any query statement to be passed through it to the underlying database driver. This allows the connectivity module to handle non-standard functionality in a manner that is suitable for its users.

JDBC must be implemental on top of common database interfaces

The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal allows JDBC to use existing ODBC level drivers by the use of a software interface. This interface would translate JDBC calls to ODBC and vice versa.

  1. Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

  • Keep it simple

This goal probably appears in all software design goal listings. JDBC is no exception. Sun felt that the design of JDBC should be very simple, allowing for only one method of completing a task per mechanism. Allowing duplicate functionality only serves to confuse the users of the API.

  • Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

  • Keep the common cases simple

Because more often than not, the usual SQL calls used by the programmer are simple SELECT’s, INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to perform with JDBC. However, more complex SQL statements should also be possible.

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple                                     Architecture-neutral

Object-oriented                       Portable

Distributed                              High-performance

Interpreted                              Multithreaded

Robust                                     Dynamic Secure

Java is also unusual in that each Java program is both compiled and interpreted. With a compile you translate a Java program into an intermediate language called Java byte codes the platform-independent code instruction is passed and run on the computer.

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

6.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

The IP layer provides a connectionless and unreliable delivery system. It considers each datagram independently of the others. Any association between datagram must be supplied by the higher layers. The IP layer supplies a checksum that includes its own header. The header includes the source and destination addresses. The IP layer handles routing through an Internet. It is also responsible for breaking up large datagram into smaller ones for transmission and reassembling them at the other end.

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

A service exists on a host, and is identified by its port. This is a 16 bit number. To send a message to a server, you send it to the port for that service of the host that it is running on. This is not location transparency! Certain of these ports are “well known”.

Sockets:

A socket is a data structure maintained by the system to handle network connections. A socket is created using the call socket. It returns an integer that is like a file descriptor. In fact, under Windows, this handle can be used with Read File and Write File functions.

#include <sys/types.h>
#include <sys/socket.h>
int socket(int family, int type, int protocol);

Here “family” will be AF_INET for IP communications, protocol will be zero, and type will depend on whether TCP or UDP is used. Two processes wishing to communicate over a network create a socket each. These are similar to two ends of a pipe – but the actual pipe does not yet exist.

6.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

 

6.8.1. Map Visualizations:

Charts showing values that relate to geographical areas. Some examples include: (a) population density in each state of the United States, (b) income per capita for each country in Europe, (c) life expectancy in each country of the world. The tasks in this project include: Sourcing freely redistributable vector outlines for the countries of the world, states/provinces in particular countries (USA in particular, but also other areas);

Creating an appropriate dataset interface (plus default implementation), a rendered, and integrating this with the existing XYPlot class in JFreeChart; Testing, documenting, testing some more, documenting some more.

6.8.2. Time Series Chart Interactivity

Implement a new (to JFreeChart) feature for interactive time series charts — to display a separate control that shows a small version of ALL the time series data, with a sliding “view” rectangle that allows you to select the subset of the time series data to display in the main chart.

6.8.3. Dashboards

There is currently a lot of interest in dashboard displays. Create a flexible dashboard mechanism that supports a subset of JFreeChart chart types (dials, pies, thermometers, bars, and lines/time series) that can be delivered easily via both Java Web Start and an applet.

 

6.8.4. Property Editors

The property editor mechanism in JFreeChart only handles a small subset of the properties that can be set for charts. Extend (or reimplement) this mechanism to provide greater end-user control over the appearance of the charts.

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION

In this paper, we have studied the threat imposed by falsely reporting users’ channel condition. Through case studies for three different types of wireless network protocols, we show that in a cooperative relaying network and a network using ETX, a false reporting attack can significantly reduce the performance of other users. Our false channel-feedback attack can arise in any channel-aware protocol where a user reports its own channel condition. To counter such attacks, we propose a secure channel condition estimation algorithm to prevent the overclaiming attack. Through analysis and simulations, we show that with proper parameters, we can prevent the over claiming attack.

CHAPTER 9

9.1 REFERENCES

  1. Dongho Kim and Yih-Chun Hu, “A Study on False Channel Condition Reporting Attacks in Wireless Networks”, IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 13, NO. 5, MAY 2014.
  • H. Luo, R. Ramjee, P. Sinha, L. E. Li, and S. Lu, “Ucan: A unified cellular and ad-hoc network architecture,” in Proc. ACM MobiCom, San Diego, CA, USA, 2003, pp. 353–367.
  • D. S. J. De Couto, D. Aguayo, J. Bicket, and R. Morris, “A highthroughput path metric for multi-hop wireless routing,” in Proc. ACM MobiCom, San Diego, CA, USA, 2003, pp. 134–146.
  • R. Draves, J. Padhye, and B. Zill, “Comparison of routing metrics for static multi-hop wireless networks,” in Proc. ACM SIGCOMM, Portland, OR, USA, 2004, pp. 133–144.
  • A. Jalali, R. Padovani, and R. Pankaj, “Data throughput of CDMA-HDR a high efficiency-high data rate personal communication wireless system,” in Proc. IEEE VTC, Tokyo, Japan, 2000, pp. 1854–1858.
  • P. Viswanath, D. N. C. Tse, and R. Laroia, “Opportunistic beamforming using dumb antennas,” IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1277–1294, Jun. 2002.
  • R. Racic, D. Ma, H. Chen, and X. Liu, “Exploiting and defending opportunistic scheduling in cellular data networks,” IEEE Trans. Mobile Comput., vol. 9, no. 5, pp. 609–620, May 2010.
  • U. Ben-Porat, A. Bremler-Barr, H. Levy, and B. Plattner, “On the vulnerability of the proportional fairness scheduler to retransmission attacks,” in Proc. IEEE INFOCOM, Shanghai, China, Apr. 2011, pp. 1431–1439.
  • S. Bali, S. Machiraju, H. Zang, and V. Frost, “A measurement study of scheduler-based attacks in 3G wireless networks,” in Proc. PAM, Berlin, Germany, 2007.

A QoS-Oriented Distributed Routing Protocol for Hybrid Wireless Networks

A QOS-ORIENTED DISTRIBUTED ROUTING PROTOCOL FOR HYBRID

WIRELESS NETWORKS

By

A

PROJECT REPORT

Submitted to the Department of Computer Science & Engineering in the                                                  FACULTY OF ENGINEERING & TECHNOLOGY

In partial fulfillment of the requirements for the award of the degree

Of

MASTER OF TECHNOLOGY

IN

COMPUTER SCIENCE & ENGINEERING

APRIL 2015

BONAFIDE CERTIFICATE

Certified that this project report titled A QOS-ORIENTED DISTRIBUTED ROUTING PROTOCOL FOR HYBRID WIRELESS NETWORKSis the bonafide work of                         Mr. _____________Who carried out the research under my supervision Certified further, that to the best of my knowledge the work reported herein does not form part of any other project report or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.

Signature of the Guide                                                                             Signature of the H.O.D

Name                                                                                                           Name

CHAPTER 1

ABSTRACT:

As wireless communication gains popularity, significant research has been devoted to supporting real-time transmission with stringent Quality of Service (QoS) requirements for wireless applications. At the same time, a wireless hybrid network that integrates a mobile wireless ad hoc network (MANET) and a wireless infrastructure network has been proven to be a better alternative for the next generation wireless networks. By directly adopting resource reservation-based QoS routing for MANETs, hybrids networks inherit invalid reservation and race condition problems in MANETs. How to guarantee the QoS in hybrid networks remains an open problem.

In this paper, we propose a QoS-Oriented Distributed routing protocol (QOD) to enhance the QoS support capability of hybrid networks. Taking advantage of fewer transmission hops and anycast transmission features of the hybrid networks, QOD transforms the packet routing problem to a resource scheduling problem.

QOD incorporates five algorithms:

1) A QoS-guaranteed neighbor selection algorithm to meet the transmission delay requirement,

2) A distributed packet scheduling algorithm to further reduce transmission delay,

3) A mobility-based segment resizing algorithm that adaptively adjusts segment size according to node mobility in order to reduce transmission time,

4) A traffic redundant elimination algorithm to increase the transmission throughput, and

5) A data redundancy elimination-based transmission algorithm to eliminate the redundant data to further improve the transmission QoS.

Analytical and simulation results based on the random way-point model and the real human mobility model show that QOD can provide high QoS performance in terms of overhead, transmission delay, mobility-resilience, and scalability.

1.2 INTRODUCTION

The rapid development of wireless networks has stimulated numerous wireless applications that have been used in wide areas such as commerce, emergency services, military, education, and entertainment. The number of WiFi capable mobile devices including laptops and handheld devices (e.g., smartphone and tablet PC) has been increasing rapidly. For example, the number of wireless Internet users has tripled world-wide in the last three years, and the number of smartphone users in US has increased from 92.8 million in 2011 to 121.4 million in 2012, and will reach around 207 million by 2017. Nowadays, people wish to watch videos, play games, watch TV, and make longdistanceconferencing via wireless mobile devices “on the go.” Therefore, video streaming applications such as Qik, Flixwagon, and FaceTime on the infrastructure wireless networks have received increasing attention recently. These applications use an infrastructure to directly connect mobile users for video watching or interaction in real time. The widespread use of wireless and mobile devices and the increasing demand for mobile multimedia streaming services are leading to a promising near future where wireless multimedia services (e.g., mobile gaming, online TV, and online conferences) are widely deployed.

The emergence and the envisioned future of real time andmultimedia applications have stimulated the need of high Quality of Service (QoS) support in wireless and mobile networking environments. The QoS support reduces endto- end transmission delay and enhances throughput to guarantee the seamless communication between mobile devices and wireless infrastructures.

At the same time, hybrid wireless networks (i.e., multihop cellular networks) have been proven to be a better network structure for the next generation wireless networks and can help to tackle the stringent end-to-end QoS requirements of different applications. Hybrid networks synergistically combine infrastructure networks and MANETs to leverage each other. Specifically, infrastructure networks improve the scalability of MANETs, while MANETs automatically establish self-organizing networks, extending the coverage of the infrastructure networks. In a vehicle opportunistic access network (an instance of hybrid networks), people in vehicles need to upload or download videos from remote Internet servers through access points (APs) (i.e., base stations) spreading out in a city.

 Since it is unlikely that the base stations cover the entire city to maintain sufficiently strong signal everywhere to support an application requiring high link rates, the vehicles themselves can form a MANET to extend the coverage of the base stations, providing continuous network connections. How to guarantee the QoS in hybrid wireless networks with high mobility and fluctuating bandwidth still remains an open question. In the infrastructure wireless networks, QoS provision (e.g., Intserv, RSVP ) has been proposed for QoS routing, which often requires node negotiation, admission control, resource reservation, and, priority scheduling of packets.

However, it is more difficult to guarantee QoS in MANETs due to their unique features including user mobility, channel variance errors, and limited bandwidth. Thus, attempts to directly adapt the QoS solutions for infrastructure networks to MANETs generally do not have great succes. Numerous reservation-based QoS routing protocols have been proposed for MANETs that create routes formed by nodes and links that reserve their resources to fulfill QoS requirements. Although these protocols can increase the QoS of the MANETs to a certain extent, they suffer from invalid reservation and race condition problems. Invalid reservation problem means that the reserved resources become useless if the data transmission path between a source node and a destination node breaks. Race condition problem means a double allocation of the same resource to two different QoS paths. However, little effort has been devoted to support QoS routing in hybrid networks. Most of the current works in hybrid networks focus on increasing network capacity or routing reliability but cannot provide QoS-guaranteed services. Direct adoption of the reservation-based QoS routing protocols of MANETs into hybrid networks inherits the invalid reservation and race condition problems.

In order to enhance the QoS support capability of hybrid networks, in this paper, we propose a QoS-Oriented Distributed routing protocol (QOD). Usually, a hybrid network has widespread base stations. The data transmission in hybrid networks has two features. First, an AP can be a source or a destination to any mobile node. Second, the number of transmission hops between a mobile node and an AP is small. The first feature allows a stream to have anycast transmission along multiple transmission paths to its destination through base stations, and the second feature enables a source node to connect to an AP through an intermediate node. Taking full advantage of the two features, QOD transforms the packet routing problem into a dynamic resource scheduling problem. Specifically, in QOD, if a source node is not within the transmission range of the AP, a source node selects nearby neighbors that can provide QoS services to forward its packets to base stations in a distributed manner.  The source node schedules the packet streams to neighbors based on their queuing condition, channel condition, and mobility, aiming toreduce transmission time and increase network capacity. The neighbors then forward packets to base stations, which further forward packets to the destination. In this paper, we focus on the neighbor node selection for QoS-guaranteed transmission. QOD is the first work for QoS routing in hybrid networks.

  1. LITRATURE SURVEY

QOS MULTICAST ROUTING BY USING MULTIPLE PATHS/TREES IN WIRELESSAD HOC NETWORKS

AUTHOR:  H. Wu and X. Jia

PUBLISH:  Ad Hoc Networks, vol. 5, pp. 600-612, 2009.

In this paper, we investigate the issues of QoS multicast routing in wireless ad hoc networks. Due to limited bandwidth of a wireless node, a QoS multicast call could often be blocked if there does not exist a single multicast tree that has the requested bandwidth, even though there is enough bandwidth in the system to support the call. In this paper, we propose a new multicast routing scheme by using multiple paths or multiple trees to meet the bandwidth requirement of a call. Three multicast routing strategies are studied, SPT (shortest path tree) based multiple-paths (SPTM), least cost tree based multiple-paths (LCTM) and multiple least cost trees (MLCT). The final routing tree(s) can meet the user’s QoS requirements such that the delay from the source to any destination node shall not exceed the required bound and the aggregate bandwidth of the paths or trees shall meet the bandwidth requirement of the call. Extensive simulations have been conducted to evaluate the performance of our three multicast routing strategies. The simulation results show that the new scheme improves the call success ratio and makes a better use of network resources.

QUALITY OF SERVICE PROVISIONING IN AD HOC WIRELESS NETWORKS: A SURVEY OF ISSUES AND SOLUTIONS

AUTHOR:  T. Reddy, I. Karthigeyan, B. Manoj, and C. Murthy

PULISH:  Ad Hoc Networks, vol. 4, no. 1, pp. 83-124, 2006.

An ad hoc wireless network (AWN) is a collection of mobile hosts forming a temporary network on the fly, without using any fixed infrastructure. Characteristics of AWNs such as lack of central coordination, mobility of hosts, dynamically varying network topology, and limited availability of resources make QoS provisioning very challenging in such networks. In this paper, we describe the issues and challenges in providing QoS for AWNs and review some of the QoS solutions proposed. We first provide a layer-wise classification of the existing QoS solutions, and then discuss each of these solutions.

QOS ROUTING BASED ON MULTI-CLASS NODES FOR MOBILE AD HOC NETWORKS

AUTHOR:  X. Du, Ad Hoc Networks

PUBLISH:  vol. 2, pp. 241-254, 2004.

Efficient routing is very important for Mobile Ad hoc Networks (MANETs). Most existing routing protocols  consider homogeneous ad hoc networks, in which all nodes are  identical, i.e., they have the same communication capabilities and  characteristics. Although a homogeneous network model is simple and easy to analyze, it misses important characteristics of  many realistic MANETs such as military battlefield networks. In  addition, a homogeneous ad hoc network suffers from poor  performance limits and scalability. In many ad hoc networks,  multiple types of nodes do co-exist; and some nodes have larger transmission power, higher transmission data rate, better  processing capability, and are more robust against bit errors and  congestion than other nodes. Hence, a heterogeneous network model is more realistic and provides many advantages (e.g., leading to more efficient routing protocol design). In this paper,

We present a new routing protocol called Multi-Class (MC) routing, which is specifically designed for heterogeneous MANETs. Moreover, we also design a new Medium Access Control (MAC) protocol for heterogeneous MANETs, which is  more efficient than IEEE 802.11b. Extensive simulation results  demonstrate that the MC routing has very good performance,  and outperforms a popular routing protocol — Zone Routing Protocol, in terms of reliability, scalability, route discovery latency, overhead, as well as packet delay and throughput.

PROVISIONING OF ADAPTABILITY TO VARIABLE TOPOLOGIES FOR ROUTING SCHEMES IN MANETS

AUTHOR:  S. Jiang, Y. Liu, Y. Jiang, and Q. Yin,

PUBLISH:   IEEE J. Selected Areas in Comm., vol. 22, no. 7, pp. 1347-1356, Sept. 2004.

Frequent changes in network topologies caused by mobility in mobile ad hoc networks (MANETs) impose great challenges to designing routing schemes for such networks. Various routing schemes each aiming at particular type of MANET (e.g., flat or clustered MANETs) with different mobility degrees (e.g., low, medium, and high mobility) have been proposed in the literature. However, since a mobile node should not be limited to operate in a particular MANET assumed by a routing scheme, an important issue is how to enable a mobile node to achieve routing performance as high as possible when it roams across different types of MANETs. To handle this issue, a quantity that can predict the link status for a time period in the future with the consideration of mobility is required. In this paper, we discuss such a quantity and investigate how well this quantity can be used by the link caching scheme in the dynamic source routing protocol to provide the adaptability to variable topologies caused by mobility through computer simulation in NS-2.

                                                                                                                                 CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1EXISTING SYSTEM:

Existing approaches for providing guaranteed services in the infrastructure networks are based on two models: integrated services (IntServ) and differentiated service (DiffServ) [42]. IntServ is a stateful model that uses resource reservation for individual flow, and uses admission control and a scheduler to maintain the QoS of traffic flows. In contrast, DiffServ is a stateless model which uses coarsegrained class-based mechanism for traffic management a number of queuing scheduling algorithms. Reservation-based QoS routing protocols have been proposed for MANETs that create routes formed by nodes and links that reserve their resources to fulfill QoS requirements although these protocols can increase the QoS of the MANETs to a certain extent.

2.2 DISADVANTAGES:

  • Cannot provide QoS-guaranteed services.
  • Suffer from invalid reservation and race condition problems .
  • Invalid reservation problem means that the reserved resources become useless and Race condition problem means a double allocation of the same resource to two different QoS paths.


PROPOSED SYSTEM:

We propose a QoS-Oriented Distributed routing protocol (QOD). Usually, a hybrid network has widespread base stations.

The data transmission in hybrid networks has two features.

First, an AP can be a source or a destination to any mobile node. Second, the number of transmission hops between a mobile node and an AP is small.  The first feature allows a stream to have anycast transmission along multiple transmission paths to its destination through base stations, and the second feature enables a source node to connect to an AP through an intermediate node. Taking full advantage of the two features, QOD transforms the packet routing problem into a dynamic resource scheduling problem. Specifically, in QOD, if a source node is not within the transmission range of the AP, a source node selects nearby neighbors that can provide QoS services to forward its packets to base stations in a distributed manner. The source node schedules the packet streams to neighbors based on their queuing condition, channel condition, and mobility, aiming to reduce transmission time and increase network capacity. The neighbors then forward packets to base stations, which further forward packets to the destination.

ADVANTAGES:

  • QoS-guaranteed neighbor selection algorithm. The algorithm selects qualified neighbors and employs deadline-driven scheduling mechanism to guarantee QoS routing.
  • Distributed packet scheduling algorithm. After qualified neighbors are identified, this algorithm schedules packet routing. It assigns earlier generated packets to forwarders with higher queuing delays, while assigns more recently generated packets to forwarders with lower queuing delays to reduce total transmission delay.
  • Mobility-based segment resizing algorithm. The source node adaptively resizes each packet in its packet stream for each neighbor node according to the neighbor’s mobility in order to increase the scheduling feasibility of the packets from the source node.
  • Soft-deadline based forwarding scheduling algorithm. In this algorithm, an intermediate node first forwards the packet with the least time allowed to wait before being forwarded out to achieve fairness in packet forwarding.
  • Data redundancy elimination based transmission. Due to the broadcasting feature of the wireless networks, the APs and mobile nodes can overhear and cache packets. This algorithm eliminates the redundant data to improve the QoS of the packet transmission.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP
  • Front End                                :           Java JDK 1.7
  • Document                               :           MS-Office 2007

CHAPTER 3

SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

  • The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system
  • The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  • DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  • DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data. The physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

  1. All processes must have at least one data flow in and one data flow out.
  2. All processes should modify the incoming data, producing new forms of outgoing data.
  3. Each data store must be involved with at least one data flow.
  4. Each external entity must be involved with at least one data flow.
  5. A data flow must be attached to at least one process.

3.1 SYSTEM ARCHITECTURE:


3.2 DATAFLOW DIAGRAM

SENSOR NODE:


MOBILE RELAY NODE:

SINK:


UML DIAGRAMS:

3.2 USE CASE DIAGRAM:


3.3 CLASS DIAGRAM:


3.4 SEQUENCE DIAGRAM:

3.5 ACTIVITY DIAGRAM:

3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

QOD ROUTING PROTOCOL:

Scheduling feasibility is the ability of a node to guarantee a packet to arrive at its destination within QoS requirements. As mentioned, when the QoS of the direct transmission between a source node and an AP cannot be guaranteed, the source node sends a request message to its neighbor nodes. After receiving a forward request from a source node, a neighbor node ni with space utility less than a threshold replies the source node. The reply message contains information about available resources for checking packet scheduling feasibility (Section 2.4), packet arrival interval Ta, transmission delay TI!D, and packet deadline Dp of the packets in each flow being forwarded by the neighbor for queuing delay estimation and distributed packet scheduling and the node’s mobility speed for determining packet size. Based on this information, the source node chooses the replied neighbors that can guarantee the delay QoS of packet transmission to APs.

The selected neighbor nodes periodically report their statuses to the source node, which ensures their scheduling feasibility and locally schedules the packet stream to them. The individual packets are forwarded to the neighbor nodes that are scheduling feasible in a round-robin fashion from a longer delayed node to a shorter delayed node, aiming to reduce the entire packet transmission delay. Algorithm 1 shows the pseudocode for the QOD routing protocol executed by each node. The QOD distributed routing algorithm is developed based on the assumption that the neighboring nodes in the network have different channel utilities and workloads using IEEE 802.11 protocol. Otherwise, there is no need for packet scheduling in routing, since all neighbors produce comparative delay for packet forwarding. Therefore, we analyze the difference in node channel utilities and workloads in a network with IEEE 802.11 protocol in order to see whether the assumption holds true in practice.

4.1 ALGORITHM

The packets travel from different APs, which may lead to different packet transmission delay, resulting in a jitter at the receiver side. The jitter problem can be solved by using token buckets mechanism at the destination APs to shape the traffic flows. This technique is orthogonal to our study in this paper and its details are beyond the scope of this paper.

4.2 MODULES:

HYBRID WIRELESS NETWORKS:

DISTRIBUTED PACKET SCHEDULING:

NEIGHBOR SELECTION ALGORITHM:

MOBILITY-BASED PACKET RESIZING:

4.3 MODULE DESCRIPTION:

HYBRID WIRELESS NETWORKS:

Hybrid wireless networks (i.e., multihop cellular networks) have been proven to be a better network structure for the next generation wireless networks and can help to tackle the stringent end-to end QoS requirements of different applications. Hybrid networks synergistically combine infrastructure networks and MANETs to leverage each other. Specifically, infrastructure networks improve the scalability of MANETs, while MANETs automatically establish self-organizing networks, extending the coverage of the infrastructure networks.

In a vehicle opportunistic access network (an instance of hybrid networks), people in vehicles need to upload or download videos from remote Internet servers through access points (APs) (i.e., base stations) spreading out in a city. Since it is unlikely that the base stations cover the entire city to maintain sufficiently strong signal everywhere to support an application requiring high link rates, the vehicles themselves can form a MANET to extend the coverage of the base stations, providing continuous network connections.

The QoS requirements mainly include end-to-end delay bound, which is essential for many applications with stringent real-time requirement. While throughput guarantee is also important, it is automatically guaranteed by bounding the transmission delay for a certain amount of packets. The source node conducts admission control to check whether there are enough resources to satisfy the requirements of QoS of the packet stream in the network model of a hybrid network. For example, when a source node n1 wants to upload files to an Internet server through APs, it can choose to send packets to the APs directly by itself or require its neighbor nodes n2, n3, or n4 to assist the packet transmission.

DISTRIBUTED PACKET SCHEDULING:

QoS of the packet transmission and how a source node assigns traffic to the intermediate nodes to ensure their scheduling feasibility in order to further reduce the stream transmission time, a distributed packet scheduling algorithm is proposed for packet routing. This algorithm assigns earlier generated packets to forwarders with higher queuing delays and scheduling feasibility, while assigns more recently generated packets to forwarders with lower queuing delays and scheduling feasibility, so that the transmission delay of an entire packet stream can be reduced.


NEIGHBOR SELECTION ALGORITHM:

Since short delay is the major real-time QoS requirement for traffic transmission, QOD incorporates the Earliest Deadline First scheduling algorithm (EDF), which is a deadline driven scheduling algorithm for data traffic scheduling in intermediate nodes. In this algorithm, an intermediate node assigns the highest priority to the packet with the closest deadline and forwards the packet with the highest priority first. Let us use SpðiÞ to denote the size of the packet steam from node ni, use Wi to denote the bandwidth of node i, and TaðiÞ to denote the packet arrival interval from node ni.


MOBILITY-BASED PACKET RESIZING:

In a highly dynamic mobile wireless network, the transmission link between two nodes is frequently broken down. The delay generated in the packet retransmission degrades the QoS of the transmission of a packet flow. On the other hand, a node in a highly dynamic network has higher probability to meet different mobile nodes and APs, which is beneficial to resource scheduling. As (2) shows, the space utility of an intermediate node that is used for forwarding a packet p is reducing packet size can increase the scheduling feasibility of an intermediate node and reduces packet dropping probability. However, we cannot make the size of the packet too small because it generates more packets to be transmitted, producing higher packet overhead. Based on this rationale and taking advantage of the benefits of node mobility, we propose a mobility-based packet resizing algorithm for QOD in this section. The basic idea is that the larger size packets are assigned to lower mobility intermediate nodes and smaller size packets are assigned to higher mobility intermediate nodes, which increases the QoS-guaranteed packet transmissions. Specifically, in QOD, as the mobility of a node increases, the size of a packet Sp sent from a node to its neighbor nodes i decreases as following:

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company.  For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are      

  • ECONOMICAL FEASIBILITY
  • TECHNICAL FEASIBILITY
  • SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:                  

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

 

5.1.2 TECHNICAL FEASIBILITY:

 This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.  

5.1.3 SOCIAL FEASIBILITY:  

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months. This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program  testing  checks  for  two  types  of  errors:  syntax  and  logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

UNIT TESTING:

Description Expected result
Test for application window properties. All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem.  The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

FUNCTIONAL TESTING:

Description Expected result
Test for all modules. All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

 The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

  • Load testing
  • Performance testing
  • Usability testing
  • Reliability testing
  • Security testing

5.1.5 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Load Testing

Description Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

PERFORMANCE TESTING:

Description Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.   Should handle large input values, and produce accurate result in a  expected time.  

5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

RELIABILITY TESTING:

Description Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of  the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

SECURITY TESTING:

  Description Expected result
Checking that the user identification is authenticated. In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

White  box  testing,  sometimes called  glass-box  testing is  a test  case  design method  that  uses  the  control  structure  of the procedural  design  to  derive  test  cases. Using  white  box  testing  method,  the software  engineer  can  derive  test  cases. The White box testing focuses on the inner structure of the software structure to be tested.

5.1.8 WHITE BOX TESTING:

Description Expected result
Exercise all logical decisions on their true and false sides. All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds. All the loops must be finite.
Exercise internal data structures to ensure their validity. All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software.  That  is,  black  testing  enables  the software engineer  to  derive  sets  of  input  conditions  that  will  fully  exercise  all  functional requirements  for  a  program.  Black box testing is not alternative to white box techniques.  Rather  it  is  a  complementary  approach  that  is  likely  to  uncover  a different  class  of  errors  than  white box  methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

5.1.10 BLACK BOX TESTING:

Description Expected result
To check for incorrect or missing functions. All the functions must be valid.
To check for interface errors. The entire interface must function normally.
To check for errors in a data structures or external data base access. The database updation and retrieval must be done.
To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

                                                                                                                                     CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

CONCLUSION:

We propose a QoS oriented distributed routing protocol (QOD) for hybrid networks to provide QoS services in a highly dynamic scenario. Taking advantage of the unique features of hybrid networks, i.e., anycast transmission and short transmission hops, QOD transforms the packet routing problem to a packet scheduling problem. In QOD, a source node directly transmits packets to an AP if the direct transmission can guarantee the QoS of the traffic. Otherwise, the source node schedules the packets to a number of qualified neighbor nodes.

Specifically, QOD incorporates five algorithms. The QoS-guaranteed neighbor selection algorithm chooses qualified neighbors for packet forwarding. The distributed packet scheduling algorithm schedules the packet transmission to further reduce the packet transmission time. The mobility-based packet resizing algorithm resizes packets and assigns smaller packets to nodes with faster mobility to guarantee the routing QoS in a highly mobile environment.

The traffic redundant elimination-based transmission algorithm can further increase the transmission throughput. The soft-deadline-based forwarding scheduling achieves fairness in packet forwarding scheduling when some packets are not scheduling feasible. Experimental results show that QOD can achieve high mobility-resilience, scalability, and contention reduction. In the future, we plan to evaluate the performance of QOD based on the real testbed.

CHAPTER 9

REFERENCES:

[1] H. Wu and X. Jia, “QoS Multicast Routing by Using Multiple Paths/Trees in Wireless Ad Hoc Networks,” Ad Hoc Networks, vol. 5, pp. 600-612, 2009.

[2] T. Reddy, I. Karthigeyan, B. Manoj, and C. Murthy, “Quality of Service Provisioning in Ad Hoc Wireless Networks: A Survey of Issues and Solutions,” Ad Hoc Networks, vol. 4, no. 1, pp. 83-124, 2006.

[3] X. Du, “QoS Routing Based on Multi-Class Nodes for Mobile Ad Hoc Networks,” Ad Hoc Networks, vol. 2, pp. 241-254, 2004.

[4] S. Jiang, Y. Liu, Y. Jiang, and Q. Yin, “Provisioning of Adaptability to Variable Topologies for Routing Schemes in MANETs,” IEEE J. Selected Areas in Comm., vol. 22, no. 7, pp. 1347-1356, Sept. 2004.

[5] M. Conti, E. Gregori, and G. Maselli, “Reliable and Efficient Forwarding in Ad Hoc Networks,” Ad Hoc Networks, vol. 4, pp. 398-415, 2006.

[6] G. Chakrabarti and S. Kulkarni, “Load Balancing and Resource Reservation in Mobile Ad Hoc Networks,” Ad Hoc Networks, vol. 4, pp. 186-203, 2006.

[7] Z. Shen and J.P. Thomas, “Security and QoS Self-Optimization in Mobile Ad Hoc Networks,” IEEE Trans. Mobile Computing, vol. 7, pp. 1138-1151, Sept. 2008.

[8] S. Ibrahim, K. Sadek, W. Su, and R. Liu, “Cooperative Communications with Relay-Selection: When to Cooperate and Whom to Cooperate With?” IEEE Trans. Wireless Comm., vol. 7, no. 7, pp. 2814-2827, July 2008.

A Denial of Service Attack to UMTS Networks Using SIM-Less Devices

A Cocktail Approach for Travel Package Recommendation

In this paper, propose a cocktail approach to generate the lists for personalized travel package recommendation. Furthermore, we extend the TAST model to the tourist-relation-area-season topic (TRAST) model for capturing the latent relationships among the tourists in each travel group. Finally, we evaluate the TAST model, the TRAST model, and the cocktail recommendation approach on the real-world travel package data.

TRAST model can be easily extended for computing relationships among many more tourists. However, the computation cost will also go up. To simplify the problem, in this paper, each time we only consider two tourists in a travel group as a tourist pair for mining their relationships. By this TRAST model, all the tourists’ travel preferences are represented by relationship distributions.

We can use their relationship distributions as features to cluster them, so as to put them into different travel groups. Thus, in this scenario, many clustering methods can be adopted. Since choosing clustering algorithm is beyond the scope of this paper, in the experiments, we refer to K-means one of the most popular clustering algorithms.

INTRODUCTION:

AS an emerging trend, more and more travel companies provide online services. However, the rapid growth of online travel information imposes an increasing challenge for tourists who have to choose from a large number of available travel packages for satisfying their personalized needs. Moreover, to increase the profit, the travel companies have to understand the preferences from different tourists and serve more attractive packages. Therefore, the demand for intelligent travel services is expected to increase dramatically. Since recommender systems have been successfully applied to enhance the quality of service in a number of fields, it is natural choice to provide travel package recommendations. Actually, recommendations for tourists have been studied before and to the best of our knowledge, the first operative tourism recommender system was introduced by Delgado and DavidsonDespite of the increasing interests in this field, the problem of leveraging unique features to distinguish personalized travel package recommendations from traditional recommender systems remains pretty open.

 Indeed, there are many technical and domain challenges inherent in designing and implementing an effective recommender system for personalized travel package recommendation. First, travel data are much fewer and sparser than traditional items, such as movies for recommendation, because the costs for a travel are much more expensive than for watching a movie. Second, every travel package consists of many landscapes (places of interest and attractions), and, thus, has intrinsic complex spatio-temporal relationships. For example, a travel package only includes the landscapes which are geographically colocated together. Also, different travel packages areusually developed for different travel seasons.

Therefore, the landscapes in a travel package usually have spatialtemporal autocorrelations. Third, traditional recommender systems usually rely on user explicit ratings.  However, for travel data, the user ratings are usually not conveniently available. Finally, the traditional items for recommendation usually have a long period of stable value, while the values of travel packages can easily depreciate over time and a package usually only lasts for a certain period of time. The travel companies need to actively create new tour packages to replace the old ones based on the interests of the tourists. To address these challenges, in our preliminary work we proposed a cocktail approach on personalized travel package recommendation. Specifically, we first analyze the key characteristics of the existing travel packages. Along this line, travel time and travel destinations are divided into different seasons and areas. Then, wedevelop a tourist-area-season topic (TAST) model, which can represent travel packages and tourists by different topic distributions. In the TAST model, the extraction of topics is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes. As a result, the TAST model can well represent the content of the travel packages and the interests of the tourists. Based on this TAST model, a cocktail approach is developed for personalized travel package recommendation by considering some additional factors including the seasonal behaviors of tourists, the prices of travel packages, and the cold start problem of new packages.

Finally, the experimental results on real-world travel data show that the TAST model can effectively capture the unique characteristics of travel data and the cocktail recommendation approach performs much better than traditional techniques. In this paper, we further study some related topic models of the TAST model, and explain the corresponding travel package recommendation strategies based on them. Also, we propose the tourist-relation-area-season topic (TRAST) model, which helps understand the reasons why tourists form a travel group. This goes beyond personalized package recommendations and is helpful for capturing the latent relationships among the tourists in each travel group.

In addition, we conduct systematic experiments on the realworld data. These experiments not only demonstrate that the TRAST model can be used as an assessment for travel group automatic formation but also provide more insights into the TAST model and the cocktail recommendation

approach. In summary, the contributions of the TAST model, the cocktail approaches, and the TRAST model for travel package recommendations are shown in Fig. 1, whereeach dashed rectangular box in the dashed circle identifies atravel group and the tourists in the same travel group are represented by the same icons.

LITRATURE SURVEY

COST-AWARE TRAVEL TOUR RECOMMENDATION

AUTHOR: Y. Ge et al.,

PUBLISH: Proc. 17th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (SIGKDD ’11), pp. 983-991, 2011.

Recent years have witnessed an increased interest in recommender systems. Despite significant progress in this field, there still remain numerous avenues to explore. Indeed, this paper provides a study of exploiting online travel information for personalized travel package recommendation. A critical challenge along this line is to address the unique characteristics of travel data, which distinguish travel packages from traditional items for recommendation. To that end, in this paper, we first analyze the characteristics of the existing travel packages and develop a tourist-area-season topic (TAST) model. This TAST model can represent travel packages and tourists by different topic distributions, where the topic extraction is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes. Then, based on this topic model representation, we propose a cocktail approach to generate the lists for personalized travel package recommendation. Furthermore, we extend the TAST model to the tourist-relation-area-season topic (TRAST) model for capturing the latent relationships among the tourists in each travel group. Finally, we evaluate the TAST model, the TRAST model, and the cocktail recommendation approach on the real-world travel package data. Experimental results show that the TAST model can effectively capture the unique characteristics of the travel data and the cocktail approach is, thus, much more effective than traditional recommendation techniques for travel package recommendation. Also, by considering tourist relationships, the TRAST model can be used as an effective assessment for travel group formation.

FLDA: MATRIX FACTORIZATION THROUGH LATENT DIRICHLET ALLOCATION

AUTHOR: D. Agarwal and B. Chen

PUBLISH: Proc. Third ACM Int’l Conf. Web Search and Data Mining (WSDM ’10), pp. 91-100, 2010.

We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a “bag-of-words” representation for item meta-data is natural. Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively. Because of data sparseness, regularization is key to good predictive accuracy. Our method works by regularizing both user and item factors simultaneously through user features and the bag of words associated with each item. Specifically, each word in an item is associated with a discrete latent factor often referred to as the topic of the word; item topics are obtained by averaging topics across all words in an item. Then, user rating on an item is modeled as user’s affinity to the item’s topics where user affinity to topics (user factors) and topic assignments to words in items (item factors) are learned jointly in a supervised fashion. To avoid overfitting, user and item factors are regularized through Gaussian linear regression and Latent Dirichlet Allocation (LDA) priors respectively. We show our model is accurate, interpretable and handles both cold-start and warm-start scenarios seamlessly through a single model. The efficacy of our method is illustrated on benchmark datasets and a new dataset from Yahoo! Buzz where fLDA provides superior predictive accuracy in cold-start scenarios and is comparable to state-of-the-art methods in warm-start scenarios. As a by-product, fLDA also identifies interesting topics that explains user-item interactions. Our method also generalizes a recently proposed technique called supervised LDA (sLDA) to collaborative filtering applications. While sLDA estimates item topic vectors in a supervised fashion for a single regression, fLDA incorporates multiple regressions (one for each user) in estimating the item factors.

MAP-BASED INTERACTION WITH A CONVERSATIONAL MOBILE RECOMMENDER SYSTEM,

AUTHOR: D. Agarwal and B. Chen

PUBLISH: Proc. Second Int’l Conf. Mobile Ubiquitous Computing, Systems, Services and Technologies (UBICOMM ’08), pp. 212-218, 2008.

Recommender systems are information search and decision support tools used when there is an overwhelming set of options to consider or when the user lacks the domain-specific knowledge necessary to take autonomous decisions. They provide users with personalized recommendations adapted to their needs and preferences in a particular usage context. In this paper, we present an approach for integrating recommendation and electronic map technologies to build a map-based conversational mobile recommender system that can effectively and intuitively support users in finding their desired products and services. The results of our real-user study show that integrating map-based visualization and interaction in mobile recommender systems improves the system recommendation effectiveness and increases the user satisfaction.

GENERATING COMPARATIVE DESCRIPTIONS OF PLACES OF INTEREST IN THE TOURISM DOMAIN

AUTHOR: B.D. Carolis, N. Novielli, V.L. Plantamura, and E. Gentile

PUBLISH: Proc. Third ACM Conf. Recommender Systems (RecSys ’09), pp. 277-280, 2009.

When visiting cities as tourists, most of the times people do not make very detailed plans and, when choosing where to go and what to seem they tend to select the area with the major number of interesting facilities. Therefore, it would be useful to support the user choice with contextual information presentation, information clustering and comparative explanations of places of potential interest in a given area. In this paper we illustrate how My Map, a mobile recommender system in the Tourism domain, generates comparative descriptions to support users in making decisions about what to see, among relevant objects of interest.

CHAPTER 2

EXISTING SYSTEM:

We first analyze the characteristics of the existing travel packages and develop a tourist-area-season topic (TAST) model. This TAST model can represent travel packages and tourists by different topic distributions, where the topic extraction is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes a tourist-area-season topic (TAST) model which can represent travel packages and tourists by different topic distributions. In the TAST model, the extraction of topics is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes.

As a result, the TAST model can well represent the content of the travel packages and the interests of the tourists. Based on this TAST model, a cocktail approach is developed for personalized travel package recommendation by considering some additional factors including the seasonal behaviors of tourists, the prices of travel packages, and the cold start problem of new packages. Finally, the experimental results on real-world travel data show that the TAST model can effectively capture the unique characteristics of travel data and the cocktail recommendation approach performs much better than traditional techniques.

DISADAVANTAGES:

  • The previous cocktail recommendation approach (Cocktail) is mainly based on the TAST model and the collaborative filtering method.
  • Indeed, another possible cocktail approach is the content-based cocktail, and in the following, we call this method TAST Content.
  • The main difference between TAST Content and Cocktail is that in TAST Content the content similarity between packages and tourists are used for ranking packages instead of using collaborative filtering interests of the tourists, thus it may also suffer from the overspecialization problem.


PROPOSED SYSTEM:

We propose the tourist-relation-area-season topic (TRAST) model, which helps understand the reasons why tourists form a travel group. This goes beyond personalized package recommendations and is helpful for capturing the latent relationships among the tourists in each travel group. In addition, we conduct systematic experiments on the real world data. These experiments demonstrate that the TRAST model can be used as an assessment for travel group automatic formation but also provide more insights into the TAST model and the cocktail recommendation approach.

Our contributions of the cocktail approaches, and the TRAST model for travel package recommendations are each dashed rectangular box in the dashed circle identifies a travel group and the tourists in the same travel group are represented in this TRAST model, all the tourists’ travel preferences are represented by relationship distributions. For a set of tourists, who want to travel the same package, we can use their relationship distributions as features to cluster them, so as to put them into different travel groups. Thus, in this scenario, many clustering methods can be adopted. Since choosing clustering algorithm is beyond the scope of this paper, in the experiments, we refer to K-means one of the most popular clustering algorithms.

ADVANTAGES:

  • The results of the season splitting and price segmentation,
  • The understanding of the extracted topics,
  • A recommendation performance comparison between Cocktail and benchmark methods,
  • The evaluation of the TRAST model, and
  • A brief discussion on recommendations for travel groups.

HARDWARE & SOFTWARE REQUIREMENTS:

HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

2.3.2 SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP
  • Front End                                :           JAVA JDK 1.7/JSP
  • Back End                                :           MYSQL Server

CHAPTER 3

3.0 SYSTEM DESIGN:

SYSTEM DESIGN

SYSTEM ARCHITECTURE:

ARCHITECTURE DIAGRAM:

CHAPTER 4

DATA FLOW DIAGRAM:

  1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of input data to the system, various processing carried out on this data, and the output data is generated by this system.
  2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  3. DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  4. DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling language in the field of object-oriented software engineering. The standard is managed, and was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of object oriented computer software. In its current form UML is comprised of two major components: a Meta-model and a notation. In the future, some form of method or process may also be added to; or associated with, UML.

          The Unified Modeling Language is a standard language for specifying, Visualization, Constructing and documenting the artifacts of software system, as well as for business modeling and other non-software systems.

The UML represents a collection of best engineering practices that have proven successful in the modeling of large and complex systems.

 The UML is a very important part of developing objects oriented software and the software development process. The UML uses mostly graphical notations to express the design of software projects.

GOALS:

          The Primary goals in the design of the UML are as follows:

  1. Provide users a ready-to-use, expressive visual modeling Language so that they can develop and exchange meaningful models.
  2. Provide extendibility and specialization mechanisms to extend the core concepts.
  3. Be independent of particular programming languages and development process.
  4. Provide a formal basis for understanding the modeling language.
  5. Encourage the growth of OO tools market.
  6. Support higher level development concepts such as collaborations, frameworks, patterns and components.
  7. Integrate best practices.

USE CASE DIAGRAM:

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical overview of the functionality provided by a system in terms of actors, their goals (represented as use cases), and any dependencies between those use cases. The main purpose of a use case diagram is to show what system functions are performed for which actor. Roles of the actors in the system can be depicted.

CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static structure diagram that describes the structure of a system by showing the system’s classes, their attributes, operations (or methods), and the relationships among the classes. It explains which class contains information.

SEQUENCE DIAGRAM:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams.

ACTIVITY DIAGRAM:

Activity diagrams are graphical representations of workflows of stepwise activities and actions with support for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams can be used to describe the business and operational step-by-step workflows of components in a system. An activity diagram shows the overall flow of control.

MODULES:

COCKTAIL APPROACH

TRAVEL PACKAGE

TRAST MODEL

RECOMMENDATIONS

MODULES DESCRIPTION:

COCKTAIL APPROACH:

Our cocktail approaches the specific preferences of the tourist while he/she is planning a trip, such as the transportation preference. In other words, we focus on designing the recommendation algorithm to attract the tourists before they make a travel decision rather than providing the travel support in the on-tour stage. Thus, our approach may be only useful in some situations (e.g., email marketing). Also, if we want to deploy this work for real-world services, tourist package and locating in the top of the recommendation list to a lack of interest of those packages, relevant (desirable) travel packages in the test set may be just a small fraction of the entire relevant ones that are actually of interest to each tourist.

TRAVEL PACKAGE:

We aim to make personalized travel package recommendations for the tourists. Thus, the users

are the tourists and the items are the existing packages, and we exploit a real-world travel data set provided by a travel company packages in a way that each tourist has traveled at least two different packages.

First, it is very sparse, and each tourist has only a few travel records. The extreme sparseness of the data leads to difficulties for using traditional recommendation techniques, such as collaborative filtering. For example, it is hard to find the credible nearest neighbors for the tourists because there are very few cotraveling packages.

Second, the travel data has strong time dependence. The travel packages often have a life cycle along with the change to the business demand, i.e., they only last for a certain period. In contrast, most of the landscapes will still be active after the original package has been discarded. These landscapes can be used to form new packages together with some other landscapes. Thus, we can observe that the landscapes are more sustainable and important than the package itself.

Third, landscape has some intrinsic features like the geographic location and the right travel seasons. Only the landscapes with similar spatial-temporal features are suitable for the same packages, i.e., the landscapes in one package have spatial-temporal autocorrelations and follow the first law of geography-everything is related to everything else, but the nearby things are more related than distant things.

Fourth, the tourists will consider both time and financial costs before they accept a package. This is quite different from the traditional recommendations where the cost of an item is usually not a concern. Thus, it is very important to profile the tourists based on their interests as well as the time and the money they can afford. Since the package with a higher price often tends to have more time and vice versa, in this paper we only take the price factor into consideration.

Fifth, people often travel with their friends, family, or colleagues. Even when two tourists in the same travel group are totally strangers, there must be some reasons for the travel company to put them together. For instance, they may be of the same age or have the same travel schedule. Hence, it is also very important to understand the relationships among the tourists in the same travel group. This understanding can help to form the travel group. Last but not least, few tourist ratings are available for travel packages. However, we can see that every choice of a travel package indicates the strong interest of the tourist in the content provided in the package.

TRAST MODEL:

TRAST model, all the tourists’ travel preferences are represented by relationship distributions. For a set of tourists, who want to travel the same package, we can use their relationship distributions as features to cluster them, so as to put them into different travel groups. Thus, in this scenario, many clustering methods can be adopted. Since choosing clustering algorithm is beyond the scope of this paper, in the experiments, we refer to K-means, one of the most popular clustering algorithms. Thus, the TRAST model can be used as an assessment for travel group automatic formation. Indeed, in real applications, when generating a travel group, some more external constraints, such as tourists’ travel date requirements, the travel company’s travel group schedule should also be considered by TRAST1 to represent the latent relationships directly.

TRAST model, the purchases of the tourists in each travel group are summed up as one single expense record and, thus, it has more complex generative process. We can understand this process by a simple example. Assume that two selected tourists in a travel group (U00d) are u1 and u2, who are young and dating with each other. Now, they decide to travel in winter (Sd) and the destination is North America (Ad). To generate a travel landscape (l), we first extract a relationship (r, e.g., lover), and then find a topic (t) for lovers to travel in the winter (e.g., skiing). Finally, based on this skiing topic and the selected travel area (e.g., Northeast America), we draw a landscape (e.g., Stowe, Vermont).

RECOMMENDATIONS:

The evaluations in previous sections are mainly focused on the individual (personalized) recommendations. Since there are tourists who frequently travel together, it is interesting to know whether the latent variables (e.g., the topics of each individual tourist and the relationships of a travel group) as well as the cocktail approaches are useful for making recommendations to a group of tourists. To this end, we performed an experimental study on group recommendations.

We adopt the widely used degree of agreement (DOA) and Top-K [23] as the evaluation metrics. Also, a simple user study was conducted and volunteers were invited to rate the recommendations. For comparison, we recorded the best performance of each algorithm by tuning their parameters, and we also set some general rules for fair comparison. For instance, for collaborative filtering-based methods, we usually consider the contribution of the nearest neighbors with similarity values larger than 0.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company.  For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are      

  • ECONOMICAL FEASIBILITY
  • TECHNICAL FEASIBILITY
  • SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:                  

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

5.1.2 TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.  

5.1.3 SOCIAL FEASIBILITY:  

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months. This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program  testing  checks  for  two  types  of  errors:  syntax  and  logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

UNIT TESTING:

Description Expected result
Test for application window properties. All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem.  The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

FUNCTIONAL TESTING:

Description Expected result
Test for all modules. All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

 The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

  • Load testing
  • Performance testing
  • Usability testing
  • Reliability testing
  • Security testing

5.1.5 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Load Testing

Description Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

PERFORMANCE TESTING:

Description Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.   Should handle large input values, and produce accurate result in a  expected time.  

5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

RELIABILITY TESTING:

Description Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of  the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

SECURITY TESTING:

  Description Expected result
Checking that the user identification is authenticated. In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

White  box  testing,  sometimes called  glass-box  testing is  a test  case  design method  that  uses  the  control  structure  of the procedural  design  to  derive  test  cases. Using  white  box  testing  method,  the software  engineer  can  derive  test  cases. The White box testing focuses on the inner structure of the software structure to be tested.

5.1.8 WHITE BOX TESTING:

Description Expected result
Exercise all logical decisions on their true and false sides. All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds. All the loops must be finite.
Exercise internal data structures to ensure their validity. All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software.  That  is,  black  testing  enables  the software engineer  to  derive  sets  of  input  conditions  that  will  fully  exercise  all  functional requirements  for  a  program.  Black box testing is not alternative to white box techniques.  Rather  it  is  a  complementary  approach  that  is  likely  to  uncover  a different  class  of  errors  than  white box  methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

5.1.10 BLACK BOX TESTING:

Description Expected result
To check for incorrect or missing functions. All the functions must be valid.
To check for interface errors. The entire interface must function normally.
To check for errors in a data structures or external data base access. The database updation and retrieval must be done.
To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 6

6.0 SOFTWARE DESCRIPTION:

 

6.1 JAVA TECHNOLOGY:

Java technology is both a programming language and a platform.

 

The Java Programming Language

 

The Java programming language is a high-level language that can be characterized by all of the following buzzwords:

  • Simple
    • Architecture neutral
    • Object oriented
    • Portable
    • Distributed     
    • High performance
    • Interpreted     
    • Multithreaded
    • Robust
    • Dynamic
    • Secure     

With most programming languages, you either compile or interpret a program so that you can run it on your computer. The Java programming language is unusual in that a program is both compiled and interpreted. With the compiler, first you translate a program into an intermediate language called Java byte codes —the platform-independent codes interpreted by the interpreter on the Java platform. The interpreter parses and runs each Java byte code instruction on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web browser that can run applets, is an implementation of the Java VM. Java byte codes help make “write once, run anywhere” possible. You can compile your program into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the same program written in the Java programming language can run on Windows 2000, a Solaris workstation, or on an iMac.

6.2 THE JAVA PLATFORM:

A platform is the hardware or software environment in which a program runs. We’ve already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that it’s a software-only platform that runs on top of other hardware-based platforms.

The Java platform has two components:

  • The Java Virtual Machine (Java VM)
  • The Java Application Programming Interface (Java API)

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The next section, What Can Java Technology Do? Highlights what functionality some of the packages in the Java API provide.

The following figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware platform. As a platform-independent environment, the Java platform can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can bring performance close to that of native code without threatening portability.

 

6.3 WHAT CAN JAVA TECHNOLOGY DO?

The most common types of programs written in the Java programming language are applets and applications. If you’ve surfed the Web, you’re probably already familiar with applets. An applet is a program that adheres to certain conventions that allow it to run within a Java-enabled browser.

However, the Java programming language is not just for writing cute, entertaining applets for the Web. The general-purpose, high-level Java programming language is also a powerful software platform. Using the generous API, you can write many types of programs.

An application is a standalone program that runs directly on the Java platform. A special kind of application known as a server serves and supports clients on a network. Examples of servers are Web servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet.

A servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of applications. Instead of working in browsers, though, servlets run within Java Web servers, configuring or tailoring the server.

How does the API support all these kinds of programs? It does so with packages of software components that provides a wide range of functionality. Every full implementation of the Java platform gives you the following features:

  • The essentials: Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on.
  • Applets: The set of conventions used by applets.
  • Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.
  • Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.
  • Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates.
  • Software components: Known as JavaBeansTM, can plug into existing component architectures.
  • Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI).
  • Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

7.4 HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?

We can’t promise you fame, fortune, or even a job if you learn the Java programming language. Still, it is likely to make your programs better and requires less effort than other languages. We believe that Java technology will help you do the following:

  • Get started quickly: Although the Java programming language is a powerful object-oriented language, it’s easy to learn, especially for programmers already familiar with C or C++.
  • Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.
  • Write better code: The Java programming language encourages good coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people’s tested code and introduce fewer bugs.
  • Develop programs more quickly: Your development time may be as much as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.
  • Avoid platform dependencies with 100% Pure Java: You can keep your program portable by avoiding the use of libraries written in other languages. The 100% Pure JavaTM Product Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.
  • Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-independent byte codes, they run consistently on any Java platform.
  • Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the feature of allowing new classes to be loaded “on the fly,” without recompiling the entire program.

6.5 ODBC:

 Microsoft Open Database Connectivity (ODBC) is a standard programming interface for application developers and database systems providers. Before ODBC became a de facto standard for Windows programs to interface with database systems, programmers had to use proprietary languages for each database they wanted to connect to. Now, ODBC has made the choice of the database system almost irrelevant from a coding perspective, which is as it should be. Application developers have much more important things to worry about than the syntax that is needed to port their program from one database to another when business needs suddenly change.

Through the ODBC Administrator in Control Panel, you can specify the particular database that is associated with a data source that an ODBC application program is written to use. Think of an ODBC data source as a door with a name on it. Each door will lead you to a particular database. For example, the data source named Sales Figures might be a SQL Server database, whereas the Accounts Payable data source could refer to an Access database. The physical database referred to by a data source can reside anywhere on the LAN.

The ODBC system files are not installed on your system by Windows 95. Rather, they are installed when you setup a separate database application, such as SQL Server Client or Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this program and each maintains a separate list of ODBC data sources.

From a programming perspective, the beauty of ODBC is that the application can be written to use the same set of function calls to interface with any data source, regardless of the database vendor. The source code of the application doesn’t change whether it talks to Oracle or SQL Server. We only mention these two as an example. There are ODBC drivers available for several dozen popular database systems. Even Excel spreadsheets and plain text files can be turned into data sources. The operating system uses the Registry information written by ODBC Administrator to determine which low-level ODBC drivers are needed to talk to the data source (such as the interface to Oracle or SQL Server). The loading of the ODBC drivers is transparent to the ODBC application program. In a client/server environment, the ODBC API even handles many of the network issues for the application programmer.

The advantages of this scheme are so numerous that you are probably thinking there must be some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to the native database interface. ODBC has had many detractors make the charge that it is too slow. Microsoft has always claimed that the critical factor in performance is the quality of the driver software that is used. In our humble opinion, this is true. The availability of good ODBC drivers has improved a great deal recently. And anyway, the criticism about performance is somewhat analogous to those who said that compilers would never match the speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the opportunity to write cleaner programs, which means you finish sooner. Meanwhile, computers get faster every year.

6.6 JDBC:

In an effort to set an independent database standard API for Java; Sun Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access mechanism that provides a consistent interface to a variety of RDBMSs. This consistent interface is achieved through the use of “plug-in” database connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he or she must provide the driver for each platform that the database and Java run on.

To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you discovered earlier in this chapter, ODBC has widespread support on a variety of platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster than developing a completely new connectivity solution.

JDBC was announced in March of 1996. It was released for a 90 day public review that ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.

The remainder of this section will cover enough information about JDBC for you to know what it is about and how to use it effectively. This is by no means a complete overview of JDBC. That would fill an entire book.

 

6.7 JDBC Goals:

Few software packages are designed without goals in mind. JDBC is one that, because of its many goals, drove the development of the API. These goals, in conjunction with early reviewer feedback, have finalized the JDBC class library into a solid framework for building database applications in Java.

The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although not the lowest database interface level possible, it is at a low enough level for higher-level tools and APIs to be created. Conversely, it is at a high enough level for application programmers to use it confidently. Attaining this goal allows for future tool vendors to “generate” JDBC code and to hide many of JDBC’s complexities from the end user.

SQL Conformance

SQL syntax varies as you move from database vendor to database vendor. In an effort to support a wide variety of vendors, JDBC will allow any query statement to be passed through it to the underlying database driver. This allows the connectivity module to handle non-standard functionality in a manner that is suitable for its users.

JDBC must be implemental on top of common database interfaces

The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal allows JDBC to use existing ODBC level drivers by the use of a software interface. This interface would translate JDBC calls to ODBC and vice versa.

  1. Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system.

  • Keep it simple

This goal probably appears in all software design goal listings. JDBC is no exception. Sun felt that the design of JDBC should be very simple, allowing for only one method of completing a task per mechanism. Allowing duplicate functionality only serves to confuse the users of the API.

  • Use strong, static typing wherever possible

Strong typing allows for more error checking to be done at compile time; also, less error appear at runtime.

  • Keep the common cases simple

Because more often than not, the usual SQL calls used by the programmer are simple SELECT’s, INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to perform with JDBC. However, more complex SQL statements should also be possible.

Finally we decided to precede the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple                                     Architecture-neutral

Object-oriented                       Portable

Distributed                              High-performance

Interpreted                              Multithreaded

Robust                                     Dynamic Secure

Java is also unusual in that each Java program is both compiled and interpreted. With a compile you translate a Java program into an intermediate language called Java byte codes the platform-independent code instruction is passed and run on the computer.

Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

7.7 NETWORKING TCP/IP STACK:

The TCP/IP stack is shorter than the OSI one:

TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless protocol.

IP datagram’s:

The IP layer provides a connectionless and unreliable delivery system. It considers each datagram independently of the others. Any association between datagram must be supplied by the higher layers. The IP layer supplies a checksum that includes its own header. The header includes the source and destination addresses. The IP layer handles routing through an Internet. It is also responsible for breaking up large datagram into smaller ones for transmission and reassembling them at the other end.

UDP:

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of the datagram and port numbers. These are used to give a client/server model – see later.

TCP:

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a virtual circuit that two processes can use to communicate.

Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address.

Network address:

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

Subnet address:

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts.

Host address:

8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet.

Total address:

The 32 bit address is usually written as 4 integers separated by dots.

Port addresses

A service exists on a host, and is identified by its port. This is a 16 bit number. To send a message to a server, you send it to the port for that service of the host that it is running on. This is not location transparency! Certain of these ports are “well known”.

Sockets:

A socket is a data structure maintained by the system to handle network connections. A socket is created using the call socket. It returns an integer that is like a file descriptor. In fact, under Windows, this handle can be used with Read File and Write File functions.

#include <sys/types.h>
#include <sys/socket.h>
int socket(int family, int type, int protocol);

Here “family” will be AF_INET for IP communications, protocol will be zero, and type will depend on whether TCP or UDP is used. Two processes wishing to communicate over a network create a socket each. These are similar to two ends of a pipe – but the actual pipe does not yet exist.

6.8 JFREE CHART:

JFreeChart is a free 100% Java chart library that makes it easy for developers to display professional quality charts in their applications. JFreeChart’s extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side applications;

Support for many output types, including Swing components, image files (including PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

JFreeChart is “open source” or, more specifically, free software. It is distributed under the terms of the GNU Lesser General Public Licence (LGPL), which permits use in proprietary applications.

6.8.1. Map Visualizations:

Charts showing values that relate to geographical areas. Some examples include: (a) population density in each state of the United States, (b) income per capita for each country in Europe, (c) life expectancy in each country of the world. The tasks in this project include: Sourcing freely redistributable vector outlines for the countries of the world, states/provinces in particular countries (USA in particular, but also other areas);

Creating an appropriate dataset interface (plus default implementation), a rendered, and integrating this with the existing XYPlot class in JFreeChart; Testing, documenting, testing some more, documenting some more.

6.8.2. Time Series Chart Interactivity

Implement a new (to JFreeChart) feature for interactive time series charts — to display a separate control that shows a small version of ALL the time series data, with a sliding “view” rectangle that allows you to select the subset of the time series data to display in the main chart.

 

6.8.3. Dashboards

There is currently a lot of interest in dashboard displays. Create a flexible dashboard mechanism that supports a subset of JFreeChart chart types (dials, pies, thermometers, bars, and lines/time series) that can be delivered easily via both Java Web Start and an applet.

 

6.8.4. Property Editors

The property editor mechanism in JFreeChart only handles a small subset of the properties that can be set for charts. Extend (or reimplement) this mechanism to provide greater end-user control over the appearance of the charts.

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

CONCLUSION:

We present study on personalized travel package recommendation. Specifically, we first analyzed the unique characteristics of travel packages and developed the TAST model, a Bayesian network for travel package and tourist representation. The TAST model can discover the interests of the tourists and extract the spatial-temporal correlations among landscapes. Then, we exploited the TAST model for developing a cocktail approach on personalized travel package recommendation. This cocktail approach follows a hybrid recommendation strategy and has the ability to combine several constraints existing in the real-world scenario.

Furthermore, we extended the TAST model to the TRAST model, which can capture the relationships among tourists in each travel group. Finally, an empirical study was conducted on real-world travel data. Experimental results demonstrate that the TAST model can capture the unique characteristics of the travel packages, the cocktail approach can lead to better performances of travel package recommendation, and the TRAST model can be used as an effective assessment for travel group automatic formation. We hope these encouraging results could lead to many future works.

Single Image Super-Resolution Based on Gradient Profile Sharpness

PSMPA Patient Self-Controllable and Multi-Level Privacy-Preserving Cooperative Authentication in Dist

The Distributed m-healthcare cloud computing system considerably facilitates secure and efficient patient treatment for medical consultation by sharing personal health information among the healthcare providers. This system should bring about the challenge of keeping both the data confidentiality and patients’ identity privacy simultaneously. Many existing access control and anonymous authentication schemes cannot be straightforwardly exploited. To solve the problem proposed a novel authorized accessible privacy model (AAPM) is established. Patients can authorize physicians by setting an access tree supporting flexible threshold predicates.

Our new technique of attribute based designated verifier signature, a patient self-controllable multi-level privacy preserving cooperative authentication scheme (PSMPA) realizing three levels of security and privacy requirement in distributed m-healthcare cloud computing system is proposed. The directly authorized physicians, the indirectly authorized physicians and the unauthorized persons in medical consultation can respectively decipher the personal health information and/or verify patients’ identities by satisfying the access tree with their own attribute sets.

1.2 INTRODUCTION:

Distributed m-healthcare cloud computing systems have been increasingly adopted worldwide including the European Commission activities, the US Health Insurance Portability and Accountability Act (HIPAA) and many other governments for efficient and high-quality medical treatment. In m-healthcare social networks, the personal health information is always shared among the patients located in respective social communities suffering from the same disease for mutual support, and across distributed healthcare providers (HPs) equipped with their own cloud servers for medical consultant. However, it also brings about a series of challenges, especially how to ensure the security and privacy of the patients’ personal health information from various attacks in the wireless communication channel such as eavesdropping and tampering As to the security facet, one of the main issues is access control of patients’ personal health information, namely it is only the authorized physicians or institutions that can recover the patients’ personal health information during the data sharing in the distributed m-healthcare cloud computing system. In practice, most patients are concerned about the confidentiality of their personal health information since it is likely to make them in trouble for each kind of unauthorized collection and disclosure.

Therefore, in distributed m-healthcare cloud computing systems, which part of the patients’ personal health information should be shared and which physicians their personal health information should be shared with have become two intractable problems demanding urgent solutions. There has emerged various research results focusing on them. A fine-grained distributed data access control scheme is proposed using the technique of attribute based encryption (ABE). A rendezvous-based access control method provides access privilege if and only if the patient and the physician meet in the physical world. Recently, a patient-centric and fine-grained data access control in multi-owner settings is constructed for securing personal health records in cloud computing. However, it mainly focuses on the central cloud computing system which is not sufficient for efficiently processing the increasing volume of personal health information in m-healthcare cloud computing system.

Moreover, it is not enough for to only guarantee the data confidentiality of the patient’s personal health information in the honest-but-curious cloud server model since the frequent communication between a patient and a professional physician can lead the adversary to conclude that the patient is suffering from a specific disease with a high probability. Unfortunately, the problem of how to protect both the patients’ data confidentiality and identity privacy in the distributed m-healthcare cloud computing scenario under the malicious model was left untouched.

In this paper, we consider simultaneously achieving data confidentiality and identity privacy with high efficiency. As is described in Fig. 1, in distributed m-healthcare cloud computing systems, all the members can be classified into three categories: the directly authorized physicians with green labels in the local healthcare provider who are authorized by the patients and can both access the patient’s personal health information and verify the patient’s identity and the indirectly authorized physicians with yellow labels in the remote healthcare providers who are authorized by the directly authorized physicians for medical consultant or some research purposes (i.e., since they are not authorized by the patients, we use the term ‘indirectly authorized’ instead). They can only access the personal health information, but not the patient’s identity. For the unauthorized persons with red labels, nothing could be obtained. By extending the techniques of attribute based access control and designated verifier signatures (DVS) on de-identified health information

1.3 LITRATURE SURVEY

SECURING PERSONAL HEALTH RECORDS IN CLOUD COMPUTING: PATIENT-CENTRIC AND FINE-GRAINED DATA ACCESS CONTROL IN MULTI-OWNER SETTINGS

AUTHOR: M. Li, S. Yu, K. Ren, and W. Lou

PUBLISH: Proc. 6th Int. ICST Conf. Security Privacy Comm. Netw., 2010, pp. 89–106.

EXPLANATION:

Online personal health record (PHR) enables patients to manage their own medical records in a centralized way, which greatly facilitates the storage, access and sharing of personal health data. With the emergence of cloud computing, it is attractive for the PHR service providers to shift their PHR applications and storage into the cloud, in order to enjoy the elastic resources and reduce the operational cost. However, by storing PHRs in the cloud, the patients lose physical control to their personal health data, which makes it necessary for each patient to encrypt her PHR data before uploading to the cloud servers. Under encryption, it is challenging to achieve fine-grained access control to PHR data in a scalable and efficient way. For each patient, the PHR data should be encrypted so that it is scalable with the number of users having access. Also, since there are multiple owners (patients) in a PHR system and every owner would encrypt her PHR files using a different set of cryptographic keys, it is important to reduce the key distribution complexity in such multi-owner settings. Existing cryptographic enforced access control schemes are mostly designed for the single-owner scenarios. In this paper, we propose a novel framework for access control to PHRs within cloud computing environment. To enable fine-grained and scalable access control for PHRs, we leverage attribute based encryption (ABE) techniques to encrypt each patients’ PHR data. To reduce the key distribution complexity, we divide the system into multiple security domains, where each domain manages only a subset of the users. In this way, each patient has full control over her own privacy, and the key management complexity is reduced dramatically.

PRIVACY AND EMERGENCY RESPONSE IN E-HEALTHCARE LEVERAGING WIRELESS BODY SENSOR NETWORKS

AUTHOR: J. Sun, Y. Fang, and X. Zhu

PUBLISH: IEEE Wireless Commun., vol. 17, no. 1, pp. 66–73, Feb. 2010.

EXPLANATION:

Electronic healthcare is becoming a vital part of our living environment and exhibits advantages over paper-based legacy systems. Privacy is the foremost concern of patients and the biggest impediment to e-healthcare deployment. In addressing privacy issues, conflicts from the functional requirements must be taken into account. One such requirement is efficient and effective response to medical emergencies. In this article, we provide detailed discussions on the privacy and security issues in e-healthcare systems and viable techniques for these issues. Furthermore, we demonstrate the design challenge in the fulfillment of conflicting goals through an exemplary scenario, where the wireless body sensor network is leveraged, and a sound solution is proposed to overcome the conflict.

HCPP: CRYPTOGRAPHY BASED SECURE EHR SYSTEM FOR PATIENT PRIVACY AND EMERGENCY HEALTHCARE

AUTHOR: J. Sun, X. Zhu, C. Zhang, and Y. Fang

PUBLISH: Proc. 31st Int. Conf. Distrib. Comput. Syst., 2011, pp. 373–382.

EXPLANATION:

Privacy concern is arguably the major barrier that hinders the deployment of electronic health record (EHR) systems which are considered more efficient, less error-prone, and of higher availability compared to traditional paper record systems. Patients are unwilling to accept the EHR system unless their protected health information (PHI) containing highly confidential data is guaranteed proper use and disclosure, which cannot be easily achieved without patients’ control over their own PHI. However, cautions must be taken to handle emergencies in which the patient may be physically incompetent to retrieve the controlled PHI for emergency treatment. In this paper, we propose a secure EHR system, HCPP (Healthcaresystem for Patient Privacy), based on cryptographic constructions and existing wireless network infrastructures, to provide privacy protection to patients under any circumstances while enabling timelyPHI retrieval for life-saving treatment in emergency situations. Furthermore, our HCPP system restricts PHI access to authorized (not arbitrary) physicians, who can be traced and held accountable if the accessed PHI is found improperly disclosed. Last but not least, HCPP leverages wireless network access to support efficient and private storage/retrieval of PHI, which underlies a secure and feasible EHR system.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

Existing system data confidentiality is much important but in existing system framework it is not enough for to only guarantee the data confidentiality of the patient’s personal health information in the honest-but-curious cloud server model since the frequent communication between a patient and a professional physician can lead the adversary to conclude that the patient is suffering from a specific disease with a high probability. Unfortunately, the problem of how to protect both the patients’ data confidentiality and identity privacy in the distributed m-healthcare cloud computing scenario under the malicious model was left untouched.

Patients are unwilling to accept the EHR system unless their protected health information (PHI) containing highly confidential data is guaranteed proper use and disclosure, which cannot be easily achieved without patients’ control over their own PHI. However, cautions must be taken to handle emergencies in which the patient may be physically incompetent to retrieve the controlled PHI for emergency treatment a secure EHR system, HCPP (Health care system for Patient Privacy), based on cryptographic constructions and existing wireless network infrastructures, to provide privacy protection to patients under any circumstances while enabling timelyPHI retrieval for life-saving treatment in emergency situations.                                

2.1.1 DISADVANTAGES:

Existing applications in e-healthcare scenario can be realized through real-time, continuous vital monitoring to give immediate alerts of changes in patient status. Also, the WBAN operates in environments with open access by various people such as hospital or medical organization, which also accommodates attackers. The open wireless channel makes the data prone to be eavesdropped, modified, and injected. Many kinds of security threats have been existed, such as unauthenticated or unauthorized access, message disclosure, message modification, denial-of-service, node capture and compromised node, and routing attacks, etc. Among which two kinds of threats play the leading role, the threats from device compromise and the threats from network dynamics.

Existing problem of security is rising nowadays. Especially, the privacy of communication through Internet may be at risk of attacking in a number of ways. On-line collecting, transmitting, and processing of personal data cause a severe threat to privacy. Once the utilization of Internet-based services is concerned on-line, the lack of privacy in network communication is the main conversation in the public. This problem is far more significant in modern medical environment, as e-healthcare networks are implemented and developed. According to common standards, the network linked with general practitioners, hospitals, and social centers at a national or international scale. While suffering the risk of leaking the privacy data, such networks’ privacy information is facing great danger.

  • Data confidentiality is low.
  • Data redundancy is high.
  • There is a violation in data security.


2.2 PROPOSED SYSTEM:

We presented a new architecture of pseudonymiaztion for protecting privacy in E-health (PIPE) integrated pseudonymization of medical data, identity management, obfuscation of metadata with anonymous authentication to prevent disclosure attacks and statistical analysis in and suggested a secure mechanism guaranteeing anonymity and privacy in both the personal health information transferring and storage at a central m-healthcare cloud server.

We proposed an anonymous authentication of membership in dynamic groups. However, since the anonymous authentication mentioned above are established based on public key infrastructure (PKI), the need of an online certificate authority (CA) and one unique public key encryption for each symmetric key k for data encryption at the portal of authorized physicians made the overhead of the construction grow linearly with size of the group. Furthermore, the anonymity level depends on the size of the anonymity set making the anonymous authentication impractical in specific surroundings where the patients are sparsely distributed.

In this paper, the security and anonymity level of our proposed construction is significantly enhanced by associating it to the underlying Gap Bilinear Diffie-Hellman (GBDH) problem and the number of patients’ attributes to deal with the privacy leakage in patient sparsely distributed scenarios significantly, without the knowledge of which physician in the healthcare provider is professional in treating his illness, the best way for the patient is to encrypt his own PHI under a specified access policy rather than assign each physician a secret key. As a result, the authorized physicians whose attribute set satisfy the access policy can recover the PHI and the access control management also becomes more efficient.

2.2.1 ADVANTAGES:

Our advantages a patient-centric and fine-grained data access control using ABE to secure personal health records in cloud computing without privacy-preserving authentication. For comparison, to achieve the same functions of PSMPA, it could be considered as the combination of ABE and DVS that the computational complexity of PSMPA remains constant regardless of the number of directly authorized physicians and nearly half of the combination construction of ABE and DVS supporting flexible predicate.

The communication cost of PSMPA also remains constant; almost half of the combination construction and independent of the number of attributes d in that though the storage overhead of PSMPA is slightly more than the combination construction, it is independent of the number of directly authorized physicians and performs significantly better than traditional DVS, all of whose computational, communication and storage overhead increase linearly to the number of directly authorized physicians.

  • M-healthcare system is fully controlled and secured with encryption standards.
  • There is no data loss and data redundancy.
  • System provides full protection for patient’s data and their attributes.

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

2.3.2 SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP or Win7
  • Front End                                :           Microsoft Visual Studio .NET 2008
  • Script                                       :           C# Script
  • Back End                                :           MS-SQL Server 2005
  • Document                               :           MS-Office 2007


CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

  • The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system
  • The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  • DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  • DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data’s in the physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

  1. All processes must have at least one data flow in and one data flow out.
  2. All processes should modify the incoming data, producing new forms of outgoing data.
  3. Each data store must be involved with at least one data flow.
  4. Each external entity must be involved with at least one data flow.
  5. A data flow must be attached to at least one process.


3.1 ARCHITECTURE DIAGRAM

3.2 DATAFLOW DIAGRAM:


UML DIAGRAMS:

3.2 USE CASE DIAGRAM:


3.3 CLASS DIAGRAM:


3.4 SEQUENCE DIAGRAM:


3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

In our implementation, we choose MIRACLE Library for simulating cryptographic operations using Microsoft C/C++ compilers. To achieve a comparable security of 1,024-bit RSA, According to the standards of Paring-based Crypto Librarya patient-centric and fine-grained data access control using ABE to secure personal health records in cloud computing [30] without privacy-preserving authentication. For comparison, to achieve the same functions of PSMPA, it could be considered as the combination of ABE and DVS that the computational complexity of PSMPA remains constant regardless of the number of directly authorized physicians and nearly half of the combination construction of ABE and DVS supporting flexible predicate. Fig. 5 illustrates the communication cost of PSMPA also remains constant, almost half of the combination construction and independent of the number of attributes d in vD. Fig. 6 shows that though the storage overhead of PSMPA is slightly more than the combination construction, it is independent of the number of directly authorized physicians and performs significantly better than traditional DVS, all of whose computational, communication and storage overhead increase linearly to the number of directly authorized physicians. that the computational and communication overhead of the combination construction decrease slightly faster than PSMPA as the threshold k increases, however, even when k reaches the maximum value equaling to d, the overheads are still much more than PSMPA. The comparison between our scheme and the anonymous authentication based on PKI the storage, communication and computational overhead towards N and k is identical to DVS, since to realize the same identity privacy, in all the constructions a pair of public key and private key would be assigned to each directly authorized physician and the number of signature operations is also linear to the number of physicians, independent of the threshold k. The simulation results show our PSMPA better adapts to the distributed m-healthcare cloud computing system than previous schemes, especially for enhancing the energy constrained mobile devices (the data sink’s) efficiency.

4.1 ALGORITHM

Attribute Based Designated Verifier Signature Scheme We propose a patient self-controllable and multi-level privacy-preserving cooperative authentication scheme based on ADVS to realize three levels of security and privacy requirement in distributed m-healthcare cloud computing system which mainly consists of the following five algorithms: Setup, Key Extraction, Sign, Verify and Transcript Simulation Generation. Denote the universe of attributes as U.


4.2 MODULES:

E-HEALTHCARE SYSTEM FRAMEWORK:

AUTHORIZED ACCESSIBLE PRIVACY MODEL:

SECURITY VERIFICATION:

PERFORMANCE EVALUATION:

4.3 MODULE DESCRIPTION:

E-healthcare System Framework:

E-healthcare System consists of three components: body area networks (BANs), wireless transmission networks and the healthcare providers equipped with their own cloud servers. The patient’s personal health information is securely transmitted to the healthcare provider for the authorized physicians to access and perform medical treatment. Illustrate the unique characteristics of distributed m-healthcare cloud computing systems where all the personal health information can be shared among patients suffering from the same disease for mutual support or among the authorized physicians in distributed healthcare providers and medical research institutions for medical consultation.

Authorized accessible privacy model:

Multi-level privacy-preserving cooperative authentication is established to allow the patients to authorize corresponding privileges to different kinds of physicians located in distributed healthcare providers by setting an access tree supporting flexible threshold predicates. Propose a novel authorized accessible privacy model for distributed m-healthcare cloud computing systems which consists of the following two components: an attribute based designated verifier signature scheme (ADVS) and the corresponding adversary model.

Security Verification:

The security and anonymity level of our proposed construction is significantly enhanced by associating it to the underlying Gap Bilinear Diffie-Hellman (GBDH) problem and the number of patients’ attributes to deal with the privacy leakage in patient sparsely distributed scenarios. More significantly, without the knowledge of which physician in the healthcare provider is professional in treating his illness, the best way for the patient is to encrypt his own PHI under a specified access policy rather than assign each physician a secret key. As a result, the authorized physicians whose attribute set satisfy the access policy can recover the PHI and the access control management also becomes more efficient.

Performance Evaluation:

The efficiency of PSMPA in terms of storage overhead, computational complexity and communication cost. a patient-centric and fine-grained data access control using ABE to secure personal health records in cloud computing without privacy-preserving authentication. To achieve the same security, our construction performs more efficiently than the traditional designated verifier signature for all the directly authorized physicians, where the overheads are linear to the number of directly authorized physicians.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company.  For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are      

  • ECONOMICAL FEASIBILITY
  • TECHNICAL FEASIBILITY
  • SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:                  

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

5.1.2 TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.  

5.1.3 SOCIAL FEASIBILITY:  

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months. This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program  testing  checks  for  two  types  of  errors:  syntax  and  logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

UNIT TESTING:

Description Expected result
Test for application window properties. All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem.  The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

FUNCTIONAL TESTING:

Description Expected result
Test for all modules. All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

 The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

  • Load testing
  • Performance testing
  • Usability testing
  • Reliability testing
  • Security testing


5.1.5 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Load Testing

Description Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

PERFORMANCE TESTING:

Description Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.   Should handle large input values, and produce accurate result in a  expected time.  

5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

RELIABILITY TESTING:

Description Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of  the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

SECURITY TESTING:

  Description Expected result
Checking that the user identification is authenticated. In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

White  box  testing,  sometimes called  glass-box  testing is  a test  case  design method  that  uses  the  control  structure  of the procedural  design  to  derive  test  cases. Using  white  box  testing  method,  the software  engineer  can  derive  test  cases. The White box testing focuses on the inner structure of the software structure to be tested.

5.1.8 WHITE BOX TESTING:

Description Expected result
Exercise all logical decisions on their true and false sides. All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds. All the loops must be finite.
Exercise internal data structures to ensure their validity. All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software.  That  is,  black  testing  enables  the software engineer  to  derive  sets  of  input  conditions  that  will  fully  exercise  all  functional requirements  for  a  program.  Black box testing is not alternative to white box techniques.  Rather  it  is  a  complementary  approach  that  is  likely  to  uncover  a different  class  of  errors  than  white box  methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

5.1.10 BLACK BOX TESTING:

Description Expected result
To check for incorrect or missing functions. All the functions must be valid.
To check for interface errors. The entire interface must function normally.
To check for errors in a data structures or external data base access. The database updation and retrieval must be done.
To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 7

7.0 SOFTWARE SPECIFICATION:

7.1 FEATURES OF .NET:

Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET Framework is a language-neutral platform for writing programs that can easily and securely interoperate. There’s no language barrier with .NET: there are numerous languages available to the developer including Managed C++, C#, Visual Basic and Java Script.

The .NET framework provides the foundation for components to interact seamlessly, whether locally or remotely on different platforms. It standardizes common data types and communications protocols so that components created in different languages can easily interoperate.

“.NET” is also the collective name given to various software components built upon the .NET platform. These will be both products (Visual Studio.NET and Windows.NET Server, for instance) and services (like Passport, .NET My Services, and so on).

7.2 THE .NET FRAMEWORK

The .NET Framework has two main parts:

1. The Common Language Runtime (CLR).

2. A hierarchical set of class libraries.

The CLR is described as the “execution engine” of .NET. It provides the environment within which programs run. The most important features are

  • Conversion from a low-level assembler-style language, called Intermediate Language (IL), into code native to the platform being executed on.
  • Memory management, notably including garbage collection.
  • Checking and enforcing security restrictions on the running code.
  • Loading and executing programs, with version control and other such features.
  • The following features of the .NET framework are also worth description:

Managed Code

The code that targets .NET, and which contains certain extra Information – “metadata” – to describe itself. Whilst both managed and unmanaged code can run in the runtime, only managed code contains the information that allows the CLR to guarantee, for instance, safe execution and interoperability.

Managed Data

With Managed Code comes Managed Data. CLR provides memory allocation and Deal location facilities, and garbage collection. Some .NET languages use Managed Data by default, such as C#, Visual Basic.NET and JScript.NET, whereas others, namely C++, do not. Targeting CLR can, depending on the language you’re using, impose certain constraints on the features available. As with managed and unmanaged code, one can have both managed and unmanaged data in .NET applications – data that doesn’t get garbage collected but instead is looked after by unmanaged code.

Common Type System

The CLR uses something called the Common Type System (CTS) to strictly enforce type-safety. This ensures that all classes are compatible with each other, by describing types in a common way. CTS define how types work within the runtime, which enables types in one language to interoperate with types in another language, including cross-language exception handling. As well as ensuring that types are only used in appropriate ways, the runtime also ensures that code doesn’t attempt to access memory that hasn’t been allocated to it.

Common Language Specification

The CLR provides built-in support for language interoperability. To ensure that you can develop managed code that can be fully used by developers using any programming language, a set of language features and rules for using them called the Common Language Specification (CLS) has been defined. Components that follow these rules and expose only CLS features are considered CLS-compliant.

7.3 THE CLASS LIBRARY

.NET provides a single-rooted hierarchy of classes, containing over 7000 types. The root of the namespace is called System; this contains basic types like Byte, Double, Boolean, and String, as well as Object. All objects derive from System. Object. As well as objects, there are value types. Value types can be allocated on the stack, which can provide useful flexibility. There are also efficient means of converting value types to object types if and when necessary.

The set of classes is pretty comprehensive, providing collections, file, screen, and network I/O, threading, and so on, as well as XML and database connectivity.

The class library is subdivided into a number of sets (or namespaces), each providing distinct areas of functionality, with dependencies between the namespaces kept to a minimum.

7.4 LANGUAGES SUPPORTED BY .NET

The multi-language capability of the .NET Framework and Visual Studio .NET enables developers to use their existing programming skills to build all types of applications and XML Web services. The .NET framework supports new versions of Microsoft’s old favorites Visual Basic and C++ (as VB.NET and Managed C++), but there are also a number of new additions to the family.

Visual Basic .NET has been updated to include many new and improved language features that make it a powerful object-oriented programming language. These features include inheritance, interfaces, and overloading, among others. Visual Basic also now supports structured exception handling, custom attributes and also supports multi-threading.

Visual Basic .NET is also CLS compliant, which means that any CLS-compliant language can use the classes, objects, and components you create in Visual Basic .NET.

Managed Extensions for C++ and attributed programming are just some of the enhancements made to the C++ language. Managed Extensions simplify the task of migrating existing C++ applications to the new .NET Framework.

C# is Microsoft’s new language. It’s a C-style language that is essentially “C++ for Rapid Application Development”. Unlike other languages, its specification is just the grammar of the language. It has no standard library of its own, and instead has been designed with the intention of using the .NET libraries as its own.

Microsoft Visual J# .NET provides the easiest transition for Java-language developers into the world of XML Web Services and dramatically improves the interoperability of Java-language programs with existing software written in a variety of other programming languages.

Active State has created Visual Perl and Visual Python, which enable .NET-aware applications to be built in either Perl or Python. Both products can be integrated into the Visual Studio .NET environment. Visual Perl includes support for Active State’s Perl Dev Kit.

Other languages for which .NET compilers are available include

  • FORTRAN
  • COBOL
  • Eiffel          
            ASP.NET  XML WEB SERVICES    Windows Forms
                         Base Class Libraries
                   Common Language Runtime
                           Operating System

Fig1 .Net Framework

C#.NET is also compliant with CLS (Common Language Specification) and supports structured exception handling. CLS is set of rules and constructs that are supported by the CLR (Common Language Runtime). CLR is the runtime environment provided by the .NET Framework; it manages the execution of the code and also makes the development process easier by providing services.

C#.NET is a CLS-compliant language. Any objects, classes, or components that created in C#.NET can be used in any other CLS-compliant language. In addition, we can use objects, classes, and components created in other CLS-compliant languages in C#.NET .The use of CLS ensures complete interoperability among applications, regardless of the languages used to create the application.

CONSTRUCTORS AND DESTRUCTORS:

Constructors are used to initialize objects, whereas destructors are used to destroy them. In other words, destructors are used to release the resources allocated to the object. In C#.NET the sub finalize procedure is available. The sub finalize procedure is used to complete the tasks that must be performed when an object is destroyed. The sub finalize procedure is called automatically when an object is destroyed. In addition, the sub finalize procedure can be called only from the class it belongs to or from derived classes.

GARBAGE COLLECTION

Garbage Collection is another new feature in C#.NET. The .NET Framework monitors allocated resources, such as objects and variables. In addition, the .NET Framework automatically releases memory for reuse by destroying objects that are no longer in use.

In C#.NET, the garbage collector checks for the objects that are not currently in use by applications. When the garbage collector comes across an object that is marked for garbage collection, it releases the memory occupied by the object.

OVERLOADING

Overloading is another feature in C#. Overloading enables us to define multiple procedures with the same name, where each procedure has a different set of arguments. Besides using overloading for procedures, we can use it for constructors and properties in a class.

MULTITHREADING:

C#.NET also supports multithreading. An application that supports multithreading can handle multiple tasks simultaneously, we can use multithreading to decrease the time taken by an application to respond to user interaction.

STRUCTURED EXCEPTION HANDLING

C#.NET supports structured handling, which enables us to detect and remove errors at runtime. In C#.NET, we need to use Try…Catch…Finally statements to create exception handlers. Using Try…Catch…Finally statements, we can create robust and effective exception handlers to improve the performance of our application.

7.5 THE .NET FRAMEWORK

The .NET Framework is a new computing platform that simplifies application development in the highly distributed environment of the Internet.

OBJECTIVES OF .NET FRAMEWORK

1. To provide a consistent object-oriented programming environment whether object codes is stored and executed locally on Internet-distributed, or executed remotely.

2. To provide a code-execution environment to minimizes software deployment and guarantees safe execution of code.

3. Eliminates the performance problems.         

There are different types of application, such as Windows-based applications and Web-based applications. 

7.6 FEATURES OF SQL-SERVER

The OLAP Services feature available in SQL Server version 7.0 is now called SQL Server 2000 Analysis Services. The term OLAP Services has been replaced with the term Analysis Services. Analysis Services also includes a new data mining component. The Repository component available in SQL Server version 7.0 is now called Microsoft SQL Server 2000 Meta Data Services. References to the component now use the term Meta Data Services. The term repository is used only in reference to the repository engine within Meta Data Services

SQL-SERVER database consist of six type of objects,

They are,

1. TABLE

2. QUERY

3. FORM

4. REPORT

5. MACRO

7.7 TABLE:

A database is a collection of data about a specific topic.

VIEWS OF TABLE:

We can work with a table in two types,

1. Design View

2. Datasheet View

Design View

          To build or modify the structure of a table we work in the table design view. We can specify what kind of data will be hold.

Datasheet View

To add, edit or analyses the data itself we work in tables datasheet view mode.

QUERY:

A query is a question that has to be asked the data. Access gathers data that answers the question from one or more table. The data that make up the answer is either dynaset (if you edit it) or a snapshot (it cannot be edited).Each time we run query, we get latest information in the dynaset. Access either displays the dynaset or snapshot for us to view or perform an action on it, such as deleting or updating.

CHAPTER 7

7.0 APPENDIX

7.1 SAMPLE SCREEN SHOTS:

7.2 SAMPLE SOURCE CODE:

CHAPTER 8

8.1 CONCLUSION AND FUTURE ENHANCEMENT:

In this paper, a novel authorized accessible privacy model and a patient self-controllable multi-level privacy preserving cooperative authentication scheme realizing three different levels of security and privacy requirement in the distributed m-healthcare cloud computing system are proposed, followed by the formal security proof and efficiency evaluations which illustrate our PSMPA can resist various kinds of malicious attacks and far outperforms previous schemes in terms of storage, computational and communication overhead.

Privacy-Preserving Detection of Sensitive Data Exposure

Statistics from security firms, research institutions and government organizations show that the numbers of data-leak instances have grown rapidly in recent years. Among various data-leak cases, human mistakes are one of the main causes of data loss. There exist solutions detecting inadvertent sensitive data leaks caused by human mistakes and to provide alerts for organizations. A common approach is to screen content in storage and transmission for exposed sensitive information. Such an approach usually requires the detection operation to be conducted in secrecy. However, this secrecy requirement is challenging to satisfy in practice, as detection servers may be compromised or outsourced.

In this paper, we present a privacy preserving data-leak detection (DLD) solution to solve the issue where a special set of sensitive data digests is used in detection. The advantage of our method is that it enables the data owner to safely delegate the detection operation to a semihonest provider without revealing the sensitive data to the provider. We describe how Internet service providers can offer their customers DLD as an add-on service with strong privacy guarantees. The evaluation results show that our method can support accurate detection with very small number of false alarms under various data-leak scenarios.

1.2 INTRODUCTION

According to a report from Risk Based Security (RBS), the number of leaked sensitive data records has increased dramatically during the last few years, i.e., from 412 million in 2012 to 822 million in 2013. Deliberately planned attacks, inadvertent leaks (e.g., forwarding confidential emails to unclassified email accounts), and human mistakes (e.g., assigning the wrong privilege) lead to most of the data-leak incidents. Detecting and preventing data leaks requires a set of complementary solutions, which may include data-leak detection, data confinement, stealthy malware detection and policy enforcement.

Network data-leak detection (DLD) typically performs deep packet inspection (DPI) and searches for any occurrences of sensitive data patterns. DPI is a technique to analyze payloads of IP/TCP packets for inspecting application layer data, e.g., HTTP header/content. Alerts are triggered when the amount of sensitive data found in traffic passes a threshold. The detection system can be deployed on a router or integrated into existing network intrusion detection systems (NIDS). Straightforward realizations of data-leak detection require the plaintext sensitive data.

However, this requirement is undesirable, as it may threaten the confidentiality of the sensitive information. If a detection system is compromised, then it may expose the plaintext sensitive data (in memory). In addition, the data owner may need to outsource the data-leak detection to providers, but may be unwilling to reveal the plaintext sensitive data to them. Therefore, one needs new data-leak detection solutions that allow the providers to scan content for leaks without learning the sensitive information.

In this paper, we propose a data-leak detection solution which can be outsourced and be deployed in a semihonest detection environment. We design, implement, and evaluate our fuzzy fingerprint technique that enhances data privacy during data-leak detection operations. Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data. Using our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected. Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. In another scenario, individuals can mark their own sensitive data and ask the administrator of their local network to detect data leaks for them.

In our detection procedure, the data owner computes a special set of digests or fingerprints from the sensitive data and then discloses only a small amount of them to the DLD provider. The DLD provider computes fingerprints from network traffic and identifies potential leaks in them. To prevent the DLD provider from gathering exact knowledge about the sensitive data, the collection of potential leaks is composed of real leaks and noises. It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.

Our contributions are summarized as follows.

1) We describe a privacy-preserving data-leak detection model for preventing inadvertent data leak in network traffic. Our model supports detection operation delegation and ISPs can provide data-leak detection as an add-on service to their customers using our model. We design, implement, and evaluate an efficient technique, fuzzy fingerprint, for privacy-preserving data-leak detection. Fuzzy fingerprints are special sensitive data digests prepared by the data owner for release to the DLD provider.

2) We implement our detection system and perform extensive experimental evaluation on 2.6 GB Enron dataset, Internet surfing traffic of 20 users, and also 5 simulated real-worlds data-leak scenarios to measure its privacy guarantee, detection rate and efficiency. Our results indicate high accuracy achieved by our underlying scheme with very low false positive rate. Our results also show that the detection accuracy does not degrade much when only partial (sampled) sensitive-data digests are used. In addition, we give an empirical analysis of our fuzzification as well as of the fairness of fingerprint partial disclosure.

1.3 LITRATURE SURVEY

PRIVACY-AWARE COLLABORATIVE SPAM FILTERING

AUTHORS: K. Li, Z. Zhong, and L. Ramaswamy

PUBLISH: IEEE Trans. Parallel Distrib. Syst., vol. 20, no. 5, pp. 725–739, May 2009.

EXPLANATION:

While the concept of collaboration provides a natural defense against massive spam e-mails directed at large numbers of recipients, designing effective collaborative anti-spam systems raises several important research challenges. First and foremost, since e-mails may contain confidential information, any collaborative anti-spam approach has to guarantee strong privacy protection to the participating entities. Second, the continuously evolving nature of spam demands the collaborative techniques to be resilient to various kinds of camouflage attacks. Third, the collaboration has to be lightweight, efficient, and scalable. Toward addressing these challenges, this paper presents ALPACAS-a privacy-aware framework for collaborative spam filtering. In designing the ALPACAS framework, we make two unique contributions. The first is a feature-preserving message transformation technique that is highly resilient against the latest kinds of spam attacks. The second is a privacy-preserving protocol that provides enhanced privacy guarantees to the participating entities. Our experimental results conducted on a real e-mail data set shows that the proposed framework provides a 10 fold improvement in the false negative rate over the Bayesian-based Bogofilter when faced with one of the recent kinds of spam attacks. Further, the privacy breaches are extremely rare. This demonstrates the strong privacy protection provided by the ALPACAS system.

DATA LEAK DETECTION AS A SERVICE: CHALLENGES AND SOLUTIONS

AUTHORS: X. Shu and D. Yao

PUBLISH: Proc. 8th Int. Conf. Secur. Privacy Commun. Netw., 2012, pp. 222–240

EXPLANATION:

We describe network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others. We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on the privacy, efficiency, accuracy and noise tolerance of our techniques. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed. It also indicates that the detection accuracy does not degrade when partial digests are used. We further provide a quantifiable method to measure the privacy guarantee offered by our fuzzy fingerprint framework.

QUANTIFYING INFORMATION LEAKS IN OUTBOUND WEB TRAFFIC

AUTHORS: K. Borders and A. Prakash

PUBLISH: Proc. 30th IEEE Symp. Secur. Privacy, May 2009, pp. 129–140.

EXPLANATION:

As the Internet grows and network bandwidth continues to increase, administrators are faced with the task of keeping confidential information from leaving their networks. Todaypsilas network traffic is so voluminous that manual inspection would be unreasonably expensive. In response, researchers have created data loss prevention systems that check outgoing traffic for known confidential information. These systems stop naive adversaries from leaking data, but are fundamentally unable to identify encrypted or obfuscated information leaks. What remains is a high-capacity pipe for tunneling data to the Internet. We present an approach for quantifying information leak capacity in network traffic. Instead of trying to detect the presence of sensitive data-an impossible task in the general case–our goal is to measure and constrain its maximum volume. We take advantage of the insight that most network traffic is repeated or determined by external information, such as protocol specifications or messages sent by a server. By filtering this data, we can isolate and quantify true information flowing from a computer. In this paper, we present measurement algorithms for the Hypertext Transfer Protocol (HTTP), the main protocol for Web browsing. When applied to real Web browsing traffic, the algorithms were able to discount 98.5% of measured bytes and effectively isolate information leaks.

CHAPTER 2

2.0 SYSTEM ANALYSIS

2.1 EXISTING SYSTEM:

  • Existing detecting and preventing data leaks requires a set of complementary solutions, which may include data-leak detection, data confinement, stealthy malware detection, and policy enforcement.
  • Network data-leak detection (DLD) typically performs deep packet inspection (DPI) and searches for any occurrences of sensitive data patterns. DPI is a technique to analyze payloads of IP/TCP packets for inspecting application layer data, e.g., HTTP header/content.
  • Alerts are triggered when the amount of sensitive data found in traffic passes a threshold. The detection system can be deployed on a router or integrated into existing network intrusion detection systems (NIDS).
  • Straightforward realizations of data-leak detection require the plaintext sensitive data. However, this requirement is undesirable, as it may threaten the confidentiality of the sensitive information. If a detection system is compromised, then it may expose the plaintext sensitive data (in memory).
  • In addition, the data owner may need to outsource the data-leak detection to providers, but may be unwilling to reveal the plaintext sensitive data to them. Therefore, one needs new data-leak detection solutions that allow the providers to scan content for leaks without learning the sensitive information.


2.1.1 DISADVANTAGES:

  • As the Internet grows and network bandwidth continues to increase, administrators are faced with the task of keeping confidential information from leaving their networks. In response, researchers have created data loss prevention systems that check outgoing traffic for known confidential information.
  • These systems stop naive adversaries from leaking data, but are fundamentally unable to identify encrypted or obfuscated information leaks. What remains is a high-capacity pipe for tunneling data to the Internet.
  • Existing approach for quantifying information leak capacity in network traffic instead of trying to detect the presence of sensitive data-an impossible task in the general case–our goal is to measure and constrain its maximum volume.
  • We take disadvantage of the insight that most network traffic is repeated or determined by external information, such as protocol specifications or messages sent by a server. By filtering this data, we can isolate and quantify true information flowing from a computer.

2.2 PROPOSED SYSTEM:

  • We propose a data-leak detection solution which can be outsourced and be deployed in a semihonest detection environment. We design, implement, and evaluate our fuzzy fingerprint technique that enhances data privacy during data-leak detection operations.
  • Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data.
  • Our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected. Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. In another scenario, individuals can mark their own sensitive data and ask the administrator of their local network to detect data leaks for them.
  • Our detection procedure, the data owner computes a special set of digests or fingerprints from the sensitive data and then discloses only a small amount of them to the DLD provider. The DLD provider computes fingerprints from network traffic and identifies potential leaks in them.
  • To prevent the DLD provider from gathering exact knowledge about the sensitive data, the collection of potential leaks is composed of real leaks and noises. It is the data owner, who post-processes the potential leaks sent back by the DLD provider and determines whether there is any real data leak.


2.2.1 ADVANTAGES:

  • We describe privacy-preserving data-leak detection model for preventing inadvertent data leak in network traffic. Our model supports detection operation delegation and ISPs can provide data-leak detection as an add-on service to their customers using our model.
  • We design, implement, and evaluate an efficient technique, fuzzy fingerprint, for privacy-preserving data-leak detection. Fuzzy fingerprints are special sensitive data digests prepared by the data owner for release to the DLD provider.
  • We implement our detection system and perform extensive experimental evaluation on internet surfing traffic of 20 users, and also 5 simulated real-worlds data-leak scenarios to measure its privacy guarantee, detection rate and efficiency.
  • Our results indicate high accuracy achieved by our underlying scheme with very low false positive rate. Our results also show that the detection accuracy does not degrade much when only partial (sampled) sensitive-data digests are used an empirical analysis of our fuzzification as well as of the fairness of fingerprint partial disclosure.

2.3 HARDWARE & SOFTWARE REQUIREMENTS:

2.3.1 HARDWARE REQUIREMENT:

v    Processor                                 –    Pentium –IV

  • Speed                                      –    1.1 GHz
    • RAM                                       –    256 MB (min)
    • Hard Disk                               –   20 GB
    • Floppy Drive                           –    1.44 MB
    • Key Board                              –    Standard Windows Keyboard
    • Mouse                                     –    Two or Three Button Mouse
    • Monitor                                   –    SVGA

 

2.3.2 SOFTWARE REQUIREMENTS:

  • Operating System                   :           Windows XP or Win7
  • Front End                                :           Microsoft Visual Studio .NET           
  • Back End                                :           MS-SQL Server
  • Server                                      :           ASP .NET Web Server
  • Script                                       :           C# Script
  • Document                               :           MS-Office 2007


CHAPTER 3

3.0 SYSTEM DESIGN:

Data Flow Diagram / Use Case Diagram / Flow Diagram:

  • The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system
  • The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  • DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  • DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.

NOTATION:

SOURCE OR DESTINATION OF DATA:

External sources or destinations, which may be people or organizations or other entities

DATA SOURCE:

Here the data referenced by a process is stored and retrieved.

PROCESS:

People, procedures or devices that produce data’s in the physical component is not identified.

DATA FLOW:

Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.

MODELING RULES:

There are several common modeling rules when creating DFDs:

  1. All processes must have at least one data flow in and one data flow out.
  2. All processes should modify the incoming data, producing new forms of outgoing data.
  3. Each data store must be involved with at least one data flow.
  4. Each external entity must be involved with at least one data flow.
  5. A data flow must be attached to at least one process.


3.2 DATAFLOW DIAGRAM

UML DIAGRAMS:

3.2 USE CASE DIAGRAM:

3.3 CLASS DIAGRAM:

3.4 SEQUENCE DIAGRAM:

3.5 ACTIVITY DIAGRAM:

CHAPTER 4

4.0 IMPLEMENTATION:

FUZZY FINGERPRINT METHOD AND PROTOCOL

We describe technical details of our fuzzy fingerprint mechanism in this section. The DLD provider obtains digests of sensitive data from the data owner. The data owner uses a sliding window and Rabin fingerprint algorithm to generate short and hard to-reverse (i.e., one-way) digests through the fast polynomial modulus operation. The sliding window generates small fragments of the processed data (sensitive data or network traffic), which preserves the local features of the data and provides the noise tolerance property. Rabin fingerprints are computed as polynomial modulus operations, and can be implemented with fast XOR, shift, and table look-up operations.

The Rabin fingerprint algorithm has a unique min-wise independence property, which supports fast random fingerprints selection (in uniform distribution) for partial fingerprints disclosure. The shingle-and-fingerprint process is defined as follows. A sliding window is used to generate q-grams on an input binary string first. The fingerprints of q-grams are then computed. A shingle (q-gram) is a fixed-size sequence of contiguous bytes. For example, the 3-gram shingle set of string abcdefgh consists of six elements {abc, bcd, cde, def, efg, fgh}. Local feature preservation is accomplished through the use of shingles. Therefore, our approach can tolerate sensitive data modification to some extent, e.g., inserted tags, small amount of character substitution, and lightly reformatted data.

From the detection perspective, a straightforward method is for the DLD provider to raise an alert if any sensitive fingerprint matches the fingerprints from the traffic.1 However, this approach has a privacy issue. If there is a data leak, there is a match between two fingerprints from sensitive data and network traffic. Then, the DLD provider learns the corresponding shingle, as it knows the content of the packet. Therefore, the central challenge is to prevent the DLD provider from learning the sensitive values even in data-leak scenarios, while allowing the provider to carry out the traffic inspection.

We propose an efficient technique to address this problem. The main idea is to relax the comparison criteria by strategically introducing matching instances on the DLD provider’s side without increasing false alarms for the data owner. Specifically, i) the data owner perturbs the sensitive-data fingerprints before disclosing them to the DLD provider, and ii) the DLD provider detects leaking by a range-based comparison instead of the exact match. The range used in the comparison is pre-defined by the data owner and correlates to the perturbation procedure.

4.2 MODULES:

NETWORK SECURITY PRIVACY:

SECURITY GOAL AND THREAT MODEL:

PRIVACY GOAL AND THREAT MODEL:

PRIVACY-ENHANCING DLD:

EXPERIMENTAL EVALUATION

4.3 MODULE DESCRIPTION:

NETWORK SECURITY PRIVACY:

Network-accessible resources may be deployed in a network as surveillance and early-warning tools, as the detection of attackers are not normally accessed for legitimate purposes. Techniques used by the attackers that attempt to compromise these decoy resources are studied during and after an attack to keep an eye on new exploitation techniques. Such analysis may be used to further tighten security of the actual network being protected by the data’s. Data forwarding can also direct an attacker’s attention away from legitimate servers. A user encourages attackers to spend their time and energy on the decoy server while distracting their attention from the data on the real server. Similar to a server, a user is a network set up with intentional vulnerabilities. Its purpose is also to invite attacks so that the attacker’s methods can be studied and that information can be used to increase network security.

SECURITY GOAL AND THREAT MODEL:

We categorize three causes for sensitive data to appear on the outbound traffic of an organization, including the legitimate data use by the employees.

• Case I Inadvertent data leak: The sensitive data is accidentally leaked in the outbound traffic by a legitimate user. This paper focuses on detecting this type of accidental data leaks over supervised network channels. Inadvertent data leak may be due to human errors such as forgetting to use encryption, carelessly forwarding an internal email and attachments to outsiders or due to application flaws (such as described in a supervised network channel could be an unencrypted channel or an encrypted channel where the content in it can be extracted and checked by an authority. Such a channel is widely used for advanced NIDS where MITM (man-in-the-middle) SSL sessions are established instead of normal SSL sessions.

• Case II Malicious data leak: A rogue insider or a piece of stealthy software may steal sensitive personal or organizational data from a host. Because the malicious adversary can use strong private encryption, steganography or covert channels to disable content-based traffic inspection, this type of leaks is out of the scope of our network-based solution host-based defenses (such as detecting the infection onset need to be deployed instead.

• Case III Legitimate and intended data transfer: The sensitive data is sent by a legitimate user intended for legitimate purposes. In this paper, we assume that the data owner is aware of legitimate data transfers and permits such transfers. So the data owner can tell whether a piece of sensitive data in the network traffic is a leak using legitimate data transfer policies.

PRIVACY GOAL AND THREAT MODEL:

DLD provider from gaining knowledge of sensitive data during the detection process, we need to set up a privacy goal that is complementary to the security goal above. We model the DLD provider as a semi-honest adversary, who follows our protocol to carry out the operations, but may attempt to gain knowledge about the sensitive data of the data owner. Our privacy goal is defined as follows. The DLD provider is given digests of sensitive data from the data owner and the content of network traffic to be examined. The DLD provider should not find out the exact value of a piece of sensitive data with a probability greater than 1 K, where K is an integer representing the number of all possible sensitive-data candidates that can be inferred by the DLD provider. We present a privacy-preserving DLD model with a new fuzzy fingerprint mechanism to improve the data protection against semi-honest DLD provider. We generate digests of sensitive data through a one-way function, and then hide the sensitive values among other non-sensitive values via fuzzification. The privacy guarantee of such an approach is much higher than 1 K when there is no leak in traffic, because the adversary’s inference can only be gained through brute-force guesses. The traffic content is accessible by the DLD provider in plaintext. Therefore, in the event of a data leak, the DLD provider may learn sensitive information from the traffic, which is inevitable for all deep packet inspection approaches. Our solution confines the amount of maximal information learned during the detection and provides quantitative guarantee for data privacy.

PRIVACY-ENHANCING DLD:

Our privacy-preserving data-leak detection method supports practical data-leak detection as a service and minimizes the knowledge that a DLD provider may gain during the process. Fig. 1 lists the six operations executed by the data owner and the DLD provider in our protocol. They include PREPROCESS run by the data owner to prepare the digests of sensitive data, RELEASE for the data owner to send the digests to the DLD provider, MONITOR and DETECT for the DLD provider to collect outgoing traffic of the organization, compute digests of traffic content, and identify potential leaks, REPORT for the DLD provider to return data-leak alerts to the data owner where there may be false positives (i.e., false alarms), and POSTPROCESS for the data owner to pinpoint true data-leak instances. Details are presented in the next section. The protocol is based on strategically computing data similarity, specifically the quantitative similarity between the sensitive information and the observed network traffic. High similarity indicates potential data leak. For data-leak detection, the ability to tolerate a certain degree of data transformation in traffic is important. We refer to this property as noise tolerance.

Our key idea for fast and noise-tolerant comparison is the design and use of a set of local features that are representatives of local data patterns, e.g., when byte b2 appears in the sensitive data, it is usually surrounded by bytes b1 and b3 forming a local pattern b1, b2, b3. Local features preserve data patterns even when modifications (insertion, deletion, and substitution) are made to parts of the data. For example, if a byte b4 is inserted after b3, the local pattern b1, b2, b3 is retained though the global pattern (e.g., a hash of the entire document) is destroyed. To achieve the privacy goal, the data owner generates a special type of digests, which we call fuzzy fingerprints. Intuitively, the purpose of fuzzy fingerprints is to hide the true sensitive data in a crowd. It prevents the DLD provider from learning its exact value. We describe the technical details next.

EXPERIMENTAL EVALUATION:

Our data-leak detection solution can be outsourced and be deployed in a fuzzy fingerprint technique that enhances data privacy during data-leak detection operations. Our approach is based on a fast and practical one-way computation on the sensitive data (SSN records, classified documents, sensitive emails, etc.). It enables the data owner to securely delegate the content-inspection task to DLD providers without exposing the sensitive data. Using our detection method, the DLD provider, who is modeled as an honest-but-curious (aka semi-honest) adversary, can only gain limited knowledge about the sensitive data from either the released digests, or the content being inspected.

Using our techniques, an Internet service provider (ISP) can perform detection on its customers’ traffic securely and provide data-leak detection as an add-on service for its customers. Our fuzzy fingerprint framework in Python, including packet collection, shingling, Rabin fingerprinting, as well as partial disclosure and fingerprint filter extensions Rabin fingerprint is based on cyclic redundancy code (CRC). We use the padding scheme mentioned in to handle small inputs. In all experiments, the shingles are in 8-byte, and the fingerprints are in 32-bit (33-bit irreducible polynomials in Rabin fingerprint).

We set up a networking environment in VirtualBox, and make a scenario where the sensitive data is leaked from a local network to the Internet. Multiple users’ hosts (Windows 7) are put in the local network, which connect to the Internet via a gateway (Fedora). Multiple servers (HTTP, FTP, etc.) and an attacker-controlled host are put on the Internet side. The gateway dumps the network traffic and sends it to a DLD server/provider (Linux). Using the sensitive-data fingerprints defined by the users in the local network, the DLD server performs off-line data-leak detection.

CHAPTER 5

5.0 SYSTEM STUDY:

5.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company.  For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are      

  • ECONOMICAL FEASIBILITY
  • TECHNICAL FEASIBILITY
  • SOCIAL FEASIBILITY

5.1.1 ECONOMICAL FEASIBILITY:                  

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

5.1.2 TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.  

5.1.3 SOCIAL FEASIBILITY:  

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

5.2 SYSTEM TESTING:

Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months. This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs.

5.2.1 UNIT TESTING:

A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program  testing  checks  for  two  types  of  errors:  syntax  and  logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.

UNIT TESTING:

Description Expected result
Test for application window properties. All the properties of the windows are to be properly aligned and displayed.
Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.

5.1.3 FUNCTIONAL TESTING:

Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly.When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem.  The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems.

FUNCTIONAL TESTING:

Description Expected result
Test for all modules. All peers should communicate in the group.
Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.

5.1. 4 NON-FUNCTIONAL TESTING:

 The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non-functional testing includes:

  • Load testing
  • Performance testing
  • Usability testing
  • Reliability testing
  • Security testing


5.1.5 LOAD TESTING:

An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test.

Load Testing

Description Expected result
It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.

5.1.5 PERFORMANCE TESTING:

Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings.

PERFORMANCE TESTING:

Description Expected result
This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management.   Should handle large input values, and produce accurate result in a  expected time.  

5.1.6 RELIABILITY TESTING:

The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team.

RELIABILITY TESTING:

Description Expected result
This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of  the server an alternate server should take over the job.

5.1.7 SECURITY TESTING:

Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.

SECURITY TESTING:

  Description Expected result
Checking that the user identification is authenticated. In case failure it should not be connected in the framework.
Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.

5.1.7 WHITE BOX TESTING:

White  box  testing,  sometimes called  glass-box  testing is  a test  case  design method  that  uses  the  control  structure  of the procedural  design  to  derive  test  cases. Using  white  box  testing  method,  the software  engineer  can  derive  test  cases. The White box testing focuses on the inner structure of the software structure to be tested.

5.1.8 WHITE BOX TESTING:

Description Expected result
Exercise all logical decisions on their true and false sides. All the logical decisions must be valid.
Execute all loops at their boundaries and within their operational bounds. All the loops must be finite.
Exercise internal data structures to ensure their validity. All the data structures must be valid.

5.1.9 BLACK BOX TESTING:

Black box testing, also called behavioral testing, focuses on the functional requirements of the software.  That  is,  black  testing  enables  the software engineer  to  derive  sets  of  input  conditions  that  will  fully  exercise  all  functional requirements  for  a  program.  Black box testing is not alternative to white box techniques.  Rather  it  is  a  complementary  approach  that  is  likely  to  uncover  a different  class  of  errors  than  white box  methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results.

5.1.10 BLACK BOX TESTING:

Description Expected result
To check for incorrect or missing functions. All the functions must be valid.
To check for interface errors. The entire interface must function normally.
To check for errors in a data structures or external data base access. The database updation and retrieval must be done.
To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.

All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential.

CHAPTER 7

7.0 SOFTWARE SPECIFICATION:

7.1 FEATURES OF .NET:

Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET Framework is a language-neutral platform for writing programs that can easily and securely interoperate. There’s no language barrier with .NET: there are numerous languages available to the developer including Managed C++, C#, Visual Basic and Java Script.

The .NET framework provides the foundation for components to interact seamlessly, whether locally or remotely on different platforms. It standardizes common data types and communications protocols so that components created in different languages can easily interoperate.

“.NET” is also the collective name given to various software components built upon the .NET platform. These will be both products (Visual Studio.NET and Windows.NET Server, for instance) and services (like Passport, .NET My Services, and so on).

7.2 THE .NET FRAMEWORK

The .NET Framework has two main parts:

1. The Common Language Runtime (CLR).

2. A hierarchical set of class libraries.

The CLR is described as the “execution engine” of .NET. It provides the environment within which programs run. The most important features are

  • Conversion from a low-level assembler-style language, called Intermediate Language (IL), into code native to the platform being executed on.
  • Memory management, notably including garbage collection.
  • Checking and enforcing security restrictions on the running code.
  • Loading and executing programs, with version control and other such features.
  • The following features of the .NET framework are also worth description:

Managed Code

The code that targets .NET, and which contains certain extra Information – “metadata” – to describe itself. Whilst both managed and unmanaged code can run in the runtime, only managed code contains the information that allows the CLR to guarantee, for instance, safe execution and interoperability.

Managed Data

With Managed Code comes Managed Data. CLR provides memory allocation and Deal location facilities, and garbage collection. Some .NET languages use Managed Data by default, such as C#, Visual Basic.NET and JScript.NET, whereas others, namely C++, do not. Targeting CLR can, depending on the language you’re using, impose certain constraints on the features available. As with managed and unmanaged code, one can have both managed and unmanaged data in .NET applications – data that doesn’t get garbage collected but instead is looked after by unmanaged code.

Common Type System

The CLR uses something called the Common Type System (CTS) to strictly enforce type-safety. This ensures that all classes are compatible with each other, by describing types in a common way. CTS define how types work within the runtime, which enables types in one language to interoperate with types in another language, including cross-language exception handling. As well as ensuring that types are only used in appropriate ways, the runtime also ensures that code doesn’t attempt to access memory that hasn’t been allocated to it.

Common Language Specification

The CLR provides built-in support for language interoperability. To ensure that you can develop managed code that can be fully used by developers using any programming language, a set of language features and rules for using them called the Common Language Specification (CLS) has been defined. Components that follow these rules and expose only CLS features are considered CLS-compliant.

7.3 THE CLASS LIBRARY

.NET provides a single-rooted hierarchy of classes, containing over 7000 types. The root of the namespace is called System; this contains basic types like Byte, Double, Boolean, and String, as well as Object. All objects derive from System. Object. As well as objects, there are value types. Value types can be allocated on the stack, which can provide useful flexibility. There are also efficient means of converting value types to object types if and when necessary.

The set of classes is pretty comprehensive, providing collections, file, screen, and network I/O, threading, and so on, as well as XML and database connectivity.

The class library is subdivided into a number of sets (or namespaces), each providing distinct areas of functionality, with dependencies between the namespaces kept to a minimum.

7.4 LANGUAGES SUPPORTED BY .NET

The multi-language capability of the .NET Framework and Visual Studio .NET enables developers to use their existing programming skills to build all types of applications and XML Web services. The .NET framework supports new versions of Microsoft’s old favorites Visual Basic and C++ (as VB.NET and Managed C++), but there are also a number of new additions to the family.

Visual Basic .NET has been updated to include many new and improved language features that make it a powerful object-oriented programming language. These features include inheritance, interfaces, and overloading, among others. Visual Basic also now supports structured exception handling, custom attributes and also supports multi-threading.

Visual Basic .NET is also CLS compliant, which means that any CLS-compliant language can use the classes, objects, and components you create in Visual Basic .NET.

Managed Extensions for C++ and attributed programming are just some of the enhancements made to the C++ language. Managed Extensions simplify the task of migrating existing C++ applications to the new .NET Framework.

C# is Microsoft’s new language. It’s a C-style language that is essentially “C++ for Rapid Application Development”. Unlike other languages, its specification is just the grammar of the language. It has no standard library of its own, and instead has been designed with the intention of using the .NET libraries as its own.

Microsoft Visual J# .NET provides the easiest transition for Java-language developers into the world of XML Web Services and dramatically improves the interoperability of Java-language programs with existing software written in a variety of other programming languages.

Active State has created Visual Perl and Visual Python, which enable .NET-aware applications to be built in either Perl or Python. Both products can be integrated into the Visual Studio .NET environment. Visual Perl includes support for Active State’s Perl Dev Kit.

Other languages for which .NET compilers are available include

  • FORTRAN
  • COBOL
  • Eiffel          
            ASP.NET  XML WEB SERVICES    Windows Forms
                         Base Class Libraries
                   Common Language Runtime
                           Operating System

Fig1 .Net Framework

C#.NET is also compliant with CLS (Common Language Specification) and supports structured exception handling. CLS is set of rules and constructs that are supported by the CLR (Common Language Runtime). CLR is the runtime environment provided by the .NET Framework; it manages the execution of the code and also makes the development process easier by providing services.

C#.NET is a CLS-compliant language. Any objects, classes, or components that created in C#.NET can be used in any other CLS-compliant language. In addition, we can use objects, classes, and components created in other CLS-compliant languages in C#.NET .The use of CLS ensures complete interoperability among applications, regardless of the languages used to create the application.

CONSTRUCTORS AND DESTRUCTORS:

Constructors are used to initialize objects, whereas destructors are used to destroy them. In other words, destructors are used to release the resources allocated to the object. In C#.NET the sub finalize procedure is available. The sub finalize procedure is used to complete the tasks that must be performed when an object is destroyed. The sub finalize procedure is called automatically when an object is destroyed. In addition, the sub finalize procedure can be called only from the class it belongs to or from derived classes.

GARBAGE COLLECTION

Garbage Collection is another new feature in C#.NET. The .NET Framework monitors allocated resources, such as objects and variables. In addition, the .NET Framework automatically releases memory for reuse by destroying objects that are no longer in use.

In C#.NET, the garbage collector checks for the objects that are not currently in use by applications. When the garbage collector comes across an object that is marked for garbage collection, it releases the memory occupied by the object.

OVERLOADING

Overloading is another feature in C#. Overloading enables us to define multiple procedures with the same name, where each procedure has a different set of arguments. Besides using overloading for procedures, we can use it for constructors and properties in a class.

MULTITHREADING:

C#.NET also supports multithreading. An application that supports multithreading can handle multiple tasks simultaneously, we can use multithreading to decrease the time taken by an application to respond to user interaction.

STRUCTURED EXCEPTION HANDLING

C#.NET supports structured handling, which enables us to detect and remove errors at runtime. In C#.NET, we need to use Try…Catch…Finally statements to create exception handlers. Using Try…Catch…Finally statements, we can create robust and effective exception handlers to improve the performance of our application.

7.5 THE .NET FRAMEWORK

The .NET Framework is a new computing platform that simplifies application development in the highly distributed environment of the Internet.

OBJECTIVES OF .NET FRAMEWORK

1. To provide a consistent object-oriented programming environment whether object codes is stored and executed locally on Internet-distributed, or executed remotely.

2. To provide a code-execution environment to minimizes software deployment and guarantees safe execution of code.

3. Eliminates the performance problems.         

There are different types of application, such as Windows-based applications and Web-based applications. 

7.6 FEATURES OF SQL-SERVER

The OLAP Services feature available in SQL Server version 7.0 is now called SQL Server 2000 Analysis Services. The term OLAP Services has been replaced with the term Analysis Services. Analysis Services also includes a new data mining component. The Repository component available in SQL Server version 7.0 is now called Microsoft SQL Server 2000 Meta Data Services. References to the component now use the term Meta Data Services. The term repository is used only in reference to the repository engine within Meta Data Services

SQL-SERVER database consist of six type of objects,

They are,

1. TABLE

2. QUERY

3. FORM

4. REPORT

5. MACRO

7.7 TABLE:

A database is a collection of data about a specific topic.

VIEWS OF TABLE:

We can work with a table in two types,

1. Design View

2. Datasheet View

Design View

          To build or modify the structure of a table we work in the table design view. We can specify what kind of data will be hold.

Datasheet View

To add, edit or analyses the data itself we work in tables datasheet view mode.

QUERY:

A query is a question that has to be asked the data. Access gathers data that answers the question from one or more table. The data that make up the answer is either dynaset (if you edit it) or a snapshot (it cannot be edited).Each time we run query, we get latest information in the dynaset. Access either displays the dynaset or snapshot for us to view or perform an action on it, such as deleting or updating.

CHAPTER 7

APPENDIX

7.1 SAMPLE SOURCE CODE

7.2 SAMPLE OUTPUT

CHAPTER 8

8.1 CONCLUSION

Our fuzzy fingerprint method differs from these solutions and enables its adopter to provide dataleak detection as a service. The customer or data owner does not need to fully trust the DLD provider using our approach. Bloom filter is a space-saving data structure for set membership test, and it is used in network security from network layer in the fuzzy Bloom filter invented in constructs a special Bloom filter that probabilistically sets the corresponding filter bits to 1’s. We designed to support a resource-sufficient routing scheme; it is a potential privacy-preserving technique. We do not invent a variant of Bloom filter for our fuzzy fingerprint, and our fuzzification process is separate from membership test. The advantage of separating fingerprint fuzzification from membership test is that it is flexible to test whether the fingerprint is sensitive with or without fuzzification

Our fuzzy fingerprint solution for data-leak detection, there are other privacy-preserving techniques invented for specific processes, e.g., DATA matching or for general purpose use, e.g., secure multi-party computation (SMC). SMC is a cryptographic mechanism, which supports a wide range of fundamental arithmetic, set, and string operations as well as complex functions such as knapsack computation, automated trouble-shooting, network event statistics, private information retrieval, genomic computation, private database query, private join operations and distributed data mining. The provable privacy guarantees offered by SMC comes at a cost in terms of computational complexity and realization difficulty. The advantage of our approach is its concision and efficiency.

8.2 FUTURE ENHANCEMENT:

We proposed fuzzy fingerprint, a privacy-preserving data-leak detection model and present its realization. Using special digests, the exposure of the sensitive data is kept to a minimum during the detection. We have conducted extensive experiments to validate the accuracy, privacy, and efficiency of our solutions. For future work, we plan to focus on designing a host-assisted mechanism for the complete data-leak detection for large-scale organizations.