Improving Web Navigation Usability by Comparing Actual and Anticipated Usage

We present a new method to identify navigation related Web usability problems based on comparing actual and anticipated usage patterns. The actual usage patterns can be extracted from Web server logs routinely recorded for operational websites by first processing the log data to identify users, user sessions, and user task-oriented transactions, and then applying a usage mining algorithm to discover patterns among actual usage paths. The anticipated usage, including information about both the path and time required for user-oriented tasks, is captured by our ideal user interactive path models constructed by cognitive experts based on their cognition of user behavior.

The comparison is performed via the mechanism of test MY SQL for checking results and identifying user navigation difficulties. The deviation data produced from this comparison can help us discover usability issues and suggest corrective actions to improve usability. A software tool was developed to automate a significant part of the activities involved. With an experiment on a small service-oriented website, we identified usability problems, which were cross-validated by domain experts, and quantified usability improvement by the higher task success rate and lower time and effort for given tasks after suggested corrections were implemented. This case study provides an initial validation of the applicability and effectiveness of our method.


As the World Wide Web becomes prevalent today, building and ensuring easy-to-use Web systems is becoming a core competency for business survival. Usability is defined as the effectiveness, efficiency, and satisfaction with which specific users can complete specific tasks in a particular environment. Three basic Web design principles, i.e., structural firmness, functional convenience, and presentational delight, were identified to help improve users’ online experience. Structural firmness relates primarily to the characteristics that influence the website security and performance. Functional convenience refers to the availability of convenient characteristics, such as a site’s ease of use and ease of navigation, that help users’ interaction with the interface. Presentational delight refers to the website characteristics that stimulate users’ senses. Usability engineering provides methods for measuring usability and for addressing usability issues. Heuristic evaluation by experts and user-centered testing are typically used to identify usability issues and to ensure satisfactory usability.

However, significant challenges exist, including 1) accuracy of problem identification due to false alarms common in expert evaluation 2) unrealistic evaluation of usability due to differences between the testing environment and the actual usage environment, and 3) increased cost due to the prolonged evolution and maintenance cycles typical for many Web applications. On the other hand, log data routinely kept at Web servers represent actual usage. Such data have been used for usage-based testing and quality assurance and also for understanding user behavior and guiding user interface design.

Server-side logs can be automatically generated by Web servers, with each entry corresponding to a user request. By analyzing these logs, Web workload was characterized and used to suggest performance enhancements for Internet Web servers. Because of the vastly uneven Web traffic, massive user population, and diverse usage environment, coverage-based testing is insufficient to ensure the quality of Web applications. Therefore, server-side logs have been used to construct Web usage models for usage-based Web testing or to automatically generate test cases accordingly to improve test efficiency.



Usability evaluation of Web sites is still a difficult and time-consuming task, often performed manually. This paper presents a tool that supports remote usability evaluation of Web sites. The tool considers client-side data on user interactions and JavaScript events. In addition, it allows the definition of custom events, giving evaluators the flexibility to add specific events to be detected and considered in the evaluation. The tool supports evaluation of any Web site by exploiting a proxy-based architecture and enables the evaluator to perform a comparison between actual user behavior and an optimal sequence of actions.


We present a new method and tool for activity modelling through qualitative sequential data analysis. In particular, we address the question of constructing a symbolic abstract representation of an activity from an activity trace. We use knowledge engineering techniques to help the analyst build ontology of the activity, that is, a set of symbols and hierarchical semantics that supports the construction of activity models. The ontology construction is pragmatic, evolutionist and driven by the analyst in accordance with their modelling goals and their research questions. Our tool helps the analyst define transformation rules to process the raw trace into abstract traces based on the ontology. The analyst visualizes the abstract traces and iteratively tests the ontology, the transformation rules and the visualization format to confirm the models of activity. With this tool and this method, we found innovative ways to represent a car-driving activity at different levels of abstraction from activity traces collected from an instrumented vehicle. As examples, we report two new strategies of lane changing on motorways that we have found and modelled with this approach.


The dissemination of Web applications is extensive and still growing. The great penetration of Web sites raises a number of challenges for usability evaluators. Video-based analysis can be rather expensive and may provide limited results. In this article, we discuss what information can be provided by automatic tools able to process the information contained in browser logs and task models. To this end, we present a tool that can be used to compare log files of user behavior with the task model representing the actual Web site design, in order to identify where users’ interactions deviate from those envisioned by the system design.




Previous studies usability has long been addressed and discussed, when people navigate the Web they often encounter a number of usability issues. This is also due to the fact that Web surfers often decide on the spur of the moment what to do and whether to continue to navigate in a Web site. Usability evaluation is thus an important phase in the deployment of Web applications. For this purpose automatic tools are very useful to gather larger amount of usability data and support their analysis.

Remote evaluation implies that users and evaluators are separated in time and/or space. This is important in order to analyse users in their daily environments and decreases the costs of the evaluation without requiring the use of specific laboratories and asking the users to move. In addition, tools for remote Web usability evaluation should be sufficiently general so that they can be used to analyse user behaviour even when using various browsers or applications developed using different toolkits. We prefer logging on the client-side in order to be able to capture any user-generated events, which can provide useful hints regarding possible usability problems.

Existing approaches have been used to support usability evaluation. An example was WebRemUsine, which was a tool for remote usability evaluation of Web applications through browser logs and task models. Propp and Frorbrig have used task models for supporting usability evaluation of a different type of application: cooperative behaviour of people interacting in smart environments. A different use of models is in the authors discuss how task models can enhance visualization of the usability test log. In our case we do not require the effort of developing models to apply our tool. We only require that the designer provides an example of optimal use associated with each of the relevant tasks. The tool will then compare the logs with the actual use with the optimal log in order to identify deviations, which may indicate potential usability problems.


Web navigate used a logger to collect data from a user session test on a Web interface prototype running on a PDA simulator in order to evaluate different types of Web navigation tools and identify the best one for small display devices.

Users were asked to find the answer to specific questions using different types of navigation tools to move from one page to another. A database was used to store users’ actions, but they logged only the answer given by the user to each specific question. Moreover they stored separately every term searched by the user by means of the internal search tool.

Client-side data encounters different challenges regarding the identification of the elements that users are interacting with, how to manage element identification when the page is changed dynamically, how to manage data logging when users are going from one page to another, amongst others. The following are some of the solutions we adopted in order to deal with these issues.


We propose a new method to identify navigation related usability problems by comparing Web usage patterns extracted from server logs against anticipated usage represented in some cognitive user models (RQ2). Fig. 1 shows the architecture of our method. It includes three major modules: Usage Pattern Extraction, IUIP Modeling, and Usability Problem Identification. First, we extract actual navigation paths from server logs and discover patterns for some typical events. In parallel, we construct IUIP models for the same events. IUIP models are based on the cognition of user behavior and can represent anticipated paths for specific user-oriented tasks.

Our IUIP models are based on the cognitive models surveyed in Section II, particularly the ACT-R model. Due to the complexity of ACT-R model development and the low-level rule based programming language it relies on we constructed our own cognitive architecture and supporting tool based on the ideas from ACT-R. In general, the user behavior patterns can be traced with a sequence of states and transitions. Our IUIP consists of a number of states and transitions. For a particular goal, a sequence of related operation rules can be specified for a series of transitions. Our IUIP model specifies both the path and the benchmark interactive time (no more than a maximum time) for some specific states (pages). The benchmark time can first be specified based on general rules for common types of Web pages. Humans usually try to complete their tasks in the most efficient manner by attempting to maximize their returns while minimizing the cost.

Typically, experts and novices will have different task performance. Novices need to learn task specific knowledge while performing the task, but experts can complete the task in the most efficient manner. Based on this cognitive mechanism, IUIP models our method is cost-effective. It would be particularly valuable in the two common situations, where an adequate number of actual users cannot be involved in testing and cognitive experts are in short supply. Server logs in our method represent real users’ operations in natural working conditions, and our IUIP models injected with human behavior cognition represent part of cognitive experts’ work. We are currently integrating these modeling and analysis tools into a tool suite that supports measurement, analysis, and overall quality improvement for Web applications.


1) Logical deviation calculation:

a) When the path choice anticipated by the IUIP model is available but not selected, a single deviation is counted.

b) Sum up all the above deviations over all the selected user transactions for each page.

2) Temporal deviation calculation:

a) When a user spends more time at a specific page than the benchmark specified for the corresponding state in the IUIP model, a single deviation is counted.

b) Sum up all the above deviations over all the selected user transactions for each page.

The successive pages related to furniture categories are grouped into a dashed box. The pages with deviations and the unanticipated follow up pages below them are marked with solid rectangular boxes. Those unanticipated follow up pages will not be used themselves for deviation calculations to avoid double counting.



Data Flow Diagram / Use Case Diagram / Flow Diagram:

  • The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system
  • The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.
  • DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.
  • DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.



External sources or destinations, which may be people or organizations or other entities


Here the data referenced by a process is stored and retrieved.


People, procedures or devices that produce data’s in the physical component is not identified.


Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data.


There are several common modeling rules when creating DFDs:

  1. All processes must have at least one data flow in and one data flow out.
  2. All processes should modify the incoming data, producing new forms of outgoing data.
  3. Each data store must be involved with at least one data flow.
  4. Each external entity must be involved with at least one data flow.
  5. A data flow must be attached to at least one process.











Our IUIP model specifies both the path and the benchmark interactive time (no more than a maximum time) for some specific states (pages). The benchmark time can first be specified based on general rules for common types of Web pages. For example, human factors guidelines specify the upper bound for the response time to mitigate the risk that users will lose interest in a website. Humans usually try to complete their tasks in the most efficient manner by attempting to maximize their returns while minimizing the cost, experts and novices will have different task performance. Novices need to learn task specific knowledge while performing the task, but experts can complete the task in the most efficient manner on this cognitive mechanism, IUIP models need to be constructed individually for novices and experts by cognitive experts by utilizing their domain expertise and their knowledge of different users' interactive behavior.

We can adapt the durations by performing iterative tests with different users Diagrammatic notation methods and tools are often used to support interaction modeling and task performance evaluation IUIP model construction and reuse, we used C++ and XML to develop our IUIP modeling tool based on the open-source visual diagram software DIA. DIA allows users to draw customized diagrams, such as UML, data flow, and other diagrams. Existing shapes and lines in DIA form part of the graphic notations in our IUIP models. New ones can be easily added by writing simple XML files. The operations, operation rules, and computation rules can be embedded into the graphic notations with XML schema we defined to form our IUIP symbols. Currently, about 20 IUIP symbols have been created to represent typical Web interactions. IUIP symbols used in subsequent examples are explained at the bottom of cognitive experts can use our IUIP modeling tool to develop various IUIP models for different Web applications.

The actual users’ navigation trails we extracted from the aggregated trail tree are compared against corresponding IUIP models automatically. This comparison will yield a set of deviations between the two. We can identify some common problems of actual users’ interaction with the Web application by focusing on deviations that occur frequently. Combined with expertise in product internal and contextual information, our results can also help identify the root causes of some usability problems existing in the Web design. Based on logical choices made and time spent by users at each page, the calculation of deviations between actual users’ usage patterns and IUIP can be divided into two parts:

The IUIP model for the task “First Selection” is shown on the top. The corresponding user Trail 7, a part of a trail tree extracted from log data, is presented under it. The node in the tree is annotated with the number of users having reached the node across the same trail prefix. The successive pages related to furniture categories are grouped into a dashed box. The pages with deviations and the unanticipated follow up pages below them are marked with solid rectangular boxes. Those unanticipated followup pages will not be used themselves for deviation calculations to avoid double counting.



The transactions identified from each user session form a collection of paths use the trie data structure to merge the paths along common prefixes. A trie, or a prefix tree, is an ordered tree used to store an associative array where the keys are usually strings. All the descendants of a node have a common prefix of the string associated with that node. The root is associated with the empty string. We adapted the trie algorithm to construct a tree structure that also captures user visit frequencies, which is called a trail tree in our work. In a trail tree, a complete path from the root to a leaf node is called a trail.

The leaf nodes of the trail tree are also annotated with the trail names. The transaction paths extracted from the Web server log are shown in the table to its left, together with path occurrence frequencies. Paths 1, 4, and 5 have the common first node a; therefore, they were merged together. For the second node of this subtree, Paths 1 and 4 both accessed Page b; therefore, the two paths were combined at Node b. Finally, Paths 1 and 4 were merged into a single trail, Trail 1, although Path 1 terminates at Node e. By the same method, the other paths can be integrated into the trail tree. The number at each edge indicates the number of users reaching the next node across the same trail prefix.

Based on the aggregated trail tree, further mining can be performed for some “interesting” pattern discovery. Typically, good mining results require a close interaction of the human experts to specify the characteristics that make navigation patterns interesting. In our method, we focus on the paths which are used by a sufficient number of users to finish a specific task. The paths can be initially prioritized by their usage frequencies and selected by using a threshold specified by the experts. Application-domain knowledge and contextual information, such as criticality of specific tasks, user privileges, etc., can also be used to identified “interesting” patterns. For the FG 2009 website, we extracted 30 trails each for Tasks 1, 2, and 3, and 5 trails for Task 4.








User Models is a growing need to incorporate insights from cognitive science about the mechanisms, strengths, and limits of human perception and cognition to understand the human factors involved in user interface design in the various constraints on cognition (e.g., system complexity) and the mechanisms and patterns of strategy selection can help human factor engineers develop solutions and apply technologies that are better suited to human abilities.

Commonly used cognitive models include GOMS, EPIC, and ACT-R. The GOMS model consists of Goals, Operators, Methods, and Selection rules. As the high-level architecture, GOMS describes behavior and defines interactions as a static sequence of human actions. As the low-level cognitive architecture, EPIC (Executive-Process/Interactive Control) and ACT-R (Adaptive Control of Thought-Rational) can be taken as the specific implementation of the high-level architecture.

They provide detailed information about how to simulate human processing and cognition important feature of these low-level cognitive architectures is that they are all implemented as computer programming systems so that cognitive models may be specified, executed, and their outputs (e.g., error rates and response latencies) compared with human performance data.


Server logs have also been used by organizations to learn about the usability of their products. For example, search queries can be extracted from server logs to discover user information needs for usability task analysis. There are many advantages to using server logs for usability studies. Logs can provide insight into real users performing actual tasks in natural working conditions versus in an artificial setting of a lab. Logs also represent the activities of many users over a long period of time versus the small sample of users in a short time span in typical lab testing. Data preparation techniques and algorithms can be used to process the raw Web server logs, and then mining can be performed to discover users’ visitation patterns for further usability analysis.

For example, organizations can mine server-side logs to predict users’ behavior and context to satisfy users’ revisitiation patterns can be discovered by mining server logs to develop guidelines for browser history mechanism that can be used to reduce users’ cognitive and physical effort Client-side logs can capture accurate comprehensive usage data for usability analysis, because they allow low-level user interaction events such as keystrokes and mouse movements to be recorded.

For example, using these client-side data, the evaluator can accurately measure time spent on particular tasks or pages as well as study the use of “back” button and user click streams. Such data are often used with task based approaches and models for usability analysis by comparing discrepancies between the designer’s anticipation and a user’s actual behavior. However, the evaluator must program the UI, modify Web pages, or use an instrumented browser with plug-in tools or a special proxy server to collect such data.


Web server logs are our data source. Each entry in a log contains the IP address of the originating host, the timestamp, the requested Web page, the referrer, the user agent and other data. Typically, the raw data need to be preprocessed and converted into user sessions and transactions to extract usage patterns.

The data preparation and preprocessing include the following domain-dependent tasks.

1) Data cleaning: This task is usually site-specific and involves removing extraneous references to style files, graphics, or sound files that may not be important for the purpose of our analysis.

2) User identification: The remaining entries are grouped by individual users. Because no user authentication and cookie information is available in most server logs, we used the combination of IP, user agent, and referrer fields to identify unique users.

3) User session identification: The activity record of each user is segmented into sessions, with each representing a single visit to a site. Without additional authentication information from users and without the mechanisms such as embedded session IDs, one must rely on heuristics for session identification. For example, we set an elapse time of 15 min between two successive page accesses as a threshold to partition a user activity record into different sessions.

4) Path completion: Client or proxy side caching can often result in missing access references to some pages that have been cached. These missing references can often be heuristically inferred from the knowledge of site topology and referrer information, along with temporal information from server logs.

These tasks are time consuming and computationally intensive, but essential to the successful discovery of usage patterns.

We developed a tool to automate all these tasks except part of path completion. For path completion, the designers or developers first need to manually discover the rules of missing references based on site structure, referrer, and other heuristic information. Once the repeated patterns are identified, this work can be automatically carried out. Our tool can work with server logs of different Web applications by modifying the related parameters in the configuration file. The processed log data are stored into a database for further use.


Our specific results from applying our method to the FG 2009 website we collected Web server access log data for the first three days after its deployment. The server log includes about above 500 entries. After preprocessing the raw log data using our tool, we identified 58 unique users and 81 sessions. Then, we constructed four event models for four typical tasks. We extracted 95 trails for these tasks. Meanwhile, a designer with three-year GUI design experience and an expert with five-year experience with human factors practice for the Web constructed four IUIP models for the same tasks based on their cognition of users’ interactive behavior. By checking the extracted usage patterns against the four IUIP models, we obtained logical and temporal deviations shown in Tables I and II and identified 17 usability issues or potential usability problems. Some usability issues were identified by both logical and temporal deviation analyses. Next, we further analyze these deviations for usability problem identification and improvement.

In Table I, 16 deviations took place in the page “index.php.” The unanticipated followup page is the page “login.php,” followed by the page “index.php?f=t” (login failure). Further reviewing the index page, we found that the page design is too simplistic: No instruction was provided to help users to login or register. We inferred that some users with limited online shopping experience were trying to use their regular email addresses and passwords to log in to the FG 2009 website.

We also found some structure design issues. For example, we observed that some users repeatedly visited the page “Selection Rules.” It is likely that when the users were not permitted to select any furniture in some categories (the FG website limited each user to select one piece of furniture under each category), they had to go to the page “Selection Rules” to find the reasons. To reduce these redundant operations and improve user experience, the help function for selection rules should be redesigned to make it more convenient for users to consult.




We have developed a new method for the identification and improvement of navigation-related Web usability problems by checking extracted usage patterns against cognitive user models. As demonstrated by our case study, our method can identify areas with usability issues to help improve the usability of Web systems. Once a website is operational, our method can be continuously applied and drive ongoing refinements. In contrast with traditional software products and systems, Web based applications have shortened development cycles and prolonged maintenance cycles. Our method can contribute significantly to continuous usability improvement over these prolonged maintenance cycles. The usability improvement in successive iterations can be quantified by the progressively better effectiveness (higher task completion rate) and efficiency (less time for given tasks).

Our method is not intended to and cannot replace heuristic usability evaluation by experts and user-centered usability testing. It complements these traditional usability practices and can be incorporated into an integrated strategy for Web usability assurance. With automated tool support for a significant part of the activities involved, our method is cost-effective. It would be particularly valuable in the two common situations, where an adequate number of actual users cannot be involved in testing and cognitive experts are in short supply. Server logs in our method represent real users’ operations in natural working conditions, and our IUIP models injected with human behavior cognition represent part of cognitive experts’ work. We are currently integrating these modeling and analysis tools into a tool suite that supports measurement, analysis, and overall quality improvement for Web applications.

8.2 FUTURE ENHANCEMENT: In the future, we should and must carry out validation studies with large-scale Web applications. We also plan to explore additional approaches to discover Web usage patterns and related usability problems generalizable to other interesting domains. For example, we have already started exploring deviation calculation and analysis at the trail level instead of at the individual page level. Such analyses might be more meaningful and yield more interesting results for Web applications with complex structure and operation sequences. Our IUIP modeling architecture and supporting tools also need to be further enhanced and optimized for more complex tasks. We will also further expand our usability research to cover more usability aspects to improve Web users’ overall satisfaction.