Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005, Web Information Systems Engineering and Internet Technologies Book Series
] conducted a comprehensive workload characterization study of Internet Web servers. By analyzing access logs from 6 Web sites (3 academic, 2 research, and 1 industrial) in 1994 and 1995, the authors identified 10 invariants: workload characteristics common to all the sites that are likely to persist over time. In this present work, we revisit the 1996 work by Arlitt and Williamson, repeating many of the same analyses on new data sets collected in 2004. In particular, we study access logs from the same 3 academic sites used in the 1996 paper. Despite a 30-fold increase in overall traffic volume from 1994 to 2004, our main conclusion is that there are no dramatic changes in Web server workload characteristics in the last 10 years. Although there have been many changes in Web technologies (e.g., new protocols, scripting languages, caching infrastructures), most of the 1996 invariants still hold true today. We postulate that these invariants will continue to hold in the future, because they represent fundamental characteristics of how humans organize, store, and access information on the Web.
International Journal of Computer Applications, 2011
Use of the Internet and the World Wide Web has improved swiftly over the past few years. Personals deploying Web servers crave to appreciate how their servers are being used by Internet users, how those patterns of use are changing over time, and what steps they should take to ensure adequate server response to the incoming requests today and in the future. This requires an evaluation of the requests offered to the Web server and the characteristics of the server's response to those requests over a suitably long time interval. In this paper we present the results of a study of Web server system of http://www.lawetalnews.com for a six month period. In this duration the traffic to the website raised drastically in terms of incoming requests and outgoing bytes. We study the request and response types, and portray the traffic distribution on the basis of request size, response time, as well as some other factors. We wind up with system performance recommendations and identify future directions for our Web research.
1999
Abstract Performance analysis and capacity planning for e-commerce sites poses an interesting problem: how to best characterize the workload of these sites. Tradition al workload characterization methods, based on hits/set, page views/set, or visits/set, are not appropriate for e-commerce sites. In these environments, customers interact with the site through a series of consecutive and related requests, called sessions. Different navigational patterns can be observed for different groups of customers.
2007
Abstract In this paper we present clustering analysis of session-based Web workloads of eight Web servers using the intrasession characteristics (ie, number of requests per session, session length in time, and bytes transferred per session) as variables. We use K-means algorithm and the Mahalanobis distance, and analyze the heavy-tailed behavior of intra-session characteristics and their correlations for each cluster.
Sigmetrics Performance Evaluation Review, 1998
One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server le size distribution 2) request size distribution 3) relative le popularity 4) embedded le references 5) temporal locality of reference and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that Surge exercises servers in a manner signi cantly di erent from other Web server benchmarks.
IEEE Communications Surveys & Tutorials, 2018
World Wide Web, 1999
In this paper we develop a general methodology for characterizing the access patterns of Web server requests based on a time-series analysis of finite collections of observed data from real systems. Our approach is used together with the access logs from the IBM Web site for the Olympic Games to demonstrate some of its advantages over previous methods and to construct a particular class of benchmarks for large-scale heavily-accessed Web server environments.
1999
Abstract Understanding the nature of the workloads and system demands created by users of the World Wide Web is crucial to properly designing and provisioning Web services. Previous measurements of Web client workloads have been shown to exhibit a number of characteristic features; however, it is not clear how those features may be changing with time. In this study we compare two measurements of Web client workloads separated in time by three years, both captured from the same computing facility at Boston University.
2011 IEEE International Symposium on Workload Characterization (IISWC), 2011
Search is the most heavily used web application in the world and is still growing at an extraordinary rate. Understanding the behaviors of web search engines, therefore, is becoming increasingly important to the design and deployment of data center systems hosting search engines. In this paper, we study three search query traces collected from real world web search engines in three different search service providers. The first part of our study is to uncover the patterns hidden in the query traces by analyzing the variations, frequencies, and locality of query requests. Our analysis reveals that, contradicted to some previous studies, real-world query traces do not follow well-defined probability models, such as Poisson distribution and log-normal distribution. The second part of our study is to deploy the real query traces and three synthetic traces generated using probability models proposed by other researchers on a Nutch based search engine. The measured performance data from the deployments further confirm that synthetic traces do not accurately reflect the real traces. We develop an evaluation tool that can collect performance metrics on-line with negligible overhead. The performance metrics include average response time, CPU utilization, Disk accesses, and cycles-per-instructions, etc. The third of our study is to compare the search engine with representative benchmarks , namely Gridmix, SPECweb2005, TPC-C, SPECCPU2006, and HPCC, with respect to basic architecture-level characteristics and performance metrics, such as instruction mix, processor pipeline stall breakdown, memory access latency, and disk accesses. The experimental results show that web search engines have a high percentage of load/store instructions, but have good cache/memory performance. We hope those results presented in this paper will enable system designers to gain insights on optimizing systems hosting search engines.
Lecture Notes in Computer Science, 2001
This paper describes techniques for improving performance at Web sites which receive signi cant tra c. Poor performance can be caused by dynamic data, insu cient n e t work bandwidth, and poor Web page design. Dynamic data overheads can often be reduced by c a c hing dynamic pages and using fast interfaces to invoke s e r v er programs. Web server acceleration can signi cantly improve performance and reduce the hardware needed at a Web site. We discuss techniques for balancing load among multiple serve r s a t a W eb site. We a l s o s h o w h o w W eb pages can be designed to minimize tra c to the site.
Performance Evaluation, 2008
Managing the resources in a large Web serving system requires knowledge of the resource needs for service requests of various types. In order to investigate the properties of Web traffic and its demand, we collected measurements of throughput and CPU utilization and performed some data analyses. First, we present our findings in relation to the time-varying nature of the traffic, the skewness of traffic intensity among the various types of requests, the correlation among traffic streams, and other system-related phenomena. Then, given such nature of web traffic, we devise and implement an on-line method for the dynamic estimation of CPU demand.
Computer Networks, 2009
Web servers are required to perform millions of transaction requests per day at an acceptable Quality of Service (QoS) level in terms of client response time and server throughput. Consequently, a thorough understanding of the performance capabilities and limitations of web servers is critical. Finding a simple web traffic model described by a reasonable number of parameters that enables powerful analysis methods and provides accurate results has been a challenging problem during the last few decades. This paper proposes a discrete statistical description of web traffic that is based on histograms. In order to reflect the second-order statistics (long-range dependence and selfsimilarity) of the workload, this basic model has been extended using the Hurst parameter. Then, a system performance model-based on histogram operators (histogram calculus) is introduced. The proposed model has been evaluated using real workload traces using a single-site server model. These evaluations show that the model is accurate and improves the results of classic queueing models. The model provides an excellent basis for a decision support tool to predict the behavior of web servers.
Network Research Workshop, 18th Asian Pacific …, 2004
High Performance Networking, 1997
Server performance has become a crucial issue for improving the overall performance of the WorldWide Web. This paper describes Webmonitor, a tool for evaluating and understanding server performance, and presents new results for a realistic workload. Webmonitor measures activity and resource c onsumption, both within the kernel and in HTTP processes running in user space. Webmonitor is implemented using an e cient combination of sampling and event-driven techniques that exhibit low overhead. Our initial implementation is for the Apache WorldWide Web server running on the Linux operating system. We demonstrate the utility of Webmonitor by measuring and understanding the performance of a Pentium-based PC acting as a dedicated WWW server. Our workload uses a le size distribution with a h e avy tail. This captures the fact that Web servers must concurrently handle some requests for large audio and video les, and a large number of requests for small documents, containing text or images. Our results show that in a Web server saturated by client requests, over 90% of the time spent handling HTTP requests is spent in the kernel. Furthermore, keeping TCP connections open, as required b y TCP, causes a factor of 2-9 increase in the elapsed time required to service a n H T T P r equest. Data gathered f r om Webmonitor provide insight into the causes of this performance penalty. Speci cally, we observe a signi cant increase in resource consumption along three dimensions: the number of HTTP processes running at the same time, CPU utilization, and memory utilization. These results emphasize the important role of operating system and network protocol implementation in determining Web server performance.
Proceedings of the 10th IEEE International …
This paper uses trace-driven simulation and synthetic Web workloads to study the request arrival process at each level of a simple Web proxy caching hierarchy. The simula- tion results show that a Web cache reduces both the peak and the mean request arrival rate for Web traffic ...
This dissertation deals with monitoring, collecting, analyzing, and modeling of World Wide Web (WWW) traffic and client interactions. The rapid growth of WWW usage has not been accompanied by an overall understanding of models of information resources and their deployment strategies. Consequently, the current Web architecture often faces performance and reliability problems. Scalability, latency, bandwidth, and disconnected operations are some of the important issues that should be considered when attempting to adjust for the growth in Web usage. The WWW Consortium launched an effort to design a new protocol that will be able to support future demands. Before doing that, however, we need to characterize current users' interactions with the WWW and understand how it is being used.
Web performance is an emerging issue in modern Internet. We present a study of the Web performance measurement data, collected during over a year of operation of MWING (Multi-agent Web-pING tool), our web server probing system. Both periodic and aperiodic factors influencing the goodput level in HTTP transactions are considered and analyzed for the predictive capability. We propose several hypotheses concerning the goodput shaping, and verify them for the statistical significance level, using the analysis of variance techniques. The presented work provides universal guidelines for designing advanced web traffic engineering applications.
Vrije Universiteit, Amsterdam, The Netherlands, Tech. Rep. IR-CS-041, Sepember, 2007
We study an access trace containing a sample of Wikipedia's traffic over a 107-day period. We perform a global analysis of the whole trace, and a detailed analysis of the requests directed to the English edition of Wikipedia. In our study, we classify client requests and examine aspects such as the number of read and save operations, flash crowds, and requests for nonexisting pages. We also outline strategies for improving Wikipedia performance in a decentralized hosting environment. Keywords: Workload analysis, ...
IASTED International Multi-Conference on Applied Informatics, 2003
The popularity of the World-Wide-Web has increased dramatically in the past few years. Web proxy servers have an important role in reducing server loads, network traffic, and client request latencies. This paper presents a detailed workload characterization study of a busy Web proxy server. The study aims in identifying the major characteristics which will improve modelling of Web proxy accessing. A set of log files is processed for workload characterization. Throughout the study, emphasis is given on identifying the criteria for a Web caching model. A statistical analysis, based on the previous criteria, is presented in order to characterize the major workload parameters. Results of this analysis are presented and the paper concludes with a discussion about workload characterization and content delivery issues.
ACM Transactions on Internet Technology, 2006
This article provides a detailed implementation study on the behavior of web serves that serve static requests where the load fluctuates over time (transient overload). Various external factors are considered, including WAN delays and losses and different client behavior models. We find that performance can be dramatically improved via a kernel-level modification to the web server to change the scheduling policy at the server from the standard FAIR (processor-sharing) scheduling to SRPT (shortest-remaining-processing-time) scheduling. We find that SRPT scheduling induces no penalties. In particular, throughput is not sacrificed and requests for long files experience only negligibly higher response times under SRPT than they did under the original FAIR scheduling.
Information Processing & Management, 2014
Several studies of Web server workloads have hypothesized that these workloads are selfsimilar. The explanation commonly advanced for this phenomenon is that the distribution of Web server requests may be heavy-tailed. However, there is another possible explanation: self-similarity can also arise from deterministic, chaotic processes. To our knowledge, this possibility has not previously been investigated, and so existing studies on Web workloads lack an adequate comparison against this alternative. We conduct an empirical study of workloads from two different Web sites: one public university, and one private company, using the largest datasets that have been described in the literature. Our study employs methods from nonlinear time series analysis to search for chaotic behavior in the web logs of these two sites. While we do find that the deterministic components (i.e. the well-known ''weekend effect'') are significant components in these time series, we do not find evidence of chaotic behavior. Predictive modeling experiments contrasting heavy-tailed with deterministic models showed that both approaches were equally effective in modeling our datasets.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.