Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, 2008 10th IEEE International Conference on High Performance Computing and Communications
One of the characteristics of the current web services is that many clients request the same or similar service from a group of replicated servers, e.g. music or movie downloading in peer-to-peer networks. Most of the time, servers are heterogeneous ones in terms of service rate. Much of research has been done in the homogeneous environment. However, there is has been little done on the heterogeneous scenario. It is important and urgent that we have models for heterogeneous server groups for the current Internet applications design and analysis. In this paper, we deploy an approximation method to transform heterogeneous systems into a group of homogeneous system. As a result, the previous results of homogeneous studies can be applied in heterogeneous cases. In order to test the approximation ratio of the proposed model to real applications, we conducted simulations to obtain the degree of similarity. We use two common strategies: random selection algorithm and Firs-Come-First-Serve (FCFS) algorithm to test the approximation ratio of the proposed model. The simulations indicate that the approximation model works well.
Cybernetics and Systems Analysis, 2020
The mathematical model of a queueing system with heterogeneous servers, without queues, and with two types of requests is investigated. High-priority requests are processed in fast servers while low-priority calls are processed in slow servers. If all servers in some group are busy, then reassigning of requests to another group is allowed. Reassigning is based on random schemes and reassignment probability depends on the number of busy servers in appropriate group. Exact and approximate methods are developed for the analysis of characteristics of the system. Explicit approximate formulas to calculate the approximate values of characteristics are proposed.
ACM Transactions on the Web, 2007
Since many Internet applications employ a multitier architecture, in this article, we focus on the problem of analytically modeling the behavior of such applications. We present a model based on a network of queues where the queues represent different tiers of the application. Our model is sufficiently general to capture (i) the behavior of tiers with significantly different performance characteristics and (ii) application idiosyncrasies such as session-based workloads, tier replication, load imbalances across replicas, and caching at intermediate tiers. We validate our model using real multitier applications running on a Linux server cluster. Our experiments indicate that our model faithfully captures the performance of these applications for a number of workloads and configurations. Furthermore, our model successfully handles a comprehensive range of resource utilization-from 0 to near saturation for the CPU-for two separate tiers. For a variety of scenarios, including those with caching at one of the application tiers, the average response times predicted by our model were within the 95% confidence intervals of the observed average response times. Our experiments also demonstrate the utility of the model for dynamic capacity provisioning, performance prediction, bottleneck identification, and session policing. In one scenario, where the request arrival rate increased from less than 1500 to nearly 4200 requests/minute, a dynamic provisioning technique employing our model was able to maintain response time targets by increasing the capacity of two of the tiers by factors of 2 and 3.5, respectively.
Internet Technologies, Applications and Societal Impact, 2002
We present a model of Web servers in terms ofG/G/m/N service system with autocorrelated (self-similar) input traffic. Diffusion approximation enables introduction of time-varying input and the analysis of transient states.
Replicated services that allow to scale dynamically can adapt to requests load. Choosing the right number of replicas is fundamental to avoid performance worsening when input spikes occur and to save resources when the load is low. Current mechanisms for automatic scaling are mostly based on fixed thresholds on CPU and memory usage, which are not sufficiently accurate and often entail late countermeasures. We propose Make Your Service Elastic (MYSE), an architecture for automatic scaling of generic replicated services based on queuing models for accurate response time estimation. Requests and service times patterns are analyzed to learn and predict over time their distribution so as to allow for early scaling. A novel heuristic is proposed to avoid the flipping phenomenon. We carried out simulations that show promising results for what concerns the effectiveness of our approach.
Proceedings of the 2nd international workshop …, 2000
In this paper, we illustrate a model-based approach to Web server performance evaluation, and present an analytic queueing model of Web servers in distributed environments. Performance predictions from the analytic model match well with the performance observed from simulation. The model forms an excellent basis for a decision support tool to allow system architects to predict the behavior of new systems prior to deployment, or existing systems under new workload scenarios.
Performance Evaluation, 2007
Join the Shortest Queue (JSQ) is a popular routing policy for server farms. However, until now all analysis of JSQ has been limited to First-Come-First-Serve (FCFS) server farms, whereas it is known that web server farms are better modeled as Processor Sharing (PS) server farms. We provide the first approximate analysis of JSQ in the PS server farm model for general job-size distributions, obtaining the distribution of queue length at each queue. To do this, we approximate the queue length of each queue in the server farm by a one-dimensional Markov chain, in a novel fashion. We also discover some interesting insensitivity properties of PS server farms with JSQ routing, and discuss the near-optimality of JSQ.
International Journal of System Dynamics Applications, 2014
Load balancing applications introduce delays due to load relocation among various web servers and depend upon the design of balancing algorithms and resources required to share in the large and wide applications. The performance of web servers depends upon the efficient sharing of the resources and it can be evaluated by the overall task completion time of the tasks based on the load balancing algorithm. Each load balancing algorithm introduces delay in the task allocation among the web servers, but still improved the performance of web servers dynamically. As a result, the queue-length of web server and average waiting time of tasks decreases with load balancing instants based on zero, deterministic, and random types of delay. In this paper, the effects of delay due to load balancing have been analyzed based on the factors: average queue-length and average waiting time of tasks. In the proposed Ratio Factor Based Delay Model (RFBDM), the above factors are minimized and improved the...
Telecommunication Systems, 2002
In this paper, we illustrate a model-based approach to Web server performance evaluation, and present an analytic queueing model of Web servers in distributed environments. Performance predictions from the analytic model match well with the performance observed from simulation. The model forms an excellent basis for a decision support tool to allow system architects to predict the behavior of new systems prior to deployment, or existing systems under new workload scenarios.
Approximation algorithms have been used to design polynomial time algorithms for intractable problems that provide solutions within the bounded proximity of the optimal solution. Load balancing problem on Heterogeneous Distributed Computing System (HDCS) deals with allocation of tasks to computing nodes, so that computing nodes are evenly loaded. Load-balancing algorithms are attempts to compute the assignment with smallest possible makespan(i.e. the completion time at the maximum loaded computing node). Load balancing problem is a NP hard problem. This paper presents an analysis of approximation algorithms based on task and machine heterogeneity through ETC matrix on Heterogeneous Distributed Computing Systems with makespan as performance metric.
We consider the job assignment problem in a multi-server system consisting of N parallel processor sharing servers, categorized into M (≪ N ) different types according to their processing capacity or speed. Jobs of random sizes arrive at the system according to a Poisson process with rate N λ. Upon each arrival, a small number of servers from each type is sampled uniformly at random. The job is then assigned to one of the sampled servers based on a selection rule. We propose two schemes, each corresponding to a specific selection rule that aims at reducing the mean sojourn time of jobs in the system.
2004
The accuracy of self-similar processes that are widely used to model Web server systems is evaluated using simulation. Specifically, we consider two processes with self-similarity from a single origin: either with a realistic self-similar arrival process or with a realistic service-time distribution, but not both. A detailed examination and comparison of these processes is presented, together with conclusions regarding the scenarios in which one process outperforms the other. The main results of the simulation are that when the system has medium or low utilization levels, both processes fail to estimate the realistic maximum number of clients in the system and the realistic average response time.
Proceedings 42nd IEEE Symposium on Foundations of Computer Science, 2001
Given the increasing traffic on the World Wide Web (Web), it is difficult for a single popular Web server to handle the demand from its many clients. By clustering a group of Web servers, it is possible to reduce the origin Web server's load significantly and reduce user's response time when accessing a Web document. A fundamental question is how to allocate Web documents among these servers in order to achieve load balancing? In this paper, we are given a collection of documents to be stored on a cluster of Web servers. Each of the servers is associated with resource limits in its memory and its number of HTTP connections. Each document has an associated size and access cost. The problem is to allocate the documents among the servers so that no server's memory size is exceeded, and the load is balanced as equally as possible. In this paper, we show that most simple formulations of this problem are NP-hard, we establish lower bounds on the value of the optimal load, and we show that if there are no memory constraints for all the servers, then there is an allocation algorithm, that is within a factor 2 of the optimal solution. We show that if all servers have the same number of HTTP connections and the same memory size, then a feasible allocation is achieved within a factor 4 of the optimal solution using at most 4 times the optimal memory size. We also provide improved approximation results for the case where documents are relatively small.
Proceedings of the 2005 ACM symposium on Applied computing - SAC '05, 2005
Efficient server selection algorithms reduce retrieval time for objects replicated on different servers and are an important component of Internet cache architectures. This paper empirically evaluates six clientside server selection algorithms. The study compares two statistical algorithms, one using median bandwidth and the other median latency, a dynamic probe algorithm, two hybrid algorithms, and random selection. The server pool includes a topologically dispersed set of United States state government web servers. Experiments were run on three clients in different cities and on different regional networks. The study examines the effects of time-of-day, client resources, and server proximity. Differences in performance highlight the degree of algorithm adaptability and the effect that network upgrades can have on statistical estimators. Dynamic network probing performs as well or better than the statistical bandwidth algorithm and the two probe-bandwidth hybrid algorithms. The statistical latency algorithm is clearly worse, but does outperform random selection.
ACM SIGMETRICS Performance Evaluation Review, 2005
Since many Internet applications employ a multi-tier architecture, in this paper, we focus on the problem of analytically modeling the behavior of such applications. We present a model based on a network of queues, where the queues represent different tiers of the application. Our model is sufficiently general to capture (i) the behavior of tiers with significantly different performance characteristics and (ii) application idiosyncrasies such as session-based workloads, concurrency limits, and caching at intermediate tiers. We validate our model using real multi-tier applications running on a Linux server cluster. Our experiments indicate that our model faithfully captures the performance of these applications for a number of workloads and configurations. For a variety of scenarios, including those with caching at one of the application tiers, the average response times predicted by our model were within the 95% confidence intervals of the observed average response times. Our experiments also demonstrate the utility of the model for dynamic capacity provisioning, performance prediction, bottleneck identification, and session policing. In one scenario, where the request arrival rate increased from less than 1500 to nearly 4200 requests/min, a dynamic provisioning technique employing our model was able to maintain response time targets by increasing the capacity of two of the application tiers by factors of 2 and 3.5, respectively.
International Journal of Electrical and Computer Engineering (IJECE), 2019
With the rising popularity of web-based applications, the primary and consistent resource in the infrastructure of World Wide Web are cluster-based web servers. Overtly in dynamic contents and database driven applications, especially at heavy load circumstances, the performance handling of clusters is a solemn task. Without using efficient mechanisms, an overloaded web server cannot provide great performance. In clusters, this overloaded condition can be avoided using load balancing mechanisms by sharing the load among available web servers. The existing load balancing mechanisms which were intended to handle static contents will grieve from substantial performance deprivation under database-driven and dynamic contents. The most serviceable load balancing approaches are Web Server Queuing (WSQ), Server Content based Queue (QSC) and Remaining Capacity (RC) under specific conditions to provide better results. By Considering this, we have proposed an approximated web server Queuing mechanism for web server clusters and also proposed an analytical model for calculating the load of a web server. The requests are classified based on the service time and keep tracking the number of outstanding requests at each webserver to achieve better performance. The approximated load of each web server is used for load balancing. The investigational results illustrate the effectiveness of the proposed mechanism by improving the mean response time, throughput and drop rate of the server cluster.
Queueing Systems
Multiserver queueing systems are found at the core of a wide variety of practical systems. Many important multiserver models have a previously-unexplained similarity: identical mean response time behavior is empirically observed in the heavy traffic limit. We explain this similarity for the first time. We do so by introducing the work-conserving finite-skip (WCFS) framework, which encompasses a broad class of important models. This class includes the heterogeneous M/G/k, the limited processor sharing policy for the M/G/1, the threshold parallelism model, and the multiserver-job model under a novel scheduling algorithm. We prove that for all WCFS models, scaled mean response time E[T ](1 − ρ) converges to the same value, E[S 2 ]/(2E[S]), in the heavy-traffic limit, which is also the heavy traffic limit for the M/G/1/FCFS. Moreover, we prove additively tight bounds on mean response time for the WCFS class, which hold for all load ρ. For each of the four models mentioned above, our bounds are the first known bounds on mean response time.
Distributed Systems Engineering
The Layered System Generator is used to create synthetic distributed systems of tasks with client-server style (RPC) interactions, representing a wide range of software architectures and workload patterns. A synthetic task system can be used to generate network and workstation traffic which represents the load from a planned software system, so one can observe its probable performance when run on the target network, or its probable impact on other existing applications. It can be used to evaluate the planned software design, or the target network's capability, or both combined. Using LSG, tests were made with systems of up to 39 tasks on a UNIX network, to investigate the performance changes that occur when a small task system is scaled up in size. The performance recorded across the range of experiments was also compared with predictions made by an analytic performance model. The errors were found to be small provided an allowance is made for workstation daemons and similar load components. 1. Distributed Server Systems The computing world is moving steadily towards distributed systems running on networks, driven by the need to communicate between applications in different places, by the economics of workstations, and by the opportunity to build reliable systems in this way. Technology to support this is becoming available in the form of remote procedure calls (RPCs), Object Request Brokers such as CORBA [1], and other "midware" such as security servers. The Distributed Computing Environment (DCE) [2] provides a collection of these features, and the Advanced Network Services Architecture (ANSA) is another such collection. These distributed processing frameworks support what we will call a "layered software architecture", with applications in the top layer and service requests descending through the layers, as illustrated in Figure 1. Client-server systems have this structure, with the deeper-layered versions being known as "three-tiered" client-server systems. This notion of layering is not quite the same as the layers of an operating system or a protocol suite, but has many resemblances since lower level servers tend to offer more generic services, for example file service. We will classify systems in Section 2 by their breadth, depth and the balance of the workload between the layers. As systems are scaled up typically their breadth increases, with more clients at the "top", and (perhaps) replication of servers in the middle layers. Sometimes there is an increase in depth also, due to a reorganization of services to divide local services which can be replicated, from global services which cannot. This paper uses a narrower idea of scalability, and measures it by the ability to get satisfactory performance from a set of tasks as the breadth is increased. A scalable system is one which can be adapted as the number of users is increased, by replicating servers, to give a proportional increase in throughput (or, to retain the same performance for each user).
International Symposium on Computer and Information Sciences, 2004
The Quality of Service (QoS) perceived by users is the dominant factor for the success of an Internet-based Web service. Thus, capacity planning has become essential for deploying Web servers. The aim of this study is to understand the key elements in Web server performance. We created a controlled test-bed environment, which allows one to analyze Web server performance in a simple and effective way. In this paper we present an analysis of two different Web server architectures and their performance, with particular focus on discovering bottlenecks.
Journal of Computer and Systems Sciences International, 2019
Markov models of systems with heterogeneous servers, various types of requests, and jump priorities are proposed. It is assumed that there are high and low priority requests; the high priority requests are assigned to the server with the high service rate, and the low priority requests are assigned to the server with the low service rate. Models of two types are studied: with separate queues and the shared queue for requests of different types. Jump priorities determine the rules according to which the type of low priority requests changes depending on the state of the queue. Methods for calculating the distribution of the probability of the system states are developed, formulas for the calculation of the characteristics of the system are derived, and the problem of the optimization of these characteristics is solved. The results of numerical experiments are presented.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.