Approximation algorithms for data placement on parallel disks

An Zhu

Approximation algorithms for data placement on parallel disks

2009, ACM Transactions on Algorithms

Abstract

We study an optimization problem that arises in the context of data placement in multimedia storage systems. We are given a collection of M multimedia data objects that need to be assigned to a storage system consisting of N disks d 1 ,d 2 ,...,d N. We are also given sets U 1 ,U 2 ,...,U M such that U i is the set of clients requesting the ith data object. Each disk d j is characterized by two parameters, namely, its storage capacity C j which indicates the maximum number of data objects that may be assigned to it, and a load capacity L j which indicates the maximum number of clients that it can serve. The goal is to find a placement of data objects on disks and an assignment of clients to disks so as to maximize the total number of clients served, subject to the capacity constraints of the storage system. We study this data placement problem for two natural classes of storage systems, namely, homogeneous and uniform ratio. Our first main result is a tight upper and lower bound on the number of items that can always be packed for any input instance to homogeneous as well as uniform ratio storage systems. We show that an algorithm given in [11] for data placement, achieves this bound. Our second main result is a polynomial time approximation scheme for the data placement problem in homogeneous and uniform ratio storage systems, answering an open question of [11]. Finally, we also study the problem from an empirical perspective. Comments Comments Postprint version.

The design and analysis of algorithms for on-line dynamic storage allocation has been a fundamental problem area in computer science for many years. In this paper we study the stochastic behavior of dynamic allocation algorithms under the natural assumption that liles enter and leave the system according to a Poisson process. In particular, we prove that for any dynamic allocation algorithm and any distribution of file sizes, the expected wasted space (or fragmentation) in the system at any time is sZ (& Jw), where N is the expected number of items (or used space) in the system. This result is known to be tight in the special case when all files have the same size. More importantly, we also construct a dynamic allocation algorithm which for any distribution of file sizes wastes only O(filog3'4N) space with very high probability. This bound is also shown to be tight for a wide variety of file-size distributions, including for example the uniform and normal distributions. The results are significant because they show that the cumulative wasted space in the holes formed by the continual arrival and departure of items is a vanishingly small portion of the used space, at least on the average. This fact is in striking contrast with Knuth's well-known 50% rule which states that the number of these holes is linear in the used space. Moreover, the proof techniques establish a surprising connection between stochastic processes, such as dynamic allocation, and static problems such as bin-packing and planar matching. We suspect that the techniques will also prove useful in analyzing other stochastic processes which might otherwise prove intractable. Lastly, we present experimental data in support of the theoretical proofs, and as a basis for postulating several conjectures. DYNAMIC STORAGE ALLOCATION 3 setting, items (records, files, etc.) of varying sizes enter and leave a storage device in a sequence not known in advance. The storage device is represented by a set of consecutive locations or addresses. At its time of arrival, an item is allocated storage space consisting of a contiguous sequence of unoccupied locations equal in length to the item's size. Some time later, the item may depart, thereby making its space available to other items yet to arrive. An item cannot be moved prior to departure, and thus wasted space builds up over time in the form of interior holes alternating with regions of occupied space. This phenomenon is commonly known as fragmentation. The basic problem is to design algorithms which minimize, at least on the average, the cumulative wasted space in the interior holes.

Log In

Approximation algorithms for data placement on parallel disks

Sign up for access to the world's latest research

Abstract

Related papers

Related topics