Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, ACM Transactions on Algorithms
We study an optimization problem that arises in the context of data placement in multimedia storage systems. We are given a collection of M multimedia data objects that need to be assigned to a storage system consisting of N disks d 1 ,d 2 ,...,d N. We are also given sets U 1 ,U 2 ,...,U M such that U i is the set of clients requesting the ith data object. Each disk d j is characterized by two parameters, namely, its storage capacity C j which indicates the maximum number of data objects that may be assigned to it, and a load capacity L j which indicates the maximum number of clients that it can serve. The goal is to find a placement of data objects on disks and an assignment of clients to disks so as to maximize the total number of clients served, subject to the capacity constraints of the storage system. We study this data placement problem for two natural classes of storage systems, namely, homogeneous and uniform ratio. Our first main result is a tight upper and lower bound on the number of items that can always be packed for any input instance to homogeneous as well as uniform ratio storage systems. We show that an algorithm given in [11] for data placement, achieves this bound. Our second main result is a polynomial time approximation scheme for the data placement problem in homogeneous and uniform ratio storage systems, answering an open question of [11]. Finally, we also study the problem from an empirical perspective. Comments Comments Postprint version.
SIAM Journal on Computing, 2008
We develop approximation algorithms for the problem of placing replicated data in arbitrary networks, where the nodes may both issue requests for data objects and have capacity for storing data objects, so as to minimize the average data-access cost. We introduce the data placement problem to model this problem. We have a set of caches F, a set of clients D, and a set of data objects O. Each cache i can store at most u i data objects. Each client j ∈ D has demand d j for a specific data object o(j) ∈ O and has to be assigned to a cache that stores that object. Storing an object o in cache i incurs a storage cost of f o i , and assigning client j to cache i incurs an access cost of d j c ij. The goal is to find a placement of the data objects to caches respecting the capacity constraints, and an assignment of clients to caches, so as to minimize the total storage and client access costs. We present a 10-approximation algorithm for this problem. Our algorithm is based on rounding an optimal solution to a natural LP-relaxation of the problem. One of the main technical challenges encountered during rounding is to preserve the cache capacities while incurring only a constant-factor increase in the solution cost. We also introduce the connected data placement problem, to capture settings where write-requests are also issued for data objects, so that one requires a mechanism to maintain consistency of data. We model this by requiring that all caches containing a given object be connected by a Steiner tree to a root for that object, which issues a multicast-message upon a write to (any copy of) that object. The total cost now includes the cost of these Steiner trees. We devise a 14-approximation algorithm for this problem. We show that our algorithms can be adapted to handle two variants of the problem: (a) a k-median variant, where there is a specified bound on the number of caches that may contain a given object; (b) a generalization where objects have lengths and the total length of the objects stored in any cache must not exceed its capacity.
We study approximation algorithms for placing replicated data in arbitrary networks. Consider a network of nodes with individual storage capacities and a metric communication cost function, in which each node periodically issues a request for an object drawn from a collection of uniform-length objects. We consider the problem of placing copies of the objects among the nodes such that the average access cost is minimized. Our main result is a polynomial-time constant-factor approximation algorithm for this placement problem. Our algorithm is based on a careful rounding of a linear programming relaxation of the problem. We also show that the data placement problem is MAXSNP-hard.We extend our approximation result to a generalization of the data placement problem that models additional costs such as the cost of realizing the placement. We also show that when object lengths are non-uniform, a constant-factor approximation is achievable if the capacity at each node in the approximate sol...
2006 Proceedings of the Eighth Workshop on Algorithm Engineering and Experiments (ALENEX), 2006
1999
In this report, we look at the problem of packing a number of arrays in memory efficiently. This is known as the dynamic storage allocation problem (DSA) and it is known to be NP-complete. We develop some simple, polynomial-time approximation algorithms with the best of them achieving a bound of 4 for a subclass of DSA instances. We report on an extensive experimental study on the FirstFit heuristic and show that the average-case performance on random instances is within 7% of the optimal value.
IEEE Transactions on Knowledge and Data Engineering, 2000
AbstractÐThe problem of optimally placing data on disks (ODP) to maximize disk-access performance has long been recognized as important. Solutions to this problem have been reported for some widely available disk technologies, such as magnetic CAV and optical CLV disks. However, important new technologies such as multizoned magnetic disks, have been recently introduced. For such technologies no formal solution to the ODP problem has been reported. In this paper, we first identify the fundamental characteristics of disk-device technologies which influence the solution to the ODP problem. We develop a comprehensive solution to the problem that covers all currently available disk technologies. We show how our comprehensive solution can be reduced to the solutions for existing disk technologies, contributing thus a solution to the ODP problem for multizoned disks. Our analytical solution has been validated through simulations and through its reduction to the known solutions for particular disks. Finally, we study how the solution for multizoned disks is affected by the disk and data characteristics.
2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom), 2013
We consider the problem of distributing media files for streaming on a distributed storage network, where servers have heterogeneous capacities and bandwidths. Regarding networking the servers' bandwidths are the bottlenecks for streaming. We present an algorithm that computes an assignment of n files to m servers for distributing media files such that the streaming speed requirements and capacity constraints are kept. As an additional feature this assignment algorithm works online, i.e. it can assign each file without files to be stored later on. Our algorithm computes the data assignment in time O(nm+m log m) outperforming linear program solvers.
Theoretical Computer Science, 2012
Video-on-Demand (VoD) services require frequent updates in file configuration on the storage subsystem, so as to keep up with the frequent changes in movie popularity. This defines a natural reconfiguration problem in which the goal is to minimize the cost of moving from one file configuration to another. The cost is incurred by file replications performed throughout the transition. The problem shows up also in production planning, preemptive scheduling with setup costs, and dynamic placement of Web applications. We show that the reconfiguration problem is NP-hard already on very restricted instances. We then develop algorithms which achieve the optimal cost by using servers whose load capacities are increased by O(1), in particular, by factor 1 + δ for any small 0 < δ < 1 when the number of servers is fixed, and by factor of 2 + ε for arbitrary number of servers, for some ε ∈ [0, 1). To the best of our knowledge, this particular variant of the data migration problem is studied here for the first time.
2007
In this thesis we address three problems related to self-management of storage networksdata placement, data reconfiguration and data monitoring. Examples of such storage networks include centrally managed systems like Storage Area Networks and Network Attached Storage devices, or even highly distributed systems like a P2P network or a Sensor Network. One of the crucial functions of a storage system is that of deciding the placement of data within the system. This data placement is dependent on the demand pattern for the data and subject to constraints of the storage system. For instance, if a particular data item is very popular the storage system might want to host it on a disk with high bandwidth or make multiple copies of the item. We present new results for some of these data placement problems.
Computer Networks, 2002
The delivery of large files to individual users, such as video on demand or application programs to the envisioned network computers is expected by many to be one of the main tasks of broadband communication networks. This requires high bandwidth capacity as well as fast and dense storage servers. This motivates multimedia service providers to optimize the delivery network, as well as the electronic content allocation.
Journal of Computer and System Sciences, 2006
The effectiveness of a distributed system hinges on the manner in which tasks and data are assigned to the underlying system resources. Moreover, today's large-scale distributed systems must accommodate heterogeneity in both the offered load and in the makeup of the available storage and compute capacity. The ideal resource assignment must balance the utilization of the underlying system against the loss of locality incurred when individual tasks or data objects are fragmented among several servers. In this paper we describe this locality-maximizing placement problem and show that an optimal solution is NP-hard. We then describe a polynomial-time algorithm that generates a placement within an additive constant of two from optimal.
2005
The addition of storage capacity in network nodes for the caching or replication of popular data objects results in reduced end-user delay, reduced network traffic, and improved scalability. The problem of allocating an available storage budget to the nodes of a hierarchical content distribution system is formulated; optimal algorithms, as well as fast/efficient heuristics, are developed for its solution. An innovative aspect of the presented approach is that it combines all relevant subproblems, concerning node locations, node sizes, and object placement, and solves them jointly in a single optimization step. The developed algorithms may be utilized in content distribution networks that employ either replication or caching/replacement.
Very Large Data Bases, 1997
Recently, technological advances have resulted in the wide availability of commercial prod- ucts offering near-line, robot-based, tertiary storage libraries. Thus, such libraries have become a crucial component of modern large- scale storage servers, given the very large stor- age requirements of modern applications. Al- though the subject of optimal data placement (ODP) strategies has received considerable at- tention for other
Parallel and Distributed Processing Techniques and Applications, 2000
Media-On-Demand (MOD) servers cater to users' needs for data and information such as news, movies, interactive games, music, merchandise catalogs, etc. This requires the storage, management, and delivery of huge amounts of multimedia data. One of the models of a MOD server is general network of computers. Some of the nodes in the network are storage nodes which contain data repositories, and the others are interface nodes which obtain data from storage nodes and deliver them to the users (clients) in a timely fashion. Determining the locations of the storage nodes to minimize the data tra c between them and the interface nodes is a global optimization problem. This paper presents an o ine heuristic for choosing the locations of storage nodes and clustering interface nodes with storage nodes so as to minimize the trafc from the storage nodes. The proposed algorithm has been validated by simulations.
Proceedings International Parallel and Distributed Processing Symposium, 2003
As storage systems scale to thousands of disks, data distribution and load balancing become increasingly important. We present an algorithm for allocating data objects to disks as a system as it grows from a few disks to hundreds or thousands. A client using our algorithm can locate a data object in microseconds without consulting a central server or maintaining a full mapping of objects or buckets to disks. Despite requiring little global configuration data, our algorithm is probabilistically optimal in both distributing data evenly and minimizing data movement when new storage is added to the system. Moreover, our algorithm supports weighted allocation and variable levels of object replication, both of which are needed to permit systems to efficiently grow while accommodating new technology.
The Journal of Supercomputing, 2009
In this paper we investigate the composition of cheap network storage resources to meet specific availability and capacity requirements. We show that the problem of finding the optimal composition for availability and price requirements can be reduced to the knapsack problem, and propose three techniques for efficiently finding approximate solutions. The first algorithm uses a dynamic programming approach to find mirrored storage resources for high availability requirements, and runs in the pseudopolynomial O(n 2 c) time where n is the number of sellers' resources to choose from and c is a capacity function of the requested and minimum availability. The second technique is a heuristic which finds resources to be agglomerated into a larger coherent resource, with complexity of O(n log n). The third technique finds a compromise between capacity and availability (which in our phrasing is a complex integer programming problem) using a genetic algorithm. The algorithms can be implemented on a broker that intermediates between buyers and sellers of storage resources. Finally, we show that a broker in an open storage market, using the combination of the three algorithms can more frequently meet user requests and lower the cost of requests that are met compared to a broker that simply matches single resources to requests.
1996
A Video-on-Demand (VOD) server needs to store hundreds of movie titles and to support thousands of concurrent accesses. This, technically and economically, imposes a great challenge on the design of the disk storage subsystem of a VOD server. Due to di erent demands for di erent movie titles, the numbers of concurrent accesses to di erent movie titles can di er a lot. We de ne access pro le as the number of concurrent accesses to each movie title that should be supported by a VOD server. The access pro le is derived based on the popularity of each movie title and thus serves as a major design goal for the disk storage subsystem. Since some popular (hot) movie titles may be concurrently accessed by hundreds of users and a current high-end magnetic disk array (disk) can only support tens of concurrent accesses, it is necessary to replicate and/or stripe the hot movie les over multiple disk arrays. The consequence of replication and striping for hot movie titles is the potential increase on the required number of disk arrays. Therefore, how to replicate, stripe, and place the movie les over a minimum number of magnetic disk arrays such that a given access pro le can be supported is an important problem. In this paper, we formulate the problem of the video le allocation over disk arrays, demonstrate that it is a NP-hard problem, and present some heuristic algorithms to nd the near-optimal solutions. The result of this study can be applied to the design of the storage subsystem of a VOD server to economically minimize the cost or to maximize the utilization of disk arrays.
2010 IEEE International Symposium on Information Theory, 2010
We consider the problem of distributing a file in a network of storage nodes whose storage budget is limited but at least equals the size file. We first generate T encoded symbols (from the file) which are then distributed among the nodes. We investigate the optimal allocation of T encoded packets to the storage nodes such that the probability of reconstructing the file by using any r out of n nodes is maximized. Since the optimal allocation of encoded packets is difficult to find in general, we find another objective function which well approximates the original problem and yet is easier to optimize. We find the optimal symmetric allocation for all coding redundancy constraints using the equivalent approximate problem. We also investigate the optimal allocation in random graphs. Finally, we provide simulations to verify the theoretical results.
2015 IEEE Global Communications Conference (GLOBECOM), 2014
In this paper we consider distributed allocation problems with memory constraint limits. Firstly, we propose a tractable relaxation to the problem of optimal symmetric allocations from [1]. The approximated problem is based on the Q-error function, and its solution approaches the solution of the initial problem, as the number of storage nodes in the network grows. Secondly, exploiting this relaxation, we are able to formulate and to solve the problem for storage allocations for memory-limited DSS storing and arbitrary memory profiles. Finally, we discuss the extension to the case of multiple data objects, stored in the DSS.
Journal of Computer and System Sciences, 1989
The design and analysis of algorithms for on-line dynamic storage allocation has been a fundamental problem area in computer science for many years. In this paper we study the stochastic behavior of dynamic allocation algorithms under the natural assumption that liles enter and leave the system according to a Poisson process. In particular, we prove that for any dynamic allocation algorithm and any distribution of file sizes, the expected wasted space (or fragmentation) in the system at any time is sZ (& Jw), where N is the expected number of items (or used space) in the system. This result is known to be tight in the special case when all files have the same size. More importantly, we also construct a dynamic allocation algorithm which for any distribution of file sizes wastes only O(filog3'4N) space with very high probability. This bound is also shown to be tight for a wide variety of file-size distributions, including for example the uniform and normal distributions. The results are significant because they show that the cumulative wasted space in the holes formed by the continual arrival and departure of items is a vanishingly small portion of the used space, at least on the average. This fact is in striking contrast with Knuth's well-known 50% rule which states that the number of these holes is linear in the used space. Moreover, the proof techniques establish a surprising connection between stochastic processes, such as dynamic allocation, and static problems such as bin-packing and planar matching. We suspect that the techniques will also prove useful in analyzing other stochastic processes which might otherwise prove intractable. Lastly, we present experimental data in support of the theoretical proofs, and as a basis for postulating several conjectures. DYNAMIC STORAGE ALLOCATION 3 setting, items (records, files, etc.) of varying sizes enter and leave a storage device in a sequence not known in advance. The storage device is represented by a set of consecutive locations or addresses. At its time of arrival, an item is allocated storage space consisting of a contiguous sequence of unoccupied locations equal in length to the item's size. Some time later, the item may depart, thereby making its space available to other items yet to arrive. An item cannot be moved prior to departure, and thus wasted space builds up over time in the form of interior holes alternating with regions of occupied space. This phenomenon is commonly known as fragmentation. The basic problem is to design algorithms which minimize, at least on the average, the cumulative wasted space in the interior holes.
2007
We present a randomized block-level storage virtualiza- tion for arbitrary heterogeneous storage systems that can distribute data in a fair and redundant way and can adapt this distribution in an efficient way as storage devices en- ter or leave the system. More precisely, our virtualization strategies can distribute a set of data blocks among a set of storage devices of arbitrary non-uniform capacities so that a storage device representing x% of the capacity in the system will get x% of the data (as long as this is in princi- ple possible) and the different copies of each data block are stored so that no two copies of a data block are located in the same device. Achieving these two properties is not easy, and no virtualization strategy has been presented so far that has been formally shown to satisfy fairness and redundancy while being time- and space-efficient and allowing an effi- cient adaptation to a changing set of devices.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.