Even within one popular sub-category of "NoSQL" solutions -key-value (KV) storage systems -no one... more Even within one popular sub-category of "NoSQL" solutions -key-value (KV) storage systems -no one existing system meets the needs of all applications. We question this poor state of affairs. In this paper, we make the case for a flexible key-value storage system (Flex-KV) that can support both DRAM and disk-based storage, can act as an unreliable cache or a durable store, and operate consistently or inconsistently. The value of such a system goes beyond ease-of-use: While exploring these dimensions of durability, consistency, and availability, we find new choices for system designs, such as a cache-consistent memcached, that offer some applications a better balance of performance and cost than was previously available.
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008, 2008
Guaranteed I/O performance is needed for a variety of applications ranging from real-time data co... more Guaranteed I/O performance is needed for a variety of applications ranging from real-time data collection to desktop multimedia to large-scale scientific simulations. Reservations on throughput, the standard measure of disk performance, fail to effectively manage disk performance due to the orders of magnitude difference between best-, average-, and worst-case response times, allowing reservation of less than 0.01% of the achievable bandwidth. We show that by reserving disk resources in terms of utilization it is possible to create a disk scheduler that supports reservation of nearly 100% of the disk resources, provides arbitrarily hard or soft guarantees depending upon application needs, and yields efficiency as good or better than best-effort disk schedulers tuned for performance. We present the architecture of our scheduler, prove the correctness of its algorithms, and provide results demonstrating its effectiveness.
Supporting rate-based processes in an integrated system
ABSTRACT As real-time applications are becoming common in general-purpose computing systems, sche... more ABSTRACT As real-time applications are becoming common in general-purpose computing systems, scheduling solutions have been developed to support processes with a variety of different timeliness constraints in an integrated way. In this paper, we identify a separate category of soft real-time processes, called rate-based processes, and investi-gate the ways to add the support of this class of processes into an integrated system. We consider rate-based pro-cesses as a separate class of soft real-time processes be-cause they do not have timing constraints in the form of the deadline. Instead, they have continuous processing requirements in terms of the constant rate. The goal of the efficient scheduling mechanism for rate-based processes as part of the integrated system is to provide good over-all system performance while maintaining the required throughput of rate-based processes.
I/O traces are good sources of information about realworld workloads; replaying such traces is of... more I/O traces are good sources of information about realworld workloads; replaying such traces is often used to reproduce the most realistic system behavior possible. But traces tend to be large, hard to use and share, and inflexible in representing more than the exact system conditions at the point the traces were captured. Often, however, researchers are not interested in the precise details stored in a bulky trace, but rather in some statistical properties found in the traces-properties that affect their system's behavior under load. We designed and built a system that (1) extracts many desired properties from a large block I/O trace, (2) builds a statistical model of the trace's salient characteristics, (3) converts the model into a concise description in the language of one or more synthetic load generators, and (4) can accurately replay the models in these load generators. Our system is modular and extensible. We experimented with several traces of varying types and sizes. Our concise models are 4-6% of the original trace size, and our modeling and replay accuracy are over 90%.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010
Data centers often consolidate a variety of workloads to increase storage utilization and reduce ... more Data centers often consolidate a variety of workloads to increase storage utilization and reduce management costs. Each workload, however, has its own performance targets that need to be met, requiring isolation from the effects of other workloads sharing the system. Satisfying the global throughput and latency targets of each workload is challenging in fully distributed storage systems because workloads can have different data layouts and different requests from the same workload can be serviced by different nodes. Quality of service schemes that manage individual system resources usually rely on resource reservations, often requiring assumptions about the layout of data. On the other hand, solutions for distributed storage tend to treat the storage system as a black box, metering requests issued to the system and often under-utilizing system resources. We show that our multi-layered approach that locally manages individual disk resources can deliver global throughput and latency targets while efficiently utilizing system resources. Our system uses upper-level control mechanisms to assign deadlines to requests based on workload performance targets, and low-level disk I/O schedulers designed to meet request deadlines while maximizing throughput at the disk. We provide a novel disk scheduler called Horizon that meets deadlines while providing efficient request re-ordering and keeping efficient disk queues. Our experimental results show that Horizon can meet more than 90% of deadlines while remaining efficient even in the presence of low-latency bursty workloads.
Modern storage systems need to concurrently support applications with different performance requi... more Modern storage systems need to concurrently support applications with different performance requirements ranging from real-time to best-effort. An important aspect of managing performance in such systems is managing disk I/O with the goals of meeting timeliness guarantees of I/O requests and achieving high overall disk efficiency. However, achieving both of these goals simultaneously is hard for two reasons. First, the need to meet deadlines imposes limits on how much I/O requests can be reordered; more pessimistic I/O latency assumptions limit reordering even further. Predicting I/O latencies is a complex task and real-time schedulers often resort to assuming worstcase latencies or using statistical distributions. Second, it is more efficient to keep large internal disk queues, but hardware queueing is usually disabled or limited in real-time systems to tightly bound the worst-case I/O latencies. This paper presents a real-time disk I/O scheduler that uses an underlying disk latency map to improve both request reordering for efficiency and I/O latency estimations for deadline scheduling. We show that more accurate estimation of disk I/O latencies allows our scheduler to provide reordering of requests with efficiency better than traditional LBN-based approaches; this eliminates the need of keeping large internal disk queues. We also show that our scheduler can enforce I/O request deadlines while maintaining high disk performance.
Proceedings of the 2012 workshop on Management of big data systems, 2012
Even within one popular sub-category of "NoSQL" solutions-key-value (KV) storage systems-no one e... more Even within one popular sub-category of "NoSQL" solutions-key-value (KV) storage systems-no one existing system meets the needs of all applications. We question this poor state of affairs. In this paper, we make the case for a flexible key-value storage system (Flex-KV) that can support both DRAM and disk-based storage, can act as an unreliable cache or a durable store, and operate consistently or inconsistently. The value of such a system goes beyond ease-of-use: While exploring these dimensions of durability, consistency, and availability, we find new choices for system designs, such as a cache-consistent memcached, that offer some applications a better balance of performance and cost than was previously available.
2006 27th IEEE International Real-Time Systems Symposium (RTSS'06), 2006
The simple notion of soft real-time processing has fractured into a spectrum of diverse soft real... more The simple notion of soft real-time processing has fractured into a spectrum of diverse soft real-time types with a variety of different resource and time constraints. Schedulers have been developed for each of these types, but these are essentially point solutions in the space of soft real-time and no single scheduler has previously been offered that can simultaneously manage all types. More generally, no detailed unified definition of soft real-time has been provided that includes all types of soft real-time processing. We present a complete real-time taxonomy covering the spectrum of processes from best-effort to hard real-time. The taxonomy divides processes into nine classes based on their resource and timeliness requirements and includes four soft real-time classes, each of which captures a group of soft real-time applications with similar characteristics. We exploit the different features of each of the soft realtime classes to integrate all of them into a single scheduler together with hard real-time and best-effort processes and present results showing their performance.
Storage management activities, such as reporting, file placement, migration and archiving, requir... more Storage management activities, such as reporting, file placement, migration and archiving, require the ability to discover files that belong to an application workflow by relying only on information from the file server. Some classes of application workflows, such as rendering an animated sequence from its graphics models or building an application from its source files, often exhibit a high degree of repeatability. We describe a system called Autograph that exploits this repeatability to discover files that belong to an application workflow. Our approach examines traces of file accesses, finds repeated and correlated accesses, and infers which files likely belong to the same workflow. Our solution targets server workflows and uses file server traces, which contain less process and file information than the local machine traces used in prior work. We show that Autograph successfully extracts workflow file signatures, even if the workflows are concurrent or share files.
2008 IEEE Real-Time and Embedded Technology and Applications Symposium, 2008
Large-and small-scale storage systems frequently serve a mixture of workloads, an increasing numb... more Large-and small-scale storage systems frequently serve a mixture of workloads, an increasing number of which require some form of performance guarantee. Providing guaranteed disk performance-the equivalent of a "virtual disk"-is challenging because disk requests are nonpreemptible and their execution times are stateful, partially non-deterministic, and can vary by orders of magnitude. Guaranteeing throughput, the standard measure of disk performance, requires worst-case I/O time assumptions orders of magnitude greater than average I/O times, with correspondingly low performance and poor control of the resource allocation. We show that disk time utilizationanalogous to CPU utilization in CPU scheduling and the only fully provisionable aspect of disk performance-yields greater control, more efficient use of disk resources, and better isolation between request streams than bandwidth or I/O rate when used as the basis for disk reservation and scheduling.
Proceedings of the 6th International Systems and Storage Conference on - SYSTOR '13, 2013
While the initial wave of in-memory key-value stores has been optimized for serving relatively fi... more While the initial wave of in-memory key-value stores has been optimized for serving relatively fixed content to a very large number of users, an emerging class of enterprise-scale data analytics workloads focuses on capturing, analyzing, and reacting to data in real-time. At the same time, advances in network technologies are shifting the performance bottleneck from the network to the memory subsystem. To address these new trends, we present a bottom-up approach to building a high performance in-memory key-value store, Mercury, for both traditional, read-intensive as well as emerging workloads with high write-to-read ratio. Mercury's architecture is based on two key design principles: (i) economizing the number of DRAM accesses per operation, and (ii) reducing synchronization overheads. We implement these principles with a simple hash table with linked-list based chaining, and provide high concurrency with a fine-grained, cache-friendly locking scheme. On a commodity single-socket server with 12 cores, Mercury scales with number of cores and executes 14 times more queries/second than a popular hash-based key-value system, Memcached, for both read and write-heavy workloads.
Efficient Guaranteed Disk I/O Performance Management
Page 1. Efficient Guaranteed Disk I/O Performance Management Anna Povzner, Scott Brandt, Carlos M... more Page 1. Efficient Guaranteed Disk I/O Performance Management Anna Povzner, Scott Brandt, Carlos Maltzahn Richard Golding, Theodore M. Wong IBM Research Darren Sawyer NetApp SRL Research Symposium October 19, 2010 Page 2. 2 ...
Cloud architectures are moving away from a traditional data center design with SAN and NAS attach... more Cloud architectures are moving away from a traditional data center design with SAN and NAS attached storage to a more flexible solution based on virtual machines with NAS attached storage. While VM storage based on NAS is ideal to meet the high scale, low cost, and manageability requirements of the cloud, it significantly alters the I/O profile for which NAS storage is designed. In this paper, we explore the storage stack in a virtualized NAS environment and highlight corresponding performance implications.
Even within one popular sub-category of "NoSQL" solutions -key-value (KV) storage systems -no one... more Even within one popular sub-category of "NoSQL" solutions -key-value (KV) storage systems -no one existing system meets the needs of all applications. We question this poor state of affairs. In this paper, we make the case for a flexible key-value storage system (Flex-KV) that can support both DRAM and disk-based storage, can act as an unreliable cache or a durable store, and operate consistently or inconsistently. The value of such a system goes beyond ease-of-use: While exploring these dimensions of durability, consistency, and availability, we find new choices for system designs, such as a cache-consistent memcached, that offer some applications a better balance of performance and cost than was previously available.
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008, 2008
Guaranteed I/O performance is needed for a variety of applications ranging from real-time data co... more Guaranteed I/O performance is needed for a variety of applications ranging from real-time data collection to desktop multimedia to large-scale scientific simulations. Reservations on throughput, the standard measure of disk performance, fail to effectively manage disk performance due to the orders of magnitude difference between best-, average-, and worst-case response times, allowing reservation of less than 0.01% of the achievable bandwidth. We show that by reserving disk resources in terms of utilization it is possible to create a disk scheduler that supports reservation of nearly 100% of the disk resources, provides arbitrarily hard or soft guarantees depending upon application needs, and yields efficiency as good or better than best-effort disk schedulers tuned for performance. We present the architecture of our scheduler, prove the correctness of its algorithms, and provide results demonstrating its effectiveness.
Supporting rate-based processes in an integrated system
ABSTRACT As real-time applications are becoming common in general-purpose computing systems, sche... more ABSTRACT As real-time applications are becoming common in general-purpose computing systems, scheduling solutions have been developed to support processes with a variety of different timeliness constraints in an integrated way. In this paper, we identify a separate category of soft real-time processes, called rate-based processes, and investi-gate the ways to add the support of this class of processes into an integrated system. We consider rate-based pro-cesses as a separate class of soft real-time processes be-cause they do not have timing constraints in the form of the deadline. Instead, they have continuous processing requirements in terms of the constant rate. The goal of the efficient scheduling mechanism for rate-based processes as part of the integrated system is to provide good over-all system performance while maintaining the required throughput of rate-based processes.
I/O traces are good sources of information about realworld workloads; replaying such traces is of... more I/O traces are good sources of information about realworld workloads; replaying such traces is often used to reproduce the most realistic system behavior possible. But traces tend to be large, hard to use and share, and inflexible in representing more than the exact system conditions at the point the traces were captured. Often, however, researchers are not interested in the precise details stored in a bulky trace, but rather in some statistical properties found in the traces-properties that affect their system's behavior under load. We designed and built a system that (1) extracts many desired properties from a large block I/O trace, (2) builds a statistical model of the trace's salient characteristics, (3) converts the model into a concise description in the language of one or more synthetic load generators, and (4) can accurately replay the models in these load generators. Our system is modular and extensible. We experimented with several traces of varying types and sizes. Our concise models are 4-6% of the original trace size, and our modeling and replay accuracy are over 90%.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010
Data centers often consolidate a variety of workloads to increase storage utilization and reduce ... more Data centers often consolidate a variety of workloads to increase storage utilization and reduce management costs. Each workload, however, has its own performance targets that need to be met, requiring isolation from the effects of other workloads sharing the system. Satisfying the global throughput and latency targets of each workload is challenging in fully distributed storage systems because workloads can have different data layouts and different requests from the same workload can be serviced by different nodes. Quality of service schemes that manage individual system resources usually rely on resource reservations, often requiring assumptions about the layout of data. On the other hand, solutions for distributed storage tend to treat the storage system as a black box, metering requests issued to the system and often under-utilizing system resources. We show that our multi-layered approach that locally manages individual disk resources can deliver global throughput and latency targets while efficiently utilizing system resources. Our system uses upper-level control mechanisms to assign deadlines to requests based on workload performance targets, and low-level disk I/O schedulers designed to meet request deadlines while maximizing throughput at the disk. We provide a novel disk scheduler called Horizon that meets deadlines while providing efficient request re-ordering and keeping efficient disk queues. Our experimental results show that Horizon can meet more than 90% of deadlines while remaining efficient even in the presence of low-latency bursty workloads.
Modern storage systems need to concurrently support applications with different performance requi... more Modern storage systems need to concurrently support applications with different performance requirements ranging from real-time to best-effort. An important aspect of managing performance in such systems is managing disk I/O with the goals of meeting timeliness guarantees of I/O requests and achieving high overall disk efficiency. However, achieving both of these goals simultaneously is hard for two reasons. First, the need to meet deadlines imposes limits on how much I/O requests can be reordered; more pessimistic I/O latency assumptions limit reordering even further. Predicting I/O latencies is a complex task and real-time schedulers often resort to assuming worstcase latencies or using statistical distributions. Second, it is more efficient to keep large internal disk queues, but hardware queueing is usually disabled or limited in real-time systems to tightly bound the worst-case I/O latencies. This paper presents a real-time disk I/O scheduler that uses an underlying disk latency map to improve both request reordering for efficiency and I/O latency estimations for deadline scheduling. We show that more accurate estimation of disk I/O latencies allows our scheduler to provide reordering of requests with efficiency better than traditional LBN-based approaches; this eliminates the need of keeping large internal disk queues. We also show that our scheduler can enforce I/O request deadlines while maintaining high disk performance.
Proceedings of the 2012 workshop on Management of big data systems, 2012
Even within one popular sub-category of "NoSQL" solutions-key-value (KV) storage systems-no one e... more Even within one popular sub-category of "NoSQL" solutions-key-value (KV) storage systems-no one existing system meets the needs of all applications. We question this poor state of affairs. In this paper, we make the case for a flexible key-value storage system (Flex-KV) that can support both DRAM and disk-based storage, can act as an unreliable cache or a durable store, and operate consistently or inconsistently. The value of such a system goes beyond ease-of-use: While exploring these dimensions of durability, consistency, and availability, we find new choices for system designs, such as a cache-consistent memcached, that offer some applications a better balance of performance and cost than was previously available.
2006 27th IEEE International Real-Time Systems Symposium (RTSS'06), 2006
The simple notion of soft real-time processing has fractured into a spectrum of diverse soft real... more The simple notion of soft real-time processing has fractured into a spectrum of diverse soft real-time types with a variety of different resource and time constraints. Schedulers have been developed for each of these types, but these are essentially point solutions in the space of soft real-time and no single scheduler has previously been offered that can simultaneously manage all types. More generally, no detailed unified definition of soft real-time has been provided that includes all types of soft real-time processing. We present a complete real-time taxonomy covering the spectrum of processes from best-effort to hard real-time. The taxonomy divides processes into nine classes based on their resource and timeliness requirements and includes four soft real-time classes, each of which captures a group of soft real-time applications with similar characteristics. We exploit the different features of each of the soft realtime classes to integrate all of them into a single scheduler together with hard real-time and best-effort processes and present results showing their performance.
Storage management activities, such as reporting, file placement, migration and archiving, requir... more Storage management activities, such as reporting, file placement, migration and archiving, require the ability to discover files that belong to an application workflow by relying only on information from the file server. Some classes of application workflows, such as rendering an animated sequence from its graphics models or building an application from its source files, often exhibit a high degree of repeatability. We describe a system called Autograph that exploits this repeatability to discover files that belong to an application workflow. Our approach examines traces of file accesses, finds repeated and correlated accesses, and infers which files likely belong to the same workflow. Our solution targets server workflows and uses file server traces, which contain less process and file information than the local machine traces used in prior work. We show that Autograph successfully extracts workflow file signatures, even if the workflows are concurrent or share files.
2008 IEEE Real-Time and Embedded Technology and Applications Symposium, 2008
Large-and small-scale storage systems frequently serve a mixture of workloads, an increasing numb... more Large-and small-scale storage systems frequently serve a mixture of workloads, an increasing number of which require some form of performance guarantee. Providing guaranteed disk performance-the equivalent of a "virtual disk"-is challenging because disk requests are nonpreemptible and their execution times are stateful, partially non-deterministic, and can vary by orders of magnitude. Guaranteeing throughput, the standard measure of disk performance, requires worst-case I/O time assumptions orders of magnitude greater than average I/O times, with correspondingly low performance and poor control of the resource allocation. We show that disk time utilizationanalogous to CPU utilization in CPU scheduling and the only fully provisionable aspect of disk performance-yields greater control, more efficient use of disk resources, and better isolation between request streams than bandwidth or I/O rate when used as the basis for disk reservation and scheduling.
Proceedings of the 6th International Systems and Storage Conference on - SYSTOR '13, 2013
While the initial wave of in-memory key-value stores has been optimized for serving relatively fi... more While the initial wave of in-memory key-value stores has been optimized for serving relatively fixed content to a very large number of users, an emerging class of enterprise-scale data analytics workloads focuses on capturing, analyzing, and reacting to data in real-time. At the same time, advances in network technologies are shifting the performance bottleneck from the network to the memory subsystem. To address these new trends, we present a bottom-up approach to building a high performance in-memory key-value store, Mercury, for both traditional, read-intensive as well as emerging workloads with high write-to-read ratio. Mercury's architecture is based on two key design principles: (i) economizing the number of DRAM accesses per operation, and (ii) reducing synchronization overheads. We implement these principles with a simple hash table with linked-list based chaining, and provide high concurrency with a fine-grained, cache-friendly locking scheme. On a commodity single-socket server with 12 cores, Mercury scales with number of cores and executes 14 times more queries/second than a popular hash-based key-value system, Memcached, for both read and write-heavy workloads.
Efficient Guaranteed Disk I/O Performance Management
Page 1. Efficient Guaranteed Disk I/O Performance Management Anna Povzner, Scott Brandt, Carlos M... more Page 1. Efficient Guaranteed Disk I/O Performance Management Anna Povzner, Scott Brandt, Carlos Maltzahn Richard Golding, Theodore M. Wong IBM Research Darren Sawyer NetApp SRL Research Symposium October 19, 2010 Page 2. 2 ...
Cloud architectures are moving away from a traditional data center design with SAN and NAS attach... more Cloud architectures are moving away from a traditional data center design with SAN and NAS attached storage to a more flexible solution based on virtual machines with NAS attached storage. While VM storage based on NAS is ideal to meet the high scale, low cost, and manageability requirements of the cloud, it significantly alters the I/O profile for which NAS storage is designed. In this paper, we explore the storage stack in a virtualized NAS environment and highlight corresponding performance implications.
Uploads
Papers by Anna Povzner