Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2020, arXiv (Cornell University)
…
10 pages
1 file
Lossy compression is one of the most important strategies to resolve the big science data issue, however, little work was done to make it resilient against silent data corruptions (SDC). In fact, SDC is becoming nonnegligible because of exa-scale computing demand on complex scientific simulations with vast volume of data being produced or in some particular instruments/devices (such as interplanetary space probe) that need to transfer large amount of data in an error-prone environment. In this paper, we propose an SDC resilient error-bounded lossy compressor upon the SZ compression framework. Specifically, we adopt a new independent-block-wise model that decomposes the entire dataset into many independent sub-blocks to compress. Then, we design and implement a series of error detection/correction strategies based on SZ. We are the first to extend algorithm-based fault tolerance (ABFT) to lossy compression. Our proposed solution incurs negligible execution overhead without soft errors. It keeps the correctness of decompressed data still bounded within user's requirement with a very limited degradation of compression ratios upon soft errors.
2006
Abstract Lossless data compression systems are typically regarded as very brittle to transmission errors. This limits their applicability to domains like noisy tetherless channels or file systems that can possibly get corrupted. Here we show how a popular lossless data compression scheme used in file formats GIF, PDF, and TIFF, among others, can be made error-resilient in such a way that the compression performance is minimally affected.
SIAM Journal on Computing, 1997
2020 IEEE International Conference on Big Data (Big Data)
Error-bounded lossy compression is becoming more and more important to today's extreme-scale HPC applications because of the ever-increasing volume of data generated because it has been widely used in in-situ visualization, data stream intensity reduction, storage reduction, I/O performance improvement, checkpoint/restart acceleration, memory footprint reduction, etc. Although many works have optimized ratio, quality, and performance for different error-bounded lossy compressors, there is none of the existing works attempting to systematically understand the impact of lossy compression errors on HPC application due to error propagation. In this paper, we propose and develop a lossy compression fault injection tool, called LCFI. To the best of our knowledge, this is the first fault injection tool that helps both lossy compressor developers and users to systematically and comprehensively understand the impact of lossy compression errors on HPC programs. The contributions of this work are threefold: (1) We propose an efficient approach to inject lossy compression errors according to a statistical analysis of compression errors for different state-of-the-art compressors. (2) We build a fault injector which is highly applicable, customizable, easy-to-use in generating top-down comprehensive results, and demonstrate the use of LCFI. (3) We evaluate LCFI on four representative HPC benchmarks with different abstracted fault models and make several observations about error propagation and their impacts on program outputs.
2018 IEEE International Conference on Cluster Computing (CLUSTER)
Error-controlled lossy compression has been studied for years because of extremely large volumes of data being produced by today's scientific simulations. None of existing lossy compressors, however, allow users to fix the peak signal-to-noise ratio (PSNR) during compression, although PSNR has been considered as one of the most significant indicators to assess compression quality. In this paper, we propose a novel technique providing a fixed-PSNR lossy compression for scientific data sets. We implement our proposed method based on the SZ lossy compression framework and release the code as an open-source toolkit. We evaluate our fixed-PSNR compressor on three realworld high-performance computing data sets. Experiments show that our solution has a high accuracy in controlling PSNR, with an average deviation of 0.1 ∼ 5.0 dB on the tested data sets.
2002
The coding of highly compressed data streams involves removing as much redundancy from the stream as possible. However, as redundancy is removed, so is the ability of the decoder to recover from error conditions caused by wireless channels, or other lossy communications links. Standard techniques for providing some protection to the stream against channel errors usually involve adding a controlled amount of redundancy back into the stream. Such redundancy might take the form of resynchronization markers, which enable the decoder to restart the decoding process from a known state, in the event of transmission errors.
2021
Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. With ever-emerging heterogeneous high-performance computing (HPC) architecture, GPU-accelerated error-bounded compressors (such as cuSZ+ and cuZFP) have been developed. However, they suffer from either low performance or low compression ratios. To this end, we propose cuSZ+ to target both high compression ratios and throughputs. We identify that data sparsity and data smoothness are key factors for high compression throughputs. Our key contributions in this work are fourfold: (1) We propose an efficient compression workflow to adaptively perform run-length encoding and/or variable-length encoding. (2) We derive Lorenzo reconstruction in decompression as multidimensional partial-sum computation and propose a fine-grained Lorenzo reconstruction algorithm for GPU architectures. (3) We carefully optimize each of cuSZ+ kernels by leveraging state-of-the-art CUDA parallel primitives...
IEEE Transactions on Parallel and Distributed Systems, 2019
Scientific simulations on high-performance computing (HPC) systems generate vast amounts of floating-point data that need to be reduced in order to lower the storage and I/O cost. Lossy compressors trade data accuracy for reduction performance and have been demonstrated to be effective in reducing data volume. However, a key hurdle to wide adoption of lossy compressors is that the trade-off between data accuracy and compression performance, particularly the compression ratio, is not well understood. Consequently, domain scientists often need to exhaust many possible error bounds before they can figure out an appropriate setup. The current practice of using lossy compressors to reduce data volume is, therefore, through trial and error, which is not efficient for large datasets which take a tremendous amount of computational resources to compress. This paper aims to analyze and estimate the compression performance of lossy compressors on HPC datasets. In particular, we predict the compression ratios of two modern lossy compressors that achieve superior performance, SZ and ZFP, on HPC scientific datasets at various error bounds, based upon the compressors' intrinsic metrics collected under a given base error bound. We evaluate the estimation scheme using twenty real HPC datasets and the results confirm the effectiveness of our approach.
2021
Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. With ever-emerging heterogeneous HPC architecture, GPU-accelerated error-bounded compressors (such as cuSZ and cuZFP) have been developed. However, they suffer from either low performance or low compression ratios. To this end, we propose cuSZ(x) to target both high compression ratio and throughput. We identify that data sparsity and data smoothness are key factors for high compression throughput. Our key contributions in this work are fourfold: (1) We propose an efficient compression workflow to adaptively perform run-length encoding and/or variable-length encoding. (2) We derive Lorenzo reconstruction in decompression as multidimensional partial-sum computation and propose a fine-grained Lorenzo reconstruction algorithm for GPU architectures. (3) We carefully optimize each of cuSZ's kernels by leveraging state-of-the-art CUDA parallel primitives. (4) We evaluate cuSZ(x) ...
IEEE Transactions on Parallel and Distributed Systems, 2019
An effective data compressor is becoming increasingly critical to today's scientific research, and many lossy compressors are developed in the context of absolute error bounds. Based on physical/chemical definitions of simulation fields or multiresolution demand, however, many scientific applications need to compress the data with a pointwise relative error bound (i.e., the smaller the data value, the smaller the compression error to tolerate). To this end, we propose two optimized lossy compression strategies under a state-of-the-art three-staged compression framework (prediction + quantization + entropy-encoding). The first strategy (called blockbased strategy) splits the data set into many small blocks and computes an absolute error bound for each block, so it is particularly suitable for the data with relatively high consecutiveness in space. The second strategy (called multi-threshold-based strategy) splits the whole value range into multiple groups with exponentially increasing thresholds and performs the compression in each group separately, which is particularly suitable for the data with a relatively large value range and spiky value changes. We implement the two strategies rigorously and evaluate them comprehensively by using two scientific applications which both require lossy compression with point-wise relative error bound. Experiments show that the two strategies exhibit the best compression qualities on different types of data sets respectively. The compression ratio of our lossy compressor is higher than that of other state-of-the-art compressors by 17.2-618 percent on the climate simulation data and 30-210 percent on the N-body simulation data, with the same relative error bound and without degradation of the overall visualization effect of the entire data.
Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871), 2000
algorithm are widely used in a variety of applications, especially in data storage and communications. However, since the LZ algorithm involves a considerable amount of parallel comparisons, it may be difficult to achieve a very high throughput using software approaches on generalpurpose processors. In addition, error propagation due to single-bit transient errors during LZ compression causes a significant data integrity problem. In this paper, we present an implementation of LZ data compression on reconfigurable hardware with concurrent error detection for high performance and reliability. Our approach achieves 100Mbps throughput using four Xilinx 4036XLA FPGA chips. We have also presented an inverse comparison technique for LZ compression to guarantee data integrity with less area overhead than traditional systems based on duplication. The resulting execution time overhead and compression ratio degradation due to concurrent error detection is also minimized.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Applied sciences, 2022
Cadernos Do Ime Serie Informatica, 2003
2014 IEEE International Conference on Big Data (Big Data), 2014
Lecture Notes in Computer Science, 2012
Bioinformatics, 2020
SIAM Journal on Scientific Computing
2019 35th Symposium on Mass Storage Systems and Technologies (MSST)
2012 IEEE 28th International Conference on Data Engineering, 2012
Lecture Notes in Computer Science, 2017
2011 31st International Conference on Distributed Computing Systems, 2011
International Journal of Computer Applications, 2013
Concurrency and Computation: Practice and Experience, 2013
Image Processing, IEEE …, 1996
IEEE Transactions on Information Theory, 1996
IGARSS 2000. IEEE 2000 International Geoscience and Remote Sensing Symposium. Taking the Pulse of the Planet: The Role of Remote Sensing in Managing the Environment. Proceedings (Cat. No.00CH37120)