Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018, arXiv (Cornell University)
…
30 pages
1 file
Compression of floating-point data will play an important role in high-performance computing as data bandwidth and storage become dominant costs. Lossy compression of floating-point data is powerful, but theoretical results are needed to bound its errors when used to store look-up tables, simulation results, or even the solution state during the computation. In this paper, we analyze the round-off error introduced by ZFP, a lossy compression algorithm. The stopping criteria for ZFP depends on the compression mode specified by the user; either fixed rate, fixed accuracy, or fixed precision . While most of our discussion is focused on the fixed precision mode of ZFP, we establish a bound on the error introduced by all three compression modes. In order to tightly capture the error, we first introduce a vector space that allows us to work with binary representations of components. Under this vector space, we define operators that implement each step of the ZFP compression and decompression to establish a bound on the error caused by ZFP. To conclude, numerical tests are provided to demonstrate the accuracy of the established bounds.
2020
Currently, the dominating constraint in many high performance computing applications is data capacity and bandwidth, in both inter-node communications and even more-so in on-node data motion. A new approach to address this limitation is to make use of data compression in the form of a compressed data array. Storing data in a compressed data array and converting to standard IEEE-754 types as needed during a computation can reduce the pressure on bandwidth and storage. However, repeated conversions (lossy compression and decompression) introduce additional approximation errors, which need to be shown to not significantly affect the simulation results. We extend recent work [J. Diffenderfer, et al., Error Analysis of ZFP Compression for Floating-Point Data, SIAM Journal on Scientific Computing, 2019] that analyzed the error of a single use of compression and decompression of the ZFP compressed data array representation [P. Lindstrom, Fixed-rate compressed floating-point arrays, IEEE Tr...
This paper describes FPC, a lossless compression algorithm for linear streams of 64-bit floating-point data. FPC is designed to compress well while at the same time meeting the high throughput demands of scientific computing environments. On our thirteen datasets, it achieves a substantially higher average compression ratio than BZIP2, DFCM, FSD, GZIP, and PLMI. At comparable compression ratios, it compresses and decompresses 8 to 300 times faster than the other five algorithms.
Visualization and Computer …, 2006
Large scale scientific simulation codes typically run on a cluster of CPUs that write/read time steps to/from a single file system. As data sets are constantly growing in size, this increasingly leads to I/O bottlenecks. When the rate at which data is produced exceeds the available I/O bandwidth, the simulation stalls and the CPUs are idle. Data compression can alleviate this problem by using some CPU cycles to reduce the amount of data needed to be transfered. Most compression schemes, however, are designed to operate offline and seek to maximize compression, not throughput. Furthermore, they often require quantizing floating-point values onto a uniform integer grid, which disqualifies their use in applications where exact values must be retained. We propose a simple scheme for lossless, online compression of floating-point data that transparently integrates into the I/O of many applications. A plug-in scheme for data-dependent prediction makes our scheme applicable to a wide variety of data used in visualization, such as unstructured meshes, point sets, images, and voxel grids. We achieve state-of-the-art compression rates and speeds, the latter in part due to an improved entropy coder. We demonstrate that this significantly accelerates I/O throughput in real simulation runs. Unlike previous schemes, our method also adapts well to variable-precision floating-point and integer data.
hipc.org
Bandwidth limitations are a bottleneck in several applications. These limitations may be (i) memory bandwidth, especially on multi core processors (ii) network bandwidth in MPI applications (iii) Bandwidth to disk in I/O intensive applications or (iv) WAN bandwidth in remote sensing applications that send observational data to a central site. Fast compression of floating point numbers can ameliorate bandwidth limitations. Here, it is crucial for the speeds of compression and decompression to be greater than the bandwidth. Otherwise, it will be faster to send the data directly. In this paper, we investigate the effectiveness of a simple Stride Based Compression Algorithm to deal with this problem, on the Cell BE processor. We find that our approach is not fast enough for dealing with the memory bandwidth limitations. However, it can be effective in dealing with the other three limitations.
… , IEEE Transactions on, 2009
Many scientific programs exchange large quantities of double-precision data between processing nodes and with mass storage devices. Data compression can reduce the number of bytes that need to be transferred and stored. However, data compression is only likely to be employed in high-end computing environments if it does not impede the throughput. This paper describes and evaluates FPC, a fast lossless compression algorithm for linear streams of 64-bit floating-point data. FPC works well on hard-to-compress scientific datasets and meets the throughput demands of high-performance systems. A comparison with five lossless compression schemes, BZIP2, DFCM, FSD, GZIP, and PLMI, on four architectures and thirteen datasets shows that FPC compresses and decompresses one to two orders of magnitude faster than the other algorithms at the same geometric-mean compression ratio. Moreover, FPC provides a guaranteed throughput as long as the prediction tables fit into the L1 data cache. For example, on a 1.6 GHz Itanium 2 server, the throughput is 670 megabytes per second regardless of what data are being compressed.
This paper describes and evaluates pFPC, a parallel implementation of the lossless FPC compression algorithm for 64-bit floating-point data. pFPC can trade off compression ratio for throughput. For example, on a 4-core 3 GHz Xeon system, it compresses our nine datasets by 18% at a throughput of 1.36 gigabytes per second and by 41% at a throughput of 570 megabytes per second. Decompression is even faster. Our experiments show that the thread count should match or be a small multiple of the data's dimensionality to maximize the compression ratio and the chunk size should be at least equal to the system's page size to maximize the throughput.
Senior Member, IEEE (a) 1 bit/double (b) 4 bits/double (c) 64 bits/double (no compression) Fig. 1: Interval volume renderings of compressed double-precision floating-point data on a 384 × 384 × 256 grid. At 4 bits/double (16x compression) the image is visually indistinguishable from full 64-bit precision.
IEICE Transactions on Information and Systems, 2012
In numerical simulations using massively parallel computers like GPGPU (General-Purpose computing on Graphics Processing Units), we often need to transfer computational results from external devices such as GPUs to the main memory or secondary storage of the host machine. Since size of the computation results is sometimes unacceptably large to hold them, it is desired that the data is compressed and stored. In addition, considering overheads for transferring data between the devices and host memories, it is preferable that the data is compressed in a part of parallel computation performed on the devices. Traditional compression methods for floating-point numbers do not always show good parallelism. In this paper, we propose a new compression method for massivelyparallel simulations running on GPUs, in which we combine a few successive floating-point numbers and interleave them to improve compression efficiency. We also present numerical examples of compression ratio and throughput obtained from experimental implementations of the proposed method runnig on CPUs and GPUs.
IEEE Transactions on Parallel and Distributed Systems, 2019
Scientific simulations on high-performance computing (HPC) systems generate vast amounts of floating-point data that need to be reduced in order to lower the storage and I/O cost. Lossy compressors trade data accuracy for reduction performance and have been demonstrated to be effective in reducing data volume. However, a key hurdle to wide adoption of lossy compressors is that the trade-off between data accuracy and compression performance, particularly the compression ratio, is not well understood. Consequently, domain scientists often need to exhaust many possible error bounds before they can figure out an appropriate setup. The current practice of using lossy compressors to reduce data volume is, therefore, through trial and error, which is not efficient for large datasets which take a tremendous amount of computational resources to compress. This paper aims to analyze and estimate the compression performance of lossy compressors on HPC datasets. In particular, we predict the compression ratios of two modern lossy compressors that achieve superior performance, SZ and ZFP, on HPC scientific datasets at various error bounds, based upon the compressors' intrinsic metrics collected under a given base error bound. We evaluate the estimation scheme using twenty real HPC datasets and the results confirm the effectiveness of our approach.
The geometric data sets found in scientific and industrial applications are often very detailed. Storing them using standard uncompressed formats results in large files that are expensive to store and slow to load and transmit. Many efficient mesh compression techniques have been proposed, but scientists and engineers often refrain from using them because they modify the mesh data. While connectivity is encoded in a lossless manner, the floating-point coordinates associated with the vertices are quantized onto a uniform integer grid for efficient predictive compression. Although a fine enough grid can usually represent the data with sufficient precision, the original floating-point values will change, regardless of grid resolution. In this paper we describe how to compress floating-point coordinates using predictive coding in a completely lossless manner. The initial quantization step is omitted and predictions are calculated in floating-point. The predicted and the actual floating-point values are then broken up into sign, exponent, and mantissa and their corrections are compressed separately with context-based arithmetic coding. As the quality of the predictions varies with the exponent, we use the exponent to switch between different arithmetic contexts. Although we report compression results using the popular parallelogram predictor, our approach works with any prediction scheme. The achieved bit-rates for lossless floating-point compression nicely complement those resulting from uniformly quantizing with different precisions.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2007 IEEE International Conference on Image Processing, 2007
World Applied Sciences …, 2013
International Journal of Signal and Imaging Systems Engineering, 2017
2019 15th International Conference on eScience (eScience)
International Journal of Computer Applications, 2021
IEEE Transactions on Parallel and Distributed Systems, 2019
Lecture Notes in Computer Science, 2005
Proceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia, 2017
IEEE Transactions on Information Theory, 1996
2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), 2016
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000
Proceedings DCC 2000. Data Compression Conference, 2000
IEEE Transactions on Circuits and Systems II: Express Briefs, 2021
IEEE Transactions on Information Theory, 1997
Information Processing Letters, 2002