Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
…
68 pages
1 file
Abstract: One important bottleneck when visualizing large data sets is the data trans-fer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) al-gorithms take into consideration the memory hierarchy to design cache efficient algo-rithms. CO approaches have the ...
IEEE Transactions on Visualization and Computer Graphics, 2000
One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cacheaware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but they lack of theoretical performance guarantees. We present in this paper a O OðN log NÞ algorithm to compute a CO layout for unstructured but well shaped meshes. We prove that a coherent traversal of a N-size mesh in dimension d induces less than N=B þ O OðN=M 1=d Þ cache-misses where B and M are the block size and the cache size, respectively. Experiments show that our layout computation is faster and significantly less memory consuming than the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better when the BSP tree produced while computing the layout is used as an acceleration data structure adjusted to the layout. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures.
Current computer architectures employ caching to improve the performance of a wide variety of applications. One of the main characteristics of such cache schemes is the use of block fetching whenever an uncached data element is accessed. To maximize the benefit of the block fetching mechanism, we present novel cache-aware and cache-oblivious layouts of surface and volume meshes that improve the performance of interactive visualization and geometric processing algorithms. Based on a general I/O model, we derive new cache-aware and cache-oblivious metrics that have high correlations with the number of cache misses when accessing a mesh. In addition to guiding the layout process, our metrics can be used to quantify the quality of a layout, e.g. for comparing different layouts of the same mesh and for determining whether a given layout is amenable to significant improvement. We show that layouts of unstructured meshes optimized for our metrics result in improvements over conventional layouts in the performance of visualization applications such as isosurface extraction and view-dependent rendering. Moreover, we improve upon recent cache-oblivious mesh layouts in terms of performance, applicability, and accuracy.
ACM Transactions on …, 2005
We present a novel method for computing cache-oblivious layouts of large meshes that improve the performance of interactive visualization and geometric processing algorithms. Given that the mesh is accessed in a reasonably coherent manner, we assume no particular data access patterns or cache parameters of the memory hierarchy involved in the computation. Furthermore, our formulation extends directly to computing layouts of multi-resolution and bounding volume hierarchies of large meshes. We develop a simple and practical cache-oblivious metric for estimating cache misses. Computing a coherent mesh layout is reduced to a combinatorial optimization problem. We designed and implemented an out-of-core multilevel minimization algorithm and tested its performance on unstructured meshes composed of tens to hundreds of millions of triangles. Our layouts can significantly reduce the number of cache misses. We have observed 2-20 times speedups in view-dependent rendering, collision detection, and isocontour extraction without any modification of the algorithms or runtime applications.
2012 13th International Workshop on Cellular Nanoscale Networks and their Applications, 2012
2003 Design, Automation and Test in Europe Conference and Exhibition, 2003
Memory access consumes a significant amount of energy in data transfer intensive applications. The selection of a memory location from a CMOS memory cell array involves driving row and column select lines. A switching event on a row select line often consumes significantly more energy in comparison to a switching event on a column select line. In order to exploit this difference in energy consumption of row and column select lines, we propose a novel data layout method that aims to minimize row switching activity by assigning spatially and temporally local data items to the same row. The problem of minimum row switching data layout has been formulated as a multi-way mesh partitioning problem. The constraints imposed on the problem formulation ensure that the complexity of the address generator required to implement the optimized data layout is bounded and that the data layout optimization can be applied to all address generator synthesis methods. Our experiments demonstrate that our method can significantly reduce row transition counts over row major data layout.
Computer Graphics Forum, 2006
We present a novel algorithm to compute cache-efficient layouts of bounding volume hierarchies (BVHs) of polygonal models. Our approach does not make any assumptions about the cache parameters or block sizes of the memory hierarchy. We introduce a new probabilistic model to predict the runtime access patterns of a BVH. Our layout computation algorithm utilizes parent-child and spatial localities between the accessed nodes to reduce both the number of cache misses and the size of the working set. Our algorithm also works well for spatial partitioning hierarchies including kd-trees. We use our algorithm to compute layouts of BVHs and spatial partitioning hierarchies of large models composed of millions of triangles. We compare our cache-efficient layouts with other layouts in the context of collision detection and ray tracing. In our benchmarks, our layouts consistently show better performance over other layouts and improve the performance of these applications by 26%-300% without any modification of the underlying algorithms or runtime applications.
IEEE Transactions on …, 2003
Very large triangle meshes, i.e. meshes composed of millions of faces, are becoming common in many applications. Obviously, processing, rendering, transmission and archival of these meshes are not simple tasks. Mesh simplification and LOD management are a rather mature technology that in many cases can efficiently manage complex data. But only few available systems can manage meshes characterized by a huge size: RAM size is often a severe bottleneck. In this paper we present a data structure called Octreebased External Memory Mesh (OEMM ). It supports external memory management of complex meshes, loading dynamically in main memory only the selected sections and preserving data consistency during local updates. The functionalities implemented on this data structure (simplification, detail preservation, mesh editing, visualization and inspection) can be applied to huge triangles meshes on lowcost PC platforms. The time overhead due to the external memory management is affordable. Results of the test of our system on complex meshes are presented.
1998
One of the main challenges in global illumination is rendering scenes with millions of polygons and megabytes of textures. Combining the processing power and the memory of multiple processors or workstations to render these complex scenes is an attractive proposition but the complex interactions between data and processing introduces a significant amount of data communication. Data locality methods may improve cache coherence and cache access coherence by finding an optimal data partitioning, by re-ordering computations, and by replacing complex geometry with simplified image-based representations. We review the different data locality methods and focus on local caching of global radiance values. We present the results of an implementation in the ray tracing program Radiance.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Computing Research Repository, 2011
Eurographics Workshop on Parallel Graphics and Visualization, 2012
… on Numerical Grid …, 2007
International Symposium on Code Generation and Optimization, 2003. CGO 2003., 2003
Lecture Notes in Computer Science, 2004
Proceedings of the 14th international symposium on Systems synthesis - ISSS '01, 2001
Concurrency: Practice and Experience, 1993
IEEE Transactions on Computers, 2000
… of Conference on Parallel and Distributed …, 2002