Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, Proceedings of the …
General purpose programming on the graphics processing units(GPGPU) has received a lot of attention in the parallel computing community as it promises to offer a large computational power at a very low price. GPGPU is best suited for regular data parallel algorithms. They are not directly amenable for algorithms which have irregular data access patterns such as convex hull, list ranking etc. In this paper, we present a GPU-optimized implementation for finding the convex hull of a two dimensional point set. Our implementation tries to minimize the impact of irregular data access patterns. Our implementation can find the convex hull of 10 million random points in less than 0.2 seconds and achieves a speedup of up to 14 over the standard sequential CPU implementation. We also discuss some of the practical issues relating to the implementation of convex hull algorithms on massively multithreaded architectures like that of the GPU.
2012
We present a novel algorithm to compute the convex hull of a point set in R using the graphics processing unit (GPU). By exploiting the relationship between the Voronoi diagram and the convex hull, we derive the answer from the former. Our algorithm only requires a few simple atomic operations and does not need explicit locking or any other concurrency control mechanism, thus it can maximize the parallelism available on the modern GPU. Our implementation using the CUDA programming model on Nvidia GPUs is robust, exact, and efficient. The experiments show that it is up to an order of magnitude faster than other sequential convex hull implementations running on the CPU for inputs of millions of points. We further extend our GPU approach to obtain the Delaunay triangulation of points in R by computing their 4D convex hull. Our works demonstrate that the GPU can be used to solve non-trivial computational geometry problems with significant performance benefit, without sacrificing accurac...
Cornell University - arXiv, 2022
Resumen-The convex hull is a fundamental geometrical structure for many applications where groups of points must be enclosed or represented by a convex polygon. Although efficient sequential convex hull algorithms exist, and are constantly being used in applications, their computation time is often considered an issue for time-sensitive tasks such as real-time collision detection, clustering or image processing for virtual reality, among others, where fast response times are required. In this work we propose a parallel GPU-based adaptation of heaphull, which is a state of the art CPU algorithm that computes the convex hull by first doing a efficient filtering stage followed by the actual convex hull computation. More specifically, this work parallelizes the filtering stage, adapting it to the GPU programming model as a series of parallel reductions. Experimental evaluation shows that the proposed implementation significantly improves the performance of the convex hull computation, reaching up to 4× of speedup over the sequential CPU-based heaphull and between 3× ∼ 4× over existing GPU based approaches.
2012
We present a novel algorithm to compute the convex hull of a point set in R 3 using the graphics processing unit (GPU). By exploiting the relationship between the Voronoi diagram and the convex hull, we derive the answer from the former. Our algorithm only requires a few simple atomic operations and does not need explicit locking or any other concurrency control mechanism, thus it can maximize the parallelism available on the modern GPU. Our implementation using the CUDA programming model on Nvidia GPUs is robust, exact, and efficient. The experiments show that it is up to an order of magnitude faster than other sequential convex hull implementations running on the CPU for inputs of millions of points. We further extend our GPU approach to obtain the Delaunay triangulation of points in R 3 by computing their 4D convex hull. Our works demonstrate that the GPU can be used to solve non-trivial computational geometry problems with significant performance benefit, without sacrificing accuracy or robustness.
International Journal of Computational Geometry & Applications, 1996
We present a parallel algorithm for finding the convex hull of a sorted point set. The algorithm runs in O( log log n) (doubly logarithmic) time using n/ log log n processors on a Common CRCW PRAM. To break the Ω( log n/ log log n) time barrier required to output the convex hull in a contiguous array, we introduce a novel data structure for representing the convex hull. The algorithm is optimal in two respects: (1) the time-processor product of the algorithm, which is linear, cannot be improved, and (2) the running time, which is doubly logarithmic, cannot be improved even by using a linear number of processors. The algorithm demonstrates the power of the “the divide-and-conquer doubly logarithmic paradigm” by presenting a non-trivial extension to situations that previously were known to have only slower algorithms.
Computers & Graphics, 2012
In this paper, we present a novel parallel algorithm for computing the convex hull of a set of points in 3D using the CUDA programming model. It is based on the QuickHull approach and starts by constructing an initial tetrahedron using four extreme points, discards the internal points, and distributes the external points to the four faces. It then proceeds iteratively. In each iteration, it refines the faces of the polyhedron, discards the internal points, and redistributes the remaining points for each face among its children faces. The refinement of a face is performed by selecting the furthest point from its associated points and generating three children triangles. In each iteration, concave edges are swapped, and concave vertices are removed to maintain convexity. The face refinement procedure is performed on the CPU, because it requires a very small fraction of the execution time (approximately 1%), and the intensive point redistribution is performed in parallel on the GPU. Our implementation outpaced the CPU-based Qhull implementation by 30 times for 10 million points and 40 times for 20 million points.
Journal of Algorithms, 1997
In this paper we present a truly practical and provably optimal O(n logh) time outputsensitive algorithm for the planar convex hull problem. The basic algorithm is similar to the algorithm presented in Chan, Snoeyink and Yap 2] where the median-nding step is replaced by an approximate median. We analyze two such schemes and show that for both methods, the algorithm runs in expected O(n log h) time. The expected number of comparisons can be made smaller than 5n logh for the upper-hull. We further show that the probability of deviation from expected running time approaches 0 rapidly with increasing values of n and h for any input. Our experiments suggest that this algorithm is a practical alternative to the worstcase O(n log n) algorithms like Graham's and especially faster for small output-sizes. Our approach bears some resemblance to a recent algorithm of Wenger 13] but our analysis is substantially di erent. The planar convex hull problem is perhaps the most studied problem in computational geometry and a large body of literature deals with computing convex hulls. Graham 5] was the rst to present an O(n log n) worst-case time algorithm. This algorithm is optimal as Yao 14] showed that (n log n) is the lower bound of the convex hull problem for the worst-case input. Some simple algorithms have O(n) expected time for known distributions of points such as uniform in a box, normal, etc. The rst output-sensitive algorithm was proposed by Chand and Kapur 3]. The two-dimensional version of their algorithm is known as the rope fence method and was independently reported by Jarvis 6]. The rope fence method takes O(nh) time to compute h extreme edges of the convex hull. Kirkpatrick and Seidel 8] proved an (n logh) lower bound when both input and output sizes are considered, so Yao's lower-bound is a special case when log h 2 (log n). They also proposed an O(n log h) optimal algorithm based on the prune-and-search technique developed by Dyer 4] and Megiddo 9]. However, it has high constants and is considered prohibitively complicated for implementation. Very recently, in 1], two O(n log h) algorithms have been proposed. One uses the linear-time median nding algorithm and the other uses a clever grouping technique. Although the latter algorithm does not have any expensive median-nding step it relies on a sophisticated logarithmic time tangent-nding routine.
An in-place algorithm is one in which the output is given in the same location as the input and only a small amount of additional memory is used by the algorithm. In this paper we describe three in-place algorithms for computing the convex hull of a planar point set. All three algorithms are optimal, some more so than others. . .
Bit Numerical Mathematics, 1990
We present a parallel algorithm for finding the convex hull of a sorted set of points in the plane. Our algorithm runs inO(logn/log logn) time usingO(n log logn/logn) processors in theCommon crcw pram computational model, which is shown to be time and cost optimal. The algorithm is based onn 1/3 divide-and-conquer and uses a simple pointer-based data structure.
Parallel Computing, 2001
We investigate the problem of ®nding the 2-D convex hull of a set of points on a coarsegrained parallel computer. Goodrich has devised a parallel sorting algorithm for n items on P processors which achieves an optimal number of communication phases for all ranges of P T n. Ferreira et al. have recently introduced a deterministic convex hull algorithm with a constant number of communication phases for n and P satisfying n P P 1 . Here we present a new parallel 2-D convex hull algorithm with an optimal bound on number of communication phases for all values of P T n while maintaining optimal local computation time. Ó .hk (X. Deng). 0167-8191/01/$ -see front matter Ó 2001 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 -8 1 9 1 ( 0 0 ) 0 0 0 9 7 -1
2011
Input point set (b) Voronoi construction (c) Star identification (d) Hull completion Figure 1: Some phases of the gHull algorithm.
Applied Mathematical Modelling, 2006
High performance machines have become available nowadays to an increasing number of researchers. Most of us might have both an access to a supercomputing center and an algorithm that could benefit from these high performance machines. The aim of the present work is to revisit all existing parallelization alternatives, including emerging technologies like software-only speculative parallelization, to solve on different architectures the same representative problem: The computation of the convex hull of a point set.
IEEE Transactions on Computers, 1988
A bstract-In this paper, we present parallel algorithms to identifv (i.e., detect and enumerate) the extreme points of the convex hull of a set of planar points using a hypercube, pyramid, tree, mesh-of-trees, mesh with reconfigurable bus, EREW PRAM, and a modified AKS network. It is known that the problem of identifying the convex hull for a set of planar points given arbitrarily cannot be solved faster than sorting. For the situation where the input set of n planar points is given ordered (by x-coordinate) one per processor on a machine with 8 (n) processors, we introduce a worst case hypercube algorithm that finishes in 80og n) time, a worst case algorithm for the pyramid, tree, and mesh-of-trees that finishes in 8(log3 n/(log log n)*) time, and a worst case algorithm for the mesh with a reconfigurable bus that finishes in 8(log2 n) time. Notice that for ordered data the sorting bound does not apply. We also show that our 80og n) time hypercube algorithm for ordered data extends to yield an optimal time and processor 80og n) worst case time EREW PRAM algorithm for the case where the set of planar points is distributed arbitrarily one point per processor. We also show that this algorithm can be extended to run in worst case 80og n) time on a modified AKS network, giving the first optimal 80og n) time algorithm for solving the convex hull problem for arbitrary planar input on a fixed degree network.
Procedia Computer Science
Computing the convex hull of a set of points is a fundamental issue in many fields, including geometric computing, computer graphics, and computer vision. This problem is computationally challenging, especially when the number of points is past the millions. In this paper, we propose a fast filtering technique that reduces the computational cost for computing a convex hull for a large set of points. The proposed method preprocesses the input set and filters all points inside a four-vertex polygon. The experimental results showed the proposed filtering approach achieved a speedup of up to 77 and 12 times faster than the standard Graham scan and Jarvis march algorithms, respectively.
This article explores three basic ap- proaches to parallelize the planar Convex Hull on computer clusters. Methods which are based on sort- ing the points are found to be inadequate for com- puter clusters due to the communication costs in- volved. N this article we discuss several methods for par- allelizing the computation of planar Convex Hulls. These methods are based on three sequential Convex Hull algorithms. Two of the algorithms we have con- sidered have optimal time complexity and require a preprocessing step where points are lexicographically ordered. The third one shows the best performance on randomized input set, where complexity is ex- pected (but not guaranteed) to be optimal. In this paper we show that the high communications cost as- sociated to computer clusters makes the sorted input algorithms perform badly. The rest of the article is organized as follows. In Section II we dene the Convex Hull of a set of points in the plane and introduce three algorithms for its...
Journal of Computational and Applied Mathematics, 2019
This work presents an optimization technique that reduces the computational cost for building the Convex Hull from a set of points. The proposed method pre-processes the input set, filtering all points inside an eight-vertex polygon in O(n) time and returns a reduced set of candidate points, ordered and distributed across four priority queues. Experimental results show that for a normal distribution of points in two-dimensional space, the filtering approach in conjunction with the Graham scan is up to 10× faster than the qhull library, and between 1.7× to 10× faster than the Convex Hull methods available in the CGAL library. Results on the worst case scenario (when all points lie in the circumference) show that a slight random radial displacement of the points make this method the fastest one. Moreover, when increasing the magnitude of this displacement, the performance of the proposed method scales at a faster rate than the other methods. In terms of memory efficiency, the proposed implementation manages to use from 3× to 6× less memory than the other methods. The reason behind this memory improvement is because the proposed method stores indices of the input arrays, avoiding duplicates of the original floating points. Furthermore, the approach extends the problem size up to n ≤ 2 40 by employing 5-byte indices (instead of 8bytes) when n ≥ 2 32. The optimization technique presented in this work has shown to be significantly useful in accelerating the computation of the Convex Hull, and it is not limited just to the combination with the Graham scan, but it can also be used in conjunction with other Convex Hull algorithms.
Lecture Notes in Computer Science, 2004
Finding the fastest algorithm to solve a problem is one of the main issues in Computational Geometry. Focusing only on worst case analysis or asymptotic computations leads to the development of complex data structures or hard to implement algorithms. Randomized algorithms appear in this scenario as a very useful tool in order to obtain easier implementations within a good expected time bound. However, parallel implementations of these algorithms are hard to develop and require an in-depth understanding of the language, the compiler and the underlying parallel computer architecture. In this paper we show how we can use speculative parallelization techniques to execute in parallel iterative algorithms such as randomized incremental constructions. In this paper we focus on the convex hull problem, and show that, using our speculative parallelization engine, the sequential algorithm can be automatically executed in parallel, obtaining speedups with as little as four processors, and reaching 5.15x speedup with 28 processors.
Theoretical Computer Science, 2004
A space-efficient algorithm is one in which the output is given in the same location as the input and only a small amount of additional memory is used by the algorithm. We describe four space-efficient algorithms for computing the convex hull of a planar point set.
IEEE Symposium on Foundations of Computer Science, 1994
We give fast randomized and deterministic parallel methodsfor constructing convex hulls in IRd, for any fixed d.Our methods are for the weakest shared-memory model,the EREW PRAM, and have optimal work bounds (withhigh probability for the randomized methods). In particular,we show that the convex hull of n points in IRdcanbe constructed in O(logn) time using O(n log n + nbd=2c)work, with
Theory of Computing Systems, 1997
We present a randomized parallel algorithm for constructing the threedimensional convex hull on a generic p-processor coarse-grained multicomputer with arbitrary interconnection network and n/ p local memory per processor, where n/ p ≥ p 2+ε (for some arbitrarily small ε > 0). For any given set of n points in 3-space, the algorithm computes the three-dimensional convex hull, with high probability, in O((n log n)/ p) local computation time and O(1) communication phases with at most O(n/ p) data sent/received by each processor. That is, with high probability, the algorithm computes the three-dimensional convex hull of an arbitrary point set in time O((n log n)/ p + n, p ), where n, p denotes the time complexity of one communication phase. The assumption n/ p ≥ p 2+ε implies a coarse-grained, limited parallelism, model which is applicable to most commercially available multiprocessors.
Computing in Science & Engineering, 2009
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.