Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Large scale simulation is moving towards sustained teraflop rates. This simulation power potentiates the most cutting edge large-scale resources to solve large and challenging problems in science and engineering. For finite element simulations, a significant challenge is how to generate meshes with billions of nodes and elements and to deliver such meshes to processors of such large-scale systems. In this work, we discuss some strategies ranging from parametric mesh generators, suitable for simple geometries, to general approaches using octree based meshes for immersed boundary geometries.
Concurrency and Computation: Practice and Experience, 2012
We show a parallel implementation and performance analysis of a linear octree-based mesh generation scheme designed to create reasonable-quality, geometry-adapted unstructured hexahedral meshes automatically from triangulated surface models. We present algorithms for the construction, 2:1 balancing and meshing large linear octrees on supercomputers. Our scheme uses efficient computer graphics algorithms for surface detection, allowing us to represent complex geometries. An isogranular analysis demonstrates good scalability. Our implementation is able to execute the 2:1 balancing operations over 3.4 billion octants in less than 10 s per 1.6 million octants per CPU core. .
53rd AIAA Aerospace Sciences Meeting, 2015
Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
2011
Scientists commonly turn to supercomputers or Clusters of Workstations with hundreds (even thousands) of nodes to generate meshes for large-scale simulations. Parallel mesh generation software is then used to decompose the original mesh generation problem into smaller sub-problems that can be solved (meshed) in parallel. The size of the final mesh is limited by the amount of aggregate memory of the parallel machine. Also, requesting many compute nodes on a shared computing resource may result in a long waiting, far surpassing the time it takes to solve the problem. These two problems (i.e., insufficient memory when computing on a small number of nodes, and long waiting times when using many nodes from a shared computing resource) can be addressed by using out-of-core algorithms. These are algorithms that keep most of the dataset out-of-core (i.e., outside of memory, on disk) and load only a portion in-core (i.e., into memory) at a time. We explored two approaches to out-of-core comp...
2006
Parallel supercomputing has traditionally focused on the inner kernel of scientific simulations: the solver. The front and back ends of the simulation pipeline - problem description and interpretation of the output - have taken a back seat to the solver when it comes to attention paid to scalability and performance, and are often relegated to offline, sequential computation. As the largest simulations move beyond the realm of the terascale and into the petascale, this decomposition in tasks and platforms becomes increasingly untenable. We propose an end-to-end approach in which all simulation components - meshing, partitioning, solver, and visualization - are tightly coupled and execute in parallel with shared data structures and no intermediate I/O. We present our implementation of this new approach in the context of octree-based finite element simulation of earthquake ground motion. Performance evaluation on up to 2048 processors demonstrates the ability of the end-to-end approach to overcome the scalability bottlenecks of the traditional approach
2012
Dealing with large simulation is a growing challenge. Ideally for the wellparallelized software prepared for high performance, the problem solving capability depends on the available hardware resources. But in practice there are several technical details which reduce the scalability of the system and prevent the effective use of such a software for large problems. In this work we describe solutions implemented in order to obtain a scalable system to solve and visualize large scale problems. The present work is based on Kratos MutliPhysics [1] framework in combination with GiD [2] pre and post processor. The applied techniques are verified by CFD simulation and visualization of a wind tunnel problem with more than 100 millions of elements in our in-hose cluster in CIMNE.
Engineering with Computers, 2006
Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior, the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one to write parallel programs that operate on unstructured meshes with only minimal knowledge of parallel computing, while making it possible to achieve excellent scalability even for complex applications. Charm++'s messagedriven model enables computation/communication overlap, while its run-time load balancing capabilities make it possible to react to the changes in computational load that occur in dynamic physics applications. The framework is highly flexible and has been enhanced with numerous capabilities for the manipulation of unstructured meshes, such as parallel mesh adaptivity and collision detection. 1
Computer Methods in Applied Mechanics and Engineering, 1996
Mesh partitioning is often the preferred approach for solving unstructured computational mechanics problems on massively parallel processors. Research in this area has focused so far on the automatic generation of subdomains with minimum interface points. In this paper, we address this issue and emphasize other aspects of the partitioning problem including the fast generation of large-scale mesh decompositions on conventional workstations, the optimization of initial decompositions for specific kernels such as parallel frontal solvers and domain decomposition based iterative methods, and parallel adaptive refinement. More specifically, we discuss a two-step partitioning paradigm for tailoring generated mesh partitions to specific applications, and propose a simple mesh contraction procedure for speeding up the optimization of initial mesh decompositions. We discuss what defines a good mesh partition for a given problem, and show that the methodology proposed herein can produce better mesh partitions than the well celebrated multilevel Recursive Spectral Bisection algorithm, and yet be an order of magnitude faster. We illustrate the combined two-step partitioning and contraction methodology with several examples from structural mechanics and fluid dynamics problems, and highlight its impact on the total solution time of realistic applications on current massively parallel processors. In particular, we show that the minimum interface size criterion does not have a significant impact on a reasonably well parallelized application, and highlight other criterion which can have a significant impact.
This paper discusses the implementation of a distributed geometry for parallel mesh generation, involving dynamic load-balancing and hence dynamic re-partitioning of the geometry. A novel approach is described for improving the efficiency of the distributed geometry interface when dealing with irregular shaped mesh partitions.
Advances in Engineering Software, 2022
This paper presents an algorithm for highly-parallel loading and processing of unstructured mesh databases in a distributed memory environment of large HPC clusters without collecting data into a single process. The algorithm is proved effective, having linear speedup in the large dataset limit. Demonstrated on Ansys CDB, EnSight, VTK Legacy, and XDMF databases, we show that it is possible to efficiently reconstruct meshes with 800 million nodes and 500 million elements in several seconds on thousands of processors, even from databases that were not designed to be read in parallel. The algorithm is implemented in our MESIO library that can be used as (i) an efficient parallel loader (e.g. for numerical physical solvers) or as (ii) a high performing parallel converter between mesh databases.
Proceedings of the 18th International Meshing Roundtable, 2009
Mesh generation is a critical component for many (bio-)engineering applications. However, parallel mesh generation codes, which are essential for these applications to take the fullest advantage of the high-end computing platforms, belong to the broader class of adaptive and irregular problems, and are among the most complex, challenging, and labor intensive to develop and maintain. As a result, parallel mesh generation is one of the last applications to be installed on new parallel architectures. In this paper we present a way to remedy this problem for new highly-scalable architectures. We present a multi-layered tetrahedral/triangular mesh generation approach capable of delivering and sustaining close to 10 18 of concurrent work units. We achieve this by leveraging concurrency at different granularity levels using a hybrid algorithm, and by carefully matching these levels to the hierarchy of the hardware architecture. This paper makes two contributions: (1) a new evolutionary path for developing multi-layered parallel mesh generation codes capable of increasing the concurrency of the state-of-the-art parallel mesh generation methods by at least 10 orders of magnitude and (2) a new abstraction for multilayered runtime systems that target parallel mesh generation codes, to efficiently orchestrate intra-and inter-layer data movement and load balancing for current and emerging multi-layered architectures with deep memory and network hierarchies.
2008
Generating finite-element meshes is a serious bottleneck for large parallel simulations. When mesh generation is limited to serial machines and element counts approach a billion, this bottleneck becomes a roadblock. pamgen is a parallel mesh generation library that allows on-the-fly scalable generation of hexahedral and quadrilateral finite element meshes for several simple geometries. It has been used to generate more that 1.1 billion elements on 17,576 processors. pamgen generates an unstructured finite element mesh on each processor at the start of a simulation. The mesh is specified by commands passed to the library as a "C"-programming language string. The resulting mesh geometry, topology, and communication information can then be queried through an API. pamgen allows specification of boundary condition application regions using sidesets (element faces) and nodesets (collections of nodes). It supports several simple geometry types. It has multiple alternatives for mesh grading. It has several alternatives for the initial domain decompositon. pamgen makes it easy to change details of the finite element mesh and is very usesful for performance studies and scoping calculations.
2013
The ever-growing demand for higher accuracy in scientific simulations based on the discretization of equations given on physical domains is typically coupled with an increase in the number of mesh elements. Conventional mesh generation tools struggle to keep up with the increased workload, as they do not scale with the availability of, for example, multi-core CPUs. We present a parallel mesh generation approach for multi-core and distributed computing environments based on our generic meshing library ViennaMesh and on the Advancing Front mesh generation algorithm. Our approach is discussed in detail and performance results are shown.
Mathematics and Computers in Simulation, 2007
Mesh generation is a critical step in high fidelity computational simulations. High-quality and high-density meshes are required to accurately capture the complex physical phenomena. A robust approach for a parallel framework has been developed to generate large-scale meshes in a short period of time. A coarse tetrahedral mesh is generated first to provide the basis of block interfaces and then is partitioned into a number of sub-domains using METIS partitioning algorithms. A volume mesh is generated on each sub-domain in parallel using an advancing front method. Dynamic load balancing is achieved by evenly distributing work among the processors. All the sub-domains are combined to create a single volume mesh. The combined volume mesh can be smoothed to remove the artifacts in the interfaces between sub-domains. A void region is defined inside each sub-domain to reduce the data points during the smoothing operation. The scalability of the parallel mesh generation is evaluated to quantify the improvement on shared-and distributed-memory computer systems.
2015
In this poster we focus and present our preliminary results pertinent to the integration of multiple parallel Delaunay mesh generation methods into a coherent hierarchical framework. The goal of this project is to study our telescopic approach and to develop Delaunay-based methods to explore concurrency at all hardware layers using abstractions at (a) medium-grain level for many cores within a single chip and (b) coarse-grain level, i.e., sub-domain level using proper error metricand application-specific continuous decomposition methods. © 2015 The Authors. Published by Elsevier Ltd. Peer-review under responsibility of organizing committee of the 24th International Meshing Roundtable (IMR24).
2015
In this paper, we describe an array-based hierarchical mesh generation capability through uniform refinement of unstructured meshes for efficient solution of PDE’s using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial mesh that can be used for a number of purposes such as multi-level methods to generating large meshes. The capability is developed under the parallel mesh framework “Mesh Oriented dAtaBase” a.k.a MOAB [16]. We describe the underlying data structures and algorithms to generate such hierarchies and present numerical results for computational efficiency and mesh quality. We also present results to demonstrate the applicability of the developed capability to a multigrid finite-element solver. c © 2015 The Authors. Published by Elsevier Ltd. Peer-review under responsibility of organizing committee of the 24th International Meshing Roundtable (IMR24).
Communications in Numerical Methods in Engineering, 1994
The paper demonstrates an approach to generate three-dimensional boundary-fitted computational meshes efficiently. One basic idea underlying the present study is that often similar geometries have to be meshed, and therefore an efficient mesh-adaption method, which allows adaptation of the topological mesh to the specific geometry, would be more efficient than generating all new meshes. On the other hand the mesh generation for Cartesian topologies has been shown to be a very simple task. It can be executed by connecting and removing brick elements to a basic cube. In connection with a so-called 'Macro Command Language', a high degree of automation can be reached when adapting topologically defined meshes to a surface. Furthermore, a high mesh quality has proved to be the key to good simulation results. During the mesh generation it is important to provide the possibility of modifying the mesh quality and also the mesh density at any time of the meshing process. Using this generation method the meshing time is reduced-e.g. a computational mesh for a two-valve cylinder head can be generated within a few hours.
Advances in Engineering Software, 2013
This work describes a techni que for generating two-dimensional triangular meshes using distributed memory parallel computers, based on a master/slaves model. This techni que uses a coarse quadtree to decompo se the domain and a serial advancing front techni que to generate the mesh in each subdomain concurrently. In order to advance the front to a neighboring subdomain, each subdomain suffers a shift to a Cartesian direction, and the same advancing front approach is performed on the shi fted subdomain. This shift-and-remesh procedure is repeatedly applied until no more mesh can be generated, shifting the subdomains to different directions each turn. A finer quadtree is also employed in this work to help estimate the processing load associated with each subdomain. This load estimati on technique produces results that accurately represent the numbe r of elements to be generated in each subdomain, leading to proper runtime prediction and to a well-balanced algorithm. The meshes generated with the parallel technique have the same quality as those generated serially, within acceptable limits. Although the presented approach is two-dimensional, the idea can be easily extended to three dimensions.
Advances in Computational Mechanics for Parallel and Distributed Processing, 1997
Parallel mesh generation is an important feature of any large distributed memory parallel computational mechanics code due to the need to ensure that (i) there are no sequential bottlenecks within the code, (ii) there is no parallel overhead incurred in partitioning an existing mesh and (iii) that no single processor is required to have enough local memory to be able to store the entire mesh. In recent years numerous algorithms have been proposed for the generation of unstructured nite element and nite volume meshes in parallel. One of the main problems with many of these approaches however is that thenal mesh, once generated, cannot generally be guaranteed to be perfectly load-balanced. In this paper we propose a post-processing step for the parallel mesh generator, based upon a cheap and e cient dynamic load-balancing technique. This technique is described and a number of numerical examples are presented in order to demonstrate that the quality of the partition of the mesh can be improved signi cantly at only a small additional computational cost.
Lecture Notes in Computational Science and Engineering
Parallel mesh generation is a relatively new research area between the boundaries of two scientific computing disciplines: computational geometry and parallel computing. In this chapter we present a survey of parallel unstructured mesh generation methods. Parallel mesh generation methods decompose the original mesh generation problem into smaller subproblems which are meshed in parallel. We organize the parallel mesh generation methods in terms of two basic attributes: (1) the sequential technique used for meshing the individual subproblems and (2) the degree of coupling between the subproblems. This survey shows that without compromising in the stability of parallel mesh generation methods it is possible to develop parallel meshing software using off-the-shelf sequential meshing codes. However, more research is required for the efficient use of the state-of-the-art codes which can scale from emerging chip multiprocessors (CMPs) to clusters built from CMPs.
2005 IEEE Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, 2005
In this paper we present two approaches for parallel out-of-core mesh generation. The first approach is based on a traditional prioritized page replacement algorithm using prioritized version of accepted LRU replacement scheme proposed by Salmon et al. for nbody calculations. The second approach is based on the percolation model proposed for the HTMT petaflops design. We evaluate both approaches using the parallel constrained Delaunay mesh generation method. Our preliminary data suggest that for problem sizes up to half a billion element meshes the traditional approach is very effective. However for larger problem sizes (in the order of billions of elements) the traditional approach becomes prohibitively expensive, but it appears from our preliminary data that the non-traditional percolation approach is a good alternative.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.