Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2020, Astronomy and Computing
…
28 pages
1 file
Realizing the next generation of radio telescopes such as the Square Kilometre Array (SKA) requires both more efficient hardware and algorithms than today's technology provides. The image-domain gridding (IDG) algorithm is a novel approach towards solving the most compute-intensive parts of creating sky images: gridding and degridding. It alleviates the performance bottlenecks of traditional AW-projection gridding by applying instrumental and environmental corrections in the image domain instead of in the Fourier domain. In this paper, we present a thorough performance analysis of this algorithm for an Intel Xeon CPU, Intel Xeon Phi, and GPUs from AMD and NVIDIA. We show that, by evaluating trigonometric functions in hardware, GPUs are both much faster and more energy efficient than a CPU or Xeon Phi. Furthermore, on GPUs, IDG is an order of magnitude faster and more energy efficient than traditional AW-projection. IDG on GPUs is the ideal candidate imaging technique for the SKA, as it meets the computational and energy constraints of the SKA Science Data Processor system.
2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2017
Realizing the next generation of radio telescopes such as the Square Kilometre Array (SKA) requires both more efficient hardware and algorithms than today's technology provides. The recently introduced image-domain gridding (IDG) algorithm is a novel approach towards solving the most compute-intensive parts of creating sky images: gridding and degridding. It avoids the performance bottlenecks of traditional AW-projection gridding by applying instrumental and environmental corrections in the image domain instead of in the Fourier domain. In this paper, we present the first implementations of this new algorithm for CPUs and Graphics Processing Units (GPUs). A thorough performance analysis, in which we apply a modified roofline analysis, shows that our parallelization approaches and optimizations lead to nearly optimal performance on these architectures. The analysis also indicates that, by leveraging dedicated hardware to evaluate trigonometric functions, GPUs are both much faster...
2020 XXXIIIrd General Assembly and Scientific Symposium of the International Union of Radio Science, 2020
Modern radio observatories provide astronomers with large data sets. To turn these data sets into science, astronomers require imaging techniques that provide state-of-the-art image quality (e.g. resolution and dynamic range) at high performance (e.g. modest runtime). We integrated an efficient GPU-accelerated implementation of the Image-Domain Gridding (IDG) algorithm into the widely used WSClean imager to fulfill these needs. WSClean features several novel deconvolution techniques, such as auto-masked multi-scale, and parallel cleaning, while IDG is capable of correcting full-polarization direction-dependent effects (DDEs) during imaging. In this paper we give an overview of the current state of our work: we introduce IDG, we present performance of IDG on contemporary accelerator hardware, we show the impact of DDE correction on image quality and we discuss its suitability for the demanding EoR science case as an example. All in all, WSClean+IDG is an excellent imager for current ...
IEEE Access
Radio telescopes produce large volumes of data that need to be processed to obtain high-resolution sky images. This is a complex task that requires computing systems that provide both high performance and high energy efficiency. Hardware accelerators such as GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays) can provide these two features and are thus an appealing option for this application. Most HPC (High-Performance Computing) systems operate in double precision (64-bit) or in single precision (32-bit), and radio-astronomical imaging is no exception. With reduced precision computing, smaller data types (e.g., 16-bit) are used to improve energy efficiency and throughput performance in noise-tolerant applications. We demonstrate that reduced precision can also be used to produce high-quality sky images. To this end, we analyze the gridding component (Image-Domain Gridding) of the widely-used WSClean imaging application. Gridding is typically one of the most time-consuming steps in the imaging process and, therefore, an excellent candidate for acceleration. We identify the minimum required exponent and mantissa bits for a custom floating-point data type. Then, we propose the first custom floating-point accelerator on a Xilinx Alveo U50 FPGA using High-Level Synthesis. Our reduced-precision implementation improves the throughput and energy efficiency of respectively 1.84x and 2.03x compared to the single-precision floating-point baseline on the same FPGA. Our solution is also 2.12x faster and 3.46x more energy-efficient than an Intel i9 9900k CPU (Central Processing Unit) and manages to keep up in throughput with an AMD RX 550 GPU.
arXiv (Cornell University), 2021
Practical aperture synthesis imaging algorithms work by iterating between estimating the sky brightness distribution and a comparison of a prediction based on this estimate with the measured data ("visibilities"). Accuracy in the latter step is crucial but is made difficult by irregular and non-planar sampling of data by the telescope. In this work we present a GPU implementation of 3d de-gridding which accurately deals with these two difficulties and is designed for distributed operation. We address the load balancing issues caused by large variation in visibilities that need to be computed. Using CUDA and NVidia GPUs we measure performance up to 1.2 billion visibilities per second.
Publications of the Astronomical Society of the Pacific, 2017
As a dedicated solar radio interferometer, the MingantU SpEctral RadioHeliograph (MUSER) generates massive observational data in the frequency range of 400 MHz-15 GHz. High-performance imaging forms a significantly important aspect of MUSER's massive data processing requirements. In this study, we implement a practical highperformance imaging pipeline for MUSER data processing. At first, the specifications of the MUSER are introduced and its imaging requirements are analyzed. Referring to the most commonly used radio astronomy software such as CASA and MIRIAD, we then implement a high-performance imaging pipeline based on the Graphics Processing Unit technology with respect to the current operational status of the MUSER. A series of critical algorithms and their pseudo codes, i.e., detection of the solar disk and sky brightness, automatic centering of the solar disk and estimation of the number of iterations for clean algorithms, are proposed in detail. The preliminary experimental results indicate that the proposed imaging approach significantly increases the processing performance of MUSER and generates images with high-quality, which can meet the requirements of the MUSER data processing.
2019
FPGAs excel in performing simple operations on high-speed streaming data, at high (energy) efficiency. However, so far, their difficult programming model and poor floating-point support prevented a wide adoption for typical HPC applications. This is changing, due to recent FPGA technology developments: support for the high-level OpenCL programming language, hard floating-point units, and tight integration with CPU cores. Combined, these are game changers: they dramatically reduce development times and allow using FPGAs for applications that were previously deemed too complex.
Monthly Notices of the Royal Astronomical Society, 2010
Astronomy depends on ever increasing computing power. Processor clock-rates have plateaued, and increased performance is now appearing in the form of additional processor cores on a single chip. This poses significant challenges to the astronomy software community. Graphics Processing Units (GPUs), now capable of general-purpose computation, exemplify both the difficult learning-curve and the significant speedups exhibited by massively-parallel hardware architectures. We present a generalised approach to tackling this paradigm shift, based on the analysis of algorithms. We describe a small collection of foundation algorithms relevant to astronomy and explain how they may be used to ease the transition to massively-parallel computing architectures. We demonstrate the effectiveness of our approach by applying it to four well-known astronomy problems: Högbom clean, inverse ray-shooting for gravitational lensing, pulsar dedispersion and volume rendering. Algorithms with well-defined memory access patterns and high arithmetic intensity stand to receive the greatest performance boost from massively-parallel architectures, while those that involve a significant amount of decision-making may struggle to take advantage of the available processing power.
2006
This chapter discusses some grid experiences in solving the problem of generating large astronomical image mosaics by composing multiple small images, from the team that has developed Montage (http://montage.ipac.caltech.edu/). The problem of generating these mosaics is complex in that individual images must be projected into a common coordinate space, overlaps between images calculated, the images processed so that the backgrounds match, and images composed while using a variety of techniques to handle the presence of multiple pixels in the same output space. To accomplish these tasks, a suite of software tools called Montage has been developed. The modules in this suite can be run on a single processor computer using a simple shell script, and can additionally be run using a combination of parallel approaches. These include running MPI versions of some modules, and using standard grid tools. In the latter case, processing workflows are automatically generated, and appropriate data sources are located and transferred to a variety of parallel processing environments for execution. As a result, it is now possible to generate large-scale mosaics on-demand in timescales that support iterative, scientific exploration. In this chapter, we describe Montage, how it was modified to execute in the grid environment, the tools that were used to support its execution, as well as performance results.
Scientific Programming, 2009
The performance potential of the Cell/B.E., as well as its availability, have attracted a lot of attention from various high-performance computing (HPC) fields. While computation intensive kernels proved to be exceptionally well suited for running on the Cell, irregular data-intensive applications are usually considered as poor matches. In this paper, we present our complete solution for enabling such a data-intensive application to run efficiently on the Cell/B.E. processor. Specifically, we target radioastronomy data gridding and degridding, two resembling imaging filters based on convolutional resampling. Our solution is based on building a high-level application model, used to evaluate parallelization alternatives. Next, we choose the one with the best performance potential, and we gradually exploit this potential by applying platform-specific and application-specific optimizations. After several iterations, our target application shows a speed-up factor between 10 and 20 on a dual-Cell blade when compared with the original application running on a commodity machine. Given these results, and based on our empirical observations, we are able to pinpoint a set of ten guidelines for parallelizing similar applications on the Cell/B.E. Finally, we conclude the Cell/B.E. can provide high performance for data-intensive applications at the price of increased programming efforts and with a significant aid from aggressive application-specific optimizations.
New Astronomy, 2011
We present status and results of AstroGrid-D, a joint effort of astrophysicists and computer scientists to employ grid technology for scientific applications. AstroGrid-D provides access to a network of distributed machines with a set of commands as well as software interfaces. It allows simple use of computer and storage facilities and to schedule or monitor compute tasks and data management. It is based on the Globus Toolkit middleware (GT4).
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Proceedings of the International Conference on Parallel Processing Workshops, 2005
Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, 2018
2014 IEEE 4th Symposium on Large Data Analysis and Visualization (LDAV), 2014
Concurrency and Computation: Practice and Experience, 2003
International Conference on Scientific Computing, 2008
Publications of the Astronomical Society of Australia, 2011
Monthly Notices of the Royal Astronomical Society, 2013
New Astronomy, 2010