Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Astronomy and Computing
For low-frequency radio astronomy, software correlation and beamforming on general purpose hardware is a viable alternative to custom designed hardware. LOFAR, a newgeneration radio telescope centered in the Netherlands with international stations in Germany, France, Ireland, Poland, Sweden and the UK, has successfully used software real-time processors based on IBM Blue Gene technology since 2004. Since then, developments in technology have allowed us to build a system based on commercial off-the-shelf components that combines the same capabilities with lower operational cost. In this paper we describe the design and implementation of a GPU-based correlator and beamformer with the same capabilities as the Blue Gene based systems. We focus on the design approach taken, and show the challenges faced in selecting an appropriate system. The design, implementation and verification of the software system shows the value of a modern test-driven development approach. Operational experience, based on three years of operations, demonstrates that a general purpose system is a good alternative to the previous supercomputer-based system or custom-designed hardware.
Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09, 2009
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are crosscorrelated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware. This is done to increase flexibility and to reduce development efforts. Examples include e-VLBI and LOFAR.
International Journal of Parallel Programming, 2010
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware, to increase flexibility and to reduce development efforts.
2011 XXXth URSI General Assembly and Scientific Symposium, 2011
This paper gives an overview of the LOFAR correlator. Unlike traditional telescopes, the correlator is implemented in software, yielding a very flexible and reconfigurable instrument. The term "correlator" understates its capabilities: it filters, corrects, coherently or incoherently beam forms, dedisperses, and transforms the data as well. It supports several observation modes, even simultaneously. The high data rates and processing requirements compel the use of a supercomputer; we use a Blue Gene/P. The software is highly optimized and achieves extremely good computational performance and bandwidths, increasing the performance of the entire LOFAR telescope.
Proceedings of the ISC
2010
Caused by historical separation and driven by the requirements of the PC gaming industry, Graphics Processing Units (GPUs) have evolved to massive parallel processing systems which entered the area of non-graphic related applications. Although a single processing core on the GPU is much slower and provides less functionality than its counterpart on the CPU, the huge number of these small processing entities outperforms the classical processors when the application can be parallelized. Thus, in recent years various radio astronomical projects have started to make use of this technology either to realize the correlator on this platform or to establish the post-processing pipeline with GPUs. Therefore, the feasibility of GPUs as a choice for a VLBI correlator is being investigated, including pros and cons of this technology. Additionally, a GPU based software correlator will be reviewed with respect to energy consumption/GFlop/sec and cost/GFlop/sec.
Journal of Astronomical Telescopes, Instruments, and Systems, 2021
The MeerKAT radio telescope consists of 64 Gregorian-offset antennas located in the Karoo in the Northern Cape in South Africa. The antenna system consists of multiple subsystems working collaboratively to form a cohesive instrument capable of operating in multiple modes for defined science cases. We focus on the channelizing subsystem (F-engine), the correlation subsystem (X-engine), and the beamforming subsystem (B-engine). In the wideband instrument mode, the channelizing can produce 1024, 4096, or 32,768 channels with correlation up to 64 antennas. Narrowband mode decomposes sampled bandwidth into 32,768 channels. The F-engine also performs delay compensation, equalization, quantization, and grouping and ordering. The X-engine provides both correlation and beamforming computations (independently). This document is intended to be a stand-alone entity covering the channelizing, correlation, and beamforming processes for the MeerKAT radio telescope. This includes data reception, pre-and post-processing, and data transmission. © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10, 2010
LOFAR is the first of a new generation of radio telescopes. Rather than using expensive dishes, it forms a distributed sensor network that combines the signals from many thousands of simple antennas. Its revolutionary design allows observations in a frequency range that has hardly been studied before.
LOFAR is the first of a new generation of radio telescopes. Rather than using expensive dishes, it forms a distributed sensor network that combines the signals from many thousands of simple antennas. Its revolutionary design allows observations in a frequency range that has hardly been studied before.
The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system – the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel inter-face which accepts raw...
2006
Our group seeks to revolutionize the development of radio astronomy signal processing instrumentation by designing and demonstrating a scalable, upgradeable, FPGA-based computing platform and software design methodology that targets a range of real-time radio telescope signal processing applications. This project relies on the development of a small number of modular, connectible, upgradeable hardware components and platformindependent signal processing algorithms and libraries which can be reused and scaled as hardware capabilities expand. We have developed such a hardware platform and many of the necessary signal processing libraries for applications in antenna array correlation, wide-band spectroscopy, and pulsar surveys. We present this platform and two applications we have developed for it as demonstrations of the technology. We also identify future directions for the development of this platform, such as packetization, RFI rejection libraries, and real-time imaging.
2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
Traditional radio telescopes use large steel dishes to observe radio sources. The largest radio telescope in the world, LOFAR, uses tens of thousands of fixed, omnidirectional antennas instead, a novel design that promises groundbreaking research in astronomy. Where traditional telescopes use custom-built hardware, LOFAR uses software to do signal processing in real time. This leads to an instrument that is inherently more flexible. However, the enormous data rates and processing requirements (tens to hundreds of teraflops) make this extremely challenging. The next-generation telescope, the SKA, will require exaflops. Unlike traditional instruments, LOFAR and SKA can observe in hundreds of directions simultaneously, using beam forming. This is useful, for example, to search the sky for pulsars (i.e. rapidly rotating highly magnetized neutron stars). Beam forming is an important technique in signal processing: it is also used in WIFI and 4G cellular networks, radar systems, and health-care microwave imaging instruments. We propose the use of many-core architectures, such as 48core CPU systems and Graphics Processing Units (GPUs), to accelerate beam forming. We use two different frameworks for GPUs, CUDA and OpenCL, and present results for hardware from different vendors (i.e. AMD and NVIDIA). Additionally, we implement the LOFAR beam former on multi-core CPUs, using OpenMP with SSE vector instructions. We use autotuning to support different architectures and implementation frameworks, achieving both platform and performance portability. Finally, we compare our results with the production implementation, written in assembly and running on an IBM Blue Gene/P supercomputer. We compare both computational and power efficiency, since power usage is one of the fundamental challenges modern radio telescopes face. Compared to the production implementation, our auto-tuned beam former is 45-50 times faster on GPUs, and 2-8 times more power efficient. Our experimental results lead to the conclusion that GPUs are an attractive solution to accelerate beam forming.
Experimental Astronomy, 2004
Moore's law is best exploited by using consumer market hardware. In particular, the gaming industry pushes the limit of processor performance thus reducing the cost per raw flop even faster than Moore's law predicts. Next to the cost benefits of Common-Of-The-Shelf (COTS) processing resources, there is a rapidly growing experience pool in cluster based processing. The typical Beowulf cluster of PC's supercomputers are well known. Multiple examples exists of specialised cluster computers based on more advanced server nodes or even gaming stations. All these cluster machines build upon the same knowledge about cluster software management, scheduling, middleware libraries and mathematical libraries. In this study, we have integrated COTS processing resources and cluster nodes into a very high performance processing platform suitable for streaming data applications, in particular to implement a correlator. The required processing power for the correlator in modern radio telescopes is in the range of the larger supercomputers, which motivates the usage of supercomputer technology. Raw processing power is provided by graphical processors and is combined with an Infiniband host bus adapter with integrated data stream handling logic. With this processing platform a scalable correlator can be built with continuously growing processing power at consumer market prices.
2018 Progress in Electromagnetics Research Symposium (PIERS-Toyama), 2018
Very Long Baseline Interferometry (VLBI) is an important radio astronomy technology, it has high spatial resolution, is widely used in deep space probes high precision measurements. Correlator is the VLBI core data pre-processing equipment, is a complex high speed signal processing system. In recent years, with the development of the Field Programmable Gate Array (FPGA) technology, a lot of high performance digital signal processing platforms based on FPGA chip have appear. In Shanghai astronomical observatory, we have designed a series of hardware correlators based on FPGA and used in Chinese lunar project Chang'E 1, Chang'E 2, Chang'E 3 and Chang'E 5T1 mission. In the following lunar project and further Mars project in China, multiple orbits spacecraft tracking will be widely used, the tracking will be more complex. But because of the limitation of the hardware platform, the real time processing speed and precision is limited, can not meet the requirements of the f...
Publications of the Astronomical Society of the Pacific, 2008
A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing highperformance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BM N 2 , where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second.
IEEE Signal Processing Magazine, 2000
R adio telescopes typically consist of multiple receivers whose signals are cross-correlated to filter out noise. A recent trend is to correlate in software instead of custom-built hardware, taking advantage of the flexibility that software solutions offer. Examples include e-VLBI and the low frequency array (LOFAR). However, the data rates are usually high and the processing requirements challenging. Many-core processors are promising devices to provide the required processing power. In this article, we explain how to implement and optimize signal-processing applications on multicore CPUs and many-core architectures, such as the Intel Core i7, NVIDIA and ATI graphics processor units (GPUs), and the Cell/BE. We use correlation as a running example. The correlator is a streaming, possibly real-time application, and is much more input/ output (I/O) intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P (BG/P) supercomputer. We discuss several important architectural problems which cause architectures to perform suboptimally, and also deal with programmability.
Arxiv preprint astro-ph/0702141, 2007
We describe the development of an FX style correlator for Very Long Baseline Interferometry (VLBI), implemented in software and intended to run in multi-processor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high performance computing, such as multi-processor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of software and the fact that the highly parallel and scalable nature of the correlation task is well suited to a multi-processor computing environment. We suggest scientific applications where such an approach to VLBI correlation is most suited and will give the best returns. We report detailed results from the Distributed FX (DiFX) software correlator, running on the Swinburne supercomputer (a Beowulf cluster of ∼300 commodity processors), including measures of the performance of the system. For example, to correlate all Stokes products for a 10 antenna array, with an aggregate bandwidth of 64 MHz per station and using typical time and frequency resolution presently requires of order 100 desktop-class compute nodes. Due to the effect of Moore's Law on commodity computing performance, the total number and cost of compute nodes required to meet a given correlation task continues to decrease rapidly with time. We show detailed comparisons between DiFX and two existing hardware-based correlators: the Australian Long Baseline Array (LBA) S2 correlator, and the NRAO Very Long Baseline Array (VLBA) correlator. In both cases, excellent agreement was found between the correlators. Finally, we describe plans for the future operation of DiFX on the Swinburne supercomputer, for both astrophysical and geodetic science.
Astronomy and Computing, 2020
Realizing the next generation of radio telescopes such as the Square Kilometre Array (SKA) requires both more efficient hardware and algorithms than today's technology provides. The image-domain gridding (IDG) algorithm is a novel approach towards solving the most compute-intensive parts of creating sky images: gridding and degridding. It alleviates the performance bottlenecks of traditional AW-projection gridding by applying instrumental and environmental corrections in the image domain instead of in the Fourier domain. In this paper, we present a thorough performance analysis of this algorithm for an Intel Xeon CPU, Intel Xeon Phi, and GPUs from AMD and NVIDIA. We show that, by evaluating trigonometric functions in hardware, GPUs are both much faster and more energy efficient than a CPU or Xeon Phi. Furthermore, on GPUs, IDG is an order of magnitude faster and more energy efficient than traditional AW-projection. IDG on GPUs is the ideal candidate imaging technique for the SKA, as it meets the computational and energy constraints of the SKA Science Data Processor system.
2019
FPGAs excel in performing simple operations on high-speed streaming data, at high (energy) efficiency. However, so far, their difficult programming model and poor floating-point support prevented a wide adoption for typical HPC applications. This is changing, due to recent FPGA technology developments: support for the high-level OpenCL programming language, hard floating-point units, and tight integration with CPU cores. Combined, these are game changers: they dramatically reduce development times and allow using FPGAs for applications that were previously deemed too complex.
Journal of Astronomical Instrumentation, 2014
Contemporary wideband radio telescope backends are generally developed on Field Programmable Gate Arrays (FPGA) or hybrid (FPGA+GPU) platforms. One of the challenges faced while developing such instruments is the functional verification of the signal processing backend at various stages of development. In the case of an interferometer or pulsar backend, the typical requirement is for one independent noise source per input, with provision for a common, correlated signal component across all the inputs, with controllable level of correlation. This paper describes the design of a FPGA-based variable correlation Digital Noise Source (DNS), and its applications to built-in testing and debugging of correlators and beamformers. This DNS uses the Central Limit Theorem-based approach for generation of Gaussian noise, and the architecture is optimized for resource requirements and ease of integration with existing signal processing blocks on FPGA.
Publications of the Astronomical Society of Australia, 2015
The Murchison Widefield Array (MWA) is a Square Kilometre Array (SKA) Precursor. The telescope is located at the Murchison Radio-astronomy Observatory (MRO) in Western Australia (WA). The MWA consists of 4096 dipoles arranged into 128 dual polarisation aperture arrays forming a connected element interferometer that cross-correlates signals from all 256 inputs. A hybrid approach to the correlation task is employed, with some processing stages being performed by bespoke hardware, based on Field Programmable Gate Arrays (FPGAs), and others by Graphics Processing Units (GPUs) housed in general purpose rack mounted servers. The correlation capability required is approximately 8 TFLOPS (Tera FLoating point Operations Per Second). The MWA has commenced operations and the correlator is generating 8.3 TB/day of correlation products, that are subsequently transferred 700 km from the MRO to Perth (WA) in real-time for storage and offline processing. In this paper we outline the correlator design, signal path, and processing elements and present the data format for the internal and external interfaces.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.