Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2000, IEEE Signal Processing Magazine
R adio telescopes typically consist of multiple receivers whose signals are cross-correlated to filter out noise. A recent trend is to correlate in software instead of custom-built hardware, taking advantage of the flexibility that software solutions offer. Examples include e-VLBI and the low frequency array (LOFAR). However, the data rates are usually high and the processing requirements challenging. Many-core processors are promising devices to provide the required processing power. In this article, we explain how to implement and optimize signal-processing applications on multicore CPUs and many-core architectures, such as the Intel Core i7, NVIDIA and ATI graphics processor units (GPUs), and the Cell/BE. We use correlation as a running example. The correlator is a streaming, possibly real-time application, and is much more input/ output (I/O) intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P (BG/P) supercomputer. We discuss several important architectural problems which cause architectures to perform suboptimally, and also deal with programmability.
Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09, 2009
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are crosscorrelated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware. This is done to increase flexibility and to reduce development efforts. Examples include e-VLBI and LOFAR.
International Journal of Parallel Programming, 2010
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware, to increase flexibility and to reduce development efforts.
Proceedings of the ISC
Publications of the Astronomical Society of the Pacific, 2008
A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing highperformance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BM N 2 , where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second.
2018 Progress in Electromagnetics Research Symposium (PIERS-Toyama), 2018
Very Long Baseline Interferometry (VLBI) is an important radio astronomy technology, it has high spatial resolution, is widely used in deep space probes high precision measurements. Correlator is the VLBI core data pre-processing equipment, is a complex high speed signal processing system. In recent years, with the development of the Field Programmable Gate Array (FPGA) technology, a lot of high performance digital signal processing platforms based on FPGA chip have appear. In Shanghai astronomical observatory, we have designed a series of hardware correlators based on FPGA and used in Chinese lunar project Chang'E 1, Chang'E 2, Chang'E 3 and Chang'E 5T1 mission. In the following lunar project and further Mars project in China, multiple orbits spacecraft tracking will be widely used, the tracking will be more complex. But because of the limitation of the hardware platform, the real time processing speed and precision is limited, can not meet the requirements of the f...
2011 IEEE Nuclear Science Symposium Conference Record, 2011
IEEE Access
Radio telescopes produce large volumes of data that need to be processed to obtain high-resolution sky images. This is a complex task that requires computing systems that provide both high performance and high energy efficiency. Hardware accelerators such as GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays) can provide these two features and are thus an appealing option for this application. Most HPC (High-Performance Computing) systems operate in double precision (64-bit) or in single precision (32-bit), and radio-astronomical imaging is no exception. With reduced precision computing, smaller data types (e.g., 16-bit) are used to improve energy efficiency and throughput performance in noise-tolerant applications. We demonstrate that reduced precision can also be used to produce high-quality sky images. To this end, we analyze the gridding component (Image-Domain Gridding) of the widely-used WSClean imaging application. Gridding is typically one of the most time-consuming steps in the imaging process and, therefore, an excellent candidate for acceleration. We identify the minimum required exponent and mantissa bits for a custom floating-point data type. Then, we propose the first custom floating-point accelerator on a Xilinx Alveo U50 FPGA using High-Level Synthesis. Our reduced-precision implementation improves the throughput and energy efficiency of respectively 1.84x and 2.03x compared to the single-precision floating-point baseline on the same FPGA. Our solution is also 2.12x faster and 3.46x more energy-efficient than an Intel i9 9900k CPU (Central Processing Unit) and manages to keep up in throughput with an AMD RX 550 GPU.
There are many applications that could benefit from signal processors capable of rapid spectral analysis of noisy temporal signals, implemented in compact, low cost, and low power devices. Doppler lidar and radar, data communications clock recovery, and surveillance applications need only detect and locate the largest peak in the spectrum. They do not require precise recovery of spectral signal amplitude, or the preservation of conservative inverse transform properties. In such cases, the complexity and speed limitations imposed by traditional multi-bit signal processing techniques based on conventional FFT's, or other frequency estimators, may be circumvented by employing the Binary Correlation Processor (BCP) technique (patent pending) described here. Authors note: see 2015 updates to technology, theory, and applications, and SUCCESSFUL PROOF OF CONCEPTS in Christian J. Grund's unpublished research.
2021
Abstract. On August 22, 2019, the Origins Space Telescope (OST) Study Team delivered the OST Mission Concept Study Report and the OST Technology Development Plan to NASA Headquarters. A key component of this study report includes the technology roadmap for detector readout and how new radio frequency-system-on-chip (RFSoC)-based technology would be used to advance the far-infrared polarimeter instrument concept for a spaceflight mission. We present our current results as they pertain to the implementation of algorithms, hardware, and architecture for instrument signal processing of this proposed observatory using RFSoC technology. We also present a small case study, comparing a more conventional readout system with one based on the RFSoC and show a trade of system complexity versus technology readiness level.
2018
Correlators are extensively used in the field of radio interferometry. Two different types are considered for two applications; autocorrelators for spectrometry and crosscorrelators for aperture synthesis. We concentrate on satellitebased applications where power budgets are very restrictive. Several satellites are already employing correlators for interferometric measurements, and future projects are targeting even larger systems in terms of spectral channels in the case of spectrometry and baseline counts in the case of aperture synthesis. Thus, it is important to develop correlators with increasing channel count, either using ASIC technology scaling or by constructing larger systems from several ASICs. Building on earlier ASIC designs, we examine how larger correlator systems can be constructed and the implications this has, in terms of power dissipation, system complexity, and ASIC count. Our findings indicate that, for large systems, having a very high channel count per ASIC is indeed of interest for keeping system complexity and power dissipation down by reducing both ASIC and I/O count, especially for crosscorrelators.
2011 XXXth URSI General Assembly and Scientific Symposium, 2011
This paper gives an overview of the LOFAR correlator. Unlike traditional telescopes, the correlator is implemented in software, yielding a very flexible and reconfigurable instrument. The term "correlator" understates its capabilities: it filters, corrects, coherently or incoherently beam forms, dedisperses, and transforms the data as well. It supports several observation modes, even simultaneously. The high data rates and processing requirements compel the use of a supercomputer; we use a Blue Gene/P. The software is highly optimized and achieves extremely good computational performance and bandwidths, increasing the performance of the entire LOFAR telescope.
Arxiv preprint astro-ph/0702141, 2007
We describe the development of an FX style correlator for Very Long Baseline Interferometry (VLBI), implemented in software and intended to run in multi-processor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high performance computing, such as multi-processor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of software and the fact that the highly parallel and scalable nature of the correlation task is well suited to a multi-processor computing environment. We suggest scientific applications where such an approach to VLBI correlation is most suited and will give the best returns. We report detailed results from the Distributed FX (DiFX) software correlator, running on the Swinburne supercomputer (a Beowulf cluster of ∼300 commodity processors), including measures of the performance of the system. For example, to correlate all Stokes products for a 10 antenna array, with an aggregate bandwidth of 64 MHz per station and using typical time and frequency resolution presently requires of order 100 desktop-class compute nodes. Due to the effect of Moore's Law on commodity computing performance, the total number and cost of compute nodes required to meet a given correlation task continues to decrease rapidly with time. We show detailed comparisons between DiFX and two existing hardware-based correlators: the Australian Long Baseline Array (LBA) S2 correlator, and the NRAO Very Long Baseline Array (VLBA) correlator. In both cases, excellent agreement was found between the correlators. Finally, we describe plans for the future operation of DiFX on the Swinburne supercomputer, for both astrophysical and geodetic science.
Astronomy and Computing
For low-frequency radio astronomy, software correlation and beamforming on general purpose hardware is a viable alternative to custom designed hardware. LOFAR, a newgeneration radio telescope centered in the Netherlands with international stations in Germany, France, Ireland, Poland, Sweden and the UK, has successfully used software real-time processors based on IBM Blue Gene technology since 2004. Since then, developments in technology have allowed us to build a system based on commercial off-the-shelf components that combines the same capabilities with lower operational cost. In this paper we describe the design and implementation of a GPU-based correlator and beamformer with the same capabilities as the Blue Gene based systems. We focus on the design approach taken, and show the challenges faced in selecting an appropriate system. The design, implementation and verification of the software system shows the value of a modern test-driven development approach. Operational experience, based on three years of operations, demonstrates that a general purpose system is a good alternative to the previous supercomputer-based system or custom-designed hardware.
The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system – the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel inter-face which accepts raw...
Publications of the Astronomical Society of Australia, 2015
The Murchison Widefield Array (MWA) is a Square Kilometre Array (SKA) Precursor. The telescope is located at the Murchison Radio-astronomy Observatory (MRO) in Western Australia (WA). The MWA consists of 4096 dipoles arranged into 128 dual polarisation aperture arrays forming a connected element interferometer that cross-correlates signals from all 256 inputs. A hybrid approach to the correlation task is employed, with some processing stages being performed by bespoke hardware, based on Field Programmable Gate Arrays (FPGAs), and others by Graphics Processing Units (GPUs) housed in general purpose rack mounted servers. The correlation capability required is approximately 8 TFLOPS (Tera FLoating point Operations Per Second). The MWA has commenced operations and the correlator is generating 8.3 TB/day of correlation products, that are subsequently transferred 700 km from the MRO to Perth (WA) in real-time for storage and offline processing. In this paper we outline the correlator design, signal path, and processing elements and present the data format for the internal and external interfaces.
Experimental Astronomy, 2004
Moore's law is best exploited by using consumer market hardware. In particular, the gaming industry pushes the limit of processor performance thus reducing the cost per raw flop even faster than Moore's law predicts. Next to the cost benefits of Common-Of-The-Shelf (COTS) processing resources, there is a rapidly growing experience pool in cluster based processing. The typical Beowulf cluster of PC's supercomputers are well known. Multiple examples exists of specialised cluster computers based on more advanced server nodes or even gaming stations. All these cluster machines build upon the same knowledge about cluster software management, scheduling, middleware libraries and mathematical libraries. In this study, we have integrated COTS processing resources and cluster nodes into a very high performance processing platform suitable for streaming data applications, in particular to implement a correlator. The required processing power for the correlator in modern radio telescopes is in the range of the larger supercomputers, which motivates the usage of supercomputer technology. Raw processing power is provided by graphical processors and is combined with an Infiniband host bus adapter with integrated data stream handling logic. With this processing platform a scalable correlator can be built with continuously growing processing power at consumer market prices.
Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10, 2010
LOFAR is the first of a new generation of radio telescopes. Rather than using expensive dishes, it forms a distributed sensor network that combines the signals from many thousands of simple antennas. Its revolutionary design allows observations in a frequency range that has hardly been studied before.
2010
Caused by historical separation and driven by the requirements of the PC gaming industry, Graphics Processing Units (GPUs) have evolved to massive parallel processing systems which entered the area of non-graphic related applications. Although a single processing core on the GPU is much slower and provides less functionality than its counterpart on the CPU, the huge number of these small processing entities outperforms the classical processors when the application can be parallelized. Thus, in recent years various radio astronomical projects have started to make use of this technology either to realize the correlator on this platform or to establish the post-processing pipeline with GPUs. Therefore, the feasibility of GPUs as a choice for a VLBI correlator is being investigated, including pros and cons of this technology. Additionally, a GPU based software correlator will be reviewed with respect to energy consumption/GFlop/sec and cost/GFlop/sec.
Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy VI, 2012
Two large correlators have been constructed to combine the signals captured by the ALMA antennas deployed on the Atacama Desert in Chile at an elevation of 5050 meters. The Baseline correlator was fabricated by a NRAO/European team to process up to 64 antennas for 16 GHz bandwidth in two polarizations and another correlator, the Atacama Compact Array (ACA) correlator, was fabricated by a Japanese team to process up to 16 antennas. Both correlators meet the same specifications except for the number of processed antennas. The main architectural differences between these two large machines will be underlined. Selected features of the Baseline and ACA correlators as well as the main technical challenges met by the designers will be briefly discussed. The Baseline correlator is the largest correlator ever built for radio astronomy. Its digital hybrid architecture provides a wide variety of observing modes including the ability to divide each input baseband into 32 frequency-mobile sub-bands for high spectral resolution and to be operated as a conventional 'lag' correlator for high time resolution. The various observing modes offered by the ALMA correlators to the science community for 'Early Science' are presented, as well as future observing modes. Coherently phasing the array to provide VLBI maps of extremely compact sources is another feature of the ALMA correlators. Finally, the status and availability of these large machines will be presented.
2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
Traditional radio telescopes use large steel dishes to observe radio sources. The largest radio telescope in the world, LOFAR, uses tens of thousands of fixed, omnidirectional antennas instead, a novel design that promises groundbreaking research in astronomy. Where traditional telescopes use custom-built hardware, LOFAR uses software to do signal processing in real time. This leads to an instrument that is inherently more flexible. However, the enormous data rates and processing requirements (tens to hundreds of teraflops) make this extremely challenging. The next-generation telescope, the SKA, will require exaflops. Unlike traditional instruments, LOFAR and SKA can observe in hundreds of directions simultaneously, using beam forming. This is useful, for example, to search the sky for pulsars (i.e. rapidly rotating highly magnetized neutron stars). Beam forming is an important technique in signal processing: it is also used in WIFI and 4G cellular networks, radar systems, and health-care microwave imaging instruments. We propose the use of many-core architectures, such as 48core CPU systems and Graphics Processing Units (GPUs), to accelerate beam forming. We use two different frameworks for GPUs, CUDA and OpenCL, and present results for hardware from different vendors (i.e. AMD and NVIDIA). Additionally, we implement the LOFAR beam former on multi-core CPUs, using OpenMP with SSE vector instructions. We use autotuning to support different architectures and implementation frameworks, achieving both platform and performance portability. Finally, we compare our results with the production implementation, written in assembly and running on an IBM Blue Gene/P supercomputer. We compare both computational and power efficiency, since power usage is one of the fundamental challenges modern radio telescopes face. Compared to the production implementation, our auto-tuned beam former is 45-50 times faster on GPUs, and 2-8 times more power efficient. Our experimental results lead to the conclusion that GPUs are an attractive solution to accelerate beam forming.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.