Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
31 pages
1 file
This paper discusses the implementation and advantages of parallel statistical computing in statistical inference, emphasizing its capability to enhance computational efficiency for large data sets by leveraging multicore machines. It categorizes various hardware architectures for parallel computing, such as pipeline processors, array processors, and concurrent multicomputers, explaining their relevance and applications in statistical methods. Through empirical testing, the effectiveness of parallel algorithms, such as k-means clustering and bootstrap resampling, is demonstrated, illustrating significant improvements in execution times when utilizing multiple processors.
Statistics: A Series of Textbooks and Monographs, 2005
2012
Abstract The social researchers use computationally intensive statistical and econometric methods for data analysis. One way for accelerating these computations is to use the parallel computing with multi-core platforms. In this paper we parallelize some representative computational kernels from statistics and econometrics on multi-core platform using the programming libraries such as Pthreads, OpenMP, Intel Cilk++, Intel TBB, Intel ArBB, SWARM and Fast Flow.
Journal of Computational and Graphical Statistics, 2007
Theoretically, many modern statistical procedures are trivial to parallelize. However, practical deployment of a parallelized implementation which is robust and reliably runs on different computational cluster configurations and environments is far from trivial. We present a framework for the R statistical computing language that provides a simple yet powerful programming interface to a computational cluster. This interface allows the development of R functions that distribute independent computations across the nodes of the computational cluster. The resulting framework allows statisticians to obtain significant speedups for some computations at little additional development cost. The particular implementation can be deployed in heterogeneous computing environments.
Concurrency: Practice and Experience, 1992
We examine parallel-processing applications to the analysis of large data sets typically used in social science research. Our research uses a parallel environment which makes it possible to have 1024 processors working simultaneously on a problem. The application is tested using various codgurations of number of processors and block-size of data reads on the estimation of a linear model of earnings for the California portion of the 15% sample of the 1970 Census. Performance factors assessed include total execution time, speed-up and efficiency. Execution times are also compared with reference to execution times on an IBM 3081 using SPSS-X. Results indicate that optimal configurations of number of processors and data block-size can produce significantly faster execution times for linear model estimation on relatively large (80,000 cases) data sets. We also discuss other applications of parallel processing to statistical analyses commonly found in social science. 1940 Census of the Population 1% Sample 1950 Census of the Population 1% Sample 1960 Census of the Population 1% Sample 1970 Census of the Population 1% Sample 1980 Census of the Population 5% Sample
2008
A notable visionary from computer science, Alan Kay, famously remarked that "the best way to predict the future is to invent it". Lee's prognostications are assured largely because of his considerable contributions to creating the future. Again following , " . . . a lot of the future that we're going to have to contend with is sitting in someone's research lab right now. And, by simply going around and looking in the right places you can get a tremendous idea of the kind of things that are going to happen." Few, if any, in statistical computing circles are better positioned than Lee to know the right places; as near as I can tell from the present, this paper has properly framed the near future of statistical computing.
2004
Parallel computation has a long history in econometric computing, but is not at all wide spread. We believe that a major impediment is the labour cost of coding for parallel architectures. Moreover, programs for specific hardware often become obsolete quite quickly. Our approach is to take a popular matrix programming language (Ox), and implement a message-passing interface using MPI. Next,
Statistical Science
Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywherefrom a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and 1-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC 1-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.
2006
Abstract. Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem–the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data.
With the growing popularity of parallel computation, researchers are looking for various means to reduce the problem solving time by performing the computations in parallel. While, interested in parallel computation they do not want to deal with the parallel programming complexities. In this paper, through RScaLAPACK we demonstrate a means that enables the user to carryout parallel computation without dealing with the intricacies of parallel programming.
April 20-23, 1988 r 0 _ Final Report A'08 I INEFC 8 DLLECTEN ER A
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Parallel Programming, 2008
Evolutionary Intelligence, 2013
arXiv (Cornell University), 2021
Computational Statistics & Data Analysis, 2015
Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2013, 2013
Journal of Statistical Software, 2009
Euro-Par 2011 Workshops, Proc. of the 2st Workshop on High Performance Bioinformatics and Biomedicine (HiBB), Bordeaux, France, 2012, pp. 3-12. doi:10.1007/978-3-642-29740-3_2 , 2012
ACM Transactions on Modeling and Computer Simulation, 2013
Uncertainty in Artificial Intelligence, 1992