Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1999, Springer eBooks
The copyright owner's consent does not include copying for general distribution, pro motion, new works, or resale. In these cases, specific wrilten permission must first be obtained from the publisher. Production managed by Alian Abrams; manufacturing supervised by Jeffrey Taub. Camera-ready copy prepared by the IMA.
Parallel Computing, 2008
Some meaningful hints about parallelization problems in image processing and analysis are discussed. The issues of the operation of various architectures used to solve vision problems, from the pipeline of dedicated operators to general purpose MIMD machines, passing through specialized SIMD machines and processors with extended instruction sets, and parallelization tools, from parallel library to parallel programming languages, are reviewed. In this context, a discussion of open issues and directions for future research is provided.
Parallel Computing, 2008
Some meaningful hints about parallelization problems in image processing and analysis are discussed. The issues of the operation of various architectures used to solve vision problems, from the pipeline of dedicated operators to general purpose MIMD machines, passing through specialized SIMD machines and processors with extended instruction sets, and parallelization tools, from parallel library to parallel programming languages, are reviewed. In this context, a discussion of open issues and directions for future research is provided.
2001
Parallel image processing I Thomas Brăunl ... [et al.]. p.cm. Includes bibliographical references and index.
Parallel Computing, 1997
We describe a parallel computer system for processing media: audio, video, and graphics, among others. The system supports medium to coarse grain parallelism, using a dataow model of execution, on a range of machine architectures scaling from a single von Neumann or general purpose processor (GPP) up to networks of several hundred heterogeneous processors. A distributed resource manager, extending or subsuming the functionality of a traditional operating system, is an integral and necessary part of the system. While we are building a system for processing a variety of media, in this paper we concentrate on video because it provides an extreme case in terms of both data rates and available parallelism.
Microelectronics Journal, 2001
Creating this document (i.e., typing this document into MS Word) took approximately 1 staff day by one of the authors. Therefore, while this document is somewhat extensive, we anticipate that the changes to the text will require no more than 0.5 staff days of effort by the publisher, excluding figures. There are some revisions required to several figures, which we anticipate should take an additional 0.5 staff days of effort by the publisher. Therefore, we anticipate that with minimal effort on the part of the publisher, a significantly enhanced version of the text will be available.
IEEE Signal Processing Magazine, 2009
he explosive growth of digital video content from commodity devices and on the Internet has precipitated a renewed interest in video processing technology, which broadly encompasses the compression, enhancement, analysis, and synthesis of digital video. Video processing is computationally intensive and often has accompanying real-time or super-real-time requirements. For example, surveillance and monitoring systems need to robustly analyze video from multiple cameras in real time to automatically detect and signal unusual events. Beyond today's known applications, the continued growth of functionality and speed of video processing systems will likely further enable novel applications. Due to the strong computational locality exhibited by video algorithms, video processing is highly amenable to parallel processing. Video tends to exhibit high degrees of locality in time: what appears on the tenth frame of a video sequence does not strongly affect the contents of the 1,000th frame, and in space: an object on the left side of single frame does not strongly influence the pixel values of on the right. Such locality makes it possible to divide video processing tasks into smaller, weakly interacting pieces amenable to parallel processing. Furthermore, these pieces can share data to economize on memory bandwidth. This article is based on our experiences in the research and development of massively parallel architectures and programming technology, in construction of parallel video processing components, and in development of video processing applications. We describe several program transformations necessary to realize the performance benefits of today's multi-and many-core architectures on video processing. We describe program optimizations using three-dimensional (3-D) convolution as a pedagogical example. We compare the relative importance of these transformations on multicore CPUs versus many-core graphics processing units (GPUs). In addition, we relate our efforts in accelerating applications in major areas of video processing using many-core GPUs. MULTICORE AND MANY-CORE TECHNOLOGIES The semiconductor industry has shifted from increasing clock speeds to a strategy of growth through increasing core counts.
Lecture Notes in Computer Science, 2012
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
Lecture Notes in Computer Science, 2006
Welcome to the proceedings of the 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA 2006), which was held in Sorrento, Italy, December, 4-6 2006. Parallel computing has become a mainstream research area in computer science and the ISPA conference has become one of the premier forums for the presentation of new and exciting research on all aspects of parallel and distributed computing. We are pleased to present the proceedings for ISPA 2006, which comprises a collection of excellent technical papers and keynote speeches. The accepted papers cover a wide range of exciting topics including architectures, languages, algorithms, software, networking and applications. The conference continues to grow and this year a record total of 277 manuscripts were submitted for consideration by the Program Committee. From these submissions the Program Committee selected only 79 regular papers in the program, which reflects the acceptance rate as 28%. An additional 10 workshops complemented the outstanding paper sessions. The submission and review process worked as follows. Each submission was assigned to at least three Program Committee members for review. Each Program Committee member prepared a single review for each assigned paper or assigned a paper to an outside reviewer for review. In addition, the Program Chairs and Program Vice-Chairs read the papers when a conflicting review result occurred. Finally, after much discussion among the Program Chairs and Program Vice-Chairs, based on the review scores, the Program Chairs made the final decision. Given the large number of submissions, each Program Committee member was assigned roughly 7-12 papers. The excellent program required a lot of effort from many people. First, we would like to thank all the authors for their hard work in preparing submissions to the conference. We deeply appreciate the effort and contributions of the Program Committee members who worked very hard to select the very best submissions and to put together an exciting program. We are also very grateful to the keynote speakers for accepting our invitation to present keynote talks. Thanks go to the Workshop Chairs for organizing ten excellent workshops on several important topics related to parallel and distributed computing and applications.
Lecture Notes in Computer Science, 2012
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
1996
These lecture notes under development and constant revision, like the eld itself have been used at MIT in a graduate course rst o ered by Alan Edelman and Shang-Hua Teng during the spring of 1994 MIT 18.337, Parallel Scienti c Computing. This rst class had about forty students from a variety of disciplines which include Applied Mathematics, Computer Science, Mechanical Engineering, Chemical Engineering, Aeronautics and Aerospace, and Applied Physics. Because of the diverse backgrounds of the students, the course, by necessity, w as designed to be of interest to engineers, computer scientists, and applied mathematicians. Our course covers a mixture of material that we feel students should be exposed to. Our primary focus is on modern numerical algorithms for scienti c computing, and also on the historical trends in architectures. At the same time, we h a ve always felt that students and the professors must su er through hands-on experience with modern parallel machines. Some students enjoy ghting new machines; others scream and complain. This is the reality of the subject. In 1995, the course was taught again by Alan Edelman with an additional emphasis on the use of portable parallel software tools. The sad truth was that there were not yet enough fully developed tools to be used. The situation is currently improving. During 1994 and 1995 our students programmed the 128 node Connection Machine CM5. This machine was the 35th most powerful computer in the world in 1994, then the very same machine was the 74th most powerful machine in the spring of 1995. At the time of writing, December 1995, this machine has sunk to position 136. The fastest machine in the world is currently in Japan. In the 1996 course we used the IBM SP-2 and Boston University's SGI machines. In addition to coauthors Shang-Hua Teng and Robert Schreiber, I would like to thank our numerous students who have written and commented on these notes and have also prepared many o f the diagrams. We also thank the students from Minnesota SCIC 8001 and the summer course held at MIT, Summer 6.50s also taught b y Rob Schreiber for all of their valuable suggestions. These notes will probably evolve i n to a book which will eventually be coauthored by Rob Schreiber and Shang-Hua Teng. Meanwhile, we are fully aware that the 1996 notes are incomplete, contain mathematical and grammatical errors, and do not cover everything we wish. They are an improvement o ver the 1995 notes, but not as good as the 1997 notes will be. I view these notes as a basis on which t o improve, not as a completed book. It has been our experience that some students of pure mathematics and theoretical computer science are a bit fearful of programming real parallel machines. Students of engineering and computer science are sometimes intimidated by mathematics. The most successful students understand that computing is not dirty" and mathematical knowledge is not scary" or useless," but both require hard work and maturity to master. The good news is that there are many jobs both in the industrial and academic sectors for experts in the eld! A good course should have a good theme. We try to emphasize the fundamental algorithmic ideas and machine design principles. We h a ve seen computer vendors come and go, but we believe that the mathematical, algorithmic, and numerical ideas discussed in these notes provide a solid foundation that will last for many y ears.
2000
Large-scale parallel computations are more common than ever, due to the increasing availability of multi-processor systems. However, writing parallel software is often a complicated and error-prone task. To relieve Diffpack users of the tedious and low-level technical details of parallel programming, we have designed a set of new software modules, tools, and programming rules, which will be the topic of
PIPS is a parallel image processing system developed at the TU Hamburg-Harburg 1 . The structure of PIPS is highly modular and hierarchical. The scope of the PIPS functionality reaches from basic low level services up to high end user interfaces. PIPS is based on the message passing principle and is therefore portable to most distributed memory architectures. A wide range of library functions, along with implementations of many typical image processing algorithms, make it easy to integrate new parallel algorithms into PIPS. Using the high end interfaces the user views the parallel system as a powerful coprocessor attached to the host machine. The parallel image processing system PIPS has been developed 1 as a platform for research in the field of image processing and parallel computing [4, 2, 5]. Image processing algorithms are well suited for parallel systems due to the large amount of data and the possibility to distribute the calculations on many processors in a natural way. A...
Parallel processing offers enhanced speed of execution to the user and facilitated by different approaches like data parallelism and control parallelism. Graphic Processing Units provide faster execution due to dedicated hardware and tools. This paper presents two popular approaches and techniques for distributed computing and GPU computing, to assist a novice in parallel computing technique. The paper discusses environment needs to be setup for both the above approaches and as a case study demonstrate matrix multiplication algorithm using SIMD architecture.
1988
Ia. F jR T LASSIFICATION lb. RESTRICTIVE MARKINGS 2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION IAVAILABILITY OF REPORT 2b. DECLASSIFICATION /DOWNGRADING SCHEDULE Approved for public release; distribution is unlimited. 4. PERFORMING ORGANIZATION REPORT NUMBER(S) S. MONITORING ORGANIZATION REPORT NUMBER(S) ETL-0495 6a. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION Massachusetts Institute
International Journal of Parallel Programming, 2013
This special issue provides a forum for presenting the latest research on algorithms and applications for parallel and distributed systems, including algorithm design and optimization, programming paradigms, algorithm design and programming techniques heterogeneous computing systems, tools and environment for parallel/distributed software development, petascale and exascale algorithms, novel parallel and distributed applications, and performance simulations, measurement, and evaluations. The success of parallel algorithms-even on problems that at first glance seem inherently serialsuggests that this style of programming will be the inherent to any application in a near future. The relevant research has gained momentum with multicore and manycore architectures, and with the expected arrival of exascale computing. As a result, the space of potential ideas and solutions is still far from being widely explored.
Architectural Design, 2006
Today you have interaction with hardware. You also have the opposite, that is, hardware changes according to the human being and the human being is interacting with the hardware-you have design, you have clothes, and we know that in the future expo of the 21st Century, it will be people. 1
WSEAS Transactions on Signal Processing archive, 2010
The aim of the paper is to validate architectures that allow an image processing researcher to develop parallel applications. A comparative analysis of the possible software and hardware solutions for real-time image and video processing was presented, with emphasis on distributed computing. The challenge was to develop algorithms that perform real-time low level operations on digital images able to be executed on a cluster of desktop PCs. The experiments on a case study show how to use parallelizable patterns and how to optimize the load balancing between the workstations.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.