Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006
AI
This guest editorial discusses the imperative evolution of embedded single-chip multicore architectures, highlighting the shift from scaling-up clock frequencies to scaling-out architectures due to increasing transistor counts and emerging challenges such as wire-delay, design complexity, and power consumption. It reviews significant research contributions targeting improvements in cache and memory systems, power management, and performance enhancement, emphasizing key studies that address these critical issues while maintaining operational efficiency and performance in multithreaded environments.
ACM SIGARCH …, 2005
The exponential increase in uniprocessor performance has begun to slow. Designers have been unable to scale performance while managing thermal, power, and electrical effects. Furthermore, design complexity limits the size of monolithic processors that can be designed while keeping costs reasonable. Industry has responded by moving toward chip multi-processor architectures (CMP). These architectures are composed from replicated processors utilizing the die area afforded by newer design processes. While this approach mitigates the issues with design complexity, power, and electrical effects, it does nothing to directly improve the performance of contemporary or future single-threaded applications.
Journal of Physics: Conference Series, 2007
The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.
Multi-core processors represent an evolutionary change in conventional computing as well setting the new trend for high performance computing (HPC)-but parallelism is nothing new. Intel has a long history with the concept of parallelism and the development of hardware-enhanced threading capabilities. Intel has been delivering threading-capable products for more than a decade. The move toward chip-level multiprocessing architectures with a large number of cores continues to offer dramatically increased performance and power characteristics. Nonetheless, this move also presents significant challenges. This paper will describe how far the industry has progressed and evaluates some of the challenges we are facing with multi-core processors and some of the solutions that have been developed.
ACM Transactions on Embedded Computing Systems, 2012
Multicore architectures provide scalable performance with a lower hardware design effort than single core processors. Our paper presents a design methodology and an embedded multicore architecture, focusing on reducing the software design complexity and boosting the performance density. First, we analyze characteristics of the Task-Level Parallelism in modern multimedia workloads. These characteristics are used to formulate requirements for the programming model. Then, we translate the programming model requirements to an architecture specification, including a novel low-complexity implementation of cache coherence and a hardware synchronization unit. Our evaluation demonstrates that the novel coherence mechanism substantially simplifies hardware design, while reducing the performance by less than 18% relative to a complex snooping technique. Compared to a single processor core, the multicores have already proven to be more area-and energy-efficient. However, the multicore architectures in embedded systems still compete with highly efficient function-specific hardware accelerators. In this paper we identify five architectural methods to boost performance density of multicoresmicroarchitectural downscaling, asymmetric multicore architectures, multithreading, generic accelerators, and conjoining. Then, we present a novel methodology to explore multicore design spaces, including the architectural methods improving the performance density. The methodology is based on a complex formula computing performances of heterogeneous multicore systems. Using this design space exploration methodology for HD and QuadHD H.264 video decoding, we estimate that the required areas of multicores in CMOS 45 nm are 2.5 mm 2 and 8.6 mm 2 , respectively. These results suggest that heterogeneous multicores are cost-effective for embedded applications and can provide a good programmability support.
2017
Parallelism has been used since the early days of computing to enhance performance. From the first computers to the most modern sequential processors (also called uniprocessors), the main concepts introduced by von Neumann [20] are still in use. However, the ever-increasing demand for computing performance has pushed computer architects toward implementing different techniques of parallelism. The von Neumann architecture was initially a sequential machine operating on scalar data with bit-serial operations [20]. Word-parallel operations were made possible by using more complex logic that could perform binary operations in parallel on all the bits in a computer word, and it was just the start of an adventure of innovations in parallel computer architectures.
2015 IEEE 39th Annual Computer Software and Applications Conference, 2015
Thread-Level Parallelism (TLP) exploitation for embedded systems has been a challenge for software developers: while it is necessary to take advantage of the availability of multiple cores, it is also mandatory to consume less energy. To speed up the development process and make it as transparent as possible, software designers use Parallel Programming Interfaces (PPIs). However, as will be shown in this paper, each PPI implements different ways to exchange data using shared memory regions, influencing performance, energy consumption and Energy-Delay Product (EDP), which varies across different embedded processors. By evaluating four PPIs and three multicore processors (ARM A8, A9 and Intel Atom), we demonstrate that by simply switching PPI it is possible to save up to 59% in energy consumption and achieve up to 85% of EDP improvements, in the most significant case. We also show that the efficiency (i.e., the best possible use of the available resources) decreases as the number of threads increases in almost all cases, but at distinct rates.
ACM Computing Surveys, 2003
Hardware multithreading is becoming a generally applied technique in the next generation of microprocessors. Several multithreaded processors are announced by industry or already into production in the areas of high-performance microprocessors, media, and network processors.
Arxiv preprint arXiv:1110.3535, 2011
Microprocessors have revolutionized the world we live in and continuous efforts are being made to manufacture not only faster chips but also smarter ones. A number of techniques such as data level parallelism, instruction level parallelism and hyper threading (Intel's HT) already exists which have dramatically improved the performance of microprocessor cores. [1, 2] This paper briefs on evolution of multi-core processors followed by introducing the technology and its advantages in today's world. The paper concludes by detailing on the challenges currently faced by multi-core processors and how the industry is trying to address these issues.
arXiv (Cornell University), 2013
In the past, efforts were taken to improve the performance of a processor via frequency scaling. However, industry has reached the limits of increasing the frequency and therefore concurrent execution of instructions on multiple cores seems the only possible option. It is not enough to provide concurrent execution by the hardware, software also have to introduce concurrency in order to exploit the parallelism.
Until recent times, we have worked with processors having a single computing/processing unit (CPU), also called a core. The clock frequency of the processor, which determines the speed of it, cannot be exceeded beyond a certain limit as with the increasing frequency, the power dissipation increases and therefore the amount of heating. So manufacturers came up with a new design of processors, called Multicore processors. A multicore processor has two or more independent computing/processing units (cores) on the same chip. Multiple cores have advantage that they run on lower frequency as compared to the single processing unit, which reduces the power dissipation or temperature. These multiple cores work together to increase the multitasking capability or performance of the system by operating on multiple instructions simultaneously in an efficient manner. This also means that with multithreaded applications, the amount of parallel computing or parallelism is increased. The applications or algorithms must be designed in such a way that their subroutines take full advantage of the multicore technology. Each core or computing unit has its own independent interface with the system bus.. But along with all these advantages, there are certain issues or challenges that must be addressed carefully when we add more cores. In this paper, we discuss about multicore processor technology. In addition to this, we also discuss various challenges faced such as power and temperature (thermal issue), interconnect issue etc. when more cores are added.
Cornell University - arXiv, 2010
Invention of Transistors in 1948 started a new era in technology, called Solid State Electronics. Since then, sustaining development and advancement in electronics and fabrication techniques has caused the devices to shrink in size and become smaller, paving the quest for increasing density and clock speed. That quest has suddenly come to a halt due to fundamental bounds applied by physical laws. But, demand for more and more computational power is still prevalent in the computing world. As a result, the microprocessor industry has started exploring the technology along a different dimension. Speed of a single work unit (CPU) is no longer the concern, rather increasing the number of independent processor cores packed in a single package has become the new concern. Such processors are commonly known as multi-core processors. Scaling the performance by using multiple cores has gained so much attention from the academia and the industry, that not only desktops, but also laptops, PDAs, cell phones and even embedded devices today contain these processors. In this paper, we explore state of the art technologies for multi-core processors and existing software tools to support parallelism. We also discuss present and future trend of research in this field. From our survey, we conclude that next few decades are going to be marked by the success of this "Ubiquitous parallel processing".
2008
Over the last few years, processor companies have moved from single threaded processors to chip multi-threaded processors. This change has been driven by factors such as thermal dissipation and need to mask memory accesses given the wide difference in memory and processor speeds. In fact as per Intel's estimates, at the end of 2007 about 80 percent of their desktop processor sales and nearly 100 percent of their server processor sales were chip multi-threaded. Though chip multi-threaded processors have become common, present operating systems are built for single threaded processors and thus are not able to make full use of the better processors. To fully utilize the compute density of these processors, some changes are needed in the operating system design, especially in the use of limited resources like L2 cache.
International Symposium on Parallel Computing in Electrical Engineering (PARELEC'06)
Multi-core processors represent an evolutionary change in conventional computing as well setting the new trend for high performance computing (HPC)but parallelism is nothing new. Intel has a long history with the concept of parallelism and the development of hardware-enhanced threading capabilities. Intel has been delivering threadingcapable products for more than a decade. The move toward chip-level multiprocessing architectures with a large number of cores continues to offer dramatically increased performance and power characteristics. Nonetheless, this move also presents significant challenges. This paper will describe how far the industry has progressed and evaluates some of the challenges we are facing with multi-core processors and some of the solutions that have been developed.
2008
High-performance single-threaded processors achieve their performance goal partly by relying, among other architectural techniques, on speculation and large on-chip caches. The hardware to support these techniques is usually a large portion of the overall processor real state area, and therefore it consumes a significant amount of power that sometimes is not optimally used toward doing useful work. In this work, we study the intuitive fact that architectures with hardware support for threads are more power efficient than a more traditional single-threaded superscalar architecture. Toward this goal, we have created a model of the power, performance and area of several parallel architectures. This model shows that a parallel architecture can be designed so that (a) it requires less area and power (to reach the same performance), or (b) it achieves better power efficiency and less area (for the same power budget), or (c) it has higher performance and better power efficiency (for the same area constraint), when compared to a single-threaded superscalar architecture.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.