Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1998, IEEE Journal of Solid-state Circuits
…
9 pages
1 file
The organization and circuit design of a 1.0-GHz integer processor built in 0.25-m CMOS technology are presented.
1998
aper describes the design methodology used to experimental 1.0 GigaHertz PowerPC integer cation tools as well as circuit constraints and itecture philosophy. The microarchitecture, cirtools were defined by the high frequency require-17 0-8186-9099-2/98 $10.00 0 1998 IEEE
2002
microprocessor provides 1-MB on-chip level-2 cache, 4-Gb/s offchip memory bandwidth, and a new 200-MHz JBus interface that supports one to four processors. The 87.5-million transistor chip is implemented in a seven-layer-metal copper 0.13- m CMOS process and dissipates 53 W at 1.3 V and 1.1 GHz. Index Terms—CMOS integrated circuits, computer architecture, high-speed integrated circuits, integrated circuit design, logic design, microprocessors. I.
IEEE Journal of Solid-State Circuits, 2002
A 32-bit integer execution core containing a Han-Carlson arithmetic-logic unit (ALU), an 8-entry 2 ALU instruction scheduler loop and a 32-entry 32-bit register file is described. In a 130 nm, six-metal, dual-CMOS technology, the 2.3 mm 2 prototype contains 160 K transistors. Measurements demonstrate capability for 5-GHz single-cycle integer execution at 25 C. Single-ended, leakage-tolerant dynamic scheme used in ALU and scheduler enables up to 9-wide ORs with 23% critical path speed improvement and 40% active leakage power reduction when compared to a conventional Kogge-Stone implementation. On-chip body-bias circuits provide additional performance improvement or leakage tolerance. Stack node preconditioning improves ALU performance by 10%. At 5 GHz, ALU power is 95 mW at 0.95 V and the register file consumes 172 mW at 1.37 V. The ALU performance is scalable to 6.5 GHz at 1.1 V and to 10 GHz at 1.7 V, 25 C.
2009
The work presented demonstrates the unique ability of Rochester Institute of Technology’s Microelectronic Engineering department to design, simulate, fabricate, and test complex digital integrated circuits. Utilizing the resources available, the author would be the first undergraduate at RIT to successfully drive the creation of a microprocessor from design through fabrication to test. The microprocessor created is the most complex digital circuit ever fabricated at RIT. Fabrication was completed on three lots using the well-established RIT sub gm CMOS Process. Functional CMOS transistors were demonstrated at the Metal 1 level, but complex digital integrated circuits were not realized beyond that.
IEEE Journal of Solid-State Circuits, 1994
A 500 MHz, 32 bit RISC microprocessor has been experimentally developed using an 8-stage pipelined architecture and high-speed circuits, including a 500 MHz 1 kilobyte doublestage pipelined cache, a 1.8 ns register file, a double-stage binary look-ahead carry (BLC) adder circuit, and a 500 MHz phase locked loop (PLL) frequency multiplier. Newly developed circuitintegrating techniques include a stacked power-line structure, which serves as a noise shield and also provides low bounce, a low voltage-swing interface circuit with on-chip adjustable termination resistors, a small-skew clock distribution method, and a clock synchronization circuit which provides small-skew clock among LSI chips. About 200 OOO transistors are integrated into a 7.90 mm x 8.84 mm die area with 0.4 pm CMOS fabrication technology. Power dissipation is 6 W at a 500 MHz operation and 3.3 V supply voltage.
IEEE Journal of Solid-State Circuits, 1996
This paper describes a 160 MHz 500 mW StrongARM microprocessor designed for lowpower, low-cost applications. The chip implements the ARM V4 instruction set 1 and is bus compatible with earlier implementations. The pin interface runs at 3.3 V but the internal power supplies can vary from 1.5 to 2.2 V, providing various options to balance performance and power dissipation. At 160 MHz internal clock speed with a nominal Vdd of 1.65 V, it delivers 185 Dhrystone 2.1 MIPS while dissipating less than 450 mW. The range of operating points runs from 100 MHz at 1.65 V dissipating less than 300 mW to 200 MHz at 2.0 V for less than 900 mW. An on-chip PLL provides the internal clock based on a 3.68 MHz clock input. The chip contains 2.5 million transistors, 90% of which are in the two 16 kB caches. It is fabricated in a 0.35-m three-metal CMOS process with 0.35 V thresholds and 0.25 m effective channel lengths. The chip measures 7.8 mm ϫ 6.4 mm and is packaged in a 144-pin plastic thin quad flat pack (TQFP) package.
IEEE Journal of Solid-state Circuits, 1997
A microprocessor implementing IBM S/390 architecture operates in a 10 + 2 way system at frequencies up to 411 MHz (2.43 ns). The chip is fabricated in a 0.2-m L e CMOS technology with five layers of metal and tungsten local interconnect. The chip size is 17.35 mm 2 17.30 mm with about 7.8 million transistors. The power supply is 2.5 V and measured power dissipation at 300 MHz is 37 W. The microprocessor features two instruction units (IU's), two fixed point units (FXU's), two floating point units (FPU's), a buffer control element (BCE) with a unified 64-KB L1 cache, and a register unit (RU). The microprocessor dispatches one instruction per cycle. The dual-instruction, fixed, and floating point units are used to check each other to increase reliability and not for improved performance. A phase-locked-loop (PLL) provides a processor clock that runs at 22 the system bus frequency. High-frequency operation was achieved through careful static circuit design and timing optimization, along with limited use of dynamic circuits for highly critical functions, and several different clocking/latching strategies for cycle time reduction. Timing-driven synthesis and placement of the control logic provided the maximum flexibility with minimum turnaround time. Extensive use of self-resetting CMOS (SRCMOS) circuits in the on-chip L1 cache provides a 2.0-ns access time and up to 500 MHz operation.
ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005., 2005
The implementation of a first-generation CELL processor that supports multiple operating systems including Linux consists of a 64b power processor element (PPE) and its L2 cache, multiple synergistic processor elements (SPE) [1] that each has its own local memory (LS) [2], a high-bandwidth internal element interconnect bus (EIB), two configurable non-coherent I/O interfaces, a memory interface controller (MIC), and a pervasive unit that supports extensive test, monitoring, and debug functions. The high level chip diagram is shown in .2.1. The key attributes include hardware content protection, virtualization and realtime support combined with extensive single-precision floatingpoint capability. By extending the Power architecture with SPE having coherent DMA access to system storage and with multioperating-system resource-management, CELL supports concurrent real-time and conventional computing. With a dual-threaded PPE and 8 SPEs this implementation is capable of handling 10 simultaneous threads and over 128 outstanding memory requests. .2.7 shows the die micrograph with roughly 234M transistors from 17 physical entities and 580k repeaters and 1.4M nets implemented in 90nm SOI technology with 8 levels of copper interconnects and one local interconnect layer. At the center of the chip is the EIB composed of four 128b data rings plus a 64b tag operated at half the processor clock rate. The wires are arranged in groups of four, interleaved with GND and VDD shields twisted at the center to reduce coupling noise on the two unshielded wires. To ensure signal integrity, over 50% of global nets are engineered with 32k repeaters. The SoC uses 2965 C4s with four regions of different row-column pitches attached to a low-cost organic package. This structure supports 15 separate power domains on the chip, many of which overlap physically on the die. The processor element design, power and clock grids, global routing, and chip assembly support a modular design in a building-block-like construction.
IEEE Journal of Solid-state Circuits, 1999
This superscalar microprocessor is the first implementation of a 32-bit RISC architecture specification incorporating a single-instruction, multiple-data vector processing engine. Two instructions per cycle plus a branch can be dispatched to two of seven execution units in this microarchitecture designed for high execution performance, high memory bandwidth, and low power for desktop, embedded, and multiprocessing systems. The processor features an enhanced memory subsystem, 128-bit internal data buses for improved bandwidth, and 32-KB eightway instruction/data caches. The integrated L2 tag and cache controller with a dedicated L2 bus interface supports L2 cache sizes of 512 KB, 1 MB, or 2 MB with two-way set associativity. At 450 MHz, and with a 2-MB L2 cache, this processor is estimated to have a floating-point and integer performance metric of 20 while dissipating only 7 W at 1.8 V. The 10.5 million transistor, 83-mm 2 die is fabricated in a 1.8-V, 0.20-m CMOS process with six layers of copper interconnect.
IEEE Micro, 1993
We also wanted to simplify the architecture t o ease unnecessary implementation constraints. This flexibility allows optimizations appropriate for specific market targets. In addition, the simplifications permit smaller, faster, and more aggressive superscalar implementations. A third objective was to provide support for a wide range of uniprocessor and multiprocessor system configurations. Recognizing key abstractions of the storage hierarchy and defining the storage control architecture to allow effective management of these abstractions achieved this goal. The architecture also allows storage references to follow either a big-endian or a little-endian byte-ordering convention to support different operating system needs. The final architecture goal was to define 64-bit extensions that allow upward compatibility of 32-bit applications. We defined 64-bit instruction operation and 64-bit memory management as a logical extension to the 32-bit execution model. To allow flexibility, each implementation can either comply with the base 32-bit PowerPC architecture or the extended 64-bit architecture. Autonomous execution units. One fundamental concept of the architecture involves the partitioning of the architecture. The specification divides the execution of instructions into three logically distinct processing units: branch, fixed point, and floating point. The units are loosely coupled so instruction execution can occur concurrently. Note that this is an architectural partitioning that does not impose implementation constraints. For example, the architecture allows implementations to provide multiple copies of any of the units for added performance or to combine any of the units for more efficient silicon area use. We structured the branch processor architecture to allow early handlings of branch instructions. Resources architecturally defined as part of the branch processor generate target addresses and evaluate branch conditions. This logical partitioning lets the branch processor completely remove branch instructions from the execution stream and execute them in parallel with operations occurring in the other functional units. The branch processor architecture defines three user-accessible branch control registers and several forms of branch instructions. The link register in conjunction with certain branch instructions provides efficient subroutine linkage. The count register acts with conditional branch instructions to construct iterating loops. The condition register contains eight 4-bit condition fields, which are set by a wide range of instructions. Branch instructions can be conditional on a bit in the condition register, conditional on the state of the count register, conditional on both registers, or simply unconditional. The branch target address can be absolute, program counter relative, or indirect from either the link register or the count register. We also defined a set of instructions to allow logical operations and movement of fields in the condition register.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
1974 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, 1974
IEEE Journal of Solid-state Circuits, 1997
International Journal of Modern Trends in Engineering and Research, 2015
IEEE Journal of Solid-state Circuits, 2009
IEEE Journal of Solid-State Circuits, 1980
Proceedings - DSD'2005: 8th Euromicro Conference on Digital System Design - Architectures, Methods and Tools, 2005
International Journal of Innovative Research in Science, Engineering and Technology, 2013
2011 IEEE 20th Symposium on Computer Arithmetic, 2011