Intel Microarchitecture White Paper
Intel Microarchitecture White Paper
Table of Contents The Adaptability of the Intel® Xeon® Processor 3500 and 5500 Series
Introducing a New Dynamically and Performance Software Adaptable
Design-Scalable Microarchitecture Maximizes performance The processor adapts to
by adapting to the workload the way your application
that Rewrites the Book on Energy through Intel® Turbo Boost wants to run
Efficiency and Performance. . . . . . . . 1 Technology and Intel® Intel® Turbo Boost
Hyper-Threading Technology Technology
The Adaptability of the Intel® Xeon®
Processor 3500 and 5500 Series. . . 2
Example: Intel® Xeon® Processor
3500 and 5500 Series. . . . . . . . . . . . . 2 Integrated
Power Gates
Quad-Core with
Hyper-Threading
2
White Paper First Tick, Now Tock: Intel® Microarchitecture (Nehalem)
• Intel Hyper-Threading Technology, a capability which enables • Scalable and configurable system interconnects and integrated
running two simultaneous threads per core—an amazing eight memory controllers
simultaneous threads per quad-core processor and 16 simulta-
• High-performance integrated graphics engine for client platforms
neous threads for dual-processor, quad-core designs. This
feature provides an energy efficient means of increasing
performance for multi-threaded workloads.
3
White Paper First Tick, Now Tock: Intel® Microarchitecture (Nehalem)
CORE 0
CORE 1
CORE 2
CORE 3
CORE 0
CORE 1
CORE 2
CORE 3
CORE 0
CORE 1
is available headroom within power, current,
and temperature specification limits. This
enables Intel Xeon processor 3500 and
All cores operate All cores operate Fewer cores may operate
5500 series to deliver extra performance at rated frequency at higher frequency at even higher frequencies
when and where it is needed (see Figure 2).
This can be particularly advantageous in Figure 2. Intel® Turbo Boost Technology increases performance by increasing processor frequency
and enabling faster speeds when conditions allow.
speeding up the processing of light or
lightly threaded workloads.
Intel Turbo Boost Technology is activated frequency step (133.33 MHz) when two or and voltage. Due to the way the system
when the operating system requests the more cores are active. Therefore, higher firmware and OS communicate, the
highest processor performance state. C-state residency (“C3” or “C6”) on some software may never detect core clock
Headroom is dynamically assessed by cores will generally result in increased frequencies above the operating frequency.6
continual measurement of temperature, core frequency on the active cores.
current draw, and power consumption. 2. Intel® Hyper-Threading Technology
The upper limits are further constrained
The maximum frequency of Intel Turbo Many server and workstation applications
by temperature, power, and current.
Boost Technology is dependent on the lend themselves to parallel, multi-threaded
These constraints are managed as a
number of active cores. The amount of execution. Intel Hyper-Threading Technol-
simple closed-loop control system. If
time the processor spends in the Intel ogy enables simultaneous multi-threading
measured temperature, power, and
Turbo Boost Technology state depends on within each processor core, up to two
current are all below factory-configured
the workload and operating environment. threads per core or eight threads per
limits, and the operating system (OS) is
quad-core processor (see Figure 3, next
Any of the following can set the upper requesting maximum processor perfor-
page). Hyper-threading reduces computa-
limit of Intel Turbo Boost Technology mance, the processor automatically steps
tional latency, making optimal use of every
on a given workload: up core frequency (+133.33 MHz) until it
clock cycle. For example, while one thread is
reaches the upper limit dictated by the
• Number of active cores waiting for a result or event, another
number of active cores. When temperature,
thread is executing in that core to maximize
• Estimated current consumption power, or current exceed factory-configured
the work from each clock cycle.
limits—and you are above the base operating
• Estimated power consumption
frequency—the processor automatically An Intel® processor and chipset combined
• Processor temperature steps down core frequency (-133.33 MHz) with an operating system and system
in order to reduce temperature, power, firmware supporting Intel Hyper-Threading
The number of active cores at any given
and current. The processor then monitors Technology enables:
instant dictates the upper limit of Intel
temperature, power, and current, and
Turbo Boost Technology. For instance, • Running demanding applications
continuously re-evaluates.
a core is considered “active” if it is in the simultaneously while maintaining
“C0” or “C1” state; a core in the “C3” or “C6” Note: When Intel Turbo Boost Technology system responsiveness
state is considered “inactive.” (C-states is requested by the OS, the processor will
•R
unning multi-threaded applications
are the power conservation states of a commonly operate between the maximum
faster to maximize productivity
processor core.) The upper limits will vary Intel Turbo Boost Technology frequency
and performance
on a per-processor number basis. For and the base operating frequency. All
example, a particular processor may allow active cores in the processor will operate • Increasing the number of transactions
up to two frequency steps (266.66 MHz) at the same frequency. Even at frequen- that can be processed simultaneously
when just one core is active and one cies above the base operating frequency,
•P
roviding headroom for new solution
all active cores will run at the same frequency
capabilities and future needs
4
White Paper First Tick, Now Tock: Intel® Microarchitecture (Nehalem)
5
White Paper First Tick, Now Tock: Intel® Microarchitecture (Nehalem)
– Improved Hardware Prefetch and Better encoding and processing, 3-D imaging, and 4. New System Architecture: Intel®
Load-Store Scheduling: Intel microarchi- gaming). In addition, Intel microarchitecture QuickPath Technology
tecture (Nehalem) continues the many (Nehalem) adds seven new Application To deliver top performance for bandwidth-
advances Intel made with the Intel Core Targeted Accelerators for more efficient intensive applications, the Intel Xeon
microarchitecture (Penryn) family of accelerated string and text processing of processor 3500 and 5500 series feature
processors in reducing memory access applications like XML. Intel® QuickPath Technology (see Figure 4,
latencies through prefetch and load- next page). As mentioned earlier in this
Application Targeted Accelerators extend
store scheduling improvements. paper, this new scalable, shared memory
the capabilities of Intel® architecture by
architecture delivers memory bandwidth
Enhanced Branch Prediction adding performance-optimized, low-latency,
leadership at up to 3.5 times the band-
lower power fixed-function accelerators
Branch prediction attempts to guess whether width of previous-generation processors.
on the processor die to benefit specific
a conditional branch will be taken or not.
applications. Such accelerators are the Intel QuickPath Technology is a platform
Branch predictors are crucial in today’s
start of a natural evolution where gradually architecture that provides high-speed (up
processors for achieving high performance.
more and more advantageous implemen- to 25.6 GB/s), point-to-point connections
They allow processors to fetch and execute
tations of fixed-function capabilities will between processors, and between proces-
instructions without waiting for a branch
be developed and added to the processor. sors and the I/O hub. Each processor has
to be resolved. Processors also use branch
Just as the evolution of silicon technology its own dedicated memory that it accesses
target prediction to attempt to guess
from 65nm to 45nm to 32nm enables directly through an Integrated Memory
the target of the branch or unconditional
more transistors for additional cores and Controller. In cases where a processor
jump before it is computed by parsing the
cache, so too will it also enable more of needs to access the dedicated memory of
instruction itself. In addition to greater
these fixed-function on-die implementations. another processor, it can do so through a
performance, an additional benefit of
The benefit will be greater performance— high-speed Intel® QuickPath Interconnect
increased branch prediction accuracy is
and superior energy efficiency—for these that links all the processors.
that it can enable the processor to consume
specific applications.
less energy by spending less time executing Intel microarchitecture (Nehalem) comple-
mispredicted branch paths. The seven Application Targeted Accelerators ments the benefits of Intel QuickPath
included in Intel microarchitecture (Nehalem) Interconnect by enhancing Intel Smart
Intel microarchitecture (Nehalem) uses
provide new string and text processing Cache with an inclusive shared L3 cache
several innovations to reduce branch
instructions to improve performance of that boosts performance while reducing
mispredicts that can hinder performance
string and text processing operations. For traffic to the processor cores.
and to improve the handling of
example, they enable parsing of XML strings
branch mispredicts.
and text at a much higher speed. These Intel® QuickPath Interconnect Performance
• New Second-Level Branch Target Buffer Application Targeted Accelerators will be Intel QuickPath Interconnect ‘s throughput
(BTB). To improve branch predictions in useful for lexing, tokenizing, regular expres- clearly demonstrates its best-in-class
applications that have large code foot- sion evaluation, virus scanning, and intrusion. interconnect performance in the server/
prints (e.g., database applications), Intel workstation market segment.
added a second-level branch target buffer. Improved Virtualization Performance
• Intel QuickPath Interconnect uses up to
BTBs reduce the performance penalty Virtualization partitions a computer so
6.4 Gigatranfers/second links, delivering
of branches in pipelined processors by that it can run separate operating systems
up to 25 Gigabytes/second (GB/s) of total
predicting the path of the branch and and software in each partition, allowing
bandwidth. That’s up to 300 percent
caching information used by the branch. one computer to act as many. Virtualiza-
greater than any other interconnect
tion enables computers, particularly
• New Renamed Return Stack Buffer solution used previously. (Gigatransfer
servers, to better leverage multi-core
(RSB). RSBs store forward and return refers to the number of data transfers
processing power and increase efficiency.
pointers associated with call and return or operations.)
Intel microarchitecture (Nehalem) adds
instructions. Intel’s new renamed RSB
new features that enable software to • Intel QuickPath Interconnect ‘s superior
helps avoid many common return
further improve their performance in architecture reduces the amount of
instruction mispredictions.
virtualized environments. For example, communication required in the interface
New Application Targeted Accelerators Intel microarchitecture (Nehalem) includes of multi-processor systems to deliver
and Intel® SSE4 an Extended Page Table (EPT) for reconcil- faster payloads.
Intel microarchitecture (Nehalem) includes ing memory type specification in a guest
• Intel QuickPath Interconnect Implicit
all the additional Intel SSE4 instructions operating system with memory type
Cyclic Redundancy Check (CRC) with
Intel included in Intel Core microarchitec- specification in the host operating system
link-level retry ensures data quality
ture (Penryn) for faster computation/ in virtualization systems that support
and performance by providing CRC
manipulation of media (graphics, video memory type specification.
without the performance penalty of
additional cycles.
6
White Paper First Tick, Now Tock: Intel® Microarchitecture (Nehalem)
Voltage (cores)
Controller
Controller
Memory
Memory
Processor Processor
Figure 4. Intel® QuickPath Technology provides dedicated per-processor Figure 5. Integrated Power Gates enable idle cores to go to near-zero
memory and point-to-point connectivity. power independently.
5. Intel® Intelligent Power Technology Chipset: Intel® Virtualization Technology for Directed I/O (Intel®
Intel Intelligent Power Technology is an innovation that monitors VT-d) helps speed data movement and eliminates much of the
power consumption in servers to identify those that are not performance overhead by giving designated virtual machines
being fully utilized. It has two main features: their own dedicated I/O devices, thus reducing the overhead
of the VMM in managing I/O traffic.
• Integrated Power Gates (see Figure 5) allow individual idling cores
to be reduced to near-zero power independent of other operating Network Adapter: Intel® Virtualization for Connectivity (Intel® VT-c)
cores, reducing idle power consumption to 10 watts, versus 16 further enhances server I/O solutions by integrating extensive
or 50 watts in prior generations of Intel quad-core processors7. hardware assists into the I/O devices that are used to connect
servers to the data center network, storage infrastructure, and
• Automated Low-Power States automatically put processor and other external devices. By performing routing functions to and
memory into the lowest available power states that will meet from virtual machines in dedicated network silicon, Intel VT-c
the requirement of the current workload (see Figure 6). Because speeds delivery and reduces the load on the VMM and server
processors are enhanced with more and lower CPU power processors, providing up to two times the throughput of
states, and the memory and I/O controllers have new power non-hardware-assisted devices.8
management features, the degree to which power can be
minimized is now greatly enhanced.
Controller
Memory
Processor Processor
Processor: Improvements to Intel® Virtualization Technology
for IA-32, Intel® 64 and Intel® Architecture (Intel® VT-x) provide
hardware-assisted page-table management, allowing the guest Memory Memory
7
White Paper First Tick, Now Tock: Intel® Microarchitecture (Nehalem)
Summary
Intel microarchitecture (Nehalem) represents the next level of multi-core performance, offering the latest in processor innovation.
First appearing as the Intel® Core™ i7 processor and now the foundation for the Intel Xeon processor 3500 and 5500 series, Intel
microarchitecture (Nehalem) intelligently maximizes performance to match workloads. As a microarchitecture for server/workstation
processors, it offers energy-efficient performance that scales energy use per performance demands while unleashing parallel processing
performance. Its new, scalable, shared memory architecture integrates a memory controller into each microprocessor and connects
processors and other components with a new high-speed interconnect that speeds traffic between processors and I/O controllers for
bandwidth-intensive applications. Numerous virtualization technologies enable Intel microarchitecture (Nehalem) to offer best-in-class
virtualization, making it the obvious choice for consolidation projects and—with its energy-efficient performance—server refreshes.
Learn More
For more information on Intel microarchitecture (Nehalem) including animations and podcasts, visit:
www.intel.com/technology/architecture-silicon/next-gen/index.htm
For more on Intel’s 45nm Hi-k metal gate process technology, see: www.intel.com/technology/45nm
For more on Intel QuickPath Technology, download the Intel QuickPath Architecture white paper at:
www.intel.com/technology/quickpath
For more on Intel Xeon processor 5500 series, download the product brief at:
www.intel.com/Assets/PDF/prodbrief/xeon-5500.pdf
For more on Intel SSE4, download the white paper, “Extending the World’s Most Popular Processor Architecture,” at:
download.intel.com/technology/architecture/new-instructions-paper.pdf
www.intel.com
1
Hyper-Threading Technology requires a computer system with a processor supporting Hyper-Threading Technology and an HT Technology enabled chipset, BIOS and operating system. Performance will vary depending on the specific hard-
ware and software you use. See www.intel.com/info/hyperthreading/ for more information including details on which processors support HT Technology.
2
Intel internal measurement. (Feb 2009) Stream-Triad benchmark. Red Hat Enterprise Linux Server 5.3. Intel® Xeon® processor E5472, 3.0 GHz, 2x6 MB L2 cache, 1600 MHz system bus, 16 GB memory (8x2 GB FB DDR2-800) vs. Intel® Xeon®
processor X5570, 2.93 GHz, 8 MB L3 cache, 6.4QPI, 24 GB memory (6x4 GB DDR3-1333).
3
Not applicable to Macintosh operating systems. Uses Intel® Turbo Boost Technology which requires a platform with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on
hardware, software and overall system configuration. Check with your platform manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see www.intel.com/technology/turboboost.
4
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and, for some uses, certain platform software enabled for it. Functionality, performance or other benefits will
vary depending on hardware and software configurations and may require a BIOS update. Software applications may not be compatible with all operating systems. Please check with your application vendor.
5
Performance results on VMmark benchmark. Intel® Xeon processor X5470 data based on published results. Intel® Xeon processor X5570 Intel internal measurement. (Feb 2009): HP Proliant ML370 G5 server platform with Intel Xeon proces-
sors X5470 3.33 GHz, 2x6 MB L2 cache, 1333 MHz FSB, 48 GB memory, VMware* ESX* V3.5.0 Update 3 Published at 9.15@ 7 tiles vs. Intel® Xeon® processor X5570, 2.93 GHz, 8 MB L3 cache, 6.4QPI, 72 GB memory (18x4 GB DDR3-800),
VMware* ESX* Build 140815. Performance measured at 19.51@ 13 tiles.
6
For a more in-depth discussion of how Intel Turbo Boost technology works, see: http://download.intel.com/design/processor/applnots/320354.pdf
7
Depending on processor SKU.
8
Intel internal measurement. (April 2008) Ixia* IxChariot* 6.4 benchmark. VMWare* ESX* v3.5U1. Intel® Xeon® processor E5355, 2.66 GHz, 8 MB L2 cache, 1333 MHz system bus, 8 GB memory (8x1 GB FB DIMM 667 MHz).
All products, platforms, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
All data is based on comparisons of engineering data sheets or measurements using actual hardware or simulators.
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and, for some uses, certain platform software enabled for it. Functionality, performance or other benefits will
vary depending on hardware and software configurations and may require a BIOS update. Software applications may not be compatible with all operating systems. Please check with your application vendor.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,
TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH
PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL
PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR
INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions
marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to
them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current
characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies
of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel’s Web site
at www.intel.com.
Copyright © 2009 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Core, Xeon Inside, and Xeon are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others. Printed in USA 0809/EB/HBD/PDF Please Recycle 319724-002US