The AMD Next-
Generation
“Zen 3” Core
•Presented by: Group FRI12
•Vijayasri Kristaparapu
•Dhanashree Shinde
Date : 20th Oct
Based on : THEME ARTICLE: HOT CHIPS 2025
33 1
Background and History
AMD LAUNCHED DELIVERED MAJOR ZEN 2 (2019) ZEN 3 (2020–2021) INTRODUCED NEW SUPPORTED
THE ZEN CORE GAINS IMPROVED REARCHITECTED ISA INNOVATIONS
ARCHITECTURE IN INSTRUCTIONS PER IPCS, CLOCK SPEEDS, THE CACHE EXTENSIONS, EXPAND FOR SERVERS, DATAC
IN EARLY 2017, CYCLE (IPC) AND AND CACHE SIZE, HIERARCHY INTO ED SECURITY ENTERS,
MARKING INTRODUCED AN WHILE UNIFIED EIGHT-CORE FEATURES, AND AND SUPERCOMPUTE
A COMPLETE INNOVATIVE SYSTEM- TRANSITIONING TO COMPLEXES, COMPATIBILITY WITH RS, INCLUDING 3D V-
REDESIGN FROM ON-CHIP A 7 NM PROCESS. ENABLING LOWER PREVIOUS AM4 CACHE
PRIOR GENERATIONS. (SOC) DESIGN LATENCY AND HIGHE SOCKETS FOR EASY INTEGRATION TO
FEATURING FOUR- R SINGLE-THREAD PLATFORM BOOST
CORE COMPLEXES. PERFORMANCE. UPGRADES. PERFORMANCE
EFFICIENCY.
2
Introduction
• Zen 3 is another redesign of AMD’s CPU core architecture.
• Focused on higher single-thread performance and better energy efficiency.
• Introduces Simultaneous Multithreading (SMT) to boost throughput with
additional threads.
• Features a reworked pipeline, enhanced branch prediction, and optimized execution
units.
• Achieves a 19% IPC uplift over Zen 2, the largest improvement since the original
“Zen”.
• Improved fetch, decode, integer, and floating-point execution units contribute to the
performance gain.
• Designed for balanced high performance and power efficiency across diverse
workloads.
3
Block diagram
Front End (Fetch & Decode)
• Advanced branch predictor with reduced
misprediction and taken-branch latency.
• 32 KB L1 instruction cache + 4,096-entry op-
cache (fetch up to 8 ops/cycle).
Integer Execution
• Distributed integer scheduler for higher efficiency.
• 4 ALUs, 3 AGUs, plus new branch & store data
units.
• 10 integer ops/cycle, larger reorder buffer (256
entries).
Floating-Point Execution
• Dispatches 6 ops/cycle.
• 2 add + 2 multiply units → 2 FMA ops/cycle (high
throughput).
Load/Store System
• 32 KB L1 data cache, 512 KB L2 cache, 3 memory
ops/cycle. 4
Branch Prediction and Front
End
• Advanced TAGE2 branch predictor optimized for latency and accuracy.
• L1 BTB: 1,024 entries, L2 BTB: 6,656 entries, indirect target table: 1,536
entries.
• 32 KB instruction cache, improved prefetching.
• Faster recovery from mispredictions, reduced branch latency.
5
Integer and Floating-Point
Execution
• Integer: Increased issue width: 7 → 10 µops per cycle.
• 4 ALUs, 3 AGUs, plus new branch and store units.
• Larger reorder buffer: 256 entries; Scheduler: 96 entries.
• Floating point: 6 µops per cycle; 2 FMA units (256-bit); latency reduced to 4
cycles.
• FP scheduler entries: 64 (up from 36 in Zen 2).
6
Load / Store and Memory
• L1 Data Cache: 32 KB, 8-way; L2: 512 KB per core.
• Store queue: 64 entries (was 48 in Zen 2).
• Improved prefetchers for cross-page and multi-level coordination.
• Supports 3 memory ops per cycle (2 stores + 1 load).
7
L3 Cache and Core Complex
• Unified 8-core CCX with shared 32 MB L3 (vs. 4-core/16 MB in Zen 2).
• Reduces latency and improves data sharing among cores.
• Bi-directional ring bus interconnect for low-latency L3 access.
• L3 filled from L2 victims only for better utilization.
8
3D V-Cache
• AMD’s 3D V-Cache: vertical stacking of extra L3 cache.
• Adds 64 MB stacked L3 to base 32 MB → 96 MB per CCD.
• Copper-to-copper bonding provides high bandwidth, low power.
• ~15% gaming FPS uplift with V-Cache prototype (12-core test).
9
Security Features
• SEV: Secure Encrypted Virtualization (per-VM memory encryption).
• SEV-ES: Adds encrypted CPU register state.
• SNP (Secure Nested Paging): Protects VM memory mappings from hypervisors.
• CET Shadow Stack and Memory Protection Keys for client CPUs.
• 256-bit encryption instruction extensions (VAES, VPCLMULQDQ).
10
Performance Highlights
• +19% IPC uplift at fixed frequency vs. Zen 2.
• +26–50% gaming performance boost.
• Better power efficiency, lower effective latency.
11
Conclusion and Key Takeaways
• Zen 3’s unified 8-core design greatly reduces latency and boosts IPC.
• The bi-directional ring bus ensures fast, balanced L3 access.
• The victim-only L3 policy maximizes cache efficiency.
• 3D V-Cache demonstrates AMD’s leadership in stacked memory design.
• And new security features strengthen cloud and client protection.
• Zen 3 laid the foundation for Zen 4 and Zen 5, showing how smart
architecture can deliver big gains even without a smaller process node.
12
THANK
YOU!!
13
Questions and
Feedback
14