Clock Tree Synthesis (CTS) Algorithms Metrics for Evaluating CTS
Prepared by Garima Jangid Algorithms
To assess and compare various Clock Tree Synthesis (CTS)
Clock Tree Types algorithms, certain key metrics are used. These metrics help
Before implementing CTS algorithms, it’s important to balance trade-offs between timing, power, and area.
understand the different clock tree architectures used in
digital designs. The selection of tree type directly affects 1. Skew
skew, power, and timing behavior. • Definition: The difference in arrival time of the
clock signal at different flip-flops.
1. Balanced Clock Tree • Goal: Minimize skew to ensure synchronous
• Definition: A tree where the clock signal reaches all operation.
sinks (flip-flops) with equal path lengths and delays. • Types:
• Objective: Achieve zero clock skew across all o Local Skew: Between nearby flip-flops.
endpoints. o Global Skew: Between farthest flip-flops.
• Common Algorithms: H-tree, DME
• Use Case: Regular layouts like SRAM arrays, where 2. Latency (Clock Latency)
uniformity is possible. • Definition: Time taken for the clock signal to travel
from the clock source (e.g., PLL) to the sink (flip-
2. Skew-Tolerant Clock Tree flop).
• Definition: A tree designed with some tolerance to • Importance: Affects setup/hold time margins and
clock skew, often used in asynchronous or mixed- timing closure.
mode designs.
• Why?: Perfect skew is hard to maintain under 3. Power Consumption
process, voltage, temperature (PVT) variations. • Definition: Dynamic power consumed by the clock
• Techniques Used: network due to switching and capacitive loads.
o Adaptive body biasing • Factors Influencing Power:
o CTS ECO to fix late/early clocks o Number of buffers
• Use Case: Designs with tight hold margins or noise- o Wire length
prone environments. o Capacitance of sinks
3. Useful Skew Tree 4. Insertion Delay
• Definition: Intentionally introduces skew to • Definition: Time taken for the clock signal to
improve setup or hold timing by shifting the clock propagate from the root to the first buffer or first
edges. branching point.
• Goal: Exploit skew instead of minimizing it. • Use: Indicates initial response time of the clock
• Example: path.
o Delay clock to ease hold violations
o Advance clock to fix setup issues 5. Jitter
• Technique: Performed using useful skew • Definition: Uncertainty or variation in clock edge
optimization and ILP models after base CTS. arrival time due to noise, voltage fluctuation, or
• Use Case: Timing-critical paths with unbalanced signal integrity issues.
logic. • Relevance: Higher jitter affects stability and timing.
4. Gated Clock Tree 6. Number of Buffers
• Definition: Clock tree with clock gating cells that • Why it Matters: Directly affects:
can shut off the clock to specific blocks. o Power consumption
• Purpose: Reduce dynamic power by disabling idle o Area usage
logic. o Clock latency
• Implementation: • Optimization Goal: Insert the minimum number of
o Clock enable signals generated at RTL buffers required to meet skew/delay constraints.
o Gating logic is inserted upstream of clock
buffers 7. Area Overhead
• Challenges: • Definition: Physical layout area consumed by:
o Glitch prevention o Clock buffers
o Correct timing with enable signal o Gating logic
• Use Case: Power-sensitive designs (mobile SoCs, o Routing metal layers
IoT). • Trade-Off: Some algorithms minimize skew at the
cost of area.
Algorithms D=R⋅C+∑(Ri⋅Ci)
1. H-Tree Algorithm • Accurate modeling ensures timing-aware tree
Concept: construction.
• A deterministic and symmetric clock routing method. Strengths:
• Clock signal is distributed from the center of the chip • Balances both delay and skew.
in an H-like recursive pattern. • Supports buffer insertion, wire sizing.
Key Features:
• Zero skew (theoretically) because of equal path 5. Alpha-DME (A-DME)
lengths. Concept:
• Suitable for memory blocks, SRAM, or regular arrays. • Adds cost functions (e.g., power, area) to traditional
Procedure: DME.
1. Divide chip into quadrants. • Adjusts trade-offs between latency and power using a
2. Draw an "H" structure connecting centers of tuning parameter α.
quadrants. Cost Function Example:
3. Repeat recursively until sinks are covered. Cost=α⋅Delay+(1−α)⋅Power
Limitations: • α close to 1 → delay optimization
• Not effective when sinks are unevenly spread. • α close to 0 → power optimization
• Inefficient in congested or irregular floorplans.
6. Van Ginneken’s Algorithm (Buffer
2. X-Tree Algorithm Insertion)
Concept: Purpose:
• Extension of H-tree; branches diagonally. • Insert buffers to manage fan-out and RC delay.
• Targets better wirelength and skew optimization for Principle:
non-uniform sinks. • Uses dynamic programming to compute optimal
Use Case: buffer placement.
• Applied where layout irregularities or asymmetry exist. Steps:
• Better practical skew performance compared to H- 1. Work from sinks up to the root.
tree. 2. Propagate candidate solutions (capacitance-load
pairs).
3. At each node, prune dominated candidates (worse
3. Steiner Tree-Based CTS
delay/power).
Concept:
• Construct a Minimum Steiner Tree (MST) that
7. Useful Skew Optimization
connects all sinks with minimal total wirelength.
Concept:
Types:
• Instead of targeting zero skew, allows intentional
• Rectilinear Steiner Tree (RST) – Manhattan routing
skew to improve slack.
(horizontal + vertical).
Examples:
• Non-Manhattan Steiner Tree – More flexible angles
(e.g., 45° in advanced nodes).
• Delay clock to flip-flops on short logic paths to ease
hold violations.
Algorithm:
• Advance clock to flip-flops with long logic paths for
• Tools like FLUTE or GeoSteiner used to generate
setup improvement.
optimal trees.
Implementation:
Limitation:
• Often modeled as Integer Linear Programming (ILP).
• May introduce skew unless combined with delay
• Applied after initial CTS in post-CTS optimization.
balancing (via DME or buffers).
8. Clock Gating Tree Insertion
4. Deferred Merge Embedding (DME)
Concept:
Concept:
• Gold standard algorithm for CTS. • Reduces dynamic power by disabling clocks to inactive
blocks.
• Focuses on zero or bounded skew, minimal
Types:
wirelength.
• Static Gating: RTL-based; simple enable signals.
Two-Phase Process: • Dynamic Gating: Data-path driven; complex logic for
fine-grained control.
Bottom-Up Phase:
1. Construct a binary tree of sinks. Key Points:
2. For each subtree, compute a feasible merging region • Gating logic inserted before clock buffers.
(FMR) — region where the parent node can be placed • Must ensure no glitches; controlled by enable latching.
to maintain zero skew.
Top-Down Phase: 9. CTS-Aware Placement Algorithms
1. Choose a point within the FMR to embed the parent. Concept:
2. Recursively place the root node at the driver and • Modify placement to ease CTS complexity.
embed children nodes. Example Strategy:
• Place flip-flops in physically close clusters.
Delay Model: • Reduce clock net length and buffer count.
• Elmore Delay is commonly used: