Clustering is an approach that many microprocessors are adopting in recent times in order to miti... more Clustering is an approach that many microprocessors are adopting in recent times in order to mitigate the increasing penalties of wire delays. In this work we propose a novel clustered VLIW architecture which has all its resources partitioned among clusters, including the cache memory. A modulo scheduling scheme for this architecture is also pro-
This work presents a novel scheme to schedule loops for clustered microarchitectures. The scheme ... more This work presents a novel scheme to schedule loops for clustered microarchitectures. The scheme is based on a pre-liminary cluster assignment phase implemented through graph partitioning techniques followed by a scheduling phase that integrates register allocation and spill ...
This work presents a modulo scheduling framework for clustered ILP processors that integrates the... more This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.
Clustered VLIW organizations are .es nowadays a common trend in the design of embedde&DSP process... more Clustered VLIW organizations are .es nowadays a common trend in the design of embedde&DSP processors. In this work we propose a novel niodulo scheduling approach f o r such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is more effective than doingflrst the assignment and latter the scheduling. We also show that loop unrolling signijicantly enhances the performance of the proposed schedule< especially when the communication chunriel among clusters is the main perjiormance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation f o r the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. MoreoveK when the cycle time is taken into account, a 4-cluster conjguration is 3.6 times faster than the uniped architecture.
Clustering is an approach that many microprocessors are adopting in recent times in order to miti... more Clustering is an approach that many microprocessors are adopting in recent times in order to mitigate the increasing penalties of wire delays. In this work we propose a novel clustered VLIW architecture which has all its resources partitioned among clusters, including the cache memory. A modulo scheduling scheme for this architecture is also pro-
This work presents a novel scheme to schedule loops for clustered microarchitectures. The scheme ... more This work presents a novel scheme to schedule loops for clustered microarchitectures. The scheme is based on a pre-liminary cluster assignment phase implemented through graph partitioning techniques followed by a scheduling phase that integrates register allocation and spill ...
This work presents a modulo scheduling framework for clustered ILP processors that integrates the... more This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.
Clustered VLIW organizations are .es nowadays a common trend in the design of embedde&DSP process... more Clustered VLIW organizations are .es nowadays a common trend in the design of embedde&DSP processors. In this work we propose a novel niodulo scheduling approach f o r such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is more effective than doingflrst the assignment and latter the scheduling. We also show that loop unrolling signijicantly enhances the performance of the proposed schedule< especially when the communication chunriel among clusters is the main perjiormance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation f o r the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. MoreoveK when the cycle time is taken into account, a 4-cluster conjguration is 3.6 times faster than the uniped architecture.
Uploads
Papers by Jesús Sánchez