Jesús Sánchez

Follower

Following

Public Views

Antonio Gonzalez

Jesus Sanchez

University of California, Los Angeles

Michael Chu

Scott Mahlke

University of Michigan

david lópez

Universitat Politecnica de Catalunya

Carlos Perez Araujo

Dr. Eng. Nabil Hasasneh

Hebron University

Fred Chow

Saman Amarasinghe

Interests

Uploads

Papers by Jesús Sánchez

Modulo scheduling for a fully-distributed clustered VLIW architecture

Clustering is an approach that many microprocessors are adopting in recent times in order to miti... more Clustering is an approach that many microprocessors are adopting in recent times in order to mitigate the increasing penalties of wire delays. In this work we propose a novel clustered VLIW architecture which has all its resources partitioned among clusters, including the cache memory. A modulo scheduling scheme for this architecture is also pro-

Download

Graph-partitioning based instruction scheduling for clustered processors

This work presents a novel scheme to schedule loops for clustered microarchitectures. The scheme ... more

A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors

This work presents a modulo scheduling framework for clustered ILP processors that integrates the... more This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.

Download

Instruction scheduling for clustered VLIW architectures

Clustered VLIW organizations are .es nowadays a common trend in the design of embedde&DSP process... more Clustered VLIW organizations are .es nowadays a common trend in the design of embedde&DSP processors. In this work we propose a novel niodulo scheduling approach f o r such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is more effective than doingflrst the assignment and latter the scheduling. We also show that loop unrolling signijicantly enhances the performance of the proposed schedule< especially when the communication chunriel among clusters is the main perjiormance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation f o r the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. MoreoveK when the cycle time is taken into account, a 4-cluster conjguration is 3.6 times faster than the uniped architecture.

Download