Papers by Karthikeyan Sankaralingam
Efficient execution of memory access phases using dataflow specialization
Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15, 2015
Karthikeyan Sankaralingam, Ramadass Nagarajan, Stephen W. Keckler, and Doug Burger. SimpleScalar Simulation
Comprehensive Circuit Failure Prediction for Logic and SRAM Using Virtual Aging
IEEE Micro, 2015
A wire-delay scalable microprocessor architecture for high performance systems
2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC., 2000
... IBM, and Intel. References [1] MS Hrishikesh, NP Jouppi, KI Farkas, D. Burger, SW Keckler, an... more ... IBM, and Intel. References [1] MS Hrishikesh, NP Jouppi, KI Farkas, D. Burger, SW Keckler, and P. Shivakumar, “The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays,” ISCA-29, pp. 14-24, May, 2002. [2] R ...
Toward a multicore architecture for real-time ray-tracing
2008 41st IEEE/ACM International Symposium on Microarchitecture, 2008
Significant improvement to visual quality for real-time 3D graphics requires modeling of complex ... more Significant improvement to visual quality for real-time 3D graphics requires modeling of complex illumination effects like soft-shadows, reflections, and diffuse lighting interac-tions. The conventional Z-buffer algorithm driven GPU model does not provide sufficient support for this ...
CMOS technology scaling poses challenges in designing dynamically scheduled cores that can sustai... more CMOS technology scaling poses challenges in designing dynamically scheduled cores that can sustain both high instruction-level parallelism and aggressive clock frequencies. In this paper, we present a new architecture that maps compiler-scheduled blocks onto a two-dimensional grid of ALUs. For the mapped window of execution, instructions execute in a dataflow-like manner, with each ALU forwarding its result along short wires to the consumers of the result. We describe our studies of program behavior and a preliminary evaluation that show that this architecture has the potential for both high clock speeds and high ILP, and may offer the best of both the VLIW and dynamic superscalar architectures.
Appears in the Proceedings of the 34 th Annual International Symposium on Microarchitecture
ABSTRACT
Appears in the Proceedings of the Annual International Symposium on Computer Architecture
ABSTRACT
Appears in the 5th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT-5)
ABSTRACT
Appears in the Proceedings of the 30
ABSTRACT
Technology constraints and application characteristics are radically changing as we scale to the ... more Technology constraints and application characteristics are radically changing as we scale to the end of silicon technology. Devices are becoming increasingly brittle, highly varying in their properties, and error-prone, leading to a fundamentally unpredictable hardware substrate. Applications are also changing, and emerging new classes of applications are increasingly relying on probabilistic methods. They have an inherent tolerance for uncertainty and can tolerate hardware errors.
Design and analysis of routed Inter-ALU Networks for ILP scalability and performance
Abstract: Modern processors rely heavily on broadcast networks to bypass instruction results tode... more Abstract: Modern processors rely heavily on broadcast networks to bypass instruction results todependent instructions in the pipeline. However, as architectures get wider and pipelinesget deeper, broadcasting becomes more complex, slower, and more difficult to implement.

Deep packet inspection is becoming prevalent for mod- ern network processing systems. They inspec... more Deep packet inspection is becoming prevalent for mod- ern network processing systems. They inspect packet pay- loads for a variety of reasons, including intrusion detecti on, traffic policing, and load balancing. The focus of this paper is deep packet inspection in intrusion detection/preventi on systems (IPSes). The performance critical operation in the se systems is signature matching: matching payloads against signatures of vulnerabilities. Increasing network speedsof today's networks and the transition from simple string-bas ed signatures to complex regular expressions has rapidly in- creased the performance requirement of signature matching . To meet these requirements, solutions range from hardware- centric ASIC/FPGA implementations to software implemen- tations using high-performance microprocessors. In this paper, we propose a programmable SIMD archi- tecture design for IPSes and develop a prototype implemen- tation on an Nvidia G80 GPU. We first present a detailed archi...
Exploring the potential of heterogeneous von neumann/dataflow execution models
Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15, 2015
Optimization and Mathematical Modeling in Computer Architecture
Synthesis Lectures on Computer Architecture, 2013
Architectural Simulators Considered Harmful
IEEE Micro, 2015
Enabling GPGPU Low-Level Hardware Explorations with MIAOW
ACM Transactions on Architecture and Code Optimization, 2015
Uploads
Papers by Karthikeyan Sankaralingam