Papers by Dionisios Pnevmatikatos
2 Guest Editors' Introduction: Multicore: The View from Europe Mateo Valero and Nacho Navarr... more 2 Guest Editors' Introduction: Multicore: The View from Europe Mateo Valero and Nacho Navarro ... 5 ArchExplorer for Automatic Design Space Exploration Veerle Desmet, Sylvain Girbal, Alex Ramirez, Augusto Vega, and Olivier Temam ... 16 The SARC Architecture Alex Ramirez, Felipe Cabarcas, Ben Juurlink, Mauricio Alvarez Mesa, Friman Sanchez, Arnaldo Azevedo, Cor Meenderinck, Ca˘ta˘lin Ciobanu, Sebastian Isaza, and Georgi Gaydadjiev ... 30 Explicit Communication and Synchronization in SARC Manolis GH Katevenis, Vassilis Papaefstathiou, Stamatis ...
Computer Architecture Dept., Polythecnic University of Catalonia (UPC), Barcelona, July, 2008
Abstract. Programming models with explicit communication between parallel tasks allow the runtime... more Abstract. Programming models with explicit communication between parallel tasks allow the runtime system to schedule task execution and data transfers ahead of time. Explicit communication is not limited to message passing and streaming applications: recent proposals in parallel programming allow such explicit communication in other task-based scenarios too. Scheduling of data transfers allows the overlap of computation and communication, and latency hiding, and locality optimization, using programmable data ...
Abstract We evaluate the effects of guarded (or conditional, or predicated) execution on the perf... more Abstract We evaluate the effects of guarded (or conditional, or predicated) execution on the performance of an instruction level parallel processor employing dynamic branch prediction. First, we assess the utility of guarded execution, both qualitatively and quantitatively, using a variety of application programs. Our assessment shows that guarded execution significantly increases the opportunities, for both compiler and dynamic hardware, to extract and exploit parallelism.
Abstract In this paper, we consider hardware-based scanning and analyzing packets payload in orde... more Abstract In this paper, we consider hardware-based scanning and analyzing packets payload in order to detect hazardous contents. We present two pattern matching techniques to compare incoming packets against intrusion detection search patterns. The first approach, decoded partial CAM (DpCAM), predecodes incoming characters, aligns the decoded data, and performs logical and on them to produce the match signal for each pattern.
Abstract SPEC is a new set of benchmark programs designed to measure a computer system's performa... more Abstract SPEC is a new set of benchmark programs designed to measure a computer system's performance. The performance measured by benchmarks is strongly affected by the existence and configuration of cache memory. In this paper we evaluate the cache miss ratio of the Integer SPEC benchmarks. We show that the cache miss ratio depends strongly on the program, and that large caches are not completely exercised by these benchmarks.
Abstract Modern processors are becoming more complex and as features and application size increas... more Abstract Modern processors are becoming more complex and as features and application size increase, their evaluation is becoming more time-consuming. To date, design space exploration relies on extensive use of software simulation that when highly accurate is slow. In this paper we propose ReSim, a parameterizable ILP processor simulation acceleration engine based on reconfigurable hardware.
Abstract We present an innovative protocol processor component that combines wire-speed processin... more Abstract We present an innovative protocol processor component that combines wire-speed processing for low-level, and best effort processing for higher-level protocols. The component is a System-on-Chip that integrates variable size packet buffering, specialised cores for header and field processing, generic RISC cores and scheduling blocks.
Abstract In this paper we advocate the use of pre-decoding for CAM-based pattern matching. We imp... more Abstract In this paper we advocate the use of pre-decoding for CAM-based pattern matching. We implement an FPGA based sub-system for NIDS (Snort) pattern matching using a combination of techniques. First, we reduce the area cost of character matching using (i) character pre-decoding before they are compared in the CAM line, and (ii) efficient shift register implementation using the SRL16 Xilinx cell.
Abstract We report on the hardware implementation of a local memory system for individual process... more Abstract We report on the hardware implementation of a local memory system for individual processors inside future chip multiprocessors (CMP). It intends to support both implicit communication, via caches, and explicit communication, via directly accessible local (ldquoscratchpadrdquo) memories and remote DMA (RDMA). We provide run-time configurability of the SRAM blocks near each processor, so that part of them operates as 2nd level (local) cache, while the rest operates as scratchpad.
Abstract This paper proposes a novel methodology for improving reliability of FPGAs without requi... more Abstract This paper proposes a novel methodology for improving reliability of FPGAs without requiring special purpose hardware. In contrast to related approaches that are applied uniformly over the target architecture, the proposed one insert redundancy only the critical for failure resources. Such an approach leads to reasonable performance improvement.
Abstract The authors consider whether SPECmarks, the figures of merit obtained from running the S... more Abstract The authors consider whether SPECmarks, the figures of merit obtained from running the SPEC benchmarks under certain specified conditions, accurately indicate the performance to be expected from real, live work loads. Miss ratios for the entire set of SPEC92 benchmarks are measured. It is found that instruction cache miss ratios in general, and data cache miss ratios for the integer benchmarks, are quite low.
Abstract. Programming models with explicit communication between parallel tasks allow the runtime... more Abstract. Programming models with explicit communication between parallel tasks allow the runtime system to schedule task execution and data transfers ahead of time. Explicit communication is not limited to message passing and streaming applications: recent proposals in parallel programming allow such explicit communication in other task-based scenarios too.
Abstract We describe the Slice Processor micro-architecture that implements a generalized operati... more Abstract We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operations, or the computation slice that can be used to calculate forthcoming memory references. This is in contrast to outcome-based predictors that exploit regularities in the (address) outcome stream.
Abstract For many programs, especially integer codes, untolerated load instruction latencies acco... more Abstract For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the design and evaluation of a fast address generation mechanism capable of eliminating the delays caused by effective address calculation for many loads and stores. Our approach works by predicting early in the pipeline (part of) the effective address of a memory access and using this predicted address to speculatively access the data cache.
Abstract As intrusion detection systems (IDS) utilize more complex syntax to efficiently describe... more Abstract As intrusion detection systems (IDS) utilize more complex syntax to efficiently describe complex attacks, their processing requirements increase rapidly. Hardware and, even more, software platforms face difficulties in keeping up with the computationally intensive IDS tasks, and face overheads that can substantially diminish performance. In this paper we introduce a packet pre-filtering approach as a means to resolve, or at least alleviate, the increasing needs of current and future intrusion detection systems.
Abstract In this paper, we consider scanning and analyzing packets in order to detect hazardous c... more Abstract In this paper, we consider scanning and analyzing packets in order to detect hazardous contents using pattern matching. We introduce a hardware perfect-hashing technique to access the memory that contains the matching patterns. A subsequent simple comparison between incoming data and memory output determines the match.
Abstract In this paper we propose the combination of hashing and use of memory to achieve low cos... more Abstract In this paper we propose the combination of hashing and use of memory to achieve low cost, exact matching of SNORT-like intrusion signatures. The basic idea is to use hashing to generate a distinct address for each candidate pattern, which is stored in memory. Our implementation, hash-mem, uses simple CRC-style polynomials implemented with XOR gates, to achieve low cost hashing of the input patterns.
Intrusion Detection Systems such as Snort scan incoming packets for evidence of security threats.... more Intrusion Detection Systems such as Snort scan incoming packets for evidence of security threats. The computation-intensive part of these systems is a text search of packet data against hundreds of patterns, and must be performed at wire-speed. FPGAs are particularly well suited for this task and several such systems have been proposed. In this paper we expand on previous work, in order to achieve and exceed OC192 processing bandwidth (10 Gbps).
Abstract The Aho-Corasick (AC) algorithm is a very flexible and efficient but memory-hungry patte... more Abstract The Aho-Corasick (AC) algorithm is a very flexible and efficient but memory-hungry pattern matching algorithm that can scan the existence of a query string among multiple test strings looking at each character exactly once, making it one of the main options for software-base intrusion detection systems such as SNORT. We present the Split-AC algorithm, which is a reconfigurable variation of the AC algorithm that exploits domain-specific characteristics of intrusion detection to reduce considerably the FSM memory requirements.
Abstract Current and future computing systems increasingly require that their functionality stays... more Abstract Current and future computing systems increasingly require that their functionality stays flexible after the system is operational, in order to cope with changing user requirements and improvements in system features, ie changing protocols and data-coding standards, evolving demands for support of different user applications, and newly emerging applications in communication, computing and consumer electronics.
Uploads
Papers by Dionisios Pnevmatikatos