Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2004, Proceedings of the 41st annual conference on Design automation - DAC '04
…
6 pages
1 file
Architectural simulation has achieved a prominent role in the system design cycle by providing designers the ability to quickly examine a wide variety of design choices. However, the recent trend in system design toward architectures that react to circuit-level phenomena has outstripped the capabilities of traditional cycle-based architectural simulators. In this paper, we present an architectural simulator design that incorporates a circuit modeling capability, permitting architectural-level simulations that react to circuit characteristics (such as latency, energy, or current draw) on a cycle-by-cycle basis. While these additional capabilities slow simulation speed, we show that the careful application of circuit simulation optimizations and simulation sampling techniques permit high levels of detail with sufficient speed to examine entire workloads.
Transactions of The Society for Modeling and Simulation International, 2004
Designing a high performance microprocessor is extremely time-consuming taking at least several years. An important part of this design flow are the architectural simulations which define the microarchitecture or the internal organization of the microprocessor. The reason why these simulations are so time-consuming is fourfold: (i) the architectural design space is huge, (ii) the number of benchmarks the microarchitecture needs to be evaluated with, is large, (iii) the number of instructions that need to be simulated per benchmark is huge as well, and (iv) simulators are becoming relatively slower due to the increasingly complex designs of current high performance microprocessors. In this paper, we extensively discuss these issues and for each of them, we propose a solution. As such, we present a new simulation methodology for designing high performance microprocessors. This is done by combining several recently proposed techniques, such as statistical simulation, representative workload design, trace sampling and reduced input sets. The major contribution of this paper is to present a holistic view on speeding up the architectural design phase in which the above mentioned techniques are integrated in one single architectural design framework. In this methodology, we first identify a region of interest in the huge design space through statistical simulation. Subsequently, this region is further explored using detailed simulations. Fortunately, these slow simulations can be sped up: (i) by selecting a limited but representative workload, (ii) by applying trace sampling and reduced input sets to limit the simulation time per benchmark, and (iii) by optimizing the architectural simulators. As such, we can conclude that this methodology can reduce the total simulation time considerably. In addition to presenting this new architectural modeling and simulation approach, we present a survey of related work of this important and fast growing research field.
2005
M-Sim is a multi-threaded microarchitectural simulation environment with a detailed cycle-accurate model for the key pipeline structures. M-Sim extends the SimpleScalar 3.0d toolset with accurate models of the pipeline structures, including explicit register renaming, and support for the concurrent execution of multiple threads according to the Simultaneous Multithreading (SMT) model. For power estimations, M-Sim includes the Wattch framework as applied to SimpleScalar. This technical report provides an overview of M-Sim, including a detailed description of the simulated processor as well as instructions for the installation and use of the M-Sim environment. The description is focused only on the changes made with respect to the Simplescalar.
IEEE Transactions on Computers, 2020
Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared to homogeneous architectures. They can be further tailored to a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this potential is contingent upon optimizing the SoC for the target domain and utilizing its resources effectively at runtime. To this end, system-level design-including scheduling, power-thermal management algorithms and design space exploration studies-plays a crucial role. This paper presents a system-level domain-specific SoC simulation (DS3) framework to address this need. DS3 enables both design space exploration and dynamic resource management for power-performance optimization of domain applications. We showcase DS3 using six real-world applications from wireless communications and radar processing domain. DS3, as well as the reference applications, is shared as open-source software to stimulate research in this area.
Proceedings of the 2003 Acm Sigmetrics International Conference on Measurement and Modeling of Computer Systems, 2003
While architecture simulation is often treated as a methodology issue, it is at the core of most processor architecture research works, and simulation speed is often the bottleneck of the typical trialand-error research process. To speedup simulation during this research process and get trends faster, researchers usually reduce the trace size. More sophisticated techniques like trace sampling or distributed simulation are scarcely used because they are considered unreliable and complex due to their impact on accuracy and the associated warm-up issues. In this article, we present DiST, a practical distributed simulation scheme where, unlike in other simulation techniques that trade accuracy for speed, the user is relieved from most accuracy issues thanks to an automatic and dynamic mechanism for adjusting the warm-up interval size. Moreover, the mechanism is designed so as to always privilege accuracy over speedup. The speedup scales with the amount of available computing resources, bringing an average 7.35 speedup on 10 machines with an average IPC error of 1.81% and a maximum IPC error of 5.06%. Besides proposing a solution to the warm-up issues in distributed simulation, we experimentally show that our technique is significantly more accurate than trace size reduction or trace sampling for identical speedups. We also show that not only the error always remains small for IPC and other metrics, but that a researcher can reliably base research decisions on DiST simulation results. Finally, we explain how the DiST tool is designed to be easily pluggable into existing architecture simulators with very few modifications.
VLSI Design, 2011
Today's System-on-Chips (SoCs) design is extremely challenging because it involves complicated design tradeoffs and heterogeneous design expertise. To explore the large solution space, system architects have to rely on system-level simulators to identify an optimized SoC architecture. In this paper, we propose a system-level simulation framework, System Performance Simulation Implementation Mechanism, or SPSIM. Based on SystemC TLM2.0, the framework consists of an executable SoC model, a simulation tool chain, and a modeling methodology. Compared with the large body of existing research in this area, this work is aimed at delivering a high simulation throughput and, at the same time, guaranteeing a high accuracy on real industrial applications. Integrating the leading TLM techniques, our simulator can attain a simulation speed that is not slower than that of the hardware execution by a factor of 35 on a set of real-world applications. SPSIM incorporates effective timing models, ...
2004
Simulators for digital systems operate at a variety of levels of abstraction varying from detailed analog and switch level modeling of the transistor to cycle based descriptions of entire systems. We propose an even higher level simulator, called ARCS, based on the abstraction of an asynchronous communication event rather than of a clock cycle. Modeling systems at this level allows architectural level exploration of the design space before cycle-level details are available, and also allows the same framework to be used to refine architectural level simulations into more detailed simulations with increasingly fine grained notions of timing. The ARCS simulation framework uses concurrently operating threads in Java with communicating sequential processes (CSP) semantics as a natural expression of communication between concurrent hardware. To avoid synchronization bottlenecks ARCS models time using a communication driven clockwork model which allows for both user configurable runtime viewing of the simulation and post processing of complete simulation timing data.
Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 1996
Simulation is a powerful tool f o r studying behavior of novel architectures and improving their performance. Howevel; the time, effort and resources invested in developing a reliable simulator with the required level of detail may become prohibitively large. In this papel; we present a simulation platform specifically designed to simulate the class of multithreaded architectures. The most important features of this simulator are its jlexibility and ease of use. The simulation model provides the user with a wide range of design criteria, architectural parameters and workload characteristics. The simulation platform includes several tools, such as: an experiment planner, an interface to Matlab f o r processing and displaying results, and an intelface to PVM for the execution of independent experiments in parallel. The simulation model is validated by comparison of analytical and experimental results.
ACM Transactions on …, 2006
In digital hardware system design, the quality of the product is directly related to the number of meaningful design alternatives properly considered. Unfortunately, existing modeling methodologies and tools have properties which make them less than ideal for rapid and accurate designspace exploration. This article identifies and evaluates the shortcomings of existing methods to motivate the Liberty Simulation Environment (LSE). LSE is a high-level modeling tool engineered to address these limitations, allowing for the rapid construction of accurate high-level simulation models. LSE simplifies model specification with low-overhead component-based reuse techniques and an abstraction for timing control. As part of a detailed description of LSE, this article presents these features, their impact on model specification effort, their implementation, and optimizations created to mitigate their otherwise deleterious impact on simulator execution performance.
The Computer Journal, 2012
In this paper, we describe the integrated power, area and thermal modeling framework in the structural simulation toolkit (SST) for large-scale high performance computer simulation. It integrates various power and thermal modeling tools and computes run-time energy dissipation for core, network on chip, memory controller and shared cache. It also provides functionality to update the leakage power as temperature changes. We illustrate the utilization of the framework by applying it to explore interconnect options in manycore systems with consideration of temperature variation and leakage feedback. We compare power, energy-delay-area product (EDAP) and energy-delay product (EDP) of four manycore configurations-1 core, 2 cores, 4 cores and 8 cores per cluster. Results from simulation with or without consideration of temperature variation both show that the 4-core per cluster configuration has the best EDAP and EDP. Even so, considering that temperature variation increases total power dissipation, we demonstrate the importance of considering temperature variation in the design flow. With this power, area and thermal modeling capability, the SST can be used for hardware/software co-design of future exascale systems.
2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, 2009
Accelerating micro-architecture simulation is becoming increasingly urgent as the complexity of workload and simulated processor increases. This paper presents a novel twostage sampling (TSS) scheme to accelerate the sampling-based simulation. It firstly selects some large samples from a dynamic instruction stream as candidates of detail simulation and then samples some small groups from each selected first stage sample to do detail simulation. Since the distribution of standard deviation of cycle per instruction (CPI) is insensitive to microarchitecture, TSS could be used to speedup design space exploration by splitting the sampling process into two stages, which is able to remove redundant instruction samples from detail simulation when the program is in stable program phase (standard deviation of CPI is near zero). It also adopts systematic sampling to accelerate the functional warm-up in sampling simulation. Experimental results show that, by combining these two techniques, TSS achieves an average and maximum speedup of 1.3 and 2.29 over SMARTS, with the average CPI relative error is less than 3%. TSS could significantly accelerate the time consuming iterative early design evaluation process.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2018 28th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2018
Proceedings of the Fifth International Conference on Simulation Tools and Techniques, 2012
Integrated Circuits and …, 2003
ACM Transactions on Architecture and Code Optimization, 2012
International Journal of Computer Theory and Engineering
2007 IEEE/ACM International Conference on Computer-Aided Design, 2007
Workshop On Computer Architecture Education, 2006
Journal of Low Power …, 2012
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design - ISLPED '12, 2012
Proceedings 2000 International Conference on Computer Design
Proceedings of the 2006 …, 2006
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2000
IEEE Transactions on Parallel and Distributed Systems, 2017