Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2023
With the end of Moore's law and the breakdown of Dennard scaling, multicore processors are the standard way to continue improving performance while reducing Size, Weight and Power (SWaP). However, this performance is typically achieved at the cost of repeatability and predictability. Precision-timed (PRET) architectures have been shown to deliver high performance without sacrificing predictability. In this paper, we introduce InterPRET: an architecture consisting of FlexPRET cores interconnected via the S4NOC network-on-chip. Both the processor cores and the network-onchip are time-predictable, yielding an end-to-end time-predictable architecture suitable for real-time systems. CCS CONCEPTS • Computer systems organization → Multicore architectures; Real-time system architecture.
21st International Conference on VLSI Design (VLSID 2008), 2008
Worst-case execution time (WCET) analysis and, in general, the predictability of real-time applications implemented on multiprocessor systems has been addressed only in very restrictive and particular contexts. One important aspect that makes the analysis difficult is the estimation of the system's communication behavior. The traffic on the bus does not solely originate from data transfers due to data dependencies between tasks, but is also affected by memory transfers as result of cache misses. As opposed to the analysis performed for a single processor system, where the cache miss penalty is constant, in a multiprocessor system each cache miss has a variable penalty, depending on the bus contention. This affects the tasks' WCET which, however, is needed in order to perform system scheduling. At the same time, the WCET depends on the system schedule due to the bus interference. In this context, we propose, for the first time, an approach to worst-case execution time analysis and system scheduling for real-time applications implemented on multiprocessor SoC architectures.
The demand for predictable timing behavior is characteristic for real-time applications. Experience has shown that this property cannot be achieved by software alone but rather requires support from the processor. This situation is analyzed and mapped to a design rationale for SPEAR (Scalable Processor for Embedded Applications in Real-time Environments), a processor that has been designed to meet the specific temporal demands of real-time systems. At the hardware level, SPEAR guarantees interrupt response with minimum temporal jitter and minimum delay. Furthermore, the processor provides an instruction set that only has constant-time instructions. At the software level, SPEAR supports the implementation of temporally predictable code according to the single-path programming paradigm. Altogether, these features support writing of code with minimal jitter and provide the basis for exact temporal predictability. Experimental results show that SPEAR indeed exhibits the anticipated high...
2014
The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. Many recent HPC applications require huge amounts of information to be processed within a bounded amount of time while EC systems are increasingly concerned with providing higher performance in real-time. The convergence of these two domains towards systems requiring both high performance and a predictable time-behavior challenges the capabilities of current hardware architectures. Fortunately, the advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictability and high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. However, addressing this mixed set of requirements is not without its own challenges and it is now of paramount importance to develop new techniques to exploit the massively parallel computation capabilities of many-core platforms in a predictable way.
2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), 2018
Multi-core processors for real-time systems need to have a time-predictable way of communicating. The use of a single, external shared memory is the standard for multicore processor communication. However, this solution is hardly time predictable. This paper presents a time-predictable solution for communication between cores, a distributed shared memory using a network-on-chip. The network-on-chip supports reading and writing data to and from distributed on-chip memory. This paper covers the implementation of time-predictable read requests on a network-on-chip. The network is implemented using statically scheduled, time-division multiplexing, enabling predictions for worst-case execution time. The implementation attempts to keep buffering as low as possible to obtain a small footprint. The solution has been implemented and successfully synthesized with a multi-core system on an FPGA. Finally, we show resource and performance measurements.
Embedded Systems Week 2008 - Proceedings of the 2008 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES'08, 2008
In a hard real-time embedded system, the time at which a result is computed is as important as the result itself. Modern processors go to extreme lengths to ensure their function is predictable, but have abandoned predictable timing in favor of average-case performance. Real-time operating systems provide timing-aware scheduling policies, but without precise worst-case execution time bounds they cannot provide guarantees.
IEEE Access, 2020
Multicore processors are gaining popularity in various domains because of their potential for maximizing system throughput of average-case tasks. In real-time systems, where processes and tasks are governed by stringent temporal constraints, the worst-case timings should be considered, and migration to multicore processors leads to additional difficulties. Resource sharing between the cores introduces timing overheads, which affect the worst-case timings and schedulability of the entire system. In this article, we provide new insights into the performance of the real-time extensions of Linux, namely, Xenomai and RT-Preempt, for a homogeneous multicore processor. First, complete details on leveraging both real-time extensions are presented. We identify various multicore deployments and discuss their trade-offs, as established through the experimental evaluation of the scheduling latency. Then, we propose a statistical method based on a variation of chi-square test to determine the best multicore deployment. The unexpected effects of interfering loads, such as CPU, memory, and network operations, on the real-time performance, are considered. Feasibility of the best multicore deployment is verified through the analysis of its periodicity and deterministic response times in a pre-emptive multitasking environment. This research is the first of its kind and will serve as a useful guideline for developing real-time applications on multicore processors.
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014, 2014
The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. We illustrate how this problem has been addressed by suitably designing the architecture, implementation, and programming model, of the Kalray MPPA R-256 single-chip many-core processor. The MPPA R-256 (Multi-Purpose Processing Array) processor integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These VLIW cores are distributed across 16 compute clusters and 4 I/O subsystems, each with a locally shared memory. Onchip communication and synchronization are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem. Off-chip interfaces include DDR, PCI and Ethernet, and a direct access to the NoC for low-latency processing of data streams. The key architectural features that support time-critical applications are timing compositional cores, independent memory banks inside the compute clusters, and the data NoC whose guaranteed services are determined by network calculus. The programming model provides communicators that effectively support distributed computing primitives such as remote writes, barrier synchronizations, active messages, and communication by sampling. POSIX time functions expose synchronous clocks inside compute clusters and mesosynchronous clocks across the MPPA R-256 processor.
High-Performance and Time-Predictable Embedded Computing, 2018
This chapter motivates the use of the OpenMP (Open Multi-Processing) parallel programming model to develop future critical real-time embedded systems, and analyzes the time-predictable properties of the OpenMP tasking model. Moreover, this chapter presents the set of compiler techniques needed to extract the timing information of an OpenMP program in the form of an OpenMP Direct Acyclic Graph or OpenMP-DAG. Considering the vast amount of parallel programming models available, there is a noticeable need to unify programming models to exploit the performance benefits of parallel and heterogeneous architectures [9]. In that sense, OpenMP has proved many advantages over its competitors to enhance productivity. The next sections introduce the main characteristics of the most relevant programming models, and conclude with an analysis of the main benefits of OpenMP.
Lecture Notes in Computer Science
Simultaneous multithreading (SMT) processors might be good candidates to fulfill the ever increasing performance requirements of embedded applications. However, state-of-the-art SMT architectures do not exhibit enough timing predictability to allow a static analysis of Worst-Case Execution Times. In this paper, we analyze the predictability of various policies implemented in SMT cores to control the sharing of resources by concurrent threads. Then, we propose an SMT architecture designed to run one hard real-time thread so that its execution time is analyzable even when other (non critical) threads are executed concurrently. Experimental results show that this architecture still provides high mean and worst-case performance.
Computer, 2016
lthough multicore technology has many benefits for real-time systems-among them, decreased weight and power, fewer cooling requirements, and increased CPU bandwidth per processor-multicore chips pose problems that stem from the cores interfering with one another when accessing shared resources. Interference is compounded in real-time systems, which are based on the assumption that worst-case execution time (WCET) is constant; that is, a software task's measured WCET must be the same whether that task executes alone or with other tasks. This assumption holds for single-core chips, but not for multicore chips unless they have isolation mechanisms between cores. Measurements we performed on a commercial multicore platform (Freescale P4080) revealed that a task's WCET can increase by as much as 600 percent when a task on one core runs with logically independent tasks in other cores. Because of the potential for large and random delay spikes, the US Federal Aviation Administration (FAA), European Aviation Safety Agency (EASA), and Transport Canada specify that only single-core chips can be used unless intercore interference is specifically defined and handled. 1 Indeed, DO-178C: Software Considerations in Airborne Systems and Equipment Certification, the primary document by which certification authorities such as the FAA, the EASA, and Transport Canada approve all commercial software-based aerospace systems, was developed for the certification of software in single-core computers. 2 With a single-core chip, architects can assume a constant WCET and can thus schedule tasks and partition resources without unanticipated delays. Hence, the ideal solution is to certifiably bound intercore interference in a multicore chip such that each core can be used as a single-core computer. As part of studying the feasibility of such a solution, we developed the Single-Core Equivalent (SCE) technology package, which addresses interference problems that arise when cores concurrently access DRAM, the memory bus, shared cache resources, I/O resources, and the on-chip network. With SCE, each core can be used as if it were a single-core chip, allowing the timing analysis and certification of software in a core independently of software in other cores. This has implications for avionics Architects of multicore chips for avionics must define and bound intercore interference, which requires assuming a constant worstcase execution time for tasks executing on the chip. With the Single Core Equivalent technology package, engineers can treat each core as if it were a single-core chip.
Microprocessors and Microsystems, 2019
Multi-core processors for real-time systems need to have a time-predictable way of communicating. The use of a single, external shared memory is the standard for multicore processor communication. However, this solution is hardly time-predictable. This paper presents a time-predictable solution for communication between cores, a distributed shared memory using a network-on-chip. The network-on-chip supports reading and writing data to and from distributed on-chip memory. This paper covers the implementation of time-predictable read requests on a network-on-chip. The network is implemented using static schedules, and time-division multiplexing, enabling predictions for worst-case execution time. The implementation attempts to keep buffering as low as possible to obtain a small footprint. The solution has been implemented and successfully synthesized with a multi-core system on an FPGA. Finally, we show resource and performance measurements.
Journal of Systems Architecture, 2015
DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User Agreement:
2012
Guaranteeing time-predictable execution in real-time systems involves the management of not only processors but also other supportive components such as cache memory, network on chip (NoC), memory controllers. These three components are designed to improve the system computational throughput through either bringing data closer to the processors (e.g cache memory) or maximizing concurrency in moving data inside the systems (e.g. NoC and memory controllers). We observe that these components can be sources of significant unpredictability in task executions if they are not operated in a deterministic manner. In particular, our analysis and experiments in [6, 35] show that with the standard cache and memory controller sharing mechanism, the execution time of a task may be unpredictably extended up to 33 to 44% in a single-core processor. We also show that analysis techniques and scheduling algorithms that have been proposed to account for and/or to mitigate this unpredictability often do not adequately address the problem at hand. As the consequence, those techniques and algorithms can only guarantee real-time execution in systems with under-utilized shared resources. In this dissertation, we study the software and hardware infrastructure, optimization techniques and scheduling algorithms that guarantee predictable execution in real-time systems that use cache memory, network on chip (NoC), and memory controllers. The main challenge is how to guarantee system predictability in such a way that maximizes the benefits and the utilization of these components. We achieve that by carefully analyzing both theoretical and practical assumptions in the use of these components and deriving novel solutions based on this understanding. For cache memory, we propose the use of softwarebased cache partitioning techniques and a real-time optimization method to minimize the system real-time utilization. The proposed solution renders better performance because of its fully utilization of available cache area. For NoC scheduling, we proposed novel scheduling algorithms that are designed to cope directly with the unique assumption on resource sharing in NoC. As will be shown, in practical systems, these scheduling algorithms can achieve near optimal performance.
This work discusses the adaptation of NoCs to real-time requirements, in particular with respect to the fulfillment of task deadlines. It is shown that, for soft real-time systems, the number of missed deadlines can be substantially reduced by the utilization of a routing mechanism based on message priorities. A core placement strategy based on message bandwidth requirements and also on message priorities can also reduce the number of missed deadlines. The paper also discusses the impact of these strategies on the energy consumption of the system and shows that an interesting design space can be explored.
www-users.cs.york.ac.uk
This paper describes the Advanced Real-time Processor Architecture (ARPA) project. The goal of this work is to create an open-source System-on-Chip model for real-time applications. The main component of the SoC is a MIPS32based RISC processor. It is implemented using a pipelined simultaneous multithreading structure that supports the execution of more than one thread or task at a time. The synergy between pipelining and simultaneous multithreading allows combining the exploration of Instruction Level Parallelism and Task Level Parallelism, hide the context switching time and reduce the need of employing complex speculative execution techniques to improve the performance of the pipelined processor. A fundamental component of the ARPA SoC is the Operating System Coprocessor, which implements in hardware some of the operating systems functions, such as task scheduling, switching, communication and timing. The proposed architecture allows building high performance, time predictable and power efficient processors optimized for embedded real-time systems.
2011
La consulta d'aquesta tesi queda condicionada a l'acceptació de les següents condicions d'ús: La difusió d'aquesta tesi per mitjà del servei TDX (www.tesisenxarxa.net) ha estat autoritzada pels titulars dels drets de propietat intel•lectual únicament per a usos privats emmarcats en activitats d'investigació i docència. No s'autoritza la seva reproducció amb finalitats de lucre ni la seva difusió i posada a disposició des d'un lloc aliè al servei TDX. No s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant al resum de presentació de la tesi com als seus continguts. En la utilització o cita de parts de la tesi és obligat indicar el nom de la persona autora. ADVERTENCIA. La consulta de esta tesis queda condicionada a la aceptación de las siguientes condiciones de uso: La difusión de esta tesis por medio del servicio TDR (www.tesisenred.net) ha sido autorizada por los titulares de los derechos de propiedad intelectual únicamente para usos privados enmarcados en actividades de investigación y docencia. No se autoriza su reproducción This thesis also investigates a solution to verify the timing correctness of HRTs without requiring any modification in the core design: we design a hardware unit which is interfaced with the processor and integrated into a functional-safety aware methodology. This unit monitors the execution time of a block of instructions and it detects if it exceeds the WCET. Concretely, we show how to handle timing faults on a real industrial automotive platform. X 8 Conclusions 8.1 Thesis Conclusions .
Designs, 2019
The static resource allocation in time-triggered systems offers significant benefits for the safety arguments of dependable systems. However, adaptation is a key factor for energy efficiency and fault recovery in Cyber-Physical System (CPS). This paper introduces the Adaptive Time-Triggered Multi-Core Architecture (ATMA), which supports adaptation using multi-schedule graphs while preserving the key properties of time-triggered systems including implicit synchronization, temporal predictability and avoidance of resource conflicts. ATMA is an overall architecture for safety-critical CPS based on a network-on-a-chip with building blocks for context agreement and adaptation. Context information is established in a globally consistent manner, providing the foundation for the temporally aligned switching of schedules in the network interfaces. A meta-scheduling algorithm computes schedule graphs and avoids state explosion with reconvergence horizons for events. For each tile, the relevan...
… Tech. Rep. UCB/EECS-2009-59 …, 2009
Game, multimedia, consumer and control applications demand low power and high performance computing platforms capable of providing real-time services. Multi-core architectures, supported by on-chip networks, are emerging as scalable solutions to fulfill these requirements. ...
Proc. of 8th International Conference on …, 2000
Time management is an important aspect of real-time computation. Traditional high performance processors provide little or no support for management of time. In this report, we propose a time-management unit which can greatly help improve the performance of a real-time system. The proposed unit can be added to any processor architecture without a ecting its performance. We also explain how the unit helps to solve the clock synchronization problems in a distributed real-time network.
2005
In Simultaneous Multithreaded (SMT) architectures most hardware resources are shared between threads. This provides a good cost/performance trade-off which renders these architectures suitable for use in embedded systems. However, since threads share many resources, they also interfere with each other. As a result, execution times of applications become highly unpredictable and dependent on the context in which an application is executed. Obviously, this poses problems if an SMT is to be used in a real-time system.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.