Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2016, Computer
lthough multicore technology has many benefits for real-time systems-among them, decreased weight and power, fewer cooling requirements, and increased CPU bandwidth per processor-multicore chips pose problems that stem from the cores interfering with one another when accessing shared resources. Interference is compounded in real-time systems, which are based on the assumption that worst-case execution time (WCET) is constant; that is, a software task's measured WCET must be the same whether that task executes alone or with other tasks. This assumption holds for single-core chips, but not for multicore chips unless they have isolation mechanisms between cores. Measurements we performed on a commercial multicore platform (Freescale P4080) revealed that a task's WCET can increase by as much as 600 percent when a task on one core runs with logically independent tasks in other cores. Because of the potential for large and random delay spikes, the US Federal Aviation Administration (FAA), European Aviation Safety Agency (EASA), and Transport Canada specify that only single-core chips can be used unless intercore interference is specifically defined and handled. 1 Indeed, DO-178C: Software Considerations in Airborne Systems and Equipment Certification, the primary document by which certification authorities such as the FAA, the EASA, and Transport Canada approve all commercial software-based aerospace systems, was developed for the certification of software in single-core computers. 2 With a single-core chip, architects can assume a constant WCET and can thus schedule tasks and partition resources without unanticipated delays. Hence, the ideal solution is to certifiably bound intercore interference in a multicore chip such that each core can be used as a single-core computer. As part of studying the feasibility of such a solution, we developed the Single-Core Equivalent (SCE) technology package, which addresses interference problems that arise when cores concurrently access DRAM, the memory bus, shared cache resources, I/O resources, and the on-chip network. With SCE, each core can be used as if it were a single-core chip, allowing the timing analysis and certification of software in a core independently of software in other cores. This has implications for avionics Architects of multicore chips for avionics must define and bound intercore interference, which requires assuming a constant worstcase execution time for tasks executing on the chip. With the Single Core Equivalent technology package, engineers can treat each core as if it were a single-core chip.
2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2016
Many-core processors offer massively parallel computation power representing a good opportunity for the design of highly integrated avionics systems. Such designs must face several challenges among which 1) temporal isolation must be ensured between applications and 2) bounds of WCET must be computed for real-time safety critical applications. In order to partially address those issues, we propose an appropriate execution model, that restricts the applications behaviours, which has been implemented on the KALRAY MPPA R-256. We tested the correctness of the approach through a series of benchmarks and the implementation of a case study.
For the last decades, industries from the safetycritical domain have been using Commercial Off-The-Shelf (COTS) architectures despite their inherent runtime variability. To guarantee hard real-time constraints in such systems, designers massively relied on resource over-provisioning and disabling the features responsible for runtime variability.
HAL (Le Centre pour la Communication Scientifique Directe), 2018
We introduce a unified wcet analysis and scheduling framework for real-time applications deployed on multicore architectures. Our method does not follow a particular programming model, meaning that any piece of existing code (in particular legacy) can be re-used, and aims at reducing automatically the worst-case number of timing interferences between tasks. Our method is based on the notion of Time Interest Points (tips), which are instructions that can generate and/or suffer from timing interferences. We show how such points can be extracted from the binary code of applications and selected prior to performing the wcet analysis. We then represent real-time tasks as sequences of time intervals separated by tips, and schedule those tasks so that the overall makespan (including the potential timing penalties incurred by interferences) is minimized. This scheduling phase is performed using an Integer Linear Programming (ilp) solver. Preliminary results on state-of-the-art benchmarks show promising results and pave the way for future extensions of the model and optimizations.
2017 IEEE Real-Time Systems Symposium (RTSS), 2017
In this paper we outline Design Space Exploration methodology aimed at homogeneous multi-core architectures, where the safety-criticality is the crux of a system design. Multi-core architectures provide better computational abilities, but at the same time complicate the computation of timing bounds. Determining suitable architectures that achieve timing requirements is an important aspect for a system designer. The proposed work conceptualizes ways to automate and explore different design facets of a multi-core processor. The intention is to ensure that the particular application meets its deadlines, while optimizing other objectives such as minimizing hardware costs, energy consumption and floor area. The automated exploration builds upon Mulitcore Response Time Analysis for timing verification and multicube for heuristic search methods. The aim is to generate an architecture design in the end that can be used directly to build a custom application specific processor.
In modern safety-related application domains, the shift from unicore to multicore processors is becoming inevitable to keep pace with the growing importance of computational capacity and to satisfy the functional consolidation trend while decreasing energy consumption and thermal hotspots. Nevertheless, typical multicore processors are mostly intended to enhance the system performance, whereas safety-critical systems (SCS) have very different demands in terms of safety, reliability, quality of service, predictability and timing correctness. Hence, the move towards multicore processors imposes many significant challenges the computing industry has to tackle. These challenges are involved in designing of certifiable multicore architectures, the organization of common resources and assimilation of concurrent software. Hence, these are encountered at all phases of the specification, design, development, testing, and certification processes. Hence, both multicore industrialists and the real-time community have to fill the gap to meet the requirements enforced by SCS. The objective of this paper is to initiate such a discussion as an effort to fill the gap between the two domains and to substantially increase the cognizance of the obstacles and issues that need to be handled in the safety-critical domain.
2020 9th Mediterranean Conference on Embedded Computing (MECO), 2020
Multicore processors provide great average case performance. However, the use of multicore processors for safety-critical applications can lead to catastrophic consequences because of contention on shared resources. The problem has been well-studied in literature and solutions such as partitioning of shared resources have been proposed. Strict partitioning of memory resources among cores, however, does not allow intercore communication. In this paper, we propose Communication Core Model (CCM) that implements the inter-core communication by bounding the amount of intercore interference in a partitioned multi-core system. A system-level perspective of how to realize such CCM along with the implementation details is provided. We compare our proposed CCM with Contention-based Communication (CBC) model where no private banking is enforced for any core. For evaluation, we consider San Diego vision benchmark suite (SD-VBS). The results of the evaluation show that the CCM offers 56 percent improvement in worst case execution time (WCET) when compared with CBC. 2020 9 th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO),
ACM Transactions on Architecture and Code Optimization, 2012
Commercial Off-The-Shelf (COTS) processors are now commonly used in real-time embedded systems. The characteristics of these processors fulfill system requirements in terms of time-to-market, low cost, and high performance-per-watt ratio. However, multithreaded (MT) processors are still not widely used in real-time systems because the timing analysis is too complex. In MT processors, simultaneously-running tasks share and compete for processor resources, so the timing analysis has to estimate the possible impact that the inter-task interferences have on the execution time of the applications.
Journal of Systems Architecture, 2021
Multicore processors provide great average-case performance. However, the use of multicore processors for safetycritical applications can lead to catastrophic consequences because of contention on shared resources. The problem has been well-studied in literature, and solutions such as partitioning of shared resources have been proposed. Strict partitioning of memory resources among cores, however, does not allow intercore communication. This paper proposes a Communication Core Model (CCM) that implements the inter-core communication by bounding the amount of intercore interference in a partitioned multicore system. A system-level perspective of how to realize such CCM along with the implementation details is provided. A formula to derive the WCET of the tasks using CCM is provided. We compare our proposed CCM with Contention-based Communication (CBC), where no private banking is enforced for any core. The analytical approach results using San Diego Vision Benchmark Suite (SD-VBS) for two models indicate that the CCM shows an improvement of up to 65 percent compared to the CBC. Moreover, our experimental results indicate that the measured WCET using SD-VBS is within the bounds calculated using the proposed analysis.
IEEE Access, 2020
Multicore processors are gaining popularity in various domains because of their potential for maximizing system throughput of average-case tasks. In real-time systems, where processes and tasks are governed by stringent temporal constraints, the worst-case timings should be considered, and migration to multicore processors leads to additional difficulties. Resource sharing between the cores introduces timing overheads, which affect the worst-case timings and schedulability of the entire system. In this article, we provide new insights into the performance of the real-time extensions of Linux, namely, Xenomai and RT-Preempt, for a homogeneous multicore processor. First, complete details on leveraging both real-time extensions are presented. We identify various multicore deployments and discuss their trade-offs, as established through the experimental evaluation of the scheduling latency. Then, we propose a statistical method based on a variation of chi-square test to determine the best multicore deployment. The unexpected effects of interfering loads, such as CPU, memory, and network operations, on the real-time performance, are considered. Feasibility of the best multicore deployment is verified through the analysis of its periodicity and deterministic response times in a pre-emptive multitasking environment. This research is the first of its kind and will serve as a useful guideline for developing real-time applications on multicore processors.
2015 27th Euromicro Conference on Real-Time Systems, 2015
Multi-core platforms represent the answer of the industry to the increasing demand for computational capabilities. From a real-time perspective, however, the inherent sharing of resources, such as memory subsystem and I/O channels, creates inter-core timing interference among critical tasks and applications deployed on different cores. As a result, modular per-core certification cannot be performed, meaning that: (1) current industrial engineering processes cannot be reused; (2) software developed and certified for single-core chips cannot be deployed on multi-core platforms as is. In this work, we propose the Single Core Equivalence (SCE) technology: a framework of OS-level techniques designed for commercial (COTS) architectures that exports a set of equivalent single-core virtual machines from a multi-core platform. This allows per-core schedulability results to be calculated in isolation and to hold when multiple cores of the system run in parallel. Thus, SCE allows each core of a multi-core chip to be considered as a conventional single-core chip, ultimately enabling industry to reuse existing software, schedulability analysis methodologies and engineering processes.
2014
The real-time systems community has over the years devoted considerable attention to the impact on execution timing that arises from contention on access to hardware shared resources. The relevance of this problem has been accentuated with the arrival of multicore processors. From the state of the art on the subject, there appears to be considerable diversity in the understanding of the problem and in the "approach" to solve it. This sparseness makes it difficult for any reader to form a coherent picture of the problem and solution space. This paper draws a tentative taxonomy in which each known approach to the problem can be categorised based on its specific goals and assumptions.
2015
The use of multicore processors in general-purpose real-time embedded systems has experienced a huge increase in the recent years. Unfortunately, critical applications are not benefiting from this type of processors as one could expect. The major obstacle is that we may not predict and provide any guarantee on real-time properties of software running on such platforms. The shared memory bus is among the most critical resources, which severely degrades the timing predictability of multicore software due to the access contention between cores. To counteract this problem, we present in this paper a new approach that supports mixed-criticality workload execution in a multicore processor-based embedded system. It allows any number of cores to run less-critical tasks concurrently with the critical core, which is running the critical task. The approach is based on the use of a dedicated Deadline Enforcement Checker (DEC) implemented in hardware, which allows the execution of any number of ...
Multicore technology has been heralded as one of the course-changing computing technologies, providing new levels of energy-ecient performance, enabled by advanced parallel processing and miniaturization techniques. This is evident by the fact that every leading chip designer has a multicore processor as a part of its product oerings and also by witnessing the proliferation of this technology across the entire range of embedded devices. Real-time embedded systems are no exception to this trend either. By denition, a key requirement for real-time embedded systems is to be able to deliver their functional behaviour within specic time bounds. However, while the computational capabilities of multicores are indisputable, they must be assessed for their predictability before employing them to host real-time applications which have strict timing requirements. While the study of timing analysis for uniprocessors is in its mature stages, given the decades of research dedicated to it, the timing analysis in the domain of multicores is still in its nascent stages. The broader focus of this thesis is to address the timing analysis challenge in multicores: specically on determining the impact of shared resources like the shared bus (or NoC's in many-core systems) on the execution time of the real-time tasks, when deployed on these multicores. To elaborate, in typical implementation of multicore systems, multiple cores access the main memory via a shared channel (like the front side bus). This often leads to contention on this shared channel, which results in an increase of the execution time and the response time of the tasks. Computing the upper bounds on these timing parameters is a vital prerequisite for the deployment of real-time tasks on these multicores and is an relatively new area of research. The work in this thesis aims at meeting this objective of providing and validating methods for the timing analysis of applications executed on multicore and many-core platforms which inherently do not guarantee predictability. The main contributions include proposing a model to derive the memory prole of tasks and the memory request prole of a core for a given time interval. This is extended further to propose a general framework to model the availability of the shared bus, using the memory prole of the analyzed task in ner granularity and to be able to deal with dierent bus arbitration mechanisms. This work has also been extended to the realm of the Many-Core systems, by proposing a method to derive the worst-case traversal time for a mesh-based interconnect network. The thesis also delves into memory controller analysis and as an interesting case study provides temporal analysis of Phase change memory based multicore systems, which unlike DRAM based systems, have noticeably dierent read and write latencies. i ii Feeling gratitude and not expressing it is like wrapping a present and not giving it.
2020
Multi-task/parallel software design methods for critical embedded systems often enforce space and/or time isolation properties, which constrain resource sharing to facilitate the design process. For instance, ensuring that computing cores only interfere with each other during dedicated communication phases largely simplifies the timing analysis of parallel code. The downside of isolation is efficiency loss – longer latencies, smaller throughput, increased memory use. We focus on hard real-time applications (or parts thereof) inside which isolation is not a requirement. We show that, in this case, fine-grain parallelism can be more efficiently exploited without isolation, while still providing the levels of safety and hard real-time guarantees required in critical industrial applications. Resulting implementations allow multiple computations and communications to take place at the same time, provided that interferences can be controlled (which is possible on timing compositional plat...
Altreonic is pleased to release a new publication in its Gödel Series, entitled: "QoS and Real Time Requirements for Embedded Many- and Multicore Systems". While the first part is mainly a short summary on real-time scheduling, mostly Rate Monotonic Scheduling and Priority Inheritance support, it already establishes the jump to distributed real-time scheduling as supported in OpenComRTOS. The second part takes a closer look at modern advanced many/multi-core architectures, interrupt latency and inter-core communication measurements and makes the argument that the sharing of the on-chip resources, including the caches, makes it very hard to predict the temporal properties of an application. Rather than rejecting such advanced architectures, the argument is made to adapt the programming model to be able to handle the stochastic spread rather than trying to control it, even if a good design will try to minimise the spread. Lastly, the bridge is made from Real-Time scheduling ...
2017 12th IEEE International Symposium on Industrial Embedded Systems (SIES), 2017
The use of Commercial Off-The-Shelf (COTS) multicores in real-time industry is on the rise due to multicores' potential performance increase and energy reduction. Yet, the unpredictable impact on timing of contention in shared hardware resources challenges certification. Furthermore, most safety certification standards target single-core architectures and do not provide explicit guidance for multicore processors. Recently, however, CAST-32A has been presented providing guidance for software planning, development and verification in multicores. In this paper, from a theoretical level, we provide a detailed review of CAST-32A objectives and the difficulty of reaching them under current COTS multicore design trends; at experimental level, we assess the difficulties of the application of CAST-32A to a real multicore processor, the NXP P4080.
In order to get accurate performance predictions, design-time architectural analysis of multicore embedded systems has to consider communication overhead. When communicating tasks execute on the same core, the communication typically happens through the local cache. On the other hand, when they run on separate cores, the communication has to go through the shared memory. As the shared memory has a significantly larger latency than the local cache, we expect a significant difference between intra-core and inter-core task communication. In this paper, we present a series of experiments we ran to identify the size of this difference, and discuss its impact on architectural analysis of multicore embedded systems. In particular, we show that the impact of the difference is much lower than anticipated.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.