Sharon Hu

Followers

Following

Co-author

Public Views

Masoud Daneshtalab

KTH Royal Institute of Technology

Juan Carlos Infante Valencia

American International University-Bangladesh

Interests

Uploads

Papers by Sharon Hu

Meshed Bluetree: Time-Predictable Multimemory Interconnect for Multicore Architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020

Download

Fixed-priority scheduling and controller co-design for time-sensitive networks

Proceedings of the 39th International Conference on Computer-Aided Design, 2020

Time-sensitive networking (TSN) is a set of standardised communication protocols developed under ... more Time-sensitive networking (TSN) is a set of standardised communication protocols developed under the IEEE 802.1 working group. TSN aims to support deterministic communication based on network schedules that are distributively configured. It is widely considered as the future in-vehicle network solution for highly automated driving, where the requirement on timing guarantee is alongside the demand of high communication bandwidth. In this work, we study a setting of periodic control and non-control packets, with implicit and arbitrary deadlines, respectively. As the FIFO (first-in, first-out) queues in the 802.1Qbv switch incur long delay in the worst case, which prevents the control tasks from achieving short sampling periods and thus impedes control performance optimisation, we propose the first fixed-priority scheduling (FPS) approach for TSN by leveraging its gate control features. In this context, we develop a finer-grained frame-level response time analysis, which provides a tighter bound than the conventional packet-level analysis. Building upon FPS and the above analysis, we formulate a co-design optimisation problem to decide the sampling periods and poles of real-time controllers with settling time as the objective to minimise, whilst satisfying the schedulability constraint. CCS CONCEPTS • Computer systems organisation → Embedded and cyberphysical systems; Real-time systems; • Networks → Network protocols.

Download

FPGA-based simulation of 3D light propagation

2014 14th International Workshop on Cellular Nanoscale Networks and their Applications (CNNA), 2014

NanoMagnet logic

70th Device Research Conference, 2012

ABSTRACT We present recent results on implementing logic using physically- coupled nanomagnet arr... more ABSTRACT We present recent results on implementing logic using physically- coupled nanomagnet arrays. The binary state of a bit is represented by the magnetization state of a single-domain nanomagnet element, and logic is accomplished through direct physical interactions between them. We refer to this approach as nanomagnet logic (NML). We have demonstrated that NML satisfies the requirements for digital logic, and offers performance advantages, primarily low power and non-volatility, as a potential post-CMOS technology.

Reports of Conferences, Institutes, and Seminars

Serials Review, 2011

Reports on two programs from the Association for Library Collections &amp;amp;amp;amp;amp;amp... more Reports on two programs from the Association for Library Collections &amp;amp;amp;amp;amp;amp;amp;amp;amp; Technical Services:“Codified Innovations: Data Standards and Their Useful Applications” and “Electronic Resources Management from the Field,” and the Southampton Workshop on UK Institutional Open Access Repositories, all held in January 2005.

Cache-aware task scheduling for maximizing control performance

2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Embedded control applications are widely implemented on small, low-cost and resource-constrained ... more Embedded control applications are widely implemented on small, low-cost and resource-constrained microcontrollers, e.g., in the automotive domain. Conventionally, control algorithms are designed using model-based approaches, without considering the details of the implementation platform. This leads to inefficient utilization of the resources. With the emergence of the cyber-physical system (CPS)-oriented thinking, there has lately been a strong interest in co-design of control algorithms and their implementation platforms. Some recent efforts have shown that a schedule on multiple applications with more onchip cache reuse is able to improve the control performance. However, it has not been studied how the control performance can be maximized for a given schedule and how an optimal schedule can be computed. In this work, we propose a twostage framework to compute the schedule maximizing the overall control performance of all the applications. First, a holistic controller design taking all the sampling periods and sensingto-actuation delays in a schedule into account is presented, aiming to maximize the overall control performance. Second, a hybrid search algorithm for discrete decision space is reported to efficiently compute an optimal schedule. Experimental results on a case study with multiple automotive applications show that a significant improvement of 10-20% in control performance can be achieved by the proposed cache-aware scheduling approach.

Download

A Spin-Orbit Torque based Cellular Neural Network (CNN) Architecture

Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017

In this paper, we propose a differential Spin Hall Effect(SHE) assisted domain wall synapse, whic... more In this paper, we propose a differential Spin Hall Effect(SHE) assisted domain wall synapse, which can generate either positive or negative synaptic weighting values without the significant cost of multiple power supply voltages, supply rails, or computationally-intensive digital hardware. The architecture of the proposed synapse utilizes reading currents flowing through two oppositely-oriented devices as weighted by device conductance. The conductance is used to encode synaptic weight and programmed by domain wall position through writing current. The ability to set the current as positively or negatively weighted results in highly-configurable functionality within a compact synapse design. The synapses are used with a soft-limiting nonlinear neuron to employ the relationship between positions and input current magnitude. We show through micro-magnetic simulation how the non-volatile physical characteristic of the domain wall calibrated synapse is used to implement a numerical integration function to realize a Cellular Neural Network(CNN). The performance of the proposed CNN design for isolated letter denoising at 0ns to 4ns demonstrates noise filtering functionality with total energy consumption during sensing of 24fJ. This compares favorably to existing spin CNN cell designs to provide a promising design approach for intrinsic neural computation.

2004 2nd workshop on embedded systems for real-time multimedia (ESTIMedia 2004)

Evolutionary Codesign

Download

Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

With pervasive applications of medical imaging in health-care, biomedical image segmentation play... more With pervasive applications of medical imaging in health-care, biomedical image segmentation plays a central role in quantitative analysis, clinical diagnosis, and medical intervention. Since manual annotation suffers limited reproducibility, arduous efforts, and excessive time, automatic segmentation is desired to process increasingly larger scale histopathological data. Recently, deep neural networks (DNNs), particularly fully convolutional networks (FCNs), have been widely applied to biomedical image segmentation, attaining much improved performance. At the same time, quantization of DNNs has become an active research topic, which aims to represent weights with less memory (precision) to considerably reduce memory and computation requirements of DNNs while maintaining acceptable accuracy. In this paper, we apply quantization techniques to FCNs for accurate biomedical image segmentation. Unlike existing literatures on quantization which primarily targets memory and computation complexity reduction, we apply quantization as a method to reduce overfitting in FCNs for better accuracy. Specifically, we focus on a state-of-the-art segmentation framework, suggestive annotation [26], which judiciously extracts representative annotation samples from the original training dataset, obtaining an effective small-sized balanced training dataset. We develop two new quantization processes for this framework: (1) suggestive annotation with quantization for highly representative training samples, and (2) network training with quantization for high accuracy. Extensive experiments on the MICCAI Gland dataset show that both quantization processes can improve the segmentation performance, and our proposed method exceeds the current state-of-the-art performance by up to 1%. In addition, our method has a reduction of up to 6.4x on memory usage.

Download

Performance and Energy Implications for Heterogeneous Computing Systems: A MiniFE Case Study

iCETD: An improved tag generation design for memory data authentication in embedded processor systems

Integration, 2017

Security becomes increasingly important in computing systems. Data integrity is of utmost importa... more Security becomes increasingly important in computing systems. Data integrity is of utmost importance. One way to protect data integrity is attaching an identifying tag to individual data. The authenticity of the data can then be checked against its tag. If the data is altered by the adversary, the related tag becomes invalid and the attack will be detected. The work presented in this paper studies an existing tag design (CETD) for authenticating memory data in embedded processor systems, where data that are stored in the memory or transferred over the bus can be tampered. Compared to other designs, this design offers the flexibility of trading-off between the implementation cost and tag size (hence the level of security); the design is cost effective and can counter the data integrity attack with random values (namely the fake values used to replace the valid data in the attack are random). However, we find that the design is vulnerable when the fake data is not randomly selected. For some data, their tags are not distributed over the full tag value space but rather limited to a much reduced set of values. When those values were chosen as the fake value, the data alteration would likely go undetected. In this article, we analytically investigate this problem and propose a low cost enhancement to ensure the full-range distribution of tag values for each data, hence effectively removing the vulnerability of the original design.

Download

Welcome to the Second Workshop on Embedded Systems for Real-Time Multimedia!

2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.

On-Chip Clocking Scheme for Nanomagnet QCA

2007 65th Annual Device Research Conference, 2007

Quantum-dot Cellular Automata (QCA) has been demonstrated using aluminum tunnel junction single-e... more Quantum-dot Cellular Automata (QCA) has been demonstrated using aluminum tunnel junction single-electron transistor technology at mK temperatures 1 , and molecular QCA is under development for operation at room temperature (RT) 2. All of the basic building blocks needed for QCA have been experimentally demonstrated. Our work on nanomagnet-based QCA (NMQCA) holds the most promise for achieving viable RT operation in the near term 3. One requirement of the QCA architecture is low-power clock structures, which is the subject of this paper. Experiments have shown that a series of nanomagnets placed side by side, with small gaps between them, can be used to carry digital data. The long axis is the "easy" axis of magnetization (i.e. low coercivity), and in a ground state, the magnets polarize in either up or down directions, which can be used to represent digital logic values. Fig. 1 shows such an arrangement where an external magnetic clocking field is applied along the "hard" (shorter, and high coercivity) axes. When the field is increased, polarization is aligned along the hard axes of the nanomagnets, forcing the line of magnets to a "null" state, and when it is released, the line relaxes adiabatically into the ground state set by the end magnet whose coercivity is made higher than the wire magnets via a narrower geometry. Alternatively, an electrical input structure, or some other method can be used. Previously, we have employed external magnetic fields to polarize magnets along their hard axes. Although an external field was used successfully for magnetic bubble technology, it is not a viable scheme for multiple clock phases at high frequencies. Consequently, we have designed and simulated an on-chip clocking scheme that should be able to create a confined, high-speed, and sufficiently strong nulling field. One clocking structure has been simulated using the Maxwell ® 2D electromagnetic simulator. A cross-sectional view of a line, or "wire," of nanomagnets along with the clocking structure is shown in Fig. 2. The magnetic fields generated due to currents in the clocking wires are strengthened in the nanomagnets by a ferrite yoke. The dimensions of the nanomagnets are 60 nm X 90 nm X 20 nm, with 15 nm gaps between them. As we have used a 2D simulator, each nanomagnet in Fig. 2 is 60 nm in the x direction and 20 nm in the y direction. Thus, both x and y directions are hard axes for the nanomagnets. Digital data is propagated through the QCA wire using multiphase clocking. First, assume that there is a fixed input (up or down state of a hard nanomagnet) at the left side of the structure. During the first phase of the clock, both current-carrying wires are excited. As a result, all the nanomagnets experience a high magnetic field in the xdirection (Fig. 3) that polarizes the magnets in the x-direction, nulling the state of the magnets. The field in the y direction (B y) is small compared to B x and, hence, is not shown. In the second clock phase, the right wire and the wire to its right (not shown) are excited, while the left one is de-excited. Now, only the nanomagnets on the right side experience a high magnetic field in the x direction (Fig. 4), and are nulled. The left nanomagnets relax to the ground state, and the state (up or down) of the individual nanomagnets are determined by the input on the left. Thus, the data is propagated from left to right up to the first half of the structure. In the next clocking phases, wires to the right are excited sequentially two at a time, and the data propagates from left to right. A four-phase clock is sufficient to facilitate data propagation. Micromagnetic simulations using OOMMF demonstrate that we should be able to clock the nanomagnets to perform logic functions. For example, Fig. 5 illustrates a short QCA wire segment in three different states. Fig. 5a shows a 3-cell wire segment after a 50 mT clock field is applied to the wire. Fig. 5b illustrates the same wire segment with the polarization of the first cell changed. A 15 mT field is applied along the x-axis of Fig. 5b (analogous to the middle row of Fig. 1) and is then reduced to 0 mT. The result is shown in Fig. 5c. The rest of the wire has changed in accordance with the new input. Fields as small as 3-5 mT have been shown to drive longer lines of similarly sized magnets to the correct ground state. The combined power consumption of the clock circuit and the nanomagnet switching power is very small, in the range of 1.5 µW per clock wire. The clocking circuit proposed in this paper should be applicable to NMQCA majority gate logic as well. Thus, this technology has the potential to allow for non-volatile, ultra-low power digital logic with minimal CMOS support.

Download

SEA: fast power estimation for micro-architectures

2003 5th International Conference on ASIC Proceedings (IEEE Cat No 03TH8690) ICASIC-03, 2003

Various approaches for micro-architectural power/ energy estimation have been introduced, mainly ... more Various approaches for micro-architectural power/ energy estimation have been introduced, mainly driven by the need to obtain fast power/energy estimates during early phases of complex SOC designs. In contrast to previous approaches we study power/energy estimation for highly optimized synthesizable description of microprocessor cores. Under this real-world design scenario, we found, unlike related previous research, that power can hardly be estimated closer than around 15% using an instruction level model. However, we can estimate the energy as close as 5%. Our research has resulted in the SEA framework that estimates energy/power consumed by a software program, taking specific micro-architectural features of the underlying programmable hardware core into consideration. With this high accuracy in energy estimation we achieve around 5 orders of magnitude faster estimations compared to state-of-the art high-level (RTL) commercial energy/power estimation tool suites. Thus, our framework is capable of reliably estimating the energy/power consumption of future complex SOCs.

Download

On the use of random walks to estimate correlation in fitness landscapes

Computational Statistics & Data Analysis, 1998

It is shown that fitness landscapes for constrained optimization problems are statistically aniso... more It is shown that fitness landscapes for constrained optimization problems are statistically anisotropic. Consequently, conducting a single, long random walk to estimate correlation in the landscape can produce incorrect results. We argue that more accurate estimates can be obtained by forming a composite picture from a set of confined random walks.

Download

Fitness functions for multiple objective optimization problems: Combining preferences with Pareto rankings

... discussion of fitness functions for MOPs. More detailed information can be found in a number ... more

RTSS 2008 Program Committee

Tarek Abdelzaher, University of Illinois at Urbana Champaign, USA Luis Almeida, Universidade de A... more

Scaling for edge inference of deep neural networks

Nature Electronics, 2018

Performance gap. In Fig. 3a, we show the number of operations needed by the leading designs in th... more Performance gap. In Fig. 3a, we show the number of operations needed by the leading designs in the ImageNet classification competition against their top-five error rates. The number of operations increases exponentially from 1.4 gigaops per image (AlexNet, 2012) to 38 gigaops per image (VGG-19, 2014) as the top-five error rate drops from 16.4% to 7.32%. In 2014, GoogLeNet was developed, which uses parallel structural optimization 18,19 to concatenate multiple paths of different scales for more effective feature extraction. This innovation dramatically reduces the number of operations required with little drop in performance. However, the number of operations continued to increase exponentially as the top-five error rate further decreased to 3.08% (Inception-v4, 2016). Graphics processing units (GPUs), field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) are popular hardware platforms for accommodating networks for edge inference 19-33. Figure 3b depicts how the performance

Download

Preference-driven hierarchical hardware/software partitioning

Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040)

In this paper, we present a hierarchical evolutionary approach to hardware/software partitioning ... more In this paper, we present a hierarchical evolutionary approach to hardware/software partitioning for real-time embedded systems. In contrast to most of previous approaches, we apply a hierarchical structure and dynamically determine the granularity of tasks and hardware modules to adaptively optimize the solution while keeping the search space as small as possible. Two new search operators are described, which exploit the proposed hierarchical structure. Efficient ranking is another problem addressed in this paper. Imprecisely Specified Multiple Attribute Utility Theory has the advantage of constraining the solution space based on the designer's preference, but suffers from high computation overhead. We propose a new technique to reduce the overhead. Experiment results show that our algorithm is both effective and efficient.

Download

Meshed Bluetree: Time-Predictable Multimemory Interconnect for Multicore Architectures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020

Download

Fixed-priority scheduling and controller co-design for time-sensitive networks

Proceedings of the 39th International Conference on Computer-Aided Design, 2020

Download

FPGA-based simulation of 3D light propagation

2014 14th International Workshop on Cellular Nanoscale Networks and their Applications (CNNA), 2014

NanoMagnet logic

70th Device Research Conference, 2012