This paper presents a practical method for improving timing uncertainty due to thermal noise in a... more This paper presents a practical method for improving timing uncertainty due to thermal noise in a ring oscillator. The methodology utilizes delay elements with nonlinear behavior dependent on event separation, the period between successive events. Pulse logic gates are shown to have delay-separation dynamics which can impact the statistics of subsequent events in the oscillators. The slope of the delayseparation is shown to linearly improve the uncertainty in these oscillators. Multiple pulses in a ring is also shown to linearly improve the timing uncertainty.
The 2009 MEMOCODE Co-Design Contest is the third in the series of annual design contests organize... more The 2009 MEMOCODE Co-Design Contest is the third in the series of annual design contests organized by the MEMOCODE Conference. Contestants have one month to create the best performing design solution to a posted design challenge. The contest is open to all interested participants, and the contest rules are designed to not exclude or favor any one design methodology or platform. The goal of the contest is to invite developers of tools and platforms to showcase their technology in a leveled competition and to encourage hands-on design activities in the fields of interest of the MEMOCODE Conference. Please see http://www.memocode-conference.com for current information about this contest.
High-level Synthesis or HLS represented an ambitious attempt by the community to provide capabili... more High-level Synthesis or HLS represented an ambitious attempt by the community to provide capabilities for 'algorithms to gates' for a period of almost three decades. The technical challenge in realizing this goal drew researchers from various areas ranging from parallel programming, digital signal processing, and logic synthesis to expert systems. This article takes a journey through the years of research in this domain with a narrative view of the lessons learnt and their implication for future research. As with any retrospective, it is written from a purely personal perspective of our research efforts in the domain, though we have made a reasonable attempt to document important technical developments in the history of high-level synthesis.
IEEE Transactions on Microwave Theory and Techniques, Feb 1, 2005
We have developed a 27-and 40-GHz tuned amplifier and a 52.5-GHz voltage-controlled oscillator us... more We have developed a 27-and 40-GHz tuned amplifier and a 52.5-GHz voltage-controlled oscillator using 0.18-m CMOS. The line-reflect-line calibrations with a microstrip-line structure, consisting of metal1 and metal6, was quite effective to extract the accurate-parameters for the intrinsic transistor on an Si substrate and realized the precise design. Using this technique, we obtained a 17-dB gain and 14-dBm output power at 27 GHz for the tuned amplifier. We also obtained a 7-dB gain and a 10.4-dBm output power with a good input and output return loss at 40 GHz. Additionally, we obtained an oscillation frequency of 52.5 GHz with phase noise of 86 dBc/Hz at a 1-MHz offset. These results indicate that our proposed technique is suitable for CMOS millimeter-wave design.
This paper presents a low phase noise, low power, wide tuning range, small area pulse ring oscill... more This paper presents a low phase noise, low power, wide tuning range, small area pulse ring oscillator fabricated in inexpensive 130nm CMOS technology, suitable for the widescale internet of things market. The ring uses very non-linear Pulse gates instead of conventional inverters as buffers substantially reducing the impulse sensitivity function (ISF) and thus the phase noise. The timing signal is rising-edge and ground referenced, allowing the supply to be used as control voltage. Common mode supply noise is rejected by double inversion in every stage of of the pulse gate as well as insensitivity to pulse amplitude and width. Fabricated ring oscillators show a phase noise of-98.41 dBC/Hz at 1MHz offset for 1.872GHz oscillator at 2.94mW power consumption and-95.14dBC/Hz at 1MHz offset for 388MHz oscillator at 216uW power consumption. The oscillators have a tuning range of 388MHz to 2.455GHz.
This paper presents a low phase noise, wide tuning range voltage controlled oscillator. The oscil... more This paper presents a low phase noise, wide tuning range voltage controlled oscillator. The oscillator uses a rotary wave oscillator topology, with pulse regenerative gates used as amplifiers. Pulse gates confer a paticular low-duty cycle pulse waveform whose arrival time is very weakly dependent on amplitude or pulse width. This reduces the root mean square value of Impulse Sensitivity Function, thereby reducing the phase noise. Further, the oscillator rejects common mode supply noise by double inversion of signals in every stage. Timing signals are ground referenced so the supply can be used as a control voltage. The fabricated 2.96 GHz oscillator in GFUS8RF(130nm) has phase noise of-132.2dBc/Hz @ 10MHz offset. The oscillator has a tuning range of 2.64GHz to 2.96GHz with average phase noise of-130.71dBc/Hz @ 10MHz offset.
This paper presents a new architecture for distributed arithmetic (DA) based Least Mean Square (L... more This paper presents a new architecture for distributed arithmetic (DA) based Least Mean Square (LMS) adaptive filter with low hardware complexity and critical path. It is well known that for DA based adaptive filter, the throughput depends on critical path and number of clock cycles to produce the output. In the proposed technique, we maintained the same number of clock cycles using multiplexed look-up tables (LUTs) which reduces the hardware complexity and critical path compared to best existing scheme. For instance, the hardware complexity can be lowered down by α.N, whereas the critical path can be reduced by TA + TM, with α, N, TA and TM being the number of reduced hardware elements, number of filter taps, adder and multiplexer computational delays, respectively. Synthesis result shows that for almost similar area and power performance, the proposed scheme achieves a gain of 27.6% due to clock speedup which results in more throughput and power can be lowered compared to best existing scheme.
European Design Automation Conference, Sep 20, 1996
Applications implementing complex protocols tax the capabilities of conventional finite state mac... more Applications implementing complex protocols tax the capabilities of conventional finite state machine synthesis techniques. In this paper, we present sequential optimization techniques whose complexity scales with the number of state bits rather than the number of states. These techniques create designs which are comparable or superior to those synthesized by conventional state-based optimization and assignment. Furthermore, they provide viable synthesis techniques for designs which are too large for synthesis with the conventional method.
This paper presents an efficient encoding and automaton construction which improves performance o... more This paper presents an efficient encoding and automaton construction which improves performance of automata-based scheduling techniques. The encoding preserves knowledge of what operations occurred previously but excludes when they occurred, allowing greater sharing among scheduling traces. The technique inherits all of the features of BDD-based control dominated scheduling including systematic speculation. Without conventional pruning, all schedules for several large samples are quickly constructed.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Jul 1, 1990
This report describes the Chippe system, gives some background previous work and describes severa... more This report describes the Chippe system, gives some background previous work and describes several sample design runs of the system. Also presented are the sources of the design tradeoffs used by Chippe, an overview of the internal design model, and experiences using the system.
IEEE Transactions on Very Large Scale Integration Systems, Aug 1, 2021
This article presents multiwire phase encoding (MWPE), a transition signaling technique aimed at ... more This article presents multiwire phase encoding (MWPE), a transition signaling technique aimed at chip-to-chip communication on silicon interposer technology, where multiple, relatively low-bandwidth transmission lines can be easily routed between high-performance dies. The encoding exploits timing correlation between transitions on multiple band-limited wires to achieve high ensemble bandwidth, potentially exceeding that of parallel conventionally encoded NRZ links. The unambiguous encoding enables instantaneous bit synchronization, resulting in low-power, low-latency, PLL-/DLL-free on-chip data movement. Theoretical and practical bandwidths achievable by phase-encoded links are evaluated as a function of channel properties. Link timing, driver, and receiver circuits are implemented to evaluate the link performance and power costs associated with moving MWPE data. The Hspice simulation-based estimates indicate that a 2-mm-long MWPE link can achieve 126-Gb/s bandwidth on a lossy, dispersive transmission line medium, with an energy cost of 0.24 pJ/bit in 22-nm FDX technology.
This work explores the new ESD (electrostatic discharge) protection design methodology for high s... more This work explores the new ESD (electrostatic discharge) protection design methodology for high speed off-chip communication ICs (Integrated Circuits). We propose novel methodology which describes the optimized design prediction of ESD protection device under HBM (Human Body Model) stress condition. Furthermore, we have discussed the ESD-I/O circuit interaction and improved the ESD circuit robustness by varying the various layout parameters and minimizing the parasitic capacitance of the protection device. Here, GG-NMOS (Gate Grounded NMOS) is taken as an ESD protection device. Moreover, LVDS (Low Voltage Differential Signaling) driver circuit is used as test circuit, where we compared the impact of capacitance due to protection device on circuit performance. The second breakdown triggering current (It2) which can be considered a metric of ESD robustness, is dependent on the drain to gate contact spacing (DCGS). We show that spacing optimization effectively elevates It2 by increasing the ballasting behavior and uniformity in current distribution while causing only a marginal increment in parasitic capacitance.
strained problems where early pruning decisions exclude candidates leading to superior solutions.... more strained problems where early pruning decisions exclude candidates leading to superior solutions. ILP schedulers (i.e. [3][6]) exactly solve scheduling but have difficulties with time complexity and complex control constraint formulation. Symbolic methods (i.e. [2][4][7][8][11]) are often effective in finding exact solutions in highly constrained problem formulations but may suffer from representation explosion. The technique described in this paper falls in the symbolic methods category. The most closely related previous work is found in [2][11] where system timing and synchronization requirements are encapsulated in finite-state machine (FSM) descriptions. Our work differs in two ways. First, we introduce non-determinism as a preferred representation for protocols. The work described in [9] supports this decision. Second, and more importantly, our formulation is hierarchical and amenable to abstraction. We believe hierarchy and abstraction are key components in making symbolic techniques manageable.
Optimization of hardware resources for conditional data-flow graph behavior is particularly impor... more Optimization of hardware resources for conditional data-flow graph behavior is particularly important when conditional behavior occurs in cyclic loops and maximization of throughput is desired. In this paper, an exact and efficient conditional resource sharing analysis using a guardbased control representation is presented. The analysis is transparent to a scheduler implementation. The proposed technique systematically handles complex conditional resource sharing for cases when folded (software pipelined) loops include conditional behavior within the loop body.
This paper describes a symbolic formulation that allows incorporation of speculative operation ex... more This paper describes a symbolic formulation that allows incorporation of speculative operation execution (preexecution) in an exact control-dependent scheduling of arbitrary forward branching control/data paths. The technique provides a closed form solution set in which all satisfying schedules are encapsulated in a compressed OBDD-based representation. To extract parallelism implicit in the input specification Boolean 'guard' functions are used to identify paths where operations have to be scheduled and the execution order of the conditionals is dynamically resolved. An efficient and systematic iterative construction method is presented along with benchmark results.
This paper presents a practical method for improving timing uncertainty due to thermal noise in a... more This paper presents a practical method for improving timing uncertainty due to thermal noise in a ring oscillator. The methodology utilizes delay elements with nonlinear behavior dependent on event separation, the period between successive events. Pulse logic gates are shown to have delay-separation dynamics which can impact the statistics of subsequent events in the oscillators. The slope of the delayseparation is shown to linearly improve the uncertainty in these oscillators. Multiple pulses in a ring is also shown to linearly improve the timing uncertainty.
The 2009 MEMOCODE Co-Design Contest is the third in the series of annual design contests organize... more The 2009 MEMOCODE Co-Design Contest is the third in the series of annual design contests organized by the MEMOCODE Conference. Contestants have one month to create the best performing design solution to a posted design challenge. The contest is open to all interested participants, and the contest rules are designed to not exclude or favor any one design methodology or platform. The goal of the contest is to invite developers of tools and platforms to showcase their technology in a leveled competition and to encourage hands-on design activities in the fields of interest of the MEMOCODE Conference. Please see http://www.memocode-conference.com for current information about this contest.
High-level Synthesis or HLS represented an ambitious attempt by the community to provide capabili... more High-level Synthesis or HLS represented an ambitious attempt by the community to provide capabilities for 'algorithms to gates' for a period of almost three decades. The technical challenge in realizing this goal drew researchers from various areas ranging from parallel programming, digital signal processing, and logic synthesis to expert systems. This article takes a journey through the years of research in this domain with a narrative view of the lessons learnt and their implication for future research. As with any retrospective, it is written from a purely personal perspective of our research efforts in the domain, though we have made a reasonable attempt to document important technical developments in the history of high-level synthesis.
IEEE Transactions on Microwave Theory and Techniques, Feb 1, 2005
We have developed a 27-and 40-GHz tuned amplifier and a 52.5-GHz voltage-controlled oscillator us... more We have developed a 27-and 40-GHz tuned amplifier and a 52.5-GHz voltage-controlled oscillator using 0.18-m CMOS. The line-reflect-line calibrations with a microstrip-line structure, consisting of metal1 and metal6, was quite effective to extract the accurate-parameters for the intrinsic transistor on an Si substrate and realized the precise design. Using this technique, we obtained a 17-dB gain and 14-dBm output power at 27 GHz for the tuned amplifier. We also obtained a 7-dB gain and a 10.4-dBm output power with a good input and output return loss at 40 GHz. Additionally, we obtained an oscillation frequency of 52.5 GHz with phase noise of 86 dBc/Hz at a 1-MHz offset. These results indicate that our proposed technique is suitable for CMOS millimeter-wave design.
This paper presents a low phase noise, low power, wide tuning range, small area pulse ring oscill... more This paper presents a low phase noise, low power, wide tuning range, small area pulse ring oscillator fabricated in inexpensive 130nm CMOS technology, suitable for the widescale internet of things market. The ring uses very non-linear Pulse gates instead of conventional inverters as buffers substantially reducing the impulse sensitivity function (ISF) and thus the phase noise. The timing signal is rising-edge and ground referenced, allowing the supply to be used as control voltage. Common mode supply noise is rejected by double inversion in every stage of of the pulse gate as well as insensitivity to pulse amplitude and width. Fabricated ring oscillators show a phase noise of-98.41 dBC/Hz at 1MHz offset for 1.872GHz oscillator at 2.94mW power consumption and-95.14dBC/Hz at 1MHz offset for 388MHz oscillator at 216uW power consumption. The oscillators have a tuning range of 388MHz to 2.455GHz.
This paper presents a low phase noise, wide tuning range voltage controlled oscillator. The oscil... more This paper presents a low phase noise, wide tuning range voltage controlled oscillator. The oscillator uses a rotary wave oscillator topology, with pulse regenerative gates used as amplifiers. Pulse gates confer a paticular low-duty cycle pulse waveform whose arrival time is very weakly dependent on amplitude or pulse width. This reduces the root mean square value of Impulse Sensitivity Function, thereby reducing the phase noise. Further, the oscillator rejects common mode supply noise by double inversion of signals in every stage. Timing signals are ground referenced so the supply can be used as a control voltage. The fabricated 2.96 GHz oscillator in GFUS8RF(130nm) has phase noise of-132.2dBc/Hz @ 10MHz offset. The oscillator has a tuning range of 2.64GHz to 2.96GHz with average phase noise of-130.71dBc/Hz @ 10MHz offset.
This paper presents a new architecture for distributed arithmetic (DA) based Least Mean Square (L... more This paper presents a new architecture for distributed arithmetic (DA) based Least Mean Square (LMS) adaptive filter with low hardware complexity and critical path. It is well known that for DA based adaptive filter, the throughput depends on critical path and number of clock cycles to produce the output. In the proposed technique, we maintained the same number of clock cycles using multiplexed look-up tables (LUTs) which reduces the hardware complexity and critical path compared to best existing scheme. For instance, the hardware complexity can be lowered down by α.N, whereas the critical path can be reduced by TA + TM, with α, N, TA and TM being the number of reduced hardware elements, number of filter taps, adder and multiplexer computational delays, respectively. Synthesis result shows that for almost similar area and power performance, the proposed scheme achieves a gain of 27.6% due to clock speedup which results in more throughput and power can be lowered compared to best existing scheme.
European Design Automation Conference, Sep 20, 1996
Applications implementing complex protocols tax the capabilities of conventional finite state mac... more Applications implementing complex protocols tax the capabilities of conventional finite state machine synthesis techniques. In this paper, we present sequential optimization techniques whose complexity scales with the number of state bits rather than the number of states. These techniques create designs which are comparable or superior to those synthesized by conventional state-based optimization and assignment. Furthermore, they provide viable synthesis techniques for designs which are too large for synthesis with the conventional method.
This paper presents an efficient encoding and automaton construction which improves performance o... more This paper presents an efficient encoding and automaton construction which improves performance of automata-based scheduling techniques. The encoding preserves knowledge of what operations occurred previously but excludes when they occurred, allowing greater sharing among scheduling traces. The technique inherits all of the features of BDD-based control dominated scheduling including systematic speculation. Without conventional pruning, all schedules for several large samples are quickly constructed.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Jul 1, 1990
This report describes the Chippe system, gives some background previous work and describes severa... more This report describes the Chippe system, gives some background previous work and describes several sample design runs of the system. Also presented are the sources of the design tradeoffs used by Chippe, an overview of the internal design model, and experiences using the system.
IEEE Transactions on Very Large Scale Integration Systems, Aug 1, 2021
This article presents multiwire phase encoding (MWPE), a transition signaling technique aimed at ... more This article presents multiwire phase encoding (MWPE), a transition signaling technique aimed at chip-to-chip communication on silicon interposer technology, where multiple, relatively low-bandwidth transmission lines can be easily routed between high-performance dies. The encoding exploits timing correlation between transitions on multiple band-limited wires to achieve high ensemble bandwidth, potentially exceeding that of parallel conventionally encoded NRZ links. The unambiguous encoding enables instantaneous bit synchronization, resulting in low-power, low-latency, PLL-/DLL-free on-chip data movement. Theoretical and practical bandwidths achievable by phase-encoded links are evaluated as a function of channel properties. Link timing, driver, and receiver circuits are implemented to evaluate the link performance and power costs associated with moving MWPE data. The Hspice simulation-based estimates indicate that a 2-mm-long MWPE link can achieve 126-Gb/s bandwidth on a lossy, dispersive transmission line medium, with an energy cost of 0.24 pJ/bit in 22-nm FDX technology.
This work explores the new ESD (electrostatic discharge) protection design methodology for high s... more This work explores the new ESD (electrostatic discharge) protection design methodology for high speed off-chip communication ICs (Integrated Circuits). We propose novel methodology which describes the optimized design prediction of ESD protection device under HBM (Human Body Model) stress condition. Furthermore, we have discussed the ESD-I/O circuit interaction and improved the ESD circuit robustness by varying the various layout parameters and minimizing the parasitic capacitance of the protection device. Here, GG-NMOS (Gate Grounded NMOS) is taken as an ESD protection device. Moreover, LVDS (Low Voltage Differential Signaling) driver circuit is used as test circuit, where we compared the impact of capacitance due to protection device on circuit performance. The second breakdown triggering current (It2) which can be considered a metric of ESD robustness, is dependent on the drain to gate contact spacing (DCGS). We show that spacing optimization effectively elevates It2 by increasing the ballasting behavior and uniformity in current distribution while causing only a marginal increment in parasitic capacitance.
strained problems where early pruning decisions exclude candidates leading to superior solutions.... more strained problems where early pruning decisions exclude candidates leading to superior solutions. ILP schedulers (i.e. [3][6]) exactly solve scheduling but have difficulties with time complexity and complex control constraint formulation. Symbolic methods (i.e. [2][4][7][8][11]) are often effective in finding exact solutions in highly constrained problem formulations but may suffer from representation explosion. The technique described in this paper falls in the symbolic methods category. The most closely related previous work is found in [2][11] where system timing and synchronization requirements are encapsulated in finite-state machine (FSM) descriptions. Our work differs in two ways. First, we introduce non-determinism as a preferred representation for protocols. The work described in [9] supports this decision. Second, and more importantly, our formulation is hierarchical and amenable to abstraction. We believe hierarchy and abstraction are key components in making symbolic techniques manageable.
Optimization of hardware resources for conditional data-flow graph behavior is particularly impor... more Optimization of hardware resources for conditional data-flow graph behavior is particularly important when conditional behavior occurs in cyclic loops and maximization of throughput is desired. In this paper, an exact and efficient conditional resource sharing analysis using a guardbased control representation is presented. The analysis is transparent to a scheduler implementation. The proposed technique systematically handles complex conditional resource sharing for cases when folded (software pipelined) loops include conditional behavior within the loop body.
This paper describes a symbolic formulation that allows incorporation of speculative operation ex... more This paper describes a symbolic formulation that allows incorporation of speculative operation execution (preexecution) in an exact control-dependent scheduling of arbitrary forward branching control/data paths. The technique provides a closed form solution set in which all satisfying schedules are encapsulated in a compressed OBDD-based representation. To extract parallelism implicit in the input specification Boolean 'guard' functions are used to identify paths where operations have to be scheduled and the execution order of the conditionals is dynamically resolved. An efficient and systematic iterative construction method is presented along with benchmark results.
Uploads
Papers by Forrest Brewer