Papers by Gregory Peterson
final electronic copy of this thesis for form and content and recommend that it be
Proceedings of the Fifth IEEE International Workshop on Behavioral Modeling and Simulation. BMAS 2001 (Cat No.01TH8601)
The understanding of biological systems remains one of our primary scientific activities. A prima... more The understanding of biological systems remains one of our primary scientific activities. A primary goal of biological research is the development of accurate models that can be used to explain biological processes, with predictive models particularly promising for drug development, epidemiology, bio-engineering, and genetic applications. We discuss the use of VHDL-AMS for developing highly predictive, accurate models of cellular processes

Joint Vision 2010 describes an approach about joint warfare of the future, and it depends on and ... more Joint Vision 2010 describes an approach about joint warfare of the future, and it depends on and highlights the contributions of air power. Accordingly, the Air Force’s vision of Global Engagement flows from Joint Vision 2010. The Air Force’s Strategic Plan outlines the core competencies that are necessary for Global Engagement. The capability for attaining the desired capabilities and goals of these visions depends on an adaptive, unified, dynamic aerospace C2 system. The unified C2 system will provide for the capability to dynamically assess, plan, execute, and project aerospace power in a joint environment. However, this capability must overcome the challenges of complexity and ultimately affordability, if it is to become a real system. In addition, future dynamic C2 systems must be integrated into a system of systems to be effective. This paper discusses development approaches and technologies that offer significant savings in overcoming the complexities of fielding a unified C2...

Proceedings of the 2002 IEEE International Workshop on Behavioral Modeling and Simulation, 2002. BMAS 2002.
Accurate, predictive models of biological cellular processes are a key component of the quest to ... more Accurate, predictive models of biological cellular processes are a key component of the quest to transform the engineering of genetic controls within organisms. Biological systems include a dizzying variety of interacting biochemical pathways, each consisting of numerous reactions and chemical species. Moreover, such classical assumptions of differential equations models as equilibrium (including well-stirred species) and a large numbers of reactants do not hold for mesoscalar, intra-cellular modeling. In fact, these systems require the seamless, accurate modeling of interacting discrete and continuous behaviors. Previous research has demonstrated the potential of using portions of biochemical pathways as switches to control behavior, with emergent behaviors such as logic gates created. Accurate, efficient models require the ability to mix discrete and continuous descriptions while providing for the highfidelity interaction between these domains. Although a number of previous researchers have discussed the use of analog and mixed signal hardware description languages (AMS HDLs) such as VHDL-AMS and Verilog-AMS for representing the behavior of MEMS, microfluidic, optical, and thermal systems, similar benefits can result from the use of AMS HDLs for biological systems modeling. We present preliminary research into the VHDL-AMS representation of biological systems at different levels of abstraction and the capability to support multi-resolution modeling. We discuss the role of this research within the context of the DARPA BioSPICE research program, our modeling and simulation approach, and future plans.

Chairs and Committee Members
Brian Bailey, Mentor Graphics, USA Gaetano Borriello, University of Washington, USA Joseph Buck, ... more Brian Bailey, Mentor Graphics, USA Gaetano Borriello, University of Washington, USA Joseph Buck, Synopsys, USA Raul Camposano, Synopsys, USA Giovanni De Micheli, Stanford University USA Martyn Edwards, University of Manchester (UMIST), UK Rolf Ernst, University of Braunschweig, Germany Thomas Fuhrman, General Motors, USA Daniel Gajski, University of California at Irvine, USA Rajesh Gupta, University of Illinois, USA Reiner Hartenstein, University of Kaiserslautern, Germany Roger Hughes, Abstract Hardware, UK Ahmed Jerraya, Istitute National Polytechnique de Grenoble, France Kurt Keutzer, Synopsys, USA Sanjaya Kumar, Honeywell, USA Philip Koopman, Carnegie-Mellon University, USA Gregory Peterson, USAF Wright Labs, USA Wolfgang Rosenstiel, University of Tubingen, Germany Albert0 Sangiovanni-Vincentelli, University of California at Berkeley, USA Donatella Sciuto, Politecnico di Milano, Italy Jorgen Staunstrup, Technical University of Denmark, Denmark Richard Taylor, Hewlett-Packard Laboratories, UK Don Thomas, Carnegie-Mellon University, USA Frank Vahid, University of California at Riverside, USA Wayne Wolf, Princeton University, USA Hiroto Yasuura, Kyushu University Japan

CUDA shared memory is fast, on-chip storage. However, the bank conflict issue could cause a perfo... more CUDA shared memory is fast, on-chip storage. However, the bank conflict issue could cause a performance bottleneck. Current NVIDIA Tesla GPUs support memory bank accesses with configurable bit-widths. While this feature provides an efficient bank mapping scheme for 32-bit and 64-bit data types, it becomes trickier to solve the bank conflict problem through manual code tuning. This paper presents a framework for automatic bank conflict analysis and optimization. Given static array access information, we calculate the conflict degree, and then provide optimized data access patterns. Basically, by searching among different combinations of interand intraarray padding, along with bank access bit-width configurations, we can efficiently reduce or eliminate bank conflicts. From RODINIA and the CUDA SDK we selected 13 kernels with bottlenecks due to shared memory bank conflicts. After using our approach, these benchmarks achieve 5%-35% improvement in runtime. Keywords— shared memory; CUDA; ...

Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, 2018
The current landscape of scientific research is widely based on modeling and simulation, typicall... more The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward since they may need multiple processing tasks and iterations. Furthermore, parameter and performance studies are common approaches used to characterize a simulation, often requiring traversal of a large parameter space. High-performance computers offer practical resources at the expense of users handling the setup, submission, and management of jobs. This work presents the design of PaPaS, a portable, lightweight, and generic workflow framework for conducting parallel parameter and performance studies. Workflows are defined using parameter files based on keyword-value pairs syntax, thus removing from the user the overhead of creating complex scripts to manage the workflow. A parameter set consists of any combination of environment variables, files, partial file contents, and command line arguments. PaPaS is being developed in Python 3 with support for distributed parallelization using SSH, batch systems, and C++ MPI. The PaPaS framework will run as user processes, and can be used in single/multi-node and multi-tenant computing systems. An example simulation using the BehaviorSpace tool from NetLogo and a matrix multiply using OpenMP are presented as parameter and performance studies, respectively. The results demonstrate that the PaPaS framework offers a simple method for defining and managing parameter studies, while increasing resource utilization.

IEEE Journal on Emerging and Selected Topics in Circuits and Systems
Mixed precision is a promising approach to save energy in iterative refinement algorithms since i... more Mixed precision is a promising approach to save energy in iterative refinement algorithms since it obtains speedup without necessitating additional cores and parallelisation. However, conventional mixed precision methods utilise statically defined precision in a loop, thus hindering further speed-up and energy savings. We overcome this problem by proposing novel methods which allow iterative refinement to utilise variable precision arithmetic dynamically in a loop (i.e. a trans-precision approach). Our methods restructure a numeric algorithm dynamically according to runtime numeric behaviour and remove unnecessary accuracy checks. We implemented our methods by extending one conventional mixed precision iterative refinement algorithm on an Intel Xeon E5-2650 2GHz core with MKL 2017 and XBLAS 1.0. Our dynamic precision approach demonstrates 2.0-2.6× speed-up and 1.8-2.4× energy savings compared to mixed precision iterative refinement when double precision solution accuracy is required for forward error and with matrix dimensions ranging from 4K to 32K.
Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing - Co-HPC '15, 2015
In this paper we present an optimized GPU co-design of the Induced Dimension Reduction (IDR) algo... more In this paper we present an optimized GPU co-design of the Induced Dimension Reduction (IDR) algorithm for solving linear systems. Starting from a baseline implementation based on the generic BLAS routines from the MAGMA software library, we apply optimizations that are based on kernel fusion and kernel overlap. Runtime experiments are used to investigate the benefit of the distinct optimization techniques for different variants of the IDR algorithm. A comparison to the reference implementation reveals that the interplay between them can succeed in cutting the overall runtime by up to about one third.
I would like to thank all those who have helped me achieve my Master of Science degree in Compute... more I would like to thank all those who have helped me achieve my Master of Science degree in Computer Engineering. First, I would like to thank Dr. Don Bouldin for introducing me to FPGA design and for his continual guidance, insight, and support. Second, I would like to thank Ersin Domangue at InfoAssure, Inc. for his explanation of a number of cryptographic concepts. I would also like to thank Adam Miller and Shawn Carrithers for their co-authorship of several cryptographic modules. Finally, I would like to thank Dr. Greg Peterson and Dr. Itamar Elhanany for their many suggestions and for serving on my committee. To my family and friends, thank you for your personal encouragement and support. Without you, this work would not have been possible.
Parallel Algorithm for the Genetic KNN-Impute Algorithm
Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006
Peterson for his constant support and instructive guidance. Second, I would like to thank Dr. Rob... more Peterson for his constant support and instructive guidance. Second, I would like to thank Dr. Robert Harrison for his support and great suggestions of my research and study. Third, I would like to thank Dr. G.Lee Warren for his insight, explanation and instruction during my work. I would also like to thank Dr. Don Bouldin for introducing me to FPGA design and serving on my committee member. A special thank also goes to my colleagues including Junqing Sun, Akila Gothanaraman, Saumil Merchant, Shaoyu Liu, Zhenzhen Liu and Scott E. Fields. Without their help, this work would not have been possible. To my friends, I would also like to acknowledge them for their constant help and the happiness that they brought during my study life.

<title>Graphics processing simulation and trade-off study for cockpit applications</title>
Cockpit Displays III, 1996
Under the sponsorship of Wright Laboratory (contract F33615-92-C-3802), Honeywell has been involv... more Under the sponsorship of Wright Laboratory (contract F33615-92-C-3802), Honeywell has been involved in the definition of next-generation display processors. This paper describes the top-level design approach, simulation and tradeoff studies, as well as the resulting architectural concepts for the cockpit display generator (CDG) processing system. The CDG architecture provides the graphical and video processing power needed to drive future high- resolution display devices and to generate advanced display formats for improved pilot situation awareness. The foremost objective of the CDG design is to achieve super-graphics workstation performance in a form factor suitable for avionics applications. The CDG design provides multichannel, high-performance 2-D and 3-D graphics and real-time video manipulation. Requirements for the CDG have been defined by the needs of Panoramic Cockpit Control and Display System (PCCADS) 2000 cockpits. Most notable are requirements for low-volume, low-power, real-time performance and tolerance for harsh environmental conditions. These goals have been realized by combining customized graphics pipelines with standard processing elements. The CDG design has been implemented as a software 'prototype' using VHDL performance and functional models. This novel design approach allows architectural tradeoffs to be made within the context of a standard design language, VHDL. Simulations have been developed to specify and evaluate particular system performance and functional and design aspects.
Accelerating Gene Regulatory Network Modeling Using Grid-Based Simulation
SIMULATION, 2004

Digital Signal Processing, 2013
A key challenge to achieve very high positioning accuracy (such as sub-mm accuracy) in Ultra-Wide... more A key challenge to achieve very high positioning accuracy (such as sub-mm accuracy) in Ultra-Wideband (UWB) positioning systems is how to obtain ultra-high resolution UWB echo pulses, which requires ADCs with a prohibitively high sampling rate. The theory of Compressed Sensing (CS) has been applied to UWB systems to acquire UWB pulses below the Nyquist sampling rate. This paper proposes a front-end optimized scheme for the CS-based UWB positioning system. A Space-Time Bayesian Compressed Sensing (STBCS) algorithm is developed for joint signal reconstruction by transferring mutual a priori information, which can dramatically decrease ADC sampling rate and improve noise tolerance. Moreover, the STBCS and time difference of arrival (TDOA) algorithms are integrated in a pipelined mode for fast tracking of the target through an incremental optimization method. Simulation results show the proposed STBCS algorithm can significantly reduce the number of measurements and has better noise tolerance than the traditional BCS, OMP, and multi-task BCS (MBCS) algorithms. The sub-mm accurate CS-based UWB positioning system using the proposed STBCS-TDOA algorithm requires only 15% of the original sampling rate compared with the UWB positioning system using a sequential sampling method.

Dynamics of domain coverage of the protein sequence universe
BMC Genomics, 2012
Background The currently known protein sequence space consists of millions of sequences in public... more Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data.

Designers face the challenge of specifying and implementing complicated mixed-technology systems.... more Designers face the challenge of specifying and implementing complicated mixed-technology systems. In order to better address mixed-signal designs, the VHDL-AMS and Verilog-AMS languages have been developed. These languages provide powerful capabilities to model and simulate behaviors in both the continuous and discrete time domains. Contemporaneously, the control systems community developed the object-oriented Modelica language to support the specification and continuous time modeling of complex control systems. The STEAMS (SUAVE and Tennessee Extensions for Analog and Mixed-Signal Systems) effort strives to provide an object-oriented systems specification and modeling language that supports both discrete and continuous time behaviors. STEAMS enables the modeling of interacting continuous and discrete time components coupled with the modeling productivity benefits associated with object-oriented techniques. This paper presents the requirements and rationale for the STEAMS language development effort, including modeling deficiencies currently facing the VHDL-AMS user community.
The understanding of biological systems remains one of our primary scientific activities. A prima... more The understanding of biological systems remains one of our primary scientific activities. A primary goal of biological research is the development of accurate models that can be used to explain biological processes, with predictive models particularly promising for drug development, epidemiology, bio-engineering, and genetic applications. We discuss the use of VHDL-AMS for developing highly predictive, accurate models of cellular processes

Accurate, predictive models of biological cellular processes are a key component of the quest to ... more Accurate, predictive models of biological cellular processes are a key component of the quest to transform the engineering of genetic controls within organisms. Biological systems include a dizzying variety of interacting biochemical pathways, each consisting of numerous reactions and chemical species. Moreover, such classical assumptions of differential equations models as equilibrium (including well-stirred species) and a large numbers of reactants do not hold for mesoscalar, intra-cellular modeling. In fact, these systems require the seamless, accurate modeling of interacting discrete and continuous behaviors. Previous research has demonstrated the potential of using portions of biochemical pathways as switches to control behavior, with emergent behaviors such as logic gates created. Accurate, efficient models require the ability to mix discrete and continuous descriptions while providing for the highfidelity interaction between these domains. Although a number of previous researchers have discussed the use of analog and mixed signal hardware description languages (AMS HDLs) such as VHDL-AMS and Verilog-AMS for representing the behavior of MEMS, microfluidic, optical, and thermal systems, similar benefits can result from the use of AMS HDLs for biological systems modeling. We present preliminary research into the VHDL-AMS representation of biological systems at different levels of abstraction and the capability to support multi-resolution modeling. We discuss the role of this research within the context of the DARPA BioSPICE research program, our modeling and simulation approach, and future plans.
Dynamics of domain coverage of the protein sequence universe
Uploads
Papers by Gregory Peterson