Papers by Hoàng Châu Vũ Lê
Computer Vision – ECCV 2018, 2018
Journal of Computational and Graphical Statistics, 2020
Many problems in statistics and machine learning can be formulated as an optimization problem of ... more Many problems in statistics and machine learning can be formulated as an optimization problem of a finite sum of non-smooth convex functions. We propose an algorithm to minimize this type of objective functions based on the idea of alternating linearization. Our algorithm retains the simplicity of contemporary methods without any restrictive assumptions on the smoothness of the loss function. We apply our proposed method to solve two challenging problems: overlapping group Lasso and convex regression with sharp partitions (CRISP). Numerical experiments show that our method is superior to the state-of-the-art algorithms, many of which are based on the accelerated proximal gradient method.

2011 IEEE International Symposium on Multimedia, 2011
The rule of thirds is one of the most important composition rules used by photographers to create... more The rule of thirds is one of the most important composition rules used by photographers to create high-quality photos. The rule of thirds states that placing important objects along the imagery thirds lines or around their intersections often produces highly aesthetic photos. In this paper, we present a method to automatically determine whether a photo respects the rule of thirds. Detecting the rule of thirds from a photo requires semantic content understanding to locate important objects, which is beyond the state of the art. This paper makes use of the recent saliency and generic objectness analysis as an alternative and accordingly designs a range of features. Our experiment with a variety of saliency and generic objectness methods shows that an encouraging performance can be achieved in detecting the rule of thirds from photos.

Pediatric Neurosurgery, 2002
Objective: Slit ventricle syndrome (SVS) has been described in hydrocephalus patients who continu... more Objective: Slit ventricle syndrome (SVS) has been described in hydrocephalus patients who continue to have shunt malfunction-like symptoms in the presence of a functioning shunt system and small ventricles on imaging studies. These symptoms usually present years after shunt placement or revision and can consist of headache, nausea and vomiting, lethargy and decreased cognitive skills. Treatments offered range from observation, medical therapy (migraine treatment) and shunt revision to subtemporal decompression or cranial vault expansion. We describe a subset of patients with SVS who were symptomatic with high intracranial pressure (ICP) as measured by sedated lumbar puncture and whose symptoms completely resolved after lumboperitoneal shunt (LPS) placement. Methods: Seven patients with a diagnosis of SVS underwent lumboperitoneal shunting. The age at shunting ranged from 3 to 18 years. Most had undergone recent ventriculoperitoneal shunt (VPS) revisions for presentation of shunt mal...

2013 23rd International Conference on Field programmable Logic and Applications, 2013
Recently, there has been a growing interest within the research community to improve energy effic... more Recently, there has been a growing interest within the research community to improve energy efficiency. In this paper, we revisit the classic Fast Fourier Transform (FFT) for energy efficient designs on FPGAs. Parameterized FFT architecture is proposed to identify design trade-offs in achieving energy efficiency. We first perform design space exploration by varying the algorithm mapping parameters, such as the degree of vertical and horizontal parallelism, that characterize the decomposition based FFT algorithms. After empirical selection on the values of algorithm mapping parameters, an energy-performance-area trade-off design for energy efficiency is identified by varying the architecture parameters, including the type of memory elements, the type of interconnection network and the number of pipeline stages. The tradeoffs between energy, area, and time are analyzed using two performance metrics: the Energy×Area×Time (EAT) composite metric and the energy efficiency (defined as the number of operations per Joule). From the experimental results, a design space is generated to demonstrate the effect of these parameters on the various performance metrics. For N-point FFT (16 ≤ N ≤ 1024), our designs achieve up to 28% and 38% improvement in the energy efficiency and EAT, respectively, compared with a state-of-the-art design.

Proceedings of the 27th Symposium on Integrated Circuits and Systems Design - SBCCI '14, 2014
Most safety critical systems today cannot be completely verified by state-of-the-art verification... more Most safety critical systems today cannot be completely verified by state-of-the-art verification approaches before their deployment to the real world. The rapidly growing complexity of these systems is amplifying the strong demand of a disruptive innovation in verification technology. In this invited paper, we propose the concept of self-verification-a fundamental change to the way how verification is approached by employing it as a post-deployment process. This enables a new generation of safety critical systems that are capable of verifying themselves. Essential for the realization of this idea is the design of a core system carrying self-verification capacities. We outline a possible architecture of the core system and demonstrate two application scenarios of how self-verification could be realized. The first one targets the verification of evolving systems whereas the second one allows the seamless integration of partially unverified components in safety-critical applications.

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014, 2014
Ensuring the correctness of high-level SystemC designs is an important and challenging problem in... more Ensuring the correctness of high-level SystemC designs is an important and challenging problem in today's Electronic System Level (ESL) methodology. Prevalently, a design is checked against a functional specification given by e.g. a testcase with reference output or a user-defined property. Another research direction takes the view of a SystemC design as a piece of concurrent software. The design is then checked for common concurrency problems and thus, a functional specification is not required. Along this line, several methods for deadlock detection and race analysis have been developed. In this work, we propose to consider a new concurrency verification problem, namely input-output determinism, for Sys-temC designs. That means for each possible input, the design must produce the same output under any valid process schedule. We argue that determinism verification is stronger than both deadlock detection and race analysis. Beside being an attractive correctness criterion itself, proven determinism helps to accelerate both simulative and formal verification. We also present a preliminary study to show the feasibility of determinism verification for SystemC designs.

2010 IEEE International High Level Design Validation and Test Workshop (HLDVT), 2010
For Electronic System Level (ESL) design SystemC has become the standard language due to its exce... more For Electronic System Level (ESL) design SystemC has become the standard language due to its excellent support of Transaction Level Modeling (TLM). But even if the complexity of the systems can be handled using the abstraction levels offered by TLM-the most abstract one is untimed and focuses on functionality-still verification is the major bottleneck. In particular, as untimed TLM models are the reference for the following refinement steps their correctness has to be ensured. Thus, formal verification approaches have been developed to prove properties for these models. However, even if several properties have been checked this does not guarantee that the complete functionality of the TLM model has been verified. Thus, in this paper we consider the problem of functional coverage analysis in formal TLM property checking. We present a coverage approach which can analyze whether the property set unambiguously describes all transactions in a SystemC TLM model. The developed coverage analysis method identifies uncovered scenarios and hence allows to close all coverage gaps. As an example we consider an automated teller machine and we show the benefits of the proposed approach.

Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010), 2010
Electronic System Level (ESL) design manages the enormous complexity of todays systems by using a... more Electronic System Level (ESL) design manages the enormous complexity of todays systems by using abstract models. In this context Transaction Level Modeling (TLM) is state-of-theart for describing complex communication without all the details. As ESL language, SystemC has become the de facto standard. Since the SystemC TLM models are used for early software development and as reference for hardware implementation their correct functional behavior is crucial. Admittedly, the best possible verification quality can be achieved with formal approaches. However, formal verification of TLM models is a hard task. Existing methods basically consider local properties or have extremely high run-time. In contrast, the approach proposed in this paper can verify "true" TLM properties, i.e. major TLM behavior like for instance the effect of a transaction and that the transaction is only started after a certain event can be proven. Our approach works as follows: After a fully automatic SystemC-to-C transformation, the TLM property is mapped to monitoring logic using C assertions and finite state machines. To detect a violation of the property the approach uses a BMC-based formulation over the outermost loop of the SystemC scheduler. In addition, we improve this verification method significantly by employing induction on the C model forming a complete and efficient approach. As shown by experiments state-of-the-art proof techniques allow proving important non-trivial behavior of SystemC TLM designs.

2012 International Symposium on Electronic System Design (ISED), 2012
In the ESL design flow, the crucial task of developing a golden model that correctly implements t... more In the ESL design flow, the crucial task of developing a golden model that correctly implements the natural-language top-level specification has received little attention so far. The major drawback of the current practice is the isolation of design and verification. Motivated by this and the recent advance of verification techniques for SystemC ESL models, we propose a novel methodology to develop a correct SystemC golden model from the top-level specification. The proposed methodology is driven by the requirements and the scenarios in the specification with design and verification going hand in hand. An early formalization of requirements and scenarios produces a set of properties and a testbench together with a code skeleton that will be successively extended to a full SystemC ESL model. The availability of properties and a testbench beforehand enables verification-driven development of the model. The advantages of the methodology are discussed and demonstrated by a case study.
Induction-Based Formal Verification of SystemC TLM Designs
2009 10th International Workshop on Microprocessor Test and Verification, 2009
ABSTRACT

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, 2012
The IEEE-1800 SystemVerilog [20] system description and verification language integrates dedicate... more The IEEE-1800 SystemVerilog [20] system description and verification language integrates dedicated verification features, like constraint random stimulus generation and functional coverage, which are the building blocks of the Universal Verification Methodology (UVM) [3], the emerging standard for electronic systems verification. In this article, we introduce our System Verification Methodology (SVM) as a SystemC library for advanced Transaction Level Modeling (TLM) testbench implementation. As such, we first present SystemC libraries for the support of verification features like functional coverage and constrained random stimulus generation. Thereafter, we introduce the SVM with advanced TLM support based on SystemC and compare it to UVM and related approaches. Finally, we demonstrate the application of our SVM by means of a testbench for a two wheel self-balancing electric vehicle.

2010 Proceedings IEEE INFOCOM, 2010
Dictionary-Based String Matching (DBSM) is used in network Deep Packet Inspection (DPI) applicati... more Dictionary-Based String Matching (DBSM) is used in network Deep Packet Inspection (DPI) applications virus scanning [1] and network intrusion detection [2]. We propose the Pipelined Affix Search with Tail Acceleration (PASTA) architecture for solving DBSM with guaranteed worst-case performance. Our PASTA architecture is composed of a Pipelined Affix Search Relay (PASR) followed by a Tail Acceleration Finite Automaton (TAFA). PASR consists of one or more pipelined Binary Search Tree (pBST) modules arranged in a linear array. TAFA is constructed with the Aho-Corasick goto and failure functions [3] in a compact multi-path and multi-stride tree structure. Both PASR and TAFA achieve good memory efficiency of 1.2 and 2 B/ch (bytes per character) respectively and are pipelined to achieve a high clock rate of 200 MHz on FPGAs. Because PASTA does not depend on the effectiveness of any hash function or the property of the input stream, its performance is guaranteed in the worst case. Our prototype implementation of PASTA on an FPGA with 10 Mb on-chip block RAM achieves 3.2 Gbps matching throughput against a dictionary of over 700K characters. This level of performance surpasses the requirements of next-generation security gateways for deep packet inspection.
Proceedings of the 50th Annual Design Automation Conference, 2013
Formal verification of SystemC is challenging. Before dealing with symbolic inputs and the concur... more Formal verification of SystemC is challenging. Before dealing with symbolic inputs and the concurrency semantics, a front-end is required to translate the design to a formal model. The lack of such front-ends has hampered the development of efficient back-ends so far. In this paper, we propose an isolated approach by using an Intermediate Verification Language (IVL). This enables a SystemC-to-IVL translator (frond-end) and an IVL verifier (back-end) to be developed independently. We present a compact but general IVL that together with an extensive benchmark set will facilitate future research. Furthermore, we propose an efficient symbolic simulator integrating Partial Order Reduction. Experimental comparison with existing approaches has shown its potential.

2011 21st International Conference on Field Programmable Logic and Applications, 2011
Most current SRAM-based high-speed Internet Protocol (IP) packet classification implementations u... more Most current SRAM-based high-speed Internet Protocol (IP) packet classification implementations use tree traversal and pipelining. However, these approaches result in inefficient memory utilization. Due to the limited amount of on-chip memory of the state-of-the-art Field Programmable Gate Arrays (FPGAs), existing designs cannot support large filter databases arising in backbone routers and intrusion detection systems. Hierarchical search structures for packet classification exhibit good memory performance and support quick rule update. However, pipelined hardware implementation of these algorithms suffer from inefficient resource and memory usage due to variation in the size of the trie nodes and backtracking. We propose a memory efficient organization denoted Clustered Hierarchical Search Structure (CHSS) for packet classification. We present a clustering algorithm that partitions a given filter database to reduce the memory requirement. We show that, using the resulting structure, backtracking is not needed to perform a search. We introduce two parameters (NRtrie, NRtree), which can be chosen based on the given filter database to achieve good memory efficiency. Our algorithm demonstrates substantial reduction in the memory footprint compared with the state-ofthe-art. For all publicly available filter databases, the achieved memory efficiency is between 21.54 and 41.25 bytes per rule. We map the proposed data structure onto a linear pipeline architecture to achieve high throughput. Post place and route result using a state-of-the-art FPGA device shows that the design can sustain a throughput of 408 million packets per second, or 130.5 Gbps (for the minimum packet size of 40 Bytes).

Compact trie forest: Scalable architecture for IP lookup on FPGAs
2012 International Conference on Reconfigurable Computing and FPGAs, 2012
ABSTRACT Memory efficiency with compact data structures for Internet Protocol (IP) lookup has rec... more ABSTRACT Memory efficiency with compact data structures for Internet Protocol (IP) lookup has recently regained much interest in the research community. In this paper, we revisit the classic trie-based approach for solving the longest prefix matching (LPM) problem used in IP lookup. Among all existing implementation platforms, Field Programmable Gate Array (FPGA) is a prevailing platform to implement SRAM-based pipelined architectures for high-speed IP lookup because of its abundant parallelism and other desirable features. However, due to the available on-chip memory and the number of I/O pins of FPGAs, state-of-the-art designs cannot support large routing tables consisting of over 350K prefixes in backbone routers. We propose a search algorithm and data structure denoted Compact Trie (CT) for IP lookup. Our algorithm demonstrates a substantial reduction in the memory footprint compared with the state-of-the-art solutions. A parallel architecture on FPGAs, named Compact Trie Forest (CTF), is introduced to support the data structure. Along with pipelining techniques, our optimized architecture also employs multiple memory banks in each stage to further reduce memory and resource redundancy. Implementation on a state-of-the-art FPGA device shows that the proposed architecture can support large routing tables consisting up to 703K IPv4 or 418K IPv6 prefixes. The post place-and-route result shows that our architecture can sustain a throughput of 420 million lookups per second (MLPS), or 135 Gbps for the minimum packet size of 40 Bytes. The result surpasses the worst-case 150 MLPS required by the standardized 100GbE line cards.

Hierarchical hybrid search structure for high performance packet classification
2012 Proceedings IEEE INFOCOM, 2012
ABSTRACT Hierarchical search structures for packet classification offer good memory performance a... more ABSTRACT Hierarchical search structures for packet classification offer good memory performance and support quick rule updates when implemented on multi-core network processors. However, pipelined hardware implementation of these algorithms has two disadvantages: (1) backtracking which requires stalling the pipeline and (2) inefficient memory usage due to variation in the size of the trie nodes. We propose a clustering algorithm that can partition a given rule database into a fixed number of clusters to eliminate back-tracking in the state-of-the-art hierarchical search structures. Furthermore, we develop a novel ternary trie data structure (T∈). In T∈ structure, the size of the trie nodes is fixed by utilizing ∈-branch property, which overcomes the memory inefficiency problems in the pipelined hardware implementation of hierarchical search structures. We design a two-stage hierarchical search structure consisting of binary search trees in Stage 1, and T∈ structures in Stage 2. Our approach demonstrates a substantial reduction in the memory footprint compared with that of the state-of-the-art. For all publicly available databases, the achieved memory efficiency is between 10.37 and 22.81 bytes of memory per rule. State-of-the-art designs can only achieve the memory efficiency of over 23 byte/rule in the best case. We also propose a SRAM-based linear pipelined architecture for packet classification that achieves high throughput. Using a state-of-the-art FPGA, the proposed design can sustain a 418 million packets per second throughput or 134 Gbps (for the minimum packet size of 40 Bytes). Additionally, our design maintains packet input order and supports in-place non-blocking rule updates.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012
Due to the steadily increasing complexity, the design of embedded systems faces serious challenge... more Due to the steadily increasing complexity, the design of embedded systems faces serious challenges. To meet these challenges additional abstraction levels have been added to the conventional design flow resulting in Electronic System Level (ESL) design. Besides abstraction, the focus in ESL during the development of a system moves from design to verification, i.e. checking whether or not the system works as intended becomes more and more important. However, at each abstraction level only the validity of certain properties is checked. Completeness, i.e. checking whether or not the entire behavior of the design has been verified, is usually not continuously checked. As a result, bugs may be found very late causing expensive iterations across several abstraction levels. This delays the finalization of the embedded system significantly. In this work, we present the concept of Completeness-Driven Development (CDD). Based on suitable completeness measures, CDD ensures that the next step in the design process can only be entered if completeness at the current abstraction level has been achieved. This leads to an early detection of bugs and accelerates the whole design process. The application of CDD is illustrated by means of an example.

2012 International Symposium on System on Chip (SoC), 2012
A huge effort is necessary to design and verify complex systems like System-on-Chip. Abstraction-... more A huge effort is necessary to design and verify complex systems like System-on-Chip. Abstraction-based methodologies have been developed resulting in Electronic System Level (ESL) design. A prominent language for ESL design is SystemC offering different levels of abstraction, interoperability and the creation of very fast models for early software development. For the verification of SystemC models, Constrained Random Verification (CRV) plays a major role. CRV allows to automatically generate simulation scenarios under the control of a set of constraints. Thereby, the generated stimuli are much more likely to hit corner cases. However, the existing SystemC Verification library (SCV), which provides CRV for SystemC models, has several deficiencies limiting the advantages of CRV. In this paper we present CRAVE, an advanced constrained random verification environment for SystemC. New dynamic features, enhanced usability and efficient constraint-solving reduce the user effort and thus improve the verification productivity.
Uploads
Papers by Hoàng Châu Vũ Lê