Papers by Mahantesh Halappanavar

Large-scale L1-regularized loss minimization problems arise in high-dimensional applications such... more Large-scale L1-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. High-performance algorithms and implementations are critical to efficiently solving these problems. Building upon previous work on coordinate descent algorithms for L1-regularized problems, we introduce a novel family of algorithms called block-greedy coordinate descent that includes, as special cases, several existing algorithms such as SCD, Greedy CD, Shotgun, and Thread-Greedy. We give a unified convergence analysis for the family of block-greedy algorithms. The analysis suggests that block-greedy coordinate descent can better exploit parallelism if features are clustered so that the maximum inner product between features in different blocks is small. Our theoretical convergence analysis is supported with experimental re- sults using data from diverse real-world applications. We hope that algorithmic approaches and convergence analysis we provide will not only advance the field, but will also encourage researchers to systematically explore the design space of algorithms for solving large-scale L1-regularization problems.
2015 IEEE International Symposium on Technologies for Homeland Security (HST), 2015
Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science - WORKS '15, 2015

Parallel Computing, 2014
ABSTRACT We study multithreaded push-relabel based algorithms for computing maximum cardinality m... more ABSTRACT We study multithreaded push-relabel based algorithms for computing maximum cardinality matching in bipartite graphs. Matching is a fundamental combinatorial problem with applications in a wide variety of problems in science and engineering. We are motivated by its use in the context of sparse linear solvers for computing the maximum transversal of a matrix. Other applications can be found in many fields such as bioinformatics [4], scheduling [27], and chemical structure analysis [14]. We implement and test our algorithms on several multi-socket multicore systems and compare their performance to state-of-the-art augmenting path-based serial and parallel algorithms using a testset comprised of a wide range of real-world instances. Building on several heuristics for enhancing performance, we demonstrate good scaling for the parallel push-relabel algorithm. We show that it is comparable to the best augmenting path-based algorithms for bipartite matching. To the best of our knowledge, this is the first extensive study of multithreaded push-relabel based algorithms. In addition to a direct impact on the applications using matching, the proposed algorithmic techniques can be extended to preflow-push based algorithms for computing maximum flow in graphs.

2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
ABSTRACT Matching is an important combinatorial problem with a number of applications in areas su... more ABSTRACT Matching is an important combinatorial problem with a number of applications in areas such as community detection, sparse linear algebra, and network alignment. Since computing optimal matchings can be very time consuming, several fast approximation algorithms, both sequential and parallel, have been suggested. Common to the algorithms giving the best solutions is that they tend to be sequential by nature, while algorithms more suitable for parallel computation give solutions of lower quality. We present a new simple 1/2-approximation algorithm for the weighted matching problem. This algorithm is both faster than any other suggested sequential 1/2-approximation algorithm on almost all inputs and when parallelized also scales better than previous multithreaded algorithms. We further extend this to a general scalable multithreaded algorithm that computes matchings of weight comparable with the best sequential deterministic algorithms. The performance of the suggested algorithms is documented through extensive experiments on different multithreaded architectures.
Partitioned global address space (PGAS) languages combine the convenient abstraction of shared me... more Partitioned global address space (PGAS) languages combine the convenient abstraction of shared memory with the notion of affinity, extending multi-threaded programming to large-scale systems with physically distributed memory. However, in spite of their obvious advantages, PGAS languages still lack appropriate tool support for performance analysis, one of the reasons why their adoption is still in its infancy. Some of the performance problems, for which tool support is needed, occur at the level of the ...
Proceedings of the 12th ACM International Conference on Computing Frontiers - CF '15, 2015
2014 21st International Conference on High Performance Computing (HiPC), 2014

2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
ABSTRACT Matching is an important combinatorial problem with a number of applications in areas su... more ABSTRACT Matching is an important combinatorial problem with a number of applications in areas such as community detection, sparse linear algebra, and network alignment. Since computing optimal matchings can be very time consuming, several fast approximation algorithms, both sequential and parallel, have been suggested. Common to the algorithms giving the best solutions is that they tend to be sequential by nature, while algorithms more suitable for parallel computation give solutions of lower quality. We present a new simple 1/2-approximation algorithm for the weighted matching problem. This algorithm is both faster than any other suggested sequential 1/2-approximation algorithm on almost all inputs and when parallelized also scales better than previous multithreaded algorithms. We further extend this to a general scalable multithreaded algorithm that computes matchings of weight comparable with the best sequential deterministic algorithms. The performance of the suggested algorithms is documented through extensive experiments on different multithreaded architectures.

2012 41st International Conference on Parallel Processing, 2012
ABSTRACT Chordal graphs are triangulated graphs where any cycle larger than three is bisected by ... more ABSTRACT Chordal graphs are triangulated graphs where any cycle larger than three is bisected by a chord. Many combinatorial optimization problems such as computing the size of the maximum clique and the chromatic number are NP-hard on general graphs but have polynomial time solutions on chordal graphs. In this paper, we present a novel multithreaded algorithm to extract a maximal chordal sub graph from a general graph. We develop an iterative approach where each thread can asynchronously update a subset of edges that are dynamically assigned to it per iteration and implement our algorithm on two different multithreaded architectures - Cray XMT, a massively multithreaded platform, and AMD Magny-Cours, a shared memory multicore platform. In addition to the proof of correctness, we present the performance of our algorithm using a test set of synthetical graphs with up to half-a-billion edges and real world networks from gene correlation studies and demonstrate that our algorithm achieves high scalability for all inputs on both types of architectures.
2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012
Journal of Parallel and Distributed Computing, 2014
A graph is chordal if every cycle of length greater than three contains an edge between non-adjac... more A graph is chordal if every cycle of length greater than three contains an edge between non-adjacent vertices. Chordal graphs are of interest both theoretically, since they admit polynomial time solutions to a range of NP-hard graph problems, and practically, since they arise in many applications including sparse linear algebra, computer vision, and computational biology. A maximal chordal subgraph is a chordal subgraph that is not a proper subgraph of any other chordal subgraph. Existing algorithms for computing maximal chordal subgraphs depend on dynamically ordering the vertices, which is an inherently sequential process and therefore limits the algorithms' parallelizability.
ISCA International Conference on Parallel and Distributed Computing Systems, 2008
2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Proceedings of the 3rd International Workshop on High Performance Computing, Networking and Analytics for the Power Grid - HiPCNA-PG '13, 2013
ABSTRACT Clustering is an important data analysis technique with numerous applications in the ana... more ABSTRACT Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques on two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.

Large-scale 1 -regularized loss minimization problems arise in high-dimensional applications such... more Large-scale 1 -regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. High-performance algorithms and implementations are critical to efficiently solving these problems. Building upon previous work on coordinate descent algorithms for 1 -regularized problems, we introduce a novel family of algorithms called block-greedy coordinate descent that includes, as special cases, several existing algorithms such as SCD, Greedy CD, Shotgun, and Thread-Greedy. We give a unified convergence analysis for the family of block-greedy algorithms. The analysis suggests that block-greedy coordinate descent can better exploit parallelism if features are clustered so that the maximum inner product between features in different blocks is small. Our theoretical convergence analysis is supported with experimental results using data from diverse real-world applications. We hope that algorithmic approaches and convergence analysis we provide will not only advance the field, but will also encourage researchers to systematically explore the design space of algorithms for solving large-scale 1 -regularization problems.
Uploads
Papers by Mahantesh Halappanavar