Papers by Sridhar Radhakrishnan
2012 International Conference on Computing, Networking and Communications (ICNC), 2012
Digital television systems have a clear disadvantage relative to analog systems in users' quality... more Digital television systems have a clear disadvantage relative to analog systems in users' quality of experience, most notably in the time required to change channels, or zap time. The goal of this research is to improve the performance of a multicasting IPTV network, both in user experience and in resource consumption. We formulate the problem of assigning IPTV clients to servers as an integer programming model, in variants which minimize channel-change times, overall network capacity consumption, or both. This problem is shown to be computationally hard, and the performance of the models is tested on problems of different sizes. Polynomial-time heuristics are presented which address a relaxed version of the problem, and the performance of these heuristics is measured.

Studying properties of graphs is essential to various applications, and recent growth of online s... more Studying properties of graphs is essential to various applications, and recent growth of online social networks has spurred interests in analyzing their structures using Graphical Processing Units (GPUs). Utilizing the faster available shared memory on GPUs have provided tremendous speed-up for solving many general-purpose problems. However, when data required for processing is large and needs to be stored in the global memory instead of the shared memory, simultaneous memory accesses by threads in execution becomes the bottleneck for achieving higher throughput. In this paper, for storing large graphs, we propose and evaluate techniques to efficiently utilize the different levels of the memory hierarchy of GPUs, with the focus being on the larger global memory. Given a graph G = (V , E), we provide an algorithm to count the number of triangles in G, while storing the adjacency information on the global memory. Our computation techniques and data structure for retrieving the adjacency information is derived from processing the breadth-first-search tree of the input graph. Also, techniques to generate combinations of nodes for testing the properties of graphs induced by the same are discussed in detail. Our methods can be extended to solve other combinatorial counting problems on graphs, such as finding the number of connected subgraphs of size k, number of cliques (resp. independent sets) of size k, and related problems for large data sets. In the context of the triangle counting algorithm, we analyze and utilize primitives such as memory access coalescing and avoiding partition camping that offset the increase in access latency of using a slower but larger global memory. Our experimental results for the GPU implementation show at least 10 times speedup for triangle counting over the CPU counterpart. Another 6-8 % increase in performance is obtained by utilizing the above mentioned primitives as compared to the naïve implementation of the program on the GPU.
... and Delay Variation Constraints Yuh-Rong Chen, Sridhar Radhakrishnan, Sudarshan K. Dhall Scho... more ... and Delay Variation Constraints Yuh-Rong Chen, Sridhar Radhakrishnan, Sudarshan K. Dhall School of Computer Science University of Oklahoma {yrchern, sridhar, sdhall}@ou.edu Suleyman Karabuk School of Industrial Engineering University of Oklahoma [email protected] ...
Auerbach Publications eBooks, Mar 28, 2003
... 13. 13. Chandran, K. et al., A feedback based scheme for improving TCP performance in ad hoc ... more ... 13. 13. Chandran, K. et al., A feedback based scheme for improving TCP performance in ad hoc networks, IEEE Personal Communication Systems Magazine, Special issue on Ad Hoc Networks, 8 (1), 34-39, 2001. 14. 14. Ahuja ...

The availability and utility of large numbers of Graphical Processing Units (GPUs) have enabled p... more The availability and utility of large numbers of Graphical Processing Units (GPUs) have enabled parallel computations using extensive multi-threading. Sequential access to global memory and contention at the size-limited shared memory have been main impediments to fully exploiting potential performance in architectures having a massive number of GPUs. We propose novel memory storage and retrieval techniques that enable parallel graph computations to overcome the above issues. More specifically, given a graph G = (V, E) and an integer k <= |V |, we provide both storage techniques and algorithms to count the number of: a) connected subgraphs of size k; b) k cliques; and c) k independent sets, all of which can be exponential in number. Our storage technique is based on creating a breadth-first search tree and storing it along with non-tree edges in a novel way. The counting problems mentioned above have many uses, including the analysis of social networks.
Springer eBooks, 2014
Using Graphics Processing Units (GPUs) to solve general purpose problems has received significant... more Using Graphics Processing Units (GPUs) to solve general purpose problems has received significant attention both in academia and industry. Harnessing the power of these devices however requires knowledge of the underlying architecture and the programming model. In this paper, we develop analytical models to predict the performance of GPUs for computationally intensive tasks. Our models are based on varying the relevant parameters-including total number of threads, number of blocks, and number of streaming multi-processors-and predicting the performance of a program for a specified instance of these parameters. The approach can be used in the context of heterogeneous environments where distinct types of GPU devices with different hardware configurations are employed.
Improving latency is the key to a successful online game-playing experience. With the use of mult... more Improving latency is the key to a successful online game-playing experience. With the use of multiple servers along with a well-provisioned network it is possible to reduce the latency. Given a network of servers, game clients, and a desired delay bound, we have designed algorithms to determine the subnetwork of servers whose cardinality is minimal. We have considered the cases

Computer Networks, Oct 1, 2013
Multicasting is an efficient way to deliver multimedia content (streaming, for instance) to diffe... more Multicasting is an efficient way to deliver multimedia content (streaming, for instance) to different locations in the network. While end-to-end real-time constraints are important for interactive applications, sustained availability of bandwidth is more important to the destinations for multimedia streaming. In this research, we address the problem of multi-stream multi-source multicast routing problem (MMMRP) where each data stream could have multiple sources that will serve it and each source can serve multiple data streams in a sustained manner. The goal of MMMRP is to construct a routing forest for each of the data streams and the destinations while maximizing the residual bandwidth. The residual bandwidth is the available bandwidth after all destinations have been served with their desired streams. Our problem is shown to be NP-hard and we provide an Integer Programming formulation together with an efficient heuristic algorithm (MMForests) based on widest-path algorithm. Our empirical evaluations show that our algorithm MMForests can construct the multicast routing trees both quickly and keeping the residual bandwidth close to the optimal.

The efficiency of a multi-core architecture is directly related to the mechanisms that map the th... more The efficiency of a multi-core architecture is directly related to the mechanisms that map the threads (processes in execution) to the cores. Determining the CPU resource availability of a multi-core architecture based on the characteristics of the threads that are in execution is the art of system performance prediction. Prediction of CPU resource availability is important in the context of making process assignment, load balancing, and scheduling decisions. In distributed infrastructure, CPU resources are allocated on demand for a chosen set of compute nodes. In this paper, a prediction model is derived for multi-core architectures and empirical evaluations are performed with real-world benchmark programs in a heterogeneous environment to demonstrate the accuracy of the proposed model. This model can be utilized in various time-sensitive applications like resource allocation in a cloud environment, task distribution (determining the order for faster processing time) in distributed systems, and others.

2013 International Conference on Computing, Networking and Communications (ICNC), 2013
Broadcast television viewing over the Internet (IPTV) is becoming commonplace. Multicasting trees... more Broadcast television viewing over the Internet (IPTV) is becoming commonplace. Multicasting trees serve as an efficient mechanism to deliver streaming data as each internal node duplicates the packets it receives and sends it along to its children which eventually delivers them to the clients. Given a set of multicasting trees whose roots are servers capable of broadcasting a set of distinct channels, and a set of clients (which are not part of the multicasting trees) each with a set of requested channels, our goal is to determine for each client for each of its channel request, a node (contact node) in the appropriate multicast tree (that serves the channel). The contact nodes are determined in such a way that certain optimization constraints are taken into consideration and satisfied. We have provided Integer Programming (IP) models and heuristics to find these contact nodes in order to optimize constraints on zap time and bandwidth utilization. The proposed IP model is novel and the polynomial-time heuristic provide a fairly good solution in a short amount of time.
Auerbach Publications eBooks, Mar 28, 2003

Computers & Operations Research, May 1, 2015
Consider a private network of geographically dispersed computers with fast and high capacity conn... more Consider a private network of geographically dispersed computers with fast and high capacity connections, and an Internet application session, such as a massive multiplayer online game, with a server and a set of clients. We refer to the former as a service overlay network (SON), and assume that it could be connected to the Internet. The problem is to decide how to configure and utilize the SON in support of this application, such that the clients' speed of communication with the server is within given communication performance requirements. We provide an Integer Programming formulation of this problem, and prove that it is NPÀHard. In an attempt to solve the problem within strict computational time requirements of actual applications, we develop a solution framework based on partitioning and enumerating the solution space into smaller subproblems, one or more of which contains an optimal solution. In this framework, we develop and test an optimal seeking exact, and a fast polynomial time heuristic algorithm with success. The exact algorithm sets optimally solvable sizes of the subject problem, whereas the heuristic algorithm sets the size of solvable instances in a real application.

International Journal of Computational Science and Engineering, 2017
Techniques for predicting the efficiency of multi-core processing associated with a set of tasks ... more Techniques for predicting the efficiency of multi-core processing associated with a set of tasks with varied CPU and main memory requirements are introduced. Prediction of CPU and memory availability is important in the context of making process assignment, load balancing, and scheduling decisions in distributed systems. Given a set of tasks each with varied CPU and main memory requirements, and a multi-core system (which generally has fewer cores than the number of tasks), we provide upper-and lower-bound models (formulas) for the efficiency with which the tasks are executed. In addition, a model for average CPU availability is introduced from the empirical study for applications that require a single predicted value instead of bounds. To facilitate scientific and controlled empirical evaluation, real-world benchmark programs with dynamic behaviour (CPU and memory requirements change in a short interval of time) are employed on UNIX systems that are parameterised by their CPU usage factor and memory requirement.

Computer Communications, 2003
Routing in the Internet is based on the best-effort mechanism, wherein the routers generally forw... more Routing in the Internet is based on the best-effort mechanism, wherein the routers generally forward packets to minimize the number of hops to the destination. Furthermore, all packets of a type are treated the same independent of their size. We propose the framework of NetLets to enable the applications to send data packets to the destination with certain guarantees on the end-to-end delay. NetLets employ in situ instruments to measure the effective bandwidth and propagation delays on the links, and compute the paths with minimum measured endto-end delay for data packets of various sizes. Based on experiments over local area networks, the paths selected by NetLets indeed achieve the minimum end-to-end delay, and our method outperformed the best-effort mechanism based on the hop count. We also describe an implementation of NetLets over the Internet to illustrate their viability for wide-area networks.

Theoretical Computer Science
The problem of finding a rectilinear minimum bend path (RMBP) between two designated points insid... more The problem of finding a rectilinear minimum bend path (RMBP) between two designated points inside a rectilinear polygon has applications in robotics and motion planning. In this paper, we present efficient algorithms to solve the query version of the RMBP problem for special classes of rectilinear polygons given their oisibility graphs. Specifically, we show that given an unweighted graph G = (V, E), with 1 VI = N and 1 E I= M, algorithms to preprocess G in linear space and time such that the shortest distance queries-queries asking for the distance between any pair of nodes in the graph-can be answered in constant time and space are presented in this paper. For the case of a chordal graph G, our algorithms give a distance which is at most one away from the actual shortest distance. When G is a K-chordal graph, our algorithm produces an exact shortest distance in O(K) time. We also present a non-trivial parallel implementation of the sequential preprocessing algorithm for the CREW-PRAM mode1 which runs in O(logz N) time using O(N + M) processors. After the preprocessing, we can answer the queries in constant time using a single processor.

Ad Hoc Networks, 2016
Mobile RFID tag reading on conveyor belt represents a practical scenario used widely in the suppl... more Mobile RFID tag reading on conveyor belt represents a practical scenario used widely in the supply chain industry. Typically, in RFID tag reading MAC protocols, the time the protocol takes to complete the reading of all the tags is directly proportional to the number of tags. As a natural scalability problem, when the number of mobile tags is large, the protocol that is initiated and coordinated by a reader will not be able to read all the tags, as the tags move away from the reader's range. In this paper, we tackle this problem, by showing how a large number of tags moving on a conveyor belt can be read using a tandem of communicating readers placed along the axis of the conveyor belt. The tags that are unread by a reader could be read by the reader next in the sequence. Rather than restarting the protocol at the next reader in the sequence to read the unread tags, it can use the information from the previous reader (we call it information sharing) to improve protocol performance and hence reduce the reading time. Thanks to the advanced RFID technologies offering enhanced tag persistent times. The tags can now preserve their states in 'power-off' conditions for significantly longer distances in the conveyor-belt.We show that this information sharing significantly enhances the tag reading performance (in terms of number of tag reads) as compared with the traditional tag reading protocols. In our experiments, we consider the tandem reader arrangement with information sharing for ALOHA, Tree, and two different combinations of ALOHA and Tree (a.k.a. hybrid) protocols. Our performance evaluation study corroborated with extensive simulation results show that the aforementioned protocols augmented with our novel information sharing frameworks outperform their respective primitive 'as-is' version counterparts.
Page 1. Optimal Control of Treatment Costs for Internet Worm Jonghyun Kim, Sridhar Radhakrishnan,... more Page 1. Optimal Control of Treatment Costs for Internet Worm Jonghyun Kim, Sridhar Radhakrishnan, Sudarshan K. Dhall School of Computer Science, University of Oklahoma, Norman, Oklahoma, USA ... For worm propagation model, we apply the classical SIS model ...
Uploads
Papers by Sridhar Radhakrishnan