Papers by karthik bharath

arXiv (Cornell University), Jun 29, 2021
Persistence landscapes are functional summaries of persistence diagrams designed to enable analys... more Persistence landscapes are functional summaries of persistence diagrams designed to enable analysis of the diagrams using tools from functional data analysis. They comprise a collection of scalar functions such that birth and death times of topological features in persistence diagrams map to extrema of functions and intervals where they are non-zero. As a consequence, variation in persistence diagrams is encoded in both amplitude and phase components of persistence landscapes. Through functional data analysis of persistence landscapes, under an elastic Riemannian metric, we show how meaningful statistical summaries of persistence landscapes (e.g., mean, dominant directions of variation) can be obtained by decoupling their amplitude and phase variations. This decoupling is achieved via optimal alignment, with respect to the elastic metric, of the persistence landscapes. The estimated phase functions are tied to the resolution parameter that determines the filtration of simplicial complexes used to construct persistence diagrams. For a dataset obtained under geometric, scale and sampling variabilities, the phase function prescribes an optimal rate of increase of the resolution parameter for enhancing the topological signal in a persistence diagram. The proposed approach adds substantially to the statistical analysis of data objects with rich structure compared to past studies. In particular, we focus on two sets of data that have been analyzed in the past, brain artery trees and images of prostate cancer cells, and show that separation of amplitude and phase of persistence landscapes is beneficial in both settings.
Variograms for kriging and clustering of spatial functional data with phase variation
Spatial Statistics

Word embeddings are commonly obtained as optimizers of a criterion function f of a text corpus, b... more Word embeddings are commonly obtained as optimizers of a criterion function f of a text corpus, but assessed on word-task performance using a different evaluation function g of the test data. We contend that a possible source of disparity in performance on tasks is the incompatibility between classes of transformations that leave f and g invariant. In particular, word embeddings defined by f are not unique; they are defined only up to a class of transformations to which f is invariant, and this class is larger than the class to which g is invariant. One implication of this is that the apparent superiority of one word embedding over another, as measured by word task performance, may largely be a consequence of the arbitrary elements selected from the respective solution sets. We provide a formal treatment of the above identifiability issue, present some numerical examples, and discuss possible resolutions.

Accurate identification of synergistic treatment combinations and their underlying biological mec... more Accurate identification of synergistic treatment combinations and their underlying biological mechnisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as patient-derived xenografts (PDX) have emerged as a unique study design evaluating multiple treatments administered to samples from the same human tumor implanted into genetically identical mice. In this paper, we propose a novel Bayesian probabilistic tree-based framework for PDX data to investigate the hierarchical relationships between treatments by inferring treatment cluster trees, referred to as treatment trees (Rx-tree). The framework motivates a new metric of mechanistic similarity between two or more treatments accounting for inherent uncertainty in tree estimation; treatments with a high estimated similarity have potentially high mechanistic synergy. Building upon Dirichlet Diffusion Trees, we derive a closed-form marginal likelihood encoding the tre...
A histogram estimate of the Radon-Nikodym derivative of a probability measure with respect to a d... more A histogram estimate of the Radon-Nikodym derivative of a probability measure with respect to a dominating measure is developed for binary sequences in {0, 1} N. A necessary and sufficient condition for the consistency of the estimate in the mean-square sense is given. It is noted that the product topology on {0, 1} N and the corresponding dominating product measure pose considerable restrictions on the rate of sampling required for the requisite convergence.

Biocomputing 2018, 2017
Solid lesions emerge within diverse tissue environments making their characterization and diagnos... more Solid lesions emerge within diverse tissue environments making their characterization and diagnosis a challenge. With the advent of cancer radiomics, a variety of techniques have been developed to transform images into quantifiable feature sets producing summary statistics that describe the morphology and texture of solid masses. Relying on empirical distribution summaries as well as grey-level co-occurrence statistics, several approaches have been devised to characterize tissue density heterogeneity. This article proposes a novel decision-tree based approach which quantifies the tissue density heterogeneity of a given lesion through its resultant distribution of tree-structured dissimilarity metrics computed with least common ancestor trees under repeated pixel re-sampling. The methodology, based on statistics derived from Galton-Watson trees, produces metrics that are minimally correlated with existing features, adding new information to the feature space and improving quantitative characterization of the extent to which a CT image conveys heterogeneous density distribution. We demonstrate its practical application through a diagnostic study of adrenal lesions. Integrating the proposed with existing features identifies classifiers of three important lesion types; malignant from benign (AUC = 0.78), functioning from non-functioning (AUC = 0.93) and calcified from non-calcified (AUC of 1).

arXiv: Statistics Theory, 2017
We propose a flexible sampling method for warp maps used in continuous monotone pairwise alignmen... more We propose a flexible sampling method for warp maps used in continuous monotone pairwise alignment of open and closed curves, possibly with landmark constraints. Using the point process machinery, we conduct a detailed study of the sampling method and demonstrate that it prescribes a distribution on the set of warp maps of $[0,1]$ and the unit length circle $\mathbb{S}^1$. The distribution (1) possesses the desiderata for decomposition of the alignment problem with landmark constraints into multiple unconstrained ones, and (2) can be centered at a desired warp map. It is based on random partitions of $[0,1]$ and $\mathbb{S}^1$ and contains a global regularization parameter, both of which enable the sampling of a rich class of warp maps. The distribution can be related to the Dirichlet process on the set of probability measures. Practical utility of the sampling method is demonstrated through (1) a novel stochastic variational algorithm, and (2) a Bayesian model for alignment, for cl...
Physics Letters A, 2021
We propose a method to define quasiprobability distributions for general spin-j systems of dimens... more We propose a method to define quasiprobability distributions for general spin-j systems of dimension n = 2j + 1, where n is a prime or power of prime. The method is based on a complete set of orthonormal commuting operators related to Mutually Unbiased Bases which enable (i) a parameterisation of the density matrix and (ii) construction of measurement operators that can be physically realised. As a result we geometrically characterise the set of states for which the quasiprobability distribution is non-negative, and can be viewed as a joint distribution of classical random variables assuming values in a finite set of outcomes. The set is an (n 2 − 1)-dimensional convex polytope with n + 1 vertices as the only pure states, n n+1 number of higher dimensional faces, and n 3 (n + 1)/2 edges.

Handbook of Variational Methods for Nonlinear Geometric Data, 2020
In this chapter, we describe several biomedical applications of geometric functional data analysi... more In this chapter, we describe several biomedical applications of geometric functional data analysis methods for modeling probability density functions, amplitude and phase components in functional data, and shapes of curves and surfaces. We begin by reviewing parameterization-invariant Riemannian metrics and corresponding simplifying square-root transforms for each case. These tools allow for computationally efficient implementations of statistical procedures on the appropriate representation spaces, including computation of the Karcher mean and exploration of variability via principal component analysis. We then showcase applications of these tools in multiple biomedical case studies based on various datasets including Glioblastoma Multiforme tumors, Diffusion Tensor Magnetic Resonance Image-based white matter tracts and fractional anisotropy functions, electrocardiogram signals, endometrial tissue surfaces and subcortical surfaces in the brain.

WIREs Computational Statistics, 2020
Proliferation of high‐resolution imaging data in recent years has led to substantial improvements... more Proliferation of high‐resolution imaging data in recent years has led to substantial improvements in the two popular approaches for analyzing shapes of data objects based on landmarks and/or continuous curves. We provide an expository account of elastic shape analysis of parametric planar curves representing shapes of two‐dimensional (2D) objects by discussing its differences, and its commonalities, to the landmark‐based approach. Particular attention is accorded to the role of reparameterization of a curve, which in addition to rotation, scaling and translation, represents an important shape‐preserving transformation of a curve. The transition to the curve‐based approach moves the mathematical setting of shape analysis from finite‐dimensional non‐Euclidean spaces to infinite‐dimensional ones. We discuss some of the challenges associated with the infinite‐dimensionality of the shape space, and illustrate the use of geometry‐based methods in the computation of intrinsic statistical s...

Journal of the American Statistical Association, 2019
We propose a novel Riemannian geometric framework for variational inference in Bayesian models ba... more We propose a novel Riemannian geometric framework for variational inference in Bayesian models based on the nonparametric Fisher-Rao metric on the manifold of probability density functions. Under the square-root density representation, the manifold can be identified with the positive orthant of the unit hypersphere S ∞ in L 2 , and the Fisher-Rao metric reduces to the standard L 2 metric. Exploiting such a Riemannian structure, we formulate the task of approximating the posterior distribution as a variational problem on the hypersphere based on the α-divergence. This provides a tighter lower bound on the marginal distribution when compared to, and a corresponding upper bound unavailable with, approaches based on the Kullback-Leibler divergence. We propose a novel gradient-based algorithm for the variational problem based on Fréchet derivative operators motivated by the geometry of S ∞ , and examine its properties. Through simulations and real data applications, we demonstrate the utility of the proposed geometric framework and algorithm on several Bayesian models.

Journal of the Royal Statistical Society. Series C, Applied statistics, 2018
We propose a curve-based Riemannian geometric approach for general shape-based statistical analys... more We propose a curve-based Riemannian geometric approach for general shape-based statistical analyses of tumours obtained from radiologic images. A key component of the framework is a suitable metric that enables comparisons of tumour shapes, provides tools for computing descriptive statistics and implementing principal component analysis on the space of tumour shapes and allows for a rich class of continuous deformations of a tumour shape. The utility of the framework is illustrated through specific statistical tasks on a data set of radiologic images of patients diagnosed with , a malignant brain tumour with poor prognosis. In particular, our analysis discovers two patient clusters with very different survival, subtype and genomic characteristics. Furthermore, it is demonstrated that adding tumour shape information to survival models containing clinical and genomic variables results in a significant increase in predictive power.

Physical Review A, 2017
Study of an N qubit mixed symmetric separable states is a long standing challenging problem as th... more Study of an N qubit mixed symmetric separable states is a long standing challenging problem as there exist no unique separability criterion. In this regard, we take up the N-qubit mixed symmetric separable states for a detailed study as these states are of experimental importance and offer elegant mathematical analysis since the dimension of the Hilbert space reduces from 2 N to N + 1. Since there exists a one to one correspondence between spin-j system and an N-qubit symmetric state, we employ Fano statistical tensor parameters for the parametrization of spin density matrix. Further, we use geometric multiaxial representation(MAR) of density matrix to characterize the mixed symmetric separable states. Since separability problem is NP hard, we choose to study it in the continuum limit where mixed symmetric separable states are characterized by the P-distribution function λ(θ, φ). We show that the N-qubit mixed symmetric separable state can be visualized as a uniaxial system if the distribution function is independent of θ and φ. We further choose distribution function to be the most general positive function on a sphere and observe that the statistical tensor parameters characterizing the N-qubit symmetric system are the expansion coefficients of the distribution function. As an example for the discrete case, we investigate the MAR of a uniformly weighted two qubit mixed symmetric separable state. We also observe that there exists a correspondence between separability and classicality of states.
Statistics & Probability Letters, 2011
The problem of distinguishing a Brownian bridge from a Brownian motion, both with possible drift,... more The problem of distinguishing a Brownian bridge from a Brownian motion, both with possible drift, on the closed unit interval, is investigated via a pair of hypothesis tests. The first, tests for observations obtained at n discrete time points to be arising from a Brownian bridge with drift by embedding the Brownian bridge into a mixture of Polya trees which represents the non-parametric alternative. The second test, tests in an identical manner, for the observations to be coming from a Brownian motion with drift. The Bayes factors for the two tests are derived and then combined to obtain the Bayes factor for the test to distinguish between the two Gaussian processes. The Tierney-Kadane approximation of the Bayes factor is derived with an error approximation of order O(n −4).
Statistica Sinica, 2015
We investigate the utility in employing asymptotic results related to a clustering criterion to t... more We investigate the utility in employing asymptotic results related to a clustering criterion to the problem of testing for the presence of jumps in financial models. We consider the Jump Diffusion model for option pricing and demonstrate how the testing problem can be reduced to the problem of testing for the presence of clusters in the increments data. The overarching premise behind the proposed approach is in the isolation of the increments with considerably larger mean pertaining to the jumps from the ones which arise from the diffusion component. Empirical verification is provided via simulations and the test is applied to financial datasets.
We develop a clustering framework for observations from a population with a smooth probability di... more We develop a clustering framework for observations from a population with a smooth probability distribution function and derive its asymptotic properties. A clustering criterion based on a linear combination of order statistics is proposed. The asymptotic behavior of the point at which the observations are split into two clusters is examined. The results obtained can then be utilized to construct an interval estimate of the point which splits the data and develop tests for bimodality and presence of clusters.
arXiv (Cornell University), Mar 25, 2020
We detail an approach to developing Stein's method for bounding integral metrics on probability m... more We detail an approach to developing Stein's method for bounding integral metrics on probability measures defined on a Riemannian manifold M. Our approach exploits the relationship between the generator of a diffusion on M having a target invariant measure and its characterising Stein operator. We consider a pair of such diffusions with different starting points, and through analysis of the distance process between the pair, derive Stein factors, which bound the solution to the Stein equation and its derivatives. The Stein factors contain curvature-dependent terms and reduce to those currently available for R m , and moreover imply that the bounds for R m remain valid when M is a flat manifold.

arXiv (Cornell University), Nov 3, 2021
In this work we consider the problem of releasing a differentially private statistical summary th... more In this work we consider the problem of releasing a differentially private statistical summary that resides on a Riemannian manifold. We present an extension of the Laplace or K-norm mechanism that utilizes intrinsic distances and volumes on the manifold. We also consider in detail the specific case where the summary is the Fréchet mean of data residing on a manifold. We demonstrate that our mechanism is rate optimal and depends only on the dimension of the manifold, not on the dimension of any ambient space, while also showing how ignoring the manifold structure can decrease the utility of the sanitized summary. We illustrate our framework in two examples of particular interest in statistics: the space of symmetric positive definite matrices, which is used for covariance matrices, and the sphere, which can be used as a space for modeling discrete distributions.
Asymptotics of Clustering for Smooth Distributions
Measure of polymer performance based on correlated physical parameters
Journal of Applied Polymer Science, 2021
Uploads
Papers by karthik bharath