Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012
…
15 pages
1 file
We develop a clustering framework for observations from a population with a smooth probability distribution function and derive its asymptotic properties. A clustering criterion based on a linear combination of order statistics is proposed. The asymptotic behavior of the point at which the observations are split into two clusters is examined. The results obtained can then be utilized to construct an interval estimate of the point which splits the data and develop tests for bimodality and presence of clusters.
Electronic Journal of Statistics, 2013
We develop a clustering framework for observations from a population with a smooth probability distribution function and derive its asymptotic properties. A clustering criterion based on a linear combination of order statistics is proposed. The asymptotic behavior of the point at which the observations are split into two clusters is examined. The results obtained can then be utilized to construct an interval estimate of the point which splits the data and develop tests for bimodality and presence of clusters.
Electronic Journal of Statistics, 2015
Les documents de travail ne reflètent pas la position du CREST et n'engagent que leurs auteurs. Working papers do not reflect the position of CREST but only the views of the authors.
Journal of Multivariate Analysis, 2014
Many clustering techniques aim at optimizing empirical criteria that are of the form of a U -statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of Uprocesses, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the excess of clustering risk of the empirical minimizer is proved to be of the order O P (1/ √ n). A lower bound result shows that the rate obtained is optimal in a minimax sense. Based on recent results related to the tail behavior of degenerate U -processes, it is also shown how to establish tighter, and even faster, rate bounds under additional assumptions. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered. Finally, it is explained how the theoretical results developed here can provide statistical guarantees for empirical clustering aggregation.
Expert Systems with Applications, 2015
This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.'s. The distance function for clustering c.p.d.f.'s that is proposed is based on the Kolmogorov-Smirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.'s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov-Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270000 c.p.d.f.'s in only 6 different clusters that correspond to 6 different c.p.d.f.'s.
Econometric Theory, 2012
The two-sample version of the celebrated Pearson goodness-of-fit problem has been a topic of extensive research, and several tests like the Kolmogorov-Smirnov and Cramér-von Mises have been suggested. Although these tests perform fairly well as omnibus tests for comparing two probability density functions (PDFs), they may have poor power against specific departures such as in location, scale, skewness, and kurtosis. We propose a new test for the equality of two PDFs based on a modified version of the Neyman smooth test using empirical distribution functions minimizing size distortion in finite samples. The suggested test can detect the specific directions of departure from the null hypothesis. Specifically, it can identify deviations in the directions of mean, variance, skewness, or tail behavior. In a finite sample, the actual probability of type-I error depends on the relative sizes of the two samples. We propose two different approaches to deal with this problem and show that, un...
2000
This paper deals with nonparametric estimation of the boundary curve of the support of a bivariate density function. This estimation problem arises in various contexts, such as for example scatterpoint image analysis and frontier estimation in econometrics. The setup in this paper is a general one, allowing the bivariate density function to be in®nite, bounded away from zero or zero at the boundary. Two estimators for the boundary curve are introduced, both based on order statistics. The asymptotic distribution of the estimators and their rate of convergence are established. Via a comparison of the rates of convergence we recommend which estimator to use in a particular situation. Both estimators can be used as an initial estimator in a two-stage procedure, designed for getting a better estimation. Simulation studies demonstrate the ®nite-sample behavior of the estimators and the proposed two-stage procedure. We illustrate the procedure on a data set on American electric utility companies.
Journal of open source software, 2018
Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC-BY).
Article CITATIONS 0 READS 5 3 authors, including: Alt Rene Pierre and Marie Curie University -Paris 6 46 PUBLICATIONS 197 CITATIONS SEE PROFILE Svetoslav Markov Bulgarian Academy of Sciences 136 PUBLICATIONS 896 CITATIONS SEE PROFILE All content following this page was uploaded by Alt Rene on 28 June 2014.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1998
Motivated by the need to develop meaningful empirical approximations to a`typical' data value, we introduce methods for density and mode estimation when data are in the form of random curves. Our approach is based on ®nite dimensional approximations via generalized Fourier expansions on an empirically chosen basis. The mode estimation problem is reduced to a problem of kernel-type multivariate estimation from vector data and is solved using a new recursive algorithm for ®nding the empirical mode. The algorithm may be used as an aid to the identi®cation of clusters in a set of data curves. Bootstrap methods are employed to select the bandwidth.
Statistics & Risk Modeling, 1996
Incorporating the HiUe theorem, smooth estimators of survival and density functions are considered , and their (asymptotic) properties are studied in an unified manner. A comparative picture of the so called kernel and nearest neighbor methods and the proposed one is presented with due emphasis on the (asymptotic) bias and mean square error criteria.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Journal of Statistical Planning and Inference, 2014
Journal of Statistical Planning and Inference, 2002
Advances in Data Analysis and Classification
Journal of Multivariate Analysis, 1992
Lecture Notes in Computer Science, 2009
Statistics and Computing, 2007
Journal of computational & theoretical statistics, 2015
Journal of Multivariate Analysis, 2014
Proceedings of the fifth Berkeley symposium …, 1967
Przegląd Statystyczny. Statistical Review
Electronic Journal of Statistics, 2019
Computational Statistics & Data Analysis, 2015
Journal of Geographical Systems
HAL (Le Centre pour la Communication Scientifique Directe), 2011
Adv. Appl. Prob., 2014
Probability in the Engineering and Informational Sciences, 2013