Papers by Gançarski Pierre
... 545 Classification non supervisée et visualisation 3D de documents, Nicolas Bonnel,Annie Mori... more ... 545 Classification non supervisée et visualisation 3D de documents, Nicolas Bonnel,Annie Morin et ... connaissances pour l'enrichissement des bases de données géographiques, Sami Faïz et Khaoula Mahmoudi .....611 Modélisation ...

Although agreement between annotators who mark feature locations within images has been studied i... more Although agreement between annotators who mark feature locations within images has been studied in the past from a statistical viewpoint, little work has attempted to quantify the extent to which this phenomenon affects the evaluation of foreground-background segmentation algorithms. Many researchers utilise ground truth in experimentation and more often than not this ground truth is derived from one annotator's opinion. How does the difference in opinion affect an algorithm's evaluation? A methodology is applied to four image processing problems to quantify the inter-annotator variance and to offer insight into the mechanisms behind agreement and the use of ground truth. It is found that when detecting linear structures annotator agreement is very low. The agreement in a structure's position can be partially explained through basic image properties. Automatic segmentation algorithms are compared to annotator agreement and it is found that there is a clear relation between the two. Several ground truth estimation methods are used to infer a number of algorithm performances. It is found that: the rank of a detector is highly dependent upon the method used to form the ground truth; and that although STAPLE and LSML appear to represent the mean of the performance measured using individual annotations, when there are few annotations, or there is a large variance in them, these estimates tend to degrade. Furthermore, one of the most commonly adopted combination methods---consensus voting---accentuates more obvious features, resulting in an overestimation of performance. It is concluded that in some datasets it is not possible to confidently infer an algorithm ranking when evaluating upon one ground truth.
… in Cartography and …, Jan 1, 2011
A. Ruas (ed.), Advances in Cartography and GIScience. Volume 2: Selection from ICC 2011, ... Conc... more A. Ruas (ed.), Advances in Cartography and GIScience. Volume 2: Selection from ICC 2011, ... Conception of a GIS-Platform to simulate urban ... Anne Ruas1; Julien Perret1; Florence Curie1; Annabelle Mas2; Anne Puissant3; Gregorz Skupinski3, Dominique Badariotti3; ...

While a problem's skew is often assumed to be constant, this paper discusses three settings where... more While a problem's skew is often assumed to be constant, this paper discusses three settings where this assumption does not hold. Consequently, incorrectly assuming skew to be constant in these contradicting cases results in an over or under estimation of an algorithm's performance. The area under a precision-recall curve (AUCPR) is a common summary measurement used to report the performance of machine learning algorithms. It is well known that precision is dependent upon class skew, which often varies between datasets. In addition to this, it is demonstrated herein that under certain circumstances the relative ranking of an algorithm (as measured by AUCPR) is not constant and is instead also dependent upon skew. The skew at which the performance of two algorithms inverts and the relationship between precision measured at different skews are defined. This is extended to account for temporal skew characteristics and situations in which skew cannot be precisely defined. Formal proofs for these findings are presented, desirable properties are proved and their application demonstrated.
The aim of our research is to analyze the evolution of urbanization and to simulate it on specifi... more The aim of our research is to analyze the evolution of urbanization and to simulate it on specific areas. We focus on the evolution between 1950 and now. We analyse the densification by means of comparing temporal topographic data bases created from existing topographic data base and maps and photo from 1950. In this paper we present how a simulation works - which input data are used, which functions are used to densify the space and how the simulation works, is tuned and run - the densification method for each urban block illustrated with results, the method used during the project to build the required knowledge for simulation and we conclude and present the main research perspectives. The methods are implemented on a dedicated open source software named GeOpenSim.

In remote sensing data classification, the ability to discriminate different land cover or materi... more In remote sensing data classification, the ability to discriminate different land cover or material types is directly linked with the spectral resolution and sampling provided by the optical sensor. Several previous studies showed that the spectral resolution is a critical issue, especially to discriminate different land covers in urban areas. In spite of the increasing avaibility of hyperspectral data, multispectral optical sensors on board of several satellites are still acquiring everyday a massive amount of data with a relatively poor spectral resolution (i.e. usually about 4 to 7 spectral bands). These remotely sensed data are intensively used for Earth observation regardless of their limited spectral resolution. In this paper, we propose to study the capacity of discrimination of several of these optical sensors : Pleiades, QuickBird, SPOT5, Ikonos, Landsat, etc. To achieve this goal, we used different spectral libraries which provide spectra of materials and land covers generally with a fine spectral resolution (from 350 to 2400nm with 10nm bandwidth). These spectra were extracted from these libraries and convolved with the Relative Spectral Responses (RSR) of each sensor to create spectra at the sensors' resolutions. Then, these reduced spectra were evaluated thanks to classical separability indices and machine learning tools. This study focuses on the capacity of each sensor to discriminate different materials according to its spectral resolution.

The Journal of Supercomputing, 2008
The goal of clustering is to identify subsets called clusters which usually correspond to objects... more The goal of clustering is to identify subsets called clusters which usually correspond to objects that are more similar to each other than they are to objects from other clusters. We have proposed the MACLAW method, a cooperative coevolution algorithm for data clustering, which has shown good results (Blansché and Gançarski, Pattern Recognit. Lett. 27(11), 1299–1306, 2006). However the complexity of the algorithm increases rapidly with the number of clusters to find. We propose in this article a parallelization of MACLAW, based on a message-passing paradigm, as well as the analysis of the application performances with experiment results. We show that we reach near optimal speedups when searching for 16 clusters, a typical problem instance for which the sequential execution duration is an obstacle to the MACLAW method. Further, our approach is original because we use the P2P-MP1 grid middleware (Genaud and Rattanapoka, Lecture Notes in Comput. Sci., vol. 3666, pp. 276–284, 2005) which both provides the message passing library and infrastructure services to discover computing resources. We also put forward that the application can be tightly coupled with the middleware to make the parallel execution nearly transparent for the user.
Satellite Image Time Series (SITS) analysis is an important domain with various applications in l... more Satellite Image Time Series (SITS) analysis is an important domain with various applications in land study. In the coming years, both high temporal and high spatial resolution SITS will be available. This article aims at providing both temporal and spatial analysis of SITS. We propose first segmenting each image of the series, and then using these segmentations in order to characterize each pixel of the data with a spatial dimension (i.e. with contextual information). Providing spatially characterized pixels, pixel-based temporal analysis can be performed. Experiments carried out with this methodology show the relevance of this approach and the significance of the resulting extracted patterns in the context of the analysis of SITS.

Pattern Recognition, 2008
Feature weighting is an aspect of increasing importance in clustering because data are becoming m... more Feature weighting is an aspect of increasing importance in clustering because data are becoming more and more complex nowadays. In this paper, we propose two new feature weighting methods based on coevolutive algorithms. The first one is inspired by the Lamarck theory (inheritance of acquired characteristics) and uses the distance-based cost function defined in the LKM algorithm as fitness function. The second method uses a fitness function based on a new partitioning quality measure. It does not need a distance-based measure. We compare classical hill-climbing optimization with these new genetic algorithms on three data sets from UCI. Results show that the proposed methods are better than the hill-climbing based algorithms. We also present a process of hyperspectral remotely sensed image classification. The experiments, corroborated by geographers, highlight the benefits of using coevolutionary feature weighting methods to improve knowledge discovery process.

Many machine-learning (either supervised or unsupervised) techniques assume that data present the... more Many machine-learning (either supervised or unsupervised) techniques assume that data present themselves in an attribute-value form. But this formalism is largely insufficient to account for many applications. Therefore, much of the ongoing research now focuses on first-order learning systems. But complex formalisms lead to high computational complexities. On the other hand, most of the currently installed databases have been designed according to a formalism known as entity-relationship, and usually implemented on a relational database management system. This formalism is far less complex than first-order logic, but much more expressive than attribute-value lists. In that context, the database schema defines an abstraction space, and learning must occur at each level of abstraction. This paper describes a clustering system able to discover useful groupings in structured databases. It is based in the COBWEB algorithm, to which it adds the ability to cluster structured objects.
Today, video is becoming one of the primary sources of information. Current video mining systems ... more Today, video is becoming one of the primary sources of information. Current video mining systems face the problem of the semantic gap (i.e., the difference between the semantic meaning of video contents and the digital information encoded within the video files). This gap can be bridged by relying on the real objects present in videos because of the semantic meaning of objects. But video object mining needs some semantics, both in the object extraction step and in the object mining step. We think that the introduction of semantics during these steps can be ensured by user interaction. We then propose a generic framework to deal with video object mining.
Feature weighting is a more and more important step in clustering because data become more and mo... more Feature weighting is a more and more important step in clustering because data become more and more complex. An embedded local feature weighting method has been proposed in [1]. In this paper, we present a new method based on the same cost function, but performed through a genetic algorithm. The learning process can be performed through an evolutionary approach or through a cooperavive coevolutionary approach. Moreover, the genetic algorithm can be combined with the original Weighting K-means algorithm in a Lamarckian learning paradigm. We compare hill-climbing optimization versus genetic algorithms, evolutionary versus coevolutionary approaches, and Darwinian versus Lamarckian learning on different datasets. The results seem to show that, on the datasets where the original algorithm is efficient, the proposed methods are even better.
Abstract. This paper deals with the description of the use of a col-laborative multi-strategy cla... more Abstract. This paper deals with the description of the use of a col-laborative multi-strategy classification applied to image analysis. This system integrates different kinds of unsupervised classification methods and produce for each classifier a result built according to the results of all the ...

Pattern Recognition, 2011
Mining sequential data is an old topic that has been revived in the last decade, due to the incre... more Mining sequential data is an old topic that has been revived in the last decade, due to the increasing availability of sequential datasets. Most works in this field are centred on the definition and use of a distance (or, at least, a similarity measure) between sequences of elements. A measure called Dynamic Time Warping (DTW) seems to be currently the most relevant for a large panel of applications. This article is about the use of DTW in data mining algorithms, and focuses on the computation of an average of a set of sequences. Averaging is an essential tool for the analysis of data. For example, the K-means clustering algorithm repeatedly computes such an average, and needs to provide a description of the clusters it forms. Averaging is here a crucial step, which must be sound in order to make algorithms work accurately. When dealing with sequences, especially when sequences are compared with DTW, averaging is not a trivial task.
Le code de la propriété intellectuelle du 1 er juillet 1992 interdit expressément la photocopie à... more Le code de la propriété intellectuelle du 1 er juillet 1992 interdit expressément la photocopie à usage collectif sans autorisation des ayants droit. Or, cette pratique en se généralisant provoquerait une baisse brutale des achats de livres, au point que la possibilité même pour les auteurs de créer des oeuvres nouvelles et de les faire éditer correctement serait alors menacée.
The classification methods applied in the objectoriented image analysis approach are often based ... more The classification methods applied in the objectoriented image analysis approach are often based on the use of domain knowledge. A key issue in this approach is the acquisition of this knowledge which is generally implicit and not formalized. In this paper, we examine the possibilities of using genetic programming for the automatic extraction of classification rules from urban remotely sensed data. The method proposed is composed of several steps: segmentation, feature extraction, selection of training sets, acquisition of rules, classification. Features related to the spectral, spatial and contextual properties of the objects are used in the classification procedure. Experiments are made on a Quickbird MS image. The quality of the results shows the effectiveness of the proposed genetic classifier in the object-oriented, knowledge-based approach.
The multiplication of very high resolution (spatial or spectral) remote sensing images appears to... more The multiplication of very high resolution (spatial or spectral) remote sensing images appears to be an opportunity to identify objects in urban and periurban areas. The classification methods applied in the object-oriented image analysis approach could be based on the use of domain knowledge. A major issue in these approaches is domain knowledge formalization and exploitation. In this paper, we propose a recognition method based on an ontology which has been developed by experts of the domain. In order to give objects a semantic meaning, we have developed a matching process between an object and the concepts of the ontology. Experiments are made on a Quickbird image. The quality of the results shows the effectiveness of the proposed method.
ABSTRACT Ina context of urban planning, it is necessary to support the identification and the for... more ABSTRACT Ina context of urban planning, it is necessary to support the identification and the formalization of the urban elements. Very often, it requires some complementary aspects of a set of images and also ancillary data. However the lack of methods,enabling the combination,of several,sources,is still compelling. In general, the use of several sources of remotely sensed data in a classification
Uploads
Papers by Gançarski Pierre