Feature selection is an essential preprocessing step for classifiers with high dimensional traini... more Feature selection is an essential preprocessing step for classifiers with high dimensional training sets. In pattern recognition, feature selection improves the performance of classification by reducing the feature space but preserving the classification capabilities of the original feature space. Image classification using frequent approximate subgraph mining (FASM) is an example where the benefits of features selections are needed. This is due using frequent approximate subgraphs (FAS) leads to high dimensional representations. In this paper, we explore the use of feature selection algorithms in order to reduce the representation of an image collection represented through FASs. In our results we report a dimensionality reduction of over 50% of the original features and we get similar classification results than those reported by using all the features.
In this paper we present a new algorithm for document clustering called Condensed Star (ACONS). T... more In this paper we present a new algorithm for document clustering called Condensed Star (ACONS). This algorithm is a natural evolution of the Star algorithm proposed by Aslam et al., and improved by them and other researchers. In this method, we introduced a new concept of star allowing a different star-shaped form; in this way we retain the strengths of previous algorithms as well as address previous shortcomings. The evaluation experiments on standard document collections show that the proposed algorithm outperforms previously defined methods and obtains a smaller number of clusters. Since the ACONS algorithm is relatively simple to implement and is also efficient, we advocate its use for tasks that require clustering, such as information organization, browsing, topic tracking, and new topic detection.
Frequent connected subgraph mining (FCSM) is an interesting task with wide applications in real l... more Frequent connected subgraph mining (FCSM) is an interesting task with wide applications in real life. Most of the previous studies are focused on pruning search subspaces or optimizing the subgraph isomorphism (SI) tests. In this paper, a new property to remove all duplicate candidates in FCSM during the enumeration is introduced. Based on this property, a new FCSM algorithm called gdFil is proposed. In our proposal, the candidate space does not contain duplicates; therefore, we can use a fast evaluation strategy for reducing the cost of SI tests without wasting memory resources. Thus, we introduce a data structure to reduce the cost of SI tests. The performance of our algorithm is compared against other reported algorithms.
Most of the Frequent Connected Subgraph Mining (FCSM) algorithms have been focused on detecting d... more Most of the Frequent Connected Subgraph Mining (FCSM) algorithms have been focused on detecting duplicate candidates using canonical form (CF) tests. CF tests have high computational complexity, which affects the efficiency of graph miners. In this paper, we introduce novel properties of the canonical adjacency matrices for reducing the number of CF tests in FCSM. Based on these properties, a new algorithm for frequent connected subgraph mining called grCAM is proposed. The experiments on real world datasets show the impact of the proposed properties in FCSM. Besides, the performance of our algorithm is compared against some other reported algorithms.
Feature selection is an essential preprocessing step for classifiers with high dimensional traini... more Feature selection is an essential preprocessing step for classifiers with high dimensional training sets. In pattern recognition, feature selection improves the performance of classification by reducing the feature space but preserving the classification capabilities of the original feature space. Image classification using frequent approximate subgraph mining (FASM) is an example where the benefits of features selections are needed. This is due using frequent approximate subgraphs (FAS) leads to high dimensional representations. In this paper, we explore the use of feature selection algorithms in order to reduce the representation of an image collection represented through FASs. In our results we report a dimensionality reduction of over 50% of the original features and we get similar classification results than those reported by using all the features.
In this paper we present a new algorithm for document clustering called Condensed Star (ACONS). T... more In this paper we present a new algorithm for document clustering called Condensed Star (ACONS). This algorithm is a natural evolution of the Star algorithm proposed by Aslam et al., and improved by them and other researchers. In this method, we introduced a new concept of star allowing a different star-shaped form; in this way we retain the strengths of previous algorithms as well as address previous shortcomings. The evaluation experiments on standard document collections show that the proposed algorithm outperforms previously defined methods and obtains a smaller number of clusters. Since the ACONS algorithm is relatively simple to implement and is also efficient, we advocate its use for tasks that require clustering, such as information organization, browsing, topic tracking, and new topic detection.
Frequent connected subgraph mining (FCSM) is an interesting task with wide applications in real l... more Frequent connected subgraph mining (FCSM) is an interesting task with wide applications in real life. Most of the previous studies are focused on pruning search subspaces or optimizing the subgraph isomorphism (SI) tests. In this paper, a new property to remove all duplicate candidates in FCSM during the enumeration is introduced. Based on this property, a new FCSM algorithm called gdFil is proposed. In our proposal, the candidate space does not contain duplicates; therefore, we can use a fast evaluation strategy for reducing the cost of SI tests without wasting memory resources. Thus, we introduce a data structure to reduce the cost of SI tests. The performance of our algorithm is compared against other reported algorithms.
Most of the Frequent Connected Subgraph Mining (FCSM) algorithms have been focused on detecting d... more Most of the Frequent Connected Subgraph Mining (FCSM) algorithms have been focused on detecting duplicate candidates using canonical form (CF) tests. CF tests have high computational complexity, which affects the efficiency of graph miners. In this paper, we introduce novel properties of the canonical adjacency matrices for reducing the number of CF tests in FCSM. Based on these properties, a new algorithm for frequent connected subgraph mining called grCAM is proposed. The experiments on real world datasets show the impact of the proposed properties in FCSM. Besides, the performance of our algorithm is compared against some other reported algorithms.
Uploads
Papers by Andrés Alonso