Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021, Neurocomputing
…
13 pages
1 file
Tensor data analysis is the evolutionary step of data analysis to more than two dimensions. Dealing with tensor data is often based on tensor decomposition methods. The present paper focuses on unsupervised learning and provides a python package referred to as TensorClus including novel co-clustering algorithms of three-way data. All proposed algorithms are based on the latent block models and suitable to different types of data, sparse or not. They are successfully evaluated on challenges in text mining, recommender systems, and hyperspectral image clustering. TensorClus is an open-source Python package that allows easy interaction with other python packages such as NumPy and TensorFlow; it also offers an interface with some tensor decomposition packages namely Tensorly and TensorD on the one hand, and on the other, the co-clustering package Coclust. Finally, it provides CPU and GPU compatibility. The TensorClus library is available at https://pypi.org/project/TensorClus/ 1 .
International Journal of Data Science and Analytics, 2020
With the exponential growth of collected data in different fields like recommender system (user, items), text mining (document, term), bioinformatics (individual, gene), co-clustering, which is a simultaneous clustering of both dimensions of a data matrix, has become a popular technique. Co-clustering aims to obtain homogeneous blocks leading to a straightforward simultaneous interpretation of row clusters and column clusters. Many approaches exist; in this paper, we rely on the latent block model (LBM), which is flexible, allowing to model different types of data matrices. We extend its use to the case of a tensor (3D matrix) data in proposing a Tensor LBM (TLBM), allowing different relations between entities. To show the interest of TLBM, we consider continuous, binary, and contingency tables datasets. To estimate the parameters, a variational EM algorithm is developed. Its performances are evaluated on synthetic and real datasets to highlight different possible applications. Keywords Co-clustering • Tensor • Data science This submission is an extension version of the PAKDD 2019 paper 'Co-clustering from Tensor Data'.
2020 28th European Signal Processing Conference (EUSIPCO), 2021
Co-clustering of tensor data is an unsupervised learning task aiming to identify multidimensional structures hidden in a tensor. These structures are critical for understanding interdependencies across variables belonging to different tensor dimensions, often referred to as modes, which are frequently disregarded when tensor data are represented via one- or two-dimensional data structures. This work proposes a new tensor co-clustering algorithm that uses a class of Bregman divergences to measure the coherence of co-clusters on an individual mode basis, while ensuring that the interactions of their prototyping elements capture the tensor intra-modal structure. A co-clustering algorithm based on the alternating-direction method of multipliers is developed. The proposed algorithm decouples the co-clustering problem into an iterative two-step process whose steps are reminiscent of classical one-way clustering and Tucker decomposition problems. The performance of the proposed method is i...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020
Clustering aims to separate observed data into different categories. The performance of popular clustering models relies on the sample-to-sample similarity. However, the pairwise similarity is prone to be corrupted by noise or outliers and thus deteriorates the subsequent clustering. A high-order relationship among samples-to-samples may elaborate the local manifold of the data and thus provide complementary information to guide the clustering. However, few studies have investigated the connection between high-order similarity and usual pairwise similarity. To fill this gap, we first define a high-order tensor similarity to exploit the samples-to-samples affinity relationship. We then establish the connection between tensor similarity and pairwise similarity, proving that the decomposable tensor similarity is the Kronecker product of the usual pairwise similarity and the non-decomposable tensor similarity is generalized to provide complementary information, which pairwise similarity fails to regard. Finally, the high-order tensor similarity and pairwise similarity (IPS2) were integrated collaboratively to enhance clustering performance by enjoying their merits. The proposed IPS2 is shown to perform superior or competitive to state-of-the-art methods on synthetic and real-world datasets. Extensive experiments demonstrated that tensor similarity is capable to boost the performance of the classical clustering method.
2021
How can we expand the tensor decomposition to reveal a hierarchical structure of the multi-modal data in a self-adaptive way? Current tensor decomposition provides only a single layer of clusters. We argue that with the abundance of multimodal data and time-evolving networks nowadays, the ability to identify emerging hierarchies is important. To this effect, we propose RecTen, a recursive hierarchical soft clustering approach based on tensor decomposition. Our approach enables us to: (a) recursively decompose clusters identified in the previous step, and (b) identify the right conditions for terminating this process. In the absence of proper ground truth, we evaluate our approach with synthetic data and test its sensitivity to different parameters. We also apply RecTen on five real datasets which involve the activities of users in online discussion platforms, such as security forums. This analysis helps us reveal clusters of users with interesting behaviors, including but not limite...
2016
In the world of the internet, social media is referred as an important section of technological backgrounds. Each user has its own content on social media which represents its participation and involvement in information system development. Participation anticipates involvements which further established a relationship of the user with the system. Prediction and analysis of user activities from various domains have become a significant part of our system accounting their credibility, security, and other issues etc. for upcoming decisions. Predicting user activity is important and so on is the challenging and recommended task to analyze social media based on current technical terms or means in just a blink of an eye time period. Various techniques and methods have been outlined that predicts user activities in accordance with the maintenance of its credibility in every cyber norm. Recommender systems and Tensor Factorization are among one of those techniques that are used to predict,...
The Annals of Statistics
Recommender systems have been widely adopted by electronic commerce and entertainment industries for individualized prediction and recommendation, which benefit consumers and improve business intelligence. In this article, we propose an innovative method, namely the recommendation engine of multilayers (REM), for tensor recommender systems. The proposed method utilizes the structure of a tensor response to integrate information from multiple modes, and creates an additional layer of nested latent factors to accommodate between-subjects dependency. One major advantage is that the proposed method is able to address the "cold-start" issue in the absence of information from new customers, new products or new contexts. Specifically, it provides more effective recommendations through subgroup information. To achieve scalable computation, we develop a new algorithm for the proposed method, which incorporates a maximum block improvement strategy into the cyclic blockwisecoordinate-descent algorithm. In theory, we investigate both algorithmic properties for global and local convergence, along with the asymptotic consistency of estimated parameters. Finally, the proposed method is applied in simulations and IRI marketing data with 116 million observations of product sales. Numerical studies demonstrate that the proposed method outperforms existing competitors in the literature.
arXiv: Computer Vision and Pattern Recognition, 2016
A new submodule clustering method via sparse and low-rank representation for multi-way data is proposed in this paper. Instead of reshaping multi-way data into vectors, this method maintains their natural orders to preserve data intrinsic structures, e.g., image data kept as matrices. To implement clustering, the multi-way data, viewed as tensors, are represented by the proposed tensor sparse and low-rank model to obtain its submodule representation, called a free module, which is finally used for spectral clustering. The proposed method extends the conventional subspace clustering method based on sparse and low-rank representation to multi-way data submodule clustering by combining t-product operator. The new method is tested on several public datasets, including synthetical data, video sequences and toy images. The experiments show that the new method outperforms the state-of-the-art methods, such as Sparse Subspace Clustering (SSC), Low-Rank Representation (LRR), Ordered Subspace...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015
Tensors or multiarray data are generalizations of matrices. Tensor clustering has become a very important research topic due to the intrinsically rich structures in real-world multiarray datasets. Subspace clustering based on vectorizing multiarray data has been extensively researched. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model. In contrast to existing techniques, we propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the so-called multinomial manifold, for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
Advances in Intelligent Systems and Computing, 2014
On internet today, an overabundance of information can be accessed, making it difficult for users to process and evaluate options and make appropriate choices. This phenomenon is known as information overload. Over time, various methods of information filtering have been introduced in order to assist users in choosing what may be of their interest. Recommender Systems (RS) are techniques for information filtering which play an important role in e-commerce, advertising, e-mail filtering, etc. Therefore, RS are an answer, though partial, to the problem of information overload. Recommendation algorithms need to be continuously updated because of a constant increase in both the quantity of information and ways of access to that information, which define the different contexts of information use. The research of more effective and more efficient methods than those currently known in literature is also stimulated by the interests of industrial research in this field, as demonstrated by the Netflix Prize Contest, the open competition for the best algorithm to predict user ratings for films, based on previous ratings. The contest showed the superiority of mathematical methods that discover latent factors which drives user-item similarity, with respect to classical collaborative filtering algorithms. With the ever-increasing information available in digital archives and textual databases, the challenge of implementing personalized filters has become the challenge of designing algorithms able to manage huge amounts of data for the elicitation of user needs and preferences. In recent years, matrix factorization techniques have proved to be a quite promising solution to the problem of designing efficient filtering algorithms in the Big Data Era. The main contribution of this paper is an analysis of these methods, which focuses on tensor factorization techniques, as well as the definition of a method for tensor factorization suitable for recommender systems.
2017
The PARAFAC tensor decomposition has enjoyed an increasing success in exploratory multi-aspect data mining scenarios. A major challenge remains the estimation of the number of latent factors (i.e., the rank) of the decomposition, which is known to yield high-quality, interpretable results. Previously, AutoTen, an automated tensor mining method which leverages a well-known quality heuristic from the field of Chemometrics, the Core Consistency Diagnostic (CORCONDIA), in order to automatically determine the rank for the PARAFAC decomposition, was proposed. In this work, building upon AutoTen, we set out to explore the trade-off between 1) the interpretability of the results (as expressed by CORCONDIA), and 2) the predictive accuracy of the decomposition, towards improving rank estimation quality. Our preliminary results indicate that striking a good balance in that trade-off yields high-quality rank estimation, towards achieving unsupervised tensor mining.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IEEE Transactions on Signal Processing, 2000
Proceedings of the 13th annual ACM international conference on Multimedia, 2005
Lecture Notes in Computer Science, 2019
arXiv (Cornell University), 2022
Journal of the Royal Statistical Society Series B: Statistical Methodology
Pattern Analysis and Applications, 2020
Fundamenta Informaticae, 2009
Proceedings of the 2008 ACM conference on Recommender systems - RecSys '08, 2008
Lecture Notes in Computer Science, 2017
2014 International Joint Conference on Neural Networks (IJCNN), 2014
Proceedings of the ... AAAI Conference on Artificial Intelligence, 2017
Information Sciences, 2015
arXiv (Cornell University), 2020
PLOS ONE, 2019
Statistics and Computing, 2021
HAL (Le Centre pour la Communication Scientifique Directe), 2017
2011 International Conference on Advances in Social Networks Analysis and Mining, 2011
Advances in Neural Networks – ISNN 2019, 2019