Papers (not up-to-date) by Dmitry Ignatov
Frequent Itemset Mining for Clustering Near Duplicate Web Documents
A vast amount of documents in the Web have duplicates, which is a challenge for developing effici... more A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.
Computing Research Repository, 2009
The problem of detecting terms that can be interesting to the advertiser is considered. If a comp... more The problem of detecting terms that can be interesting to the advertiser is considered. If a company has already bought some advertising terms which describe certain services, it is reasonable to find out the terms bought by competing companies. A part of them can be recommended as future advertising terms to the company. The goal of this work is to propose better interpretable recommendations based on FCA and association rules.
A novel approach to triclustering of a three-way binary data is proposed. Tricluster is defined i... more A novel approach to triclustering of a three-way binary data is proposed. Tricluster is defined in terms of Triadic Formal Concept Analysis as a dense triset of a binary relation Y, describing relationship between objects, attributes and conditions. This definition is a relaxation of a triconcept notion and makes it possible to find all triclusters and triconcepts contained in triclusters of large datasets. This approach generalizes the similar study of concept-based biclustering.
Computing Research Repository, 2009
Owners of a web-site are often interested in analysis of groups of users of their site. Informati... more Owners of a web-site are often interested in analysis of groups of users of their site. Information on these groups can help optimizing the structure and contents of the site. In this paper we use an approach based on formal concepts for constructing taxonomies of user groups. For decreasing the huge amount of concepts that arise in applications, we employ stability index of a concept, which describes how a group given by a concept extent differs from other such groups. We analyze resulting taxonomies of user groups for three target websites.
Papers by Dmitry Ignatov

Research Square (Research Square), Jun 1, 2023
The article describes an approach to the construction of complex distributed cyber-physical syste... more The article describes an approach to the construction of complex distributed cyber-physical systems with a high level of architectural dynamics built on fog and edge computing platforms. The key idea of the developed approach is to use digital twins as dynamic models of the observed and managed systems, which are kept up-to-date by processing the event flow received in the form of logs. A reference architecture of a dynamic runtime digital twin is proposed. The possible approaches to synthesis of models that form the basis of digital twins are discussed. Examples of using the proposed approach to solve practical problems are given. The described approach may be of interest to specialists engaged in research and development of different kinds of information systems realized on the IoT platforms such as smart cities, smart transport, medical information systems, etc.

We show how purpose can be used as a central guiding principle for organizing knowledge about art... more We show how purpose can be used as a central guiding principle for organizing knowledge about artifacts. It allows the actions in which the artifact participates to be related naturally to other objects. Similarly, the structure or parts of the artifact can also be related to the actions. A knowledgebase called PurposeNet has been built using these principles. A comparison with other knowledebases shows that it is a superior method in terms of coverage. It also makes it possible for automatic extraction of simple facts (or information) from text for populating a richly structured knowledgebase. An experiment in domain-specific question-answering from a given passage shows that PurposeNet used alongwith scripts (or knowledge of stereotypical situations), can lead to substantially higher accuracy in question answering. In the domain of car racing, individually they produce correct answers to 50% and 37.5% questions respectively, but together they produce 89% correct answers.

arXiv (Cornell University), Feb 23, 2016
Triadic Formal Concept Analysis (3FCA) was introduced by Lehman and Wille almost two decades ago.... more Triadic Formal Concept Analysis (3FCA) was introduced by Lehman and Wille almost two decades ago. And many researchers work in Data Mining and Formal Concept Analysis using the notions of closed sets, Galois and closure operators, closure systems, but up-to-date even though that different researchers actively work on mining triadic and n-ary relations, a proper closure operator for enumeration of triconcepts, i.e. maximal triadic cliques of tripartite hypergaphs, was not introduced. In this paper we show that the previously introduced operators for obtaining triconcepts and maximal connected and complete sets (MCCSs) are not always consistent and provide the reader with a definition of valid closure operator and associated set system. Moreover, we study the difficulties of related problems from order-theoretic and combinatorial point view as well as provide the reader with justifications of the complexity classes of these problems.
arXiv (Cornell University), Feb 23, 2014
This paper presents an analysis of data from a gift-exchange-game experiment. The experiment was ... more This paper presents an analysis of data from a gift-exchange-game experiment. The experiment was described in 'The Impact of Social Comparisons on Reciprocity' by Gächter et al. 2012. Since this paper uses state-of-art data science techniques, the results provide a different point of view on the problem. As already shown in relevant literature from experimental economics, human decisions deviate from rational payoff maximization. The average gift rate was 31%. Gift rate was under no conditions zero. Further, we derive some special findings and calculate their significance.

International Joint Conference on Artificial Intelligence, 2021
In this paper we study certain properties of the GreConD algorithm for Boolean matrix factorisati... more In this paper we study certain properties of the GreConD algorithm for Boolean matrix factorisation, a popular technique in Data Mining with binary relational data. This greedy algorithm was inspired by the fact that the optimal number of factors for the Boolean matrix factorisation can be chosen among the formal concepts of the corresponding formal context. In particular, we consider one of the hardest cases (in terms of the numerous of possible factors), the so-called contranominal scales, and show that the output of GreConD is not optimal in this case. Moreover, we formally analyse its output by means of recurrences and generating functions and provide the reader with the closed form for the returned number of factors. An algorithm generating the optimal number of factors and the corresponding product matrices P and Q is also provided by us for the case of contranominal scales.
Concept Lattices and their Applications, 2016
We propose a new algorithm for consensus clustering, FCA-Consensus, based on Formal Concept Analy... more We propose a new algorithm for consensus clustering, FCA-Consensus, based on Formal Concept Analysis. As the input, the algorithm takes T partitions of a certain set of objects obtained by k-means algorithm after T runs from different initialisations. The resulting consensus partition is extracted from an antichain of the concept lattice built on a formal context objects × classes, where the classes are the set of all cluster labels from each initial k-means partition. We compare the results of the proposed algorithm in terms of ARI measure with the state-of-theart algorithms on synthetic datasets. Under certain conditions, the best ARI values are demonstrated by FCA-Consensus.

Springer proceedings in mathematics & statistics, 2020
This paper is related to the problem of finding the maximal quasi-bicliques in a bipartite graph ... more This paper is related to the problem of finding the maximal quasi-bicliques in a bipartite graph (bigraph). A quasi-biclique in the bigraph is its "almost" complete subgraph. The relaxation of completeness can be understood variously; here, we assume that the subgraph is a γ-quasi-biclique if it lacks a certain number of edges to form a biclique such that its density is at least γ ∈ (0, 1]. For a bigraph and fixed γ, the problem of searching for the maximal quasi-biclique consists of finding a subset of vertices of the bigraph such that the induced subgraph is a quasi-biclique and its size is maximal for a given graph. Several models based on Mixed Integer Programming (MIP) to search for a quasi-biclique are proposed and tested for working efficiency. An alternative model inspired by biclustering is formulated and tested; this model simultaneously maximizes both the size of the quasi-biclique and its density, using the least-square criterion similar to the one exploited by triclustering T B .
Concept Lattices and their Applications, 2015
In this work we propose and study an approach for collaborative filtering, which is based on Bool... more In this work we propose and study an approach for collaborative filtering, which is based on Boolean matrix factorisation and exploits additional (context) information about users and items. To avoid similarity loss in case of Boolean representation we use an adjusted type of projection of a target user to the obtained factor space. We have compared the proposed method with SVD-based approach on the MovieLens dataset. The experiments demonstrate that the proposed method has better MAE and Precision and comparable Recall and F-measure. We also report an increase of quality in the context information presence.

Applied Soft Computing, Jun 1, 2019
Recurrent neural networks have proved to be an effective method for statistical language modeling... more Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long-Short Term Memory models. We make particular attention to the high-dimensional output problem caused by the very large vocabulary size. We focus on effective compression methods in the context of their exploitation on devices: pruning, quantization, and matrix decomposition approaches (low-rank factorization and tensor train decomposition, in particular). For each model we investigate the trade-off between its size, suitability for fast inference and perplexity. We propose a general pipeline for applying the most suitable methods to compress recurrent neural networks for language modeling. It has been shown in the experimental study with the Penn Treebank (PTB) dataset that the most efficient results in terms of speed and compression-perplexity balance are obtained by matrix decomposition techniques.
Reduction of the number of parameters is one of the most important goals in Deep Learning. In thi... more Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.
This short paper is related to the problem of finding maximum quasi-bicliques in a bipartite grap... more This short paper is related to the problem of finding maximum quasi-bicliques in a bipartite graph (bigraph). A quasi-biclique in a bigraph is its "almost" complete subgraph; here, we assume that the subgraph is a quasi-biclique if it lacks γ • 100% of the edges to become a biclique. The problem of finding the maximal quasi-biclique(s) consists of finding subset(s) of vertices of an input bigraph such that the induced by these subsets subgraph is a quasi-biclique and its size is maximal. A model based on mixed integer programming (MIP) to search for a quasi-biclique is proposed and tested. Another its variant is tested that simultaneously maximizes both the size of the quasi-biclique and its density, using the least-square criterion similar to the one exploited by Tri-Box method for tricluster generation. Therefore, the output patterns can be called large dense biclusters as well.

International Joint Conference on Artificial Intelligence, 2018
In this paper, we explain how Galois connection and related operators between sets of users and i... more In this paper, we explain how Galois connection and related operators between sets of users and items naturally arise in user-item data for forming neighbourhoods of a target user or item for Collaborative Filtering. We compare the properties of these operators and their applicability in simple collaborative user-to-user and item-to-item setting. Moreover, we propose a new neighbourhood-forming operator based on pair-wise similarity ranking of users, which takes intermediate place between the studied closure operators and its relaxations in terms of neighbourhood size and demonstrates comparatively good Precision-Recall trade-off. In addition, we compare the studied neighbourhood-forming operators in the collaborative filtering setting against simple but strong benchmark, the SlopeOne algorithm, over bimodal cross-validation on MovieLens dataset.

Concept Lattices and their Applications, 2020
We propose the usage of two power indices from cooperative game theory and public choice theory f... more We propose the usage of two power indices from cooperative game theory and public choice theory for ranking attributes of closed sets, namely intents of formal concepts (or closed itemsets). The introduced indices are related to extensional concept stability and based on counting generators, especially those that contain a selected attribute. The introduction of such indices is motivated by the so-called interpretable machine learning, which supposes that we do not only have the class membership decision of a trained model for a particular object, but also a set of attributes (in the form of JSM-hypotheses or other patterns) along with individual importance of their single attributes (or more complex constituent elements). We characterise computation of Shapley and Banzhaf values of a formal concept in terms of minimal generators and their order filters, provide the reader with their properties important for computation purposes, and show experimental results.
arXiv (Cornell University), Jul 20, 2015
We propose a new algorithm for recommender systems with numeric ratings which is based on Pattern... more We propose a new algorithm for recommender systems with numeric ratings which is based on Pattern Structures (RAPS). As the input the algorithm takes rating matrix, e.g., such that it contains movies rated by users. For a target user, the algorithm returns a rated list of items (movies) based on its previous ratings and ratings of other users. We compare the results of the proposed algorithm in terms of precision and recall measures with Slope One, one of the state-of-theart item-based algorithms, on Movie Lens dataset and RAPS demonstrates the best or comparable quality.
arXiv (Cornell University), Feb 23, 2016
Triadic Formal Concept Analysis (3FCA) was introduced by Lehman and Wille almost two decades ago,... more Triadic Formal Concept Analysis (3FCA) was introduced by Lehman and Wille almost two decades ago, but up-to-date even though that different researchers actively work on this FCA branch, a proper closure operator for enumeration of triconcepts, i.e. maximal triadic cliques of tripartite hypergaphs, was not introduced. In this paper we show that the previously introduced operators for obtaining triconcepts and maximal connected and complete sets (MCCS) are not always consistent and provide the reader with the definition of a valid closure operator and the associated set system.

arXiv (Cornell University), Sep 25, 2022
In this paper we count set closure systems (also known as Moore families) for the case when all s... more In this paper we count set closure systems (also known as Moore families) for the case when all single element sets are closed. In particular, we give the numbers of such strict (empty set included) and non-strict families for the base set of size n = 6. We also provide the number of such inequivalent Moore families with respect to all permutations of the base set up to n = 6. The search in OEIS and existing literature revealed the coincidence of the found numbers with the entry for D. M. Davis' set union lattice (A235604, up to n = 5) and |L n |, the number of atomic lattices on n atoms, obtained by S. Mapes (up to n = 6), respectively. Thus we study all those cases, establish one-to-one correspondences between them via Galois adjunctions and Formal Concept Analysis, and provide the reader with two of our enumerative algorithms as well as with the results of these algorithms used for additional tests. Other results include the largest size of intersection free families for n = 6 plus our conjecture for n = 7, an upper bound for the number of atomic lattices L n , and some structural properties of L n based on the theory of extremal lattices.
Uploads
Papers (not up-to-date) by Dmitry Ignatov
Papers by Dmitry Ignatov