Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013, arXiv (Cornell University)
How does one search for a needle in a multi-dimensional haystack without knowing what a needle is and without knowing if there is one in the haystack? This kind of problem requires a paradigm shift-away from hypothesis driven searches of the data-towards a methodology that lets the data speak for itself. Dynamic Quantum Clustering (DQC) is such a methodology. DQC is a powerful visual method that works with big, high-dimensional data. It exploits variations of the density of the data (in feature space) and unearths subsets of the data that exhibit correlations among all the measured variables. The outcome of a DQC analysis is a movie that shows how and why sets of data-points are eventually classified as members of simple clusters or as members of-what we call-extended structures. This allows DQC to be successfully used in a non-conventional exploratory mode where one searches data for unexpected information without the need to model the data. We show how this works for big, complex, real-world datasets that come from five distinct fields: i.e., x-ray nano-chemistry, condensed matter, biology, seismology and finance. These studies show how DQC excels at uncovering unexpected, small-but meaningful-subsets of the data that contain important information. We also establish an important new result: namely, that big, complex datasets often contain interesting structures that will be missed by many conventional clustering techniques. Experience shows that these structures appear frequently enough that it is crucial to know they can exist, and that when they do, they encode important hidden information. In short, we not only demonstrate that DQC can be flexibly applied to datasets that present significantly different challenges, we also show how a simple analysis can be used to look for the needle in the haystack, determine what it is, and find what this means.
2008
Nuclear Physics B - Proceedings Supplements, 2010
Last year, in 2008, I gave a talk titled Quantum Calisthenics. This year I am going to tell you about how the work I described then has spun off into a most unlikely direction. What I am going to talk about is how one maps the problem of finding clusters in a given data set into a problem in quantum mechanics. I will then use the tricks I described to let quantum evolution lets the clusters come together on their own.
Physical Review E, 2009
A given set of data-points in some feature space may be associated with a Schrödinger equation whose potential is determined by the data. This is known to lead to good clustering solutions. Here we extend this approach into a full-fledged dynamical scheme using a time-dependent Schrödinger equation. Moreover, we approximate this Hamiltonian formalism by a truncated calculation within a set of Gaussian wave functions (coherent states) centered around the original points. This allows for analytic evaluation of the time evolution of all such states, opening up the possibility of exploration of relationships among data-points through observation of varying dynamical-distances among points and convergence of points into clusters. This formalism may be further supplemented by preprocessing, such as dimensional reduction through singular value decomposition or feature filtering. PACS numbers: 89.75.Fb,89.90.+n,95.75.Pq,89.75.Kd −1/2 i,j
Mathematics, 2017
Data clustering is a vital tool for data analysis. This work shows that some existing useful methods in data clustering are actually based on quantum mechanics and can be assembled into a~powerful and accurate data clustering method where the efficiency of computational quantum chemistry eigenvalue methods is therefore applicable. These methods can be applied to scientific data, engineering data and even text.
Companion of the 2023 International Conference on Management of Data
In the last few years, the field of quantum computing has experienced remarkable progress. The prototypes of quantum computers already exist and have been made available to users through cloud services (e.g., IBM Q experience, Google quantum AI, or Xanadu quantum cloud). While fault-tolerant and large-scale quantum computers are not available yet (and may not be for a long time, if ever), the potential of this new technology is undeniable. Quantum algorithms have the proven ability to either outperform classical approaches for several tasks, or are impossible to be efficiently simulated by classical means under reasonable complexity-theoretic assumptions. Even imperfect current-day technology is speculated to exhibit computational advantages over classical systems. Recent research is using quantum computers to solve machine learning tasks. Meanwhile, the database community has already successfully applied various machine learning algorithms for data management tasks, so combining the fields seems to be a promising endeavour. However, quantum machine learning is a new research field for most database researchers. In this tutorial, we provide a fundamental introduction to quantum computing and quantum machine learning and show the potential benefits and applications for database research. In addition, we demonstrate how to apply quantum machine learning to the join order optimization problem in databases. CCS CONCEPTS • Computer systems organization → Quantum computing; • Computing methodologies → Machine learning; • Information systems → Data management systems.
arXiv: Quantum Physics, 2018
This text aims to present and explain quantum machine learning algorithms to a data scientist in an accessible and consistent way. The algorithms and equations presented are not written in rigorous mathematical fashion, instead, the pressure is put on examples and step by step explanation of difficult topics. This contribution gives an overview of selected quantum machine learning algorithms, however there is also a method of scores extraction for quantum PCA algorithm proposed as well as a new cost function in feed-forward quantum neural networks is introduced. The text is divided into four parts: the first part explains the basic quantum theory, then quantum computation and quantum computer architecture are explained in section two. The third part presents quantum algorithms which will be used as subroutines in quantum machine learning algorithms. Finally, the fourth section describes quantum machine learning algorithms with the use of knowledge accumulated in previous parts.
2012
In explorative data analysis, the data under consideration often resides in a high-dimensional (HD) data space. Currently many methods are available to analyze this type of data. So far, proposed automatic approaches include dimensionality reduction and cluster analysis, whereby visual-interactive methods aim to provide effective visual mappings to show, relate, and navigate HD data. Furthermore, almost all of these methods conduct the analysis from a singular perspective, meaning that they consider the data in either the original HD data space, or a reduced version thereof. Additionally, HD data spaces often consist of combined features that measure different properties, in which case the particular relationships between the various properties may not be clear to the analysts a priori since it can only be revealed if appropriate feature combinations (subspaces) of the data are taken into consideration. Considering just a single subspace is, however, often not sufficient since different subspaces may show complementary, conjointly, or contradicting relations between data items. Useful information may consequently remain embedded in sets of subspaces of a given HD input data space. Relying on the notion of subspaces, we propose a novel method for the visual analysis of HD data in which we employ an interestingness-guided subspace search algorithm to detect a candidate set of subspaces. Based on appropriately defined subspace similarity functions, we visualize the subspaces and provide navigation facilities to interactively explore large sets of subspaces. Our approach allows users to effectively compare and relate subspaces with respect to involved dimensions and clusters of objects. We apply our approach to synthetic and real data sets. We thereby demonstrate its support for understanding HD data from different perspectives, effectively yielding a more complete view on HD data.
arXiv (Cornell University), 2022
There are several approaches in trying to solve the Quantitative Structure-Activity (QSAR) problem. These approaches are based either on statistical methods or on predictive data mining using neural networks. Among the statistical methods, one should consider regression analysis, pattern recognition (such as cluster analysis, factor analysis and principal components analysis) or partial least squares. These approaches have a low explanatory capability or non at all. This paper attempts to establish a new approach in solving QSSAR problems using descriptive data mining. This way, the relationship between the chemical properties and the activity of a substance would be comprehensibly modeled.
npj Quantum Information, 2021
Quantum kernel methods show promise for accelerating data analysis by efficiently learning relationships between input data points that have been encoded into an exponentially large Hilbert space. While this technique has been used successfully in small-scale experiments on synthetic datasets, the practical challenges of scaling to large circuits on noisy hardware have not been thoroughly addressed. Here, we present our findings from experimentally implementing a quantum kernel classifier on real high-dimensional data taken from the domain of cosmology using Google’s universal quantum processor, Sycamore. We construct a circuit ansatz that preserves kernel magnitudes that typically otherwise vanish due to an exponentially growing Hilbert space, and implement error mitigation specific to the task of computing quantum kernels on near-term hardware. Our experiment utilizes 17 qubits to classify uncompressed 67 dimensional data resulting in classification accuracy on a test set that is ...
Research Square (Research Square), 2024
This research explores the potential of quantum computing in data analysis, focusing on the efficient analysis of high-dimensional quantum datasets using dimensionality reduction techniques. The study aims to fill the knowledge gap by developing robust quantum dimensionality reduction techniques that can mitigate noise and errors. The research methodology involved a comprehensive review and analysis of existing quantum dimensionality reduction techniques, such as quantum principal component analysis, quantum linear discriminant analysis and quantum generative models. The study also explored the limitations imposed by NISQ devices and proposed strategies to adapt these techniques to work efficiently within these constraints. The key results demonstrate the potential of quantum dimensionality reduction techniques to effectively reduce the dimensionality of high-dimensional quantum datasets while preserving critical quantum information. The evaluation of quantum principal component analysis, quantum linear discriminant analysis and quantum generative models showed their effectiveness in improving quantum data analysis, particularly in improving simulation speed and predicting properties. Despite the challenges posed by noise and errors, robust quantum dimensionality reduction methods showed promise in mitigating these effects and preserving quantum information. Finally, this research contributes to the advancement of quantum data analysis by presenting a comprehensive analysis of quantum dimensionality reduction techniques and their applications. It highlights the importance of developing robust quantum feature learning methods that can operate efficiently in noisy quantum environments, especially in the NISQ era.
2023
Quantum and quantum-inspired machine learning has emerged as a promising and challenging research field due to the increased popularity of quantum computing, especially with near-term devices. Theoretical contributions point toward generative modeling as a promising direction to realize the first examples of real-world quantum advantages from these technologies. A few empirical studies also demonstrate such potential, especially when considering quantum-inspired models based on tensor networks. In this work, we apply tensornetwork-based generative models to the problem of molecular discovery. In our approach, we utilize two small molecular datasets: a subset of 4989 molecules from the QM9 dataset and a small in-house dataset of 516 validated antioxidants from TotalEnergies. We compare several tensor network models against a generative adversarial network using different samplebased metrics, which reflect their learning performances on each task, and multiobjective performances using 3 relevant molecular metrics per task. We also combine the output of the models and demonstrate empirically that such a combination can be beneficial, advocating for the unification of classical and quantum(-inspired) generative learning.
International Journal of Knowledge Content Development & Technology , 2021
The study provides a quantitative and qualitative description of global research in the domain of quantum machine learning (QML) as a way to understand the status of global research in the subject at the global, national, institutional, and individual author level. The data for the study was sourced from the Scopus database for the period 1999-2020. The study analyzed global research output (1374 publications) and global citations (22434 citations) to measure research productivity and performance on metrics. In addition, the study carried out bibliometric mapping of the literature to visually represent network relationship between key countries, institutions, authors, and significant keyword in QML research. The study finds that the USA and China lead the world ranking in QML research, accounting for 32.46% and 22.56% share respectively in the global output. The top 25 global organizations and authors lead with 35.52% and 16.59% global share respectively. The study also tracks key research areas, key global players, most significant keywords, and most productive source journals. The study observes that QML research is gradually emerging as an interdisciplinary area of research in computer science, but the body of its literature that has appeared so far is very small and insignificant even though 22 years have passed since the appearance of its first publication. Certainly, QML as a research subject at present is at a nascent stage of its development.
arXiv (Cornell University), 2016
International Journal of Information Quality, 2007
The use of tools in data and information quality diagnosis and improvement projects is desirable to automate clerical tasks and in some cases it is mandatory to be effective. In this work we present a toolkit that we developed in the context of the NEAT methodology, that provides a systematic way of assessing Data Quality. The novelty of the proposal is given by the construction of many independent single-purpose tools, which are combined into a powerful kit, and its insertion in the context of a methodology that follows the best practices accepted in the community. We also present the outcome of its use in actual projects.
ArXiv, 2018
Clustering is a complex process in finding the relevant hidden patterns in unlabeled datasets, broadly known as unsupervised learning. Support vector clustering algorithm is a well-known clustering algorithm based on support vector machines and Gaussian kernels. In this paper, we have investigated the support vector clustering algorithm in quantum paradigm. We have developed a quantum algorithm which is based on quantum support vector machine and the quantum kernel (Gaussian kernel and polynomial kernel) formulation. The investigation exhibits approximately exponential speed up in the quantum version with respect to the classical counterpart.
The drug discovery process is a rigorous and time-consuming endeavor, typically requiring several years of extensive research and development. Although classical machine learning (ML) has proven successful in this field, its computational demands in terms of speed and resources are significant. In recent years, researchers have sought to explore the potential benefits of quantum computing (QC) in the context of ML, leading to the emergence of Quantum Machine Learning (QML) as a distinct research field. The objective of the current study is twofold: first, to present a review of the proposed QML algorithms for application in the drug discovery pipeline, and second, to compare QML algorithms with their classical and hybrid counterparts in terms of their efficiency. A query-based search of various databases took place, and five different categories of algorithms were identified in which QML was implemented. The majority of QML applications in drug discovery are primarily focused on the...
2022
One of the most promising areas of research to obtain practical advantage is Quantum Machine Learning which was born as a result of cross-fertilisation of ideas between Quantum Computing and Classical Machine Learning. In this paper, we apply Quantum Machine Learning (QML) frameworks to improve binary classification models for noisy datasets which are prevalent in financial datasets. The metric we use for assessing the performance of our quantum classifiers is the area under the receiver operating characteristic curve (ROC/AUC). By combining such approaches as hybrid-neural networks, parametric circuits, and data re-uploading we create QML inspired architectures and utilise them for the classification of non-convex 2 and 3-dimensional figures. An extensive benchmarking of our new FULL HYBRID classifiers against existing quantum and classical classifier models, reveals that our novel models exhibit better learning characteristics to asymmetrical Gaussian noise in the dataset compared...
ArXiv, 2019
Quantum Clustering is a powerful method to detect clusters in data with mixed density. However, it is very sensitive to a length parameter that is inherent to the Schrodinger equation. In addition, linking data points into clusters requires local estimates of covariance that are also controlled by length parameters. This raises the question of how to adjust the control parameters of the Schrodinger equation for optimal clustering. We propose a probabilistic framework that provides an objective function for the goodness-of-fit to the data, enabling the control parameters to be optimised within a Bayesian framework. This naturally yields probabilities of cluster membership and data partitions with specific numbers of clusters. The proposed framework is tested on real and synthetic data sets, assessing its validity by measuring concordance with known data structure by means of the Jaccard score (JS). This work also proposes an objective way to measure performance in unsupervised learni...
npj Quantum Information, 2021
Tensor Networks, a numerical tool originally designed for simulating quantum many-body systems, have recently been applied to solve Machine Learning problems. Exploiting a tree tensor network, we apply a quantum-inspired machine learning technique to a very important and challenging big data problem in high-energy physics: the analysis and classification of data produced by the Large Hadron Collider at CERN. In particular, we present how to effectively classify so-called b-jets, jets originating from b-quarks from proton–proton collisions in the LHCb experiment, and how to interpret the classification results. We exploit the Tensor Network approach to select important features and adapt the network geometry based on information acquired in the learning process. Finally, we show how to adapt the tree tensor network to achieve optimal precision or fast response in time without the need of repeating the learning process. These results pave the way to the implementation of high-frequenc...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.