QML Classifier
QML Classifier
Research
Abstract
Quantum machine learning (QML) algorithms have demonstrated the power of quantum computing for solving complex
problems and big data in certain tasks. In this study, we explore the capabilities of QML for the classification of real-world
biological large datasets including ten different cancer types based on gene expression values. By comparing the clas-
sification results obtained from the quantum algorithm with those from classical approaches, we disclose that the QML
algorithm overall achieves comparable and reliable results. Moreover, we identify novel biomarkers that can contribute
to the understanding of cancer biology. Some of these biomarkers are consistent with DNA promoter methylation.
Our findings highlight the potential of QML in cancer classification and biomarker discovery, paving the way for future
advancements in other disease research and clinical applications.
Article highlights
• QML could be implemented on real-world datasets to classify cancer types and identify biomarkers.
• QSVM outperformed some classical models in classification of ten cancer datasets.
• The novel biomarkers were found using quantum machine learning approach.
• Findings demonstrate the potential of QML in medical research and biomarker discovery.
1 Introduction
Quantum machine learning (QML) is an emerging and rapidly growing research field that lies at the intersection of
quantum computing and machine learning (ML). In recent years, QML has attracted considerable attention due to its
potential to enhance algorithms in certain tasks for faster problem-solving, higher accuracy achievement, and lower
energy consumption. Quantum computing based on the principles of quantum mechanics such as superposition and
entanglement, provides enhancements over traditional computing capabilities [1–3]. Due to the property of superposi-
tion, quantum bits (qubits) can exist in multiple states simultaneously, allowing quantum computers to process a vast
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s42452-024-
06220-6.
Vol.:(0123456789)
Research Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6
number of possibilities at once. This capability enables them to explore numerous combinations and find solutions
much faster for specific problems. In addition, quantum entanglement allows qubits to be correlated with each other
in ways that classical bits cannot be. This quantum correlation can enhance the performance of quantum algorithms
for modeling more complex relationships and interactions. These quantum properties enable quantum computers to
tackle problem-solving issues by handling them with enormous datasets and high-dimensional spaces more effectively
than classical computers. This capability makes quantum computers and quantum algorithms ideal for applications in
machine learning and optimization.
On the other hand, machine learning algorithms create models that learn and identify patterns from data in diverse
domains, such as scientific research, technical applications, and industrial settings [4–6]. In the healthcare domain,
machine learning algorithms have been widely employed for classifying biological data, identifying novel biomarkers,
and exploring new disease subtypes.
Despite the significant success of traditional machine learning models, these approaches face challenges in han-
dling large datasets and complex models with extensive computational operations. Dealing with large datasets can be
time-consuming and expensive for them, limiting their scalability. Also, they may struggle with certain types of data
with complex relationships and patterns, restricting their effectiveness. Quantum machine learning offers a promising
solution by leveraging the unique properties of quantum systems to improve computational efficiency, handle complex
data, and enhance scalability [7].
QML presents a significant potential to effectively influence biological and medical data analysis with the aim of mini-
mizing the occurrence of medical mistakes [8]. Hence, utilizing the power of QML algorithms can enhance the analysis of
medical data to improve the detection of early-stage diseases. In turn, it leads to enhancement of patient management,
cost reduction, and ultimately, better treatment outcomes [9].
In an investigation focused on utilizing QML for classifying a particular type of heart disease, a comparison between
quantum classifiers and classical algorithms was conducted using a cardiovascular dataset. In the study, it was claimed
that the accuracy of the QML algorithms demonstrated an overall improvement [10]. One popular algorithm in QML for
classification tasks is the Quantum Support Vector Machine (QSVM) which is considered a quantum counterpart of the
classical Support Vector Machine (SVM) algorithm. QSVM due to its quantum computing capability has the potential to
influence the classification metrics to provide better identification and diagnosis
In QSVM, quantum kernels are used for computational operations that are difficult to perform in related classical
methods [11, 12]. The utilization of quantum kernels allows QSVM to explore the power of quantum computing in
machine learning, particularly concerning kernel functions [13]. In addition, a kernel-based support vector machine was
implemented to achieve higher accuracy compared to the classical SVM [14].
On real-world biological systems with their particular complexity and high dimensionality, a support vector machine
with quantum kernel functions was employed to investigate the effect of feature engineering on classification accuracy
[15]. For implementing quantum algorithms, it is essential to take into account the current quantum computational
limitations. It means that the full potential of quantum methods is not universally accessible, and also the achieved
results may not be generalizable across all systems. Hence, the predicted outcomes have only been observed in certain
cases for demonstrating superior performance of quantum models compared to classical methods. For instance, in an
investigation, the evaluation of various quantum machine learning techniques on five distinct datasets revealed that,
on average, the QSVM algorithm consistently outperformed the classical counterparts in terms of accuracy by a few
percentage points [16].
While, in another experiment on the Diabetes disease dataset, a comparative analysis was conducted to assess the
performance of different QML models compared to a classical algorithm. The classical method demonstrated a slight
superiority over the quantum algorithms with respect to overall accuracy [17].
In this research, we implemented the quantum support vector machine algorithm on biological high-dimensional
real-world datasets to classify various cancer types and identify novel biomarkers. In comparison between the obtained
results from quantum machine learning algorithms and the classical ones, the overall classification analysis showed that
for noticeable number of datasets, the quantum model achieved superior results. Considering the limitation of quantum
computers for the qubit numbers and the high dimensionality of gene expression data as features, we executed our
QSVM algorithm on a classical quantum simulator.
Vol:.(1234567890)
Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6 Research
2 Methods
In this section, we present our employed quantum machine learning method for classifying ten cancers and identifying
novel biomarkers. We begin by providing an overview of the fundamental concepts of quantum computing. Next, we
introduce quantum machine learning, which includes the use of quantum kernels. For comparing the results of our QSVM
algorithm, we applied some other classical machine learning models including SVM (kernel=’linear’), Random Forest
(n_estimators: 100, min_samples_split: 2, bootstrap: True), Extreme Gradient Boosting (n_estimators: 100, max_depth:
6, learning_rate: 0.3, gamma: 0), Decision Tree (min_samples_split: 2, min_samples_leaf: 1, criterion: ‘gini’), and Logis-
tic Regression (C: 1.0, penalty: ‘l2’, solver: ’lbfgs’, max_iter: 100, fit_intercept: True). Finally, we present and discuss the
outcomes.
2.1 Quantum Computing
In classical computing, information is stored in bits as either 0 or 1. However, in quantum computing, the basic unit of
information is stored as qubits, which are quantum bits. Mathematically, a qubit is represented as a vector in a two-
dimensional complex Hilbert space, with the computational basis �0⟩ and �1⟩ [2]. Unlike classical bits, a qubit can exist in
one of three states: �0⟩, �1⟩, or a superposition, which is a linear combination of the basis states, denoted as c0 �0⟩ + c1 �1⟩.
Quantum entanglement, a significant phenomenon in quantum mechanics, arises from the concept of superposition.
It refers to the correlation between two or more quantum states, where each state can influence the others, even when
physically separated by a great distance. Measurement of one entangled particle causes the collapse of its superposition
state, instantaneously affecting the others. These quantum characteristics imply that entangled qubit states can store
more information than individual ones. This property provides quantum computing with the potential for exponential
speedup in solving traditionally difficult problems.
Quantum machine learning combines quantum algorithms with classical machine learning methods to enhance com-
putational complexity. QML algorithms consist of quantum classification, quantum clustering, quantum regression,
quantum optimization, and several other models. These methods have developed with the aim of utilizing quantum
computing principles to potentially outperform their classical counterparts in terms of speed and efficiency. However,
these algorithms are still in the early stages of development and require further research and experimentation. Particu-
larly, quantum classification algorithms seek to classify data points into different classes or categories. One of the key
quantum classification algorithms is the QSVM, which uses quantum computing principles to find the optimal hyperplane
that separates different classes in the feature space [18]. QSVM has the potential to outperform classical SVMs in certain
cases by leveraging quantum parallelism and interference.
Various strategies have been developed to create QML algorithms, which serve as quantum counterparts to clas-
sical machine learning techniques [19]. Quantum-inspired machine learning techniques leverage quantum comput-
ing methods to improve classical machine learning algorithms [20]. Quantum-enhanced machine learning algorithms
involve executing classical data on quantum computers [21]. The potential of quantum computers lies in their ability
to utilize qubits, which provide an exponential computational space compared to classical computers operating in
the Boolean space. Hybrid classical-quantum machine learning algorithms merge classical and quantum methods to
enhance model performance [22]. In this research, we employ a quantum machine learning algorithm based on the
support vector machine method to classify high-dimensional cancer data and identify biomarkers. Our approach falls
into the category of hybrid classical-quantum machine learning algorithms, where we simulate the quantum algorithm
on a classical computer.
Quantum support vector machine algorithm utilizes the benefits of quantum computing to improve the performance
of classical SVM algorithms, which is a popular supervised machine learning algorithm commonly used for classifica-
tion problems [23]. It operates by separating data points into the distinct classes through the identification of optimal
Vol.:(0123456789)
Research Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6
hyperplanes. The optimal hyperplane is determined by maximizing the margin between different classes, effectively
ensuring the greatest distance from the nearest data points of each class. The SVM algorithms use kernels to classify the
data that is not linearly separable. Kernel functions map data points into a higher-dimensional space for easier solving
non-linearly separable problems using feature maps [24]. Depending on the data characteristics, some common kernel
functions that are used in SVM include linear, polynomial, and radial basis function kernels. The introduction of quantum
models into classical kernel methods gives rise to quantum kernels, which form the foundation of QSVM. In QML, quan-
tum kernels are applied to map classical data into quantum states within a Hilbert space through the implementation
of quantum feature maps [25].
In the QML framework, classical data points x from the input dataset χ are encoded into quantum states �ψ(x)⟩ within
a higher-dimensional Hilbert space (H) during the quantum feature mapping process [26, 27]. Mathematically, this can
be represented as:
ψ ∶ χ → H;x ∈ χ, �ψ(x)⟩ ∈ H (1)
where ψ denotes a quantum feature encoding that employs a mathematical procedure for mapping data.
The quantum kernel counterparts of classical kernels are constructed based on the kernel function to build a quantum
classifier. The kernel function κ calculates the inner product between two quantum states that represents two mapped
data points x, x ′ in combination with a quantum feature map. It is defined as:
� � � �
κ x, x � = ⟨ψ(x)�ψ x � ⟩H ;x, x � ∈ χ (2)
Here, ⟨.�.⟩ denotes the dot product [28]. By applying quantum kernels, the transformation of classical input data into
quantum states can be represented through an encoding circuit [12]:
� � �
�ψ(x)⟩ = Uψ (x)�0⊗n ⟩;Uψ (x) = Uψ (x)H⊗n (3)
� �
d
In this equation, Uψ (x) indicates a quantum feature encoding circuit consisting of H (Hadamard gates) and Uψ (x)
(unitary operations containing Pauli gates). The variable n denotes the number of qubits used for encoding, while d rep-
resents the depth of the circuit for creating the quantum feature map [12]. The initial state for n qubits is �0⊗n ⟩ = �0 … 0⟩.
The quantum computing process involves modifying qubits using quantum gates, which integrate traditional logical
operators and gates. Quantum gates, such as Hadamard gates and controlled Pauli gates, serve as the fundamental
components of quantum circuits. Hadamard gates facilitate the transition of qubits into a superposition state, while
controlled Pauli gates enable rotations of qubits around the x, y, and z axes, thereby facilitating quantum entanglement
among qubits [16]. To estimate an observable value within a quantum circuit, multiple measurements are performed,
and the resultant output is utilized as a prediction [29].
2.4 Our approach
In our research, we utilized the QML scikit-qulacs package [30] to classify different types of cancers and identify biomark-
ers. This quantum machine learning algorithm is based on Qulacs [31], one of the fastest quantum circuit simulators.
Qulacs is designed to assist researchers in quantum computing operations and has demonstrated significant speed-up
in exploring and performing algorithms with high accuracy. The scikit-qulacs package introduces a quantum machine
learning algorithm that utilizes the quantum circuit simulator Qulacs. In this QML algorithm, quantum circuits are created
using the “Learning Circuit” function from the "skqulacs.circuit" sub-package in the “skqulacs” package [32]. A quantum
circuit, represented as an array of quantum gates, is constructed using the “ParametricQuantumCircuit” class. The “cre-
ate_yzcx_ansatz” function from the “skqulacs.circuit.pre_defined” sub-package is used to generate quantum gates. The
resulting circuit can be visualized using the “circuit_drawer” function from the "qulacsvis" package [33]. The classification
process involves quantum support vector classification, which is executed using the "QSVM" function from the “skqulacs.
qsvm” package [34]. We implemented the “skqulacs.qsvm” package in our data to take advantage of its capabilities in
performing classification.
Furthermore, we employed the Fisher score algorithm for feature selection. This algorithm is a dimensional reduc-
tion and feature-ranking approach commonly used for selecting genes from expression profile data. The procedure for
feature selection is as follows: Given X ∈ ℝm×n, a matrix of gene expression data (where m and n represent the number
Vol:.(1234567890)
Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6 Research
of genes and samples, respectively), and NG = (U, C, D, δ), a neighborhood decision system for gene expression data,
the Fisher score is calculated as:
tr (Ab )
f(Z) = (4)
tr (Aw )
Here, tr() denotes the trace of a matrix, Aw represents the scatter matrix within the same class, and Ab represents the
scatter matrix between cancer samples and their paired normal samples. An exploratory strategy is commonly employed
to compute the score for each gene using similar criteria. The Fisher score for the k-th gene is then obtained as:
∑C 2
n (μki
i=1 i
− μk )
f(i) = ∑C 2 (5)
n (σki )
i=1 i
Here, ni refers to the sample number of the i-th class, μki and σki represent the mean and standard deviation of the
samples from the i-th category with the k-th gene, respectively, and μk is the mean magnitude of the k-th gene samples
[35, 36]. We wrote and executed the code in a Python 3.11.3 environment.
3 Data collection
We downloaded GDC TCGA RNA-seq (HTSeq) counts of 10 cancers including Bladder Cancer (BLCA), Colon Adenocarci-
noma (COAD), Head and Neck Squamous Cell Carcinoma (HNSC), Kidney Renal Clear Cell Carcinoma (KIRC), Liver hepa-
tocellular carcinoma (LIHC), Lung Adenocarcinoma (LUAD), Prostate Adenocarcinoma (PRAD), Rectal Adenocarcinoma
(READ), Stomach Adenocarcinoma (STAD), and Thyroid Cancer (THCA) from UCSC Xena database (https://xenabrowser.
net/). A total of 4785 tumor and healthy samples have been employed. The details of the samples are described in Table 1.
The Ensemble IDs were mapped to official gene symbols. The Ensemble IDs without gene names were excluded. The
database contains logarithm-transformed data. For this reason, before starting the gene filtering, we converted them
back to integer raw counts.
In the data preprocessing stage, we applied several methods for gene filtering and determining differentially expressed
genes (DEGs). First, we removed all rows with a zero-sum. Next, we discarded genes with variances less than the 1st
quartile. Subsequently, we performed TMM normalization and log2 transformation on the data. To identify DEGs, we
employed the edgR and limma voom methods using the edgeR (version 3.36.0) and limma (3.50.0) packages in the R
programming environment (version 4.3.1), respectively. The criteria we considered were a Benjamini-Hochberg adjusted
p − value < 0.05 [37] and |logFC | > 3. Finally, we identified the common DEGs identified by both methods for further
analysis. We excluded long non-coding RNAs from the analysis.
Table 1 Details of the cancer Cancer # Tumor # Paired normal # Train samples #Train samples # Test samples
datasets samples samples after SMOTE
Vol.:(0123456789)
Research Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6
To prepare the data for classification, we randomly partitioned the dataset into train and test sets using Python 3.11.3
in a 70/30 ratio. To address the imbalance between normal and cancer classes in the training dataset, we employed
Synthetic Minority Over-Sampling Technique (SMOTE). SMOTE generates synthetic data points in the feature space
by utilizing the K-Nearest Neighbor (KNN) algorithm [38]. The number of train and test samples have mentioned in
Table 1.
In our study, we performed some classical models including classical SVM (CSVM), RF, DT, XGBoost, and LR, as well
as QSVM algorithm to classify ten types of cancer and identify biomarkers. The analysis was simulated on a classi-
cal computer. The following sections provide a detailed explanation of the analysis steps and present the obtained
results.
4.1 Determination of DEGs
Table 2 presents the number of genes after undergoing various filtering steps (as described in the Data Collection
section), as well as the number of DEGs identified by edgeR, limma, and the common DEGs identified by both meth-
ods (Supplementary data file 1). For further analysis, we focused on the shared DEGs as a strict criterion for selecting
genes to be included in the machine learning analysis step.
Features CMTM3 CDH3 MMP11 AQP2 MARCO FHL1 CLCA2 CDH3 MYOC LRP4
PRKCE CEMIP CA9 UMOD OIT3 CAV1 DLX1 GRIN2D COL10A1 GABRB2
RRM2 KRT80 ADIPOQ MUC15 FCN3 FABP4 SLC45A2 FOXQ1 ACAN KLHDC8A
EEF1A1P5 ETV4 ADH1B HS6ST2 CRHBP PYCR1 PCA3 KRT80 CLEC3B METTL7B
RPLP06 OTOP2 SH3BGRL2 SLC9A4 CLEC4G AGER AMACR ETV4 CST1 PRR15
Vol:.(1234567890)
Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6 Research
The Fisher score, a supervised feature selection approach, was used to rank the genes based on their importance.
We selected the top 5 ranked genes based on their Fisher scores as indicated in Table 3. The expression amount of
each sample served as input data for both classical and quantum machine learning methods. The bold ones are new
biomarkers for cancer diagnosis.
For QSVM, we also applied five qubits according to the top features ranked by the Fisher score. We also applied differ-
ent depth layers (1-20), and selected the ones with the higher obtained accuracy. Figure 1 demonstrates the constructed
circuits for 1, 3, 4 layers as examples.
We used parameterized quantum circuit (PQC) structure to demonstrate the quantum circuit for each cancer data. In
PQC, we used YZ-CX circuit constructions consisting of single-qubit y and z rotations and two-qubit entangling gates.
Each quantum circuit is defined with N qubits that are used for implementing quantum machine learning algorithms.
Depending on the cost of applying each element, we defined every circuit with l layers that show the repetition of the
circuit’s basic block. Table 4 and Fig. 2 present the test accuracy, precision, recall, F1-score, and AUC, while Figure S1 illus-
trates the receiver operating characteristic (ROC) curves generated by classical machine learning algorithms and QSVM.
The results show that LR and RF leads to the better classification performances among classical machine learning
algorithms. Logistic regression, while is a simpler linear model, has been widely utilized in cancer research due to its
interpretability and ease of implementation [39]. Studies have shown that it can achieve comparable results to more
complex models in specific scenarios, particularly in low-dimensional datasets [40]. However, its limitations become
evident in high-dimensional settings, where it may struggle to capture the intricate relationships between features effec-
tively [41]. As we firstly employed the feature selection Fisher approach, it could effectively classify the cancer datasets.
On the other hand, the ensemble nature of RF allows it to manage high feature-to-sample ratios effectively, making it
particularly suitable for cancer classification tasks where the data can be complex and noisy [41].
The classification reports also demonstrate the good classification performance of the QSVM-Kernel method. The
results also illustrate that our quantum-integrated workflow provides a clear advantage over classical algorithms in
terms of accuracy when applied to identical datasets, particularly in cases including HNSC, KIRC, LIHC, LUAD, READ, and
STAD. The larger AUC indicates that the quantum support vector machine (QSVM) demonstrates superior performance
for COAD, HNSC, LIHC, PRAD, READ, THCA, and STAD. While previous studies have provided valuable insights into the
application of quantum models for cancer classification, there are still some limitations such as the number of qubits that
need to be addressed. In one study [42], a hybrid quantum-mechanical system with two qubits was employed to encode,
process, and classify images of cancerous and non-cancerous pigmented skin lesions. This approach demonstrated the
Fig. 1 The circuit constructed by five qubits and a 1 and b 3, and c 4 layers
Vol.:(0123456789)
Research Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6
Table 4 The test accuracy, Classification Accuracy Precision Recall F1-score AUC
precision, recall, F1-score, and method
ACU obtained by classic ML
algorithms and QSVM The SVM BLCA: 0.97 BLCA: 0.78 BLCA: 0.89 BLCA: 0.83 BLCA: 1.00
terms “Ln” denote the applied COAD: 0.99 COAD: 0.93 COAD: 0.99 COAD: 0.96 COAD: 0.90
depth layer, in which “n” HNSC: 0.99 HNSC: 0.93 HNSC: 0.99 HNSC: 0.96 HNSC: 0.99
represent the number of layer KIRC: 0.96 KIRC: 0.90 KIRC: 0.98 KIRC: 0.94 KIRC: 0.98
LIHC: 0.96 LIHC: 0.90 LIHC: 0.92 LIHC: 0.91 LIHC: 0.92
LUAD: 0.98 LUAD: 0.91 LUAD: 0.99 LUAD: 0.95 LUAD: 0.99
PRAD: 0.94 PRAD: 0.83 PRAD: 0.97 PRAD: 0.88 PRAD: 0.97
READ: 1.00 READ: 1.00 READ: 1.00 READ: 1.00 READ: 1.00
STAD: 1.00 STAD: 1.00 STAD: 1.00 STAD: 1.00 STAD: 1.00
THCA: 0.95 THCA: 0.89 THCA: 0.94 THCA: 0.91 THCA: 0.94
RF BLCA: 0.99 BLCA: 1.00 BLCA: 0.90 BLCA: 0.94 BLCA: 0.90
COAD: 1.00 COAD: 1.00 COAD: 1.00 COAD: 1.00 COAD: 1.00
HNSC: 0.99 HNSC: 0.93 HNSC: 0.99 HNSC: 0.96 HNSC: 0.99
KIRC: 0.99 KIRC: 0.98 KIRC: 1.00 KIRC: 0.99 KIRC: 1.00
LIHC: 0.97 LIHC: 0.92 LIHC: 0.92 LIHC: 0.92 LIHC: 0.92
LUAD: 0.98 LUAD: 0.97 LUAD: 0.94 LUAD: 0.95 LUAD: 0.94
PRAD: 0.96 PRAD: 0.90 PRAD: 0.94 PRAD: 0.92 PRAD: 0.94
READ: 1.00 READ: 1.00 READ: 1.00 READ: 1.00 READ: 1.00
STAD: 0.98 STAD: 0.99 STAD: 0.89 STAD: 0.93 STAD: 0.89
THCA: 0.98 THCA: 0.97 THCA: 0.93 THCA: 0.95 THCA: 0.93
XGBoost BLCA: 0.98 BLCA: 0.99 BLCA: 0.80 BLCA: 0.87 BLCA: 0.80
COAD: 1.00 COAD: 1.00 COAD: 1.00 COAD: 1.00 COAD: 1.00
HNSC: 0.98 HNSC: 0.92 HNSC: 0.95 HNSC: 0.94 HNSC: 0.95
KIRC: 0.99 KIRC: 0.98 KIRC: 1.00 KIRC: 0.99 KIRC: 1.00
LIHC: 0.94 LIHC: 0.91 LIHC: 0.76 LIHC: 0.82 LIHC: 076
LUAD: 0.98 LUAD: 0.99 LUAD: 0.92 LUAD: 0.95 LUAD: 0.92
PRAD: 0.95 PRAD: 0.90 PRAD: 0.86 PRAD: 0.88 PRAD: 0.86
READ: 0.98 READ: 0.83 READ: 0.99 READ: 0.90 READ: 0.99
STAD: 0.98 STAD: 0.99 STAD: 0.89 STAD: 0.93 STAD: 0.89
THCA: 0.97 THCA: 0.95 THCA: 0.93 THCA: 0.94 THCA: 0.93
DT BLCA: 0.97 BLCA: 0.79 BLCA: 0.79 BLCA: 0.79 BLCA: 0.79
COAD: 1.00 COAD: 1.00 COAD: 1.00 COAD: 1.00 COAD: 1.00
HNSC: 0.98 HNSC: 0.92 HNSC: 0.95 HNSC: 0.94 HNSC: 0.95
KIRC: 0.97 KIRC: 0.90 KIRC: 0.98 KIRC: 0.94 KIRC: 0.98
LIHC: 0.95 LIHC: 0.91 LIHC: 0.86 LIHC: 0.88 LIHC: 0.86
LUAD: 0.98 LUAD: 0.96 LUAD: 0.92 LUAD: 0.94 LUAD: 0.92
PRAD: 0.89 PRAD: 0.75 PRAD: 0.83 PRAD: 0.78 PRAD: 0.83
READ: 1.00 READ: 1.00 READ: 1.00 READ: 1.00 READ: 1.00
STAD: 0.98 STAD: 0.93 STAD: 0.88 STAD: 0.91 STAD: 0.88
THCA: 0.96 THCA: 0.93 THCA: 0.93 THCA: 0.93 THCA: 0.93
THCA: 0.93 THCA: 0.93 THCA: 0.93 THCA: 0.93
LR BLCA: 0.99 BLCA: 1.00 BLCA: 0.90 BLCA: 0.94 BLCA: 0.90
COAD: 1.00 COAD: 1.00 COAD: 1.00 COAD: 1.00 COAD: 1.00
HNSC: 0.99 HNSC: 0.93 HNSC: 0.99 HNSC: 0.96 HNSC: 0.99
KIRC: 0.99 KIRC: 0.98 KIRC: 1.00 KIRC: 0.99 KIRC: 1.00
LIHC: 0.97 LIHC: 0.92 LIHC: 0.92 LIHC: 0.92 LIHC: 0.92
LUAD: 0.99 LUAD: 0.97 LUAD: 1.00 LUAD: 0.99 LUAD: 1.00
PRAD: 0.97 PRAD: 0.94 PRAD: 0.92 PRAD: 0.93 PRAD: 0.92
READ: 1.00 READ: 1.00 READ: 1.00 READ: 1.00 READ: 1.00
STAD: 0.98 STAD: 0.99 STAD: 0.89 STAD: 0.93 STAD: 0.89
THCA: 0.96 THCA: 0.94 THCA: 0.91 THCA: 0.92 THCA: 0.91
QSVM BLCA: L13: 0.98 BLCA: 0.83 BLCA: 0.89 BLCA: 0.86 BLCA: 0.89
COAD: L1: 0.99 COAD: 0.93 COAD: 0.99 COAD: 0.96 COAD: 1.00
HNSC: L3: 0.99 HNSC: 0.93 HNSC: 0.99 HNSC: 0.96 HNSC: 0.99
KIRC: L17:0.99 KIRC: 0.96 KIRC: 0.99 KIRC: 0.97 KIRC: 0.99
LIHC: L19: 0.97 LIHC: 0.91 LIHC: 0.95 LIHC: 0.93 LIHC: 0.95
LUAD: L8: 0.99 LUAD: 0.95 LUAD: 0.99 LUAD: 0.97 LUAD: 0.99
PRAD: L1: 0.95 PRAD: 0.84 PRAD: 0.97 PRAD: 0.89 PRAD: 0.97
READ: L1: 1.00 READ: 1.00 READ: 1.00 READ: 1.00 READ: 1.00
STAD: L1: 1.00 STAD: 1.00 STAD: 1.00 STAD: 1.00 STAD: 1.00
THCA: L4: 0.97 THCA: 0.93 THCA: 0.95 THCA: 0.94 THCA: 0.95
Vol:.(1234567890)
Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6 Research
Fig. 2 The test accuracy, precision, recall, and F1-score obtained by classic ML algorithms and QSVM
potential of quantum machine learning in image-based cancer classification. Furthermore, researchers used the QSVM
model to distinguish between malignant breast cancer tumors and non-cancerous benign tumors [43]. They applied
principal component analysis for feature reduction, taking into account the limited number of available qubits. However,
we provided a stepwise approach containing a feature selection method and then applied QSVM with a negotiable num-
ber of qubits. It led to the finding of novel biomarkers along with confirmation of the previously identified biomarkers.
Essentially, different types of cancer can be considered real-world systems whose structures consist of a large number
of variety parameters. This means that classifying these high-dimensional systems is complicated and time-consuming
on classical computers [44–46]. While conventional computers can handle cancer classification tasks, quantum comput-
ers have the potential to solve complex high-dimensional problems more efficiently. Quantum computers, due to their
unique capabilities such as superposition and entanglement, offer advantages over traditional computers in certain cases,
particularly for cancer type classification. One key advantage of quantum computers for cancer type classification is their
ability to handle high-dimensional data more effectively than classical computers. Quantum SVMs can represent data in
high-dimensional spaces efficiently, which is crucial for analyzing complex cancer genomics datasets [47]. Furthermore,
quantum SVMs have shown potential for achieving higher classification accuracy in shorter training times compared to
classical SVMs, especially for non-linearly separable datasets. The quantum kernel methods used in QSVM can lead to
better performance than classical kernels in certain tasks. As quantum hardware continues to improve, there is increasing
potential for quantum SVMs to outperform classical SVMs for specific cancer classification tasks. Quantum advantage has
already been demonstrated in simulations and on real quantum devices using error mitigation techniques.
Additionally, quantum computers can handle larger datasets more effectively than classical computers due to their
parallelization capabilities. This is particularly relevant for cancer classification, where large genomic and imaging
datasets are becoming increasingly available. QML models are designed to classify different cancer types and identify
biomarkers applicable in various real-life situations [44, 46]. QML techniques can enhance personalized treatment
plans by analyzing individual patient data to predict their cancer type and how they might respond to specific treat-
ments. By identifying unique biomarkers, healthcare providers can tailor therapies that improve treatment efficacy and
minimize side effects [48]. Moreover, effective classification of cancer types aids in early detection. QML models can
analyze imaging data or genetic profiles to identify anomalies indicating cancer [42, 46], facilitating timely interven-
tions that are crucial for successful treatment. Throughout the treatment process, professionals can utilize the analysis
Vol.:(0123456789)
Research Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6
of patient data over time to assess how well a cancer is responding to treatment. Furthermore, healthcare institutions
can leverage QML models to analyze large datasets from populations to identify trends in cancer types and treatment
responses. In drug development, QML can identify biomarkers that predict how different cancers respond to new
drugs. This can streamline the drug discovery process [49], allowing researchers to focus on compounds that are more
likely to be effective for specific cancer types. Additionally, QML models can be integrated with other technologies,
such as electronic health records [10] to provide a holistic view of patient health, resulting in better management of
disease care.
Promoter DNA methylation is an epigenetic modification that regulates gene expression and often occurs early in tumo-
rigenesis [50]. In our study, we examined the promoter DNA methylation status of the classifier genes and visualized
the data using box-whisker plots. To assess the significance of expression level differences between normal and tumor
groups, we employed Welch’s T-test.
Figures S2–S11 depict the methylation levels of features that showed the relationship with the expression values. The
observed changes in promoter DNA methylation may provide potential mechanisms underlying the downregulation or
upregulation of classifier gene expression as determined by QSVM.
The significant declined promotor DNA methylation levels of PRM2 and PRKCE in BLCA, KRT80 in COAD, ADH1B,
ADIPOQ, and CA9 in HNSC, HS6ST2 in KIRC, CLEC4G, CRHBP, FCN3, MARCO in LIHC, CLCA2, AMACR, and SLC45A2 in
PRAD, FOXQ1, ETV4, and KRT80 in READ, COL10A1 in STAD, LRP4, KLHDC8A, METTL7B, and PRR15 in THCA accompany
with increased gene expression as well as significantly increased promotor DNA methylation level of CMTM3 in BLCA,
CDH3, ETV4, and OTOP2 in COAD, AQP2, MUC15, and SLC9A4 in KIRC, CAV1 and FHL1 in LUAD, GRIN2D and CDH3 in
READ, and DLX1 in PRAD were observed.
5 Conclusion
In the present study, we utilized a quantum machine learning algorithm for the classification of high-dimensional RNA-
seq cancer data. Our findings demonstrate that the results obtained from the quantum machine learning approach
are consistent with those obtained from classical machine learning algorithms, indicating the applicability of quantum
methods for analyzing high-throughput biological data in certain cases. Additionally, our analysis successfully identi-
fied novel biomarkers. These findings highlight the potential of quantum machine learning in medical research and the
discovery of valuable biomarkers.
Author contributions M-ZG performed the bioinformatics analysis. M-ZG and EA performed the QML analysis, interpreters the results, and
wrote the manuscript.
Funding No funding.
Data availability The RNA-seq data expression were downloaded from UCSC Xena database (https://xenabrowser.net/). The codes are avail-
able in https://github.com/Mohadesehzarei/skqulacs_QSVM.
Declarations
Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which
permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to
the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You
do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party
material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativeco
mmons.org/licenses/by-nc-nd/4.0/.
Vol:.(1234567890)
Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6 Research
References
1. Preskill J. Quantum computing in the NISQ era and beyond. Quantum. 2018;2:79.
2. Preskill J. Lecture notes for physics 229: quantum information and computation. California Inst Technol. 1998;16(1):1–8.
3. Gill SS, Kumar A, Singh H, Singh M, Kaur K, Usman M, Buyya R. Quantum computing: a taxonomy, systematic review and future direc-
tions. Softw Pract Exp. 2022;52(1):66–114.
4. Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349(6245):255–60.
5. Sarker I. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. 2021;2(3):160.
6. Wuest T, Weimer D, Irgens C, Thoben K-D. Machine learning in manufacturing: advantages, challenges, and applications. Prod Manuf
Res. 2016;4(1):23–45.
7. Mironowicz P, Mandarino A, Yilmaz A, Ankenbrand T: Applications of quantum machine learning for quantitative finance. arXiv preprint
arXiv:240510119 2024.
8. Maheshwari D, Garcia-Zapirain B, Sierra-Sosa D. Quantum machine learning applications in the biomedical domain: a systematic
review. IEEE Access. 2022;29(10):80463–84.
9. Ullah U, Garcia-Zapirain B. Quantum machine learning revolution in healthcare: a systematic review of emerging perspectives and
applications. IEEE Access. 2024;12,11423–50.
10. Maheshwari D, Ullah U, Marulanda PAO, Jurado AG-O, Gonzalez ID, Merodio JMO, Garcia-Zapirain B. Quantum machine learning
applied to electronic healthcare records for ischemic heart disease classification. Hum-Cent Comput Inf Sci. 2023;13(06).
11. Rebentrost P, Mohseni M, Lloyd S. Quantum support vector machine for big data classification. Phys Rev Lett. 2014;113(13): 130503.
12. Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM. Supervised learning with quantum-enhanced
feature spaces. Nature. 2019;567(7747):209–12.
13. Park S, Park DK, Rhee JK. Variational quantum approximate support vector machine with inference transfer. Sci Rep. 2023;13(1):3288.
14. Willsch D, Willsch M, De Raedt H, Michielsen K. Support vector machines on the D-Wave quantum annealer. Comput Phys Commun.
2020;248: 107006.
15. Vasques X, Paik H, Cif L. Application of quantum machine learning using quantum kernel algorithms on multiclass neuron M-type
classification. Sci Rep. 2023;13(1):11541.
16. Simões RDM, Huber P, Meier N, Smailov N, Füchslin RM, Stockinger K. Experimental evaluation of quantum machine learning algo-
rithms. IEEE Access. 2023;11:6197–208.
17. Maheshwari D, Sierra-Sosa D, Garcia-Zapirain B. Variational quantum classifier for binary classification: real vs synthetic dataset. IEEE
Access. 2021;10:3705–15.
18. Tychola KA, Kalampokas T, Papakostas GA. Quantum machine learning—an overview. Electronics. 2023;12(11):2379.
19. Zeguendry A, Jarir Z, Quafafou M. Quantum machine learning: a review and case studies. Entropy. 2023;25(2):287.
20. Felser T, Trenti M, Sestini L, Gianelle A, Zuliani D, Lucchesi D, Montangero S. Quantum-inspired machine learning on high-energy
physics data. npj Quantum Inform. 2021;7(1):111.
21. Dunjko V, Taylor JM, Briegel HJ. Quantum-enhanced machine learning. Phys Rev Lett. 2016;117(13): 130501.
22. Adhikary S, Dangwal S, Bhowmik D. Supervised learning with a quantum classifier using multi-level systems. Quantum Inform Process.
2020;19:1–12.
23. Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi MZ. Recent applications of machine learning and deep learning models in the predic-
tion, diagnosis, and management of diabetes: a comprehensive review. Diabetol Metabol Syndr. 2022;14(1):1–39.
24. Huang H-Y, Broughton M, Mohseni M, Babbush R, Boixo S, Neven H, McClean JR. Power of data in quantum machine learning. Nat
Commun. 2021;12(1):2631.
25. Hancco-Quispe JK, Borda-Colque JP, Torres-Cruz F: Quantum machine learning applied to the classification of diabetes. arXiv preprint
arXiv:00109 2022.
26. Goto T, Tran QH, Nakajima K. Universal approximation property of quantum machine learning models in quantum-enhanced feature
spaces. Phys Rev Lett. 2021;127(9): 090506.
27. Jerbi S, Fiderer LJ, Poulsen Nautrup H, Kübler JM, Briegel HJ, Dunjko V. Quantum machine learning beyond kernel methods. Nat
Commun. 2023;14(1):517.
28. Schuld M, Killoran N. Quantum machine learning in feature Hilbert spaces. Phys Rev Lett. 2019;122(4): 040504.
29. Schuld M, Sweke R, Meyer JJ. Effect of data encoding on the expressive power of variational quantum-machine-learning models.
Phys Rev A. 2021;103(3): 032430.
30. [https://qulacs-osaka.github.io/scikit-qulacs/index.html]
31. Suzuki Y, Kawase Y, Masumura Y, Hiraga Y, Nakadai M, Chen J, Nakanishi KM, Mitarai K, Imai R, Tamiya S. Qulacs: a fast and versatile
quantum circuit simulator for research purpose. Quantum. 2021;5:559.
32. [https://qulacs-osaka.github.io/scikit-qulacs/skqulacs.html]
33. https://qulacs-osaka.github.io/scikit-qulacs/notebooks/circuit_visualize.html. In.
34. https://qulacs-osaka.github.io/scikit-qulacs/skqulacs.qsvm.qsvc.html. In.
35. Sun L, Zhang X-Y, Qian Y-H, Xu J-C, Zhang S-G, Tian Y. Joint neighborhood entropy-based gene selection method with fisher score
for tumor classification. Appl Intell. 2019;49:1245–59.
36. Li C, Xu J. Feature selection with the Fisher score followed by the maximal clique centrality algorithm can accurately identify the hub
genes of hepatocellular carcinoma. Sci Rep. 2019;9(1):17283.
37. Ghobadi MZ, Afsaneh E, Emamzadeh R, Soroush M. Potential miRNA-gene interactions determining progression of various ATLL
cancer subtypes after infection by HTLV-1 oncovirus. BMC Med Genom. 2023;16(1):1–9.
38. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
39. Teixeira M, Silva F, Ferreira RM, Pereira T, Figueiredo C, Oliveira HP. A review of machine learning methods for cancer characterization
from microbiome data. NPJ Precis Oncol. 2024;8(1):123.
Vol.:(0123456789)
Research Discover Applied Sciences (2024) 6:513 | https://doi.org/10.1007/s42452-024-06220-6
40. Houfani D, Slatnia S, Kazar O, Remadna I, Saouli H, Ortiz G, Merizig A. An improved model for breast cancer diagnosis by combining
PCA and logistic regression techniques. Int J Comput Digit Syst. 2023;13(1):701–16.
41. Couronné R, Probst P, Boulesteix A-L. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinfor-
matics. 2018;19(1):270.
42. Iyer V, Ganti B, Hima Vyshnavi AM, Krishnan Namboori PK, Iyer S. Hybrid quantum computing based early detection of skin cancer. J
Interdiscip Math. 2020;23(2):347–55.
43. Vashisth S, Dhall I, Aggarwal G. Design and analysis of quantum powered support vector machines for malignant breast cancer diagnosis.
J Intell Syst. 2021;30(1):998–1013.
44. Flöther FF. The state of quantum computing applications in health and medicine. Res Direct Quantum Technol. 2023;1: e10.
45. Maheshwari D, Garcia-Zapirain B, Sierra-Sosa D. Quantum machine learning applications in the biomedical domain: a systematic review.
IEEE Access. 2022;10:80463–84.
46. Khan MA-Z, Innan N, Galib AAO, Bennai M: Brain tumor diagnosis using quantum convolutional neural networks. arXiv preprint
arXiv:240115804 2024.
47. Park J-E, Quanz B, Wood S, Higgins H, Harishankar R: Practical application improvement to quantum SVM: theory to practice. arXiv preprint
arXiv:201207725 2020.
48. Ghobadi MZ, Emamzadeh R, Teymoori-Rad M, Afsaneh E. Exploration of blood−derived coding and non-coding RNA diagnostic immu-
nological panels for COVID-19 through a co-expressed-based machine learning procedure. Front Immunol. 2022;13:1001070.
49. Wong YK, Zhou Y, Liang YS, Qiu H, Wu YX, He B. The new answer to drug discovery: quantum machine learning in preclinical drug devel-
opment. In IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML). 2023;557–64.
50. Bouras E, Karakioulaki M, Bougioukas KI, Aivaliotis M, Tzimagiorgis G, Chourdakis M. Gene promoter methylation and cancer: an umbrella
review. Gene. 2019;710:333–40.
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Vol:.(1234567890)