Sentiment Analysis Optimization Using Ensemble of
Sentiment Analysis Optimization Using Ensemble of
php/RESTI
JURNAL RESTI
(Rekayasa Sistem dan Teknologi Informasi)
Vol. 9 No. 4 (2025) 905 - 914 e-ISSN: 2580-0760
Abstract
This study targets improved sentiment classification by combining the strengths of multiple SVM kernels within an ensemble
framework. We introduce SVM Porlis, which fuses Linear, RBF, Polynomial, and Sigmoid kernels using both hard and soft
voting to boost performance on skewed data. The task is binary sentiment recognition (positive vs. negative). A corpus of 2,248
tweets concerning the debate over the naturalization of Indonesia’s national football players was gathered via the official
X/Twitter API, with a marked dominance of negative tweets. The preprocessing pipeline encompassed cleaning, labeling,
tokenization, stopword removal, stemming, and TF-IDF feature extraction. To counter the imbalance, SMOTE was applied to
synthesize additional minority-class samples. Each kernel was first trained and assessed independently, then aggregated into
the SVM Porlis ensemble. Evaluation used accuracy, precision, recall, F1-score, and confusion-matrix analysis. The soft-
voting SVM Porlis model achieved the best results—98% for accuracy, precision, recall, and F1—outperforming single-kernel
baselines and other ensembles such as SVM + Chi-Square and SVM + PSO. These outcomes indicate that integrating diverse
kernels effectively captures both linear and nonlinear patterns, yielding a robust and adaptive approach for sentiment analysis
on real-world, imbalanced datasets.
Keywords: ensemble learning; kernel function; sentiment analysis; smote; support vector machine
How to Cite: M. Khairul Anam, T. P. Lestari, L. Efrizoni, N. S. Handayani, and I. Andhika, “Sentiment Analysis Optimization Using Ensemble
of Multiple SVM Kernel Functions”, J. RESTI (Rekayasa Sist. Teknol. Inf.) , vol. 9, no. 4, pp. 905 - 914, Aug. 2025.
Permalink/DOI: [Link]
905
Anam et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 9 No. 4 (2025)
a hybrid of SVM and Gradient Boosting, which reached To further improve classification accuracy and
a predictive accuracy of 93% [9]. The combination of robustness, this research proposes an ensemble model
chi-square and SVM also showed promising results, named SVM Porlis, which integrates the four kernels
achieving an accuracy of 95.56% [10]. Other ensemble using both hard and soft voting techniques. This
methods combining SVM with different algorithms approach is designed to maximize the unique
have demonstrated strong performance, with accuracy capabilities of each kernel: Linear for linear
reaching 96.53% [11]. Studies involving Indonesian- discrimination [24], RBF for capturing highly non-
language datasets also vary in performance. For linear structures [25], Polynomial for multi-way non-
instance, a study using Random Forest with Uncertainty linear feature interactions [26], and Sigmoid for added
Sampling on Indonesian Twitter data achieved 81% flexibility and adaptability [27]. The combination of
accuracy [12], while another used Naïve Bayes to these kernels enables the model to generalize across
analyze Instagram comments about Anies Baswedan diverse, imbalanced, and noisy sentiment datasets—
and achieved 85%, though it suffered from overfitting common characteristics in social media texts. In this
issues [13]. Further, a study incorporating SMOTE ensemble, voting mechanisms play a crucial role in
improved the performance of algorithms like Logistic mitigating the limitations of individual kernels. When
Regression, SVM, and Naïve Bayes, reaching up to one or more base classifiers exhibit relatively lower
89% accuracy [14]. accuracy on certain instances, the final prediction is not
determined solely by that weak performer. Instead, hard
While these studies highlight the effectiveness of
voting considers the majority prediction from all
ensemble techniques, chi-square, and hyperparameter
kernels, while soft voting aggregates the predicted
optimization in enhancing SVM, a critical gap remains:
probabilities and selects the class with the highest
few have explored or reported the specific impact of
combined confidence. This ensures that even if a single
different SVM kernel functions on sentiment
kernel underperforms in specific scenarios, the
classification. Yet, kernel selection is a vital component
ensemble decision still benefits from the strengths of
in SVM modeling, especially for sentiment analysis
the more accurate kernels, resulting in improved overall
tasks that involve non-linear, noisy, and imbalanced
performance and stability.
data with complex subjective expressions. This study
addresses this gap by systematically evaluating four In the sentiment classification process, the study
core SVM kernels—RBF, Linear, Polynomial, and utilized tweet data obtained through the official API of
Sigmoid—using Indonesian-language tweets collected the X/Twitter platform, totaling 2,248 entries related to
via the X (Twitter) API. the issue of naturalization of Indonesian national
football players. The data was categorized into two
SVM offers several kernel options such as Linear,
sentiment classes: positive and negative. However, the
Polynomial, RBF, Gaussian, Gaussian-Diagonal,
class distribution in the dataset was imbalanced, with
Laplace_rbf, Anova_rbf, and Sigmoid [15]. In this
the number of tweets expressing negative sentiment
study, only four kernels (RBF, Linear, Polynomial, and
significantly higher than those with positive sentiment.
Sigmoid) were selected for comparison and integration
To address this issue, the Synthetic Minority Over-
to improve performance on unstructured data. The RBF
sampling Technique (SMOTE) was applied to
(Radial Basis Function) kernel is particularly effective
strengthen the representation of the minority class,
in handling non-linear decision boundaries, which are
allowing the model to generalize more effectively
common in sentiment data, and provides a strong
across both classes. The preprocessing stage included
structure for generalization [16]. Its gamma parameter
cleaning irrelevant characters or symbols, labeling the
controls the sensitivity to the distance between data
data, tokenizing to split sentences into individual words,
points, helping capture subtle emotional nuances [17].
removing stopwords, and performing stemming to
Previous studies have shown that this kernel achieves
return words to their root form. The processed data was
high accuracy in sentiment analysis, reaching 87.25%
then represented using the Term Frequency–Inverse
[18]. It is also highly effective for structured data and
Document Frequency (TF-IDF) method, which
has demonstrated strong performance in prior research
transforms textual data into high-dimensional
with an accuracy of 93.55% [19]. The Polynomial
numerical features that can be processed by the model.
kernel handles complex relationships well, especially
For model construction, four kernels from the Support
when sentiment expressions involve interactions among
Vector Machine (SVM) algorithm—Linear, RBF,
multiple features. Its degree parameter allows control
Polynomial, and Sigmoid—were employed and
over model complexity [20], , and it has shown good
combined into an ensemble model named SVM Porlis,
performance in sentiment analysis with an accuracy of
utilizing both soft voting and hard voting techniques to
84% [21]. The Sigmoid kernel, which resembles neural
enhance classification performance.
network activation functions, is suitable for modeling
moderate non-linear patterns. Although not as widely This study is expected to leverage the strengths of each
used as other kernels, it remains effective for datasets kernel within the SVM algorithm to develop a model
with intermediate complexity [22]. In sentiment that excels in terms of accuracy, precision, recall, and
analysis, it has shown exceptional performance, F1-score. The proposed model, named SVM Porlis, is
achieving an accuracy of up to 96.26% [23]. designed to demonstrate strong generalization
906
Anam et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 9 No. 4 (2025)
907
Anam et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 9 No. 4 (2025)
These examples demonstrate the conversion of raw minority-class examples by creating points along the
tweets into standardized input by removing hashtags, line segments that connect a sample to its nearest
emojis, punctuation, and stopwords, and by reducing minority neighbors, thereby expanding the minority
each word to its root form using stemming. Such class and mitigating imbalance [30]. After SMOTE is
transformations improve the quality of features passed applied, the label distribution becomes balanced as
to the classifier, which in turn contributes to the shown in Figure 3 with both classes containing an equal
robustness and accuracy of the sentiment analysis number of instances.
model.
After normalization, each review was represented using
the Term Frequency–Inverse Document Frequency
(TF-IDF) scheme, which converts text into high-
dimensional sparse numeric vectors. TF-IDF assigns
larger weights to terms that are frequent within a
document yet infrequent across the corpus, thereby
highlighting tokens most informative for sentiment
classification [35]. The formulations of term frequency,
inverse document frequency, and their product (TF-
IDF) used in this study are provided in Equations 1–3.
Term Frequency (TF):
Number of times the word appears in the document Figure 2. Data Before Class Balancing
𝑇𝐹 = (1)
Total words in the document
However, applying SMOTE requires careful
Inverse Document Frequency (IDF): consideration, especially regarding its timing in the
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡
preprocessing pipeline [36]. If SMOTE is applied
𝐼𝐷𝐹 = (2) before splitting the dataset into training and testing sets,
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑤𝑜𝑟𝑑
there is a risk of overfitting due to synthetic samples
TF-IDF Score: leaking into both subsets. This contamination can lead
to overly optimistic performance estimates, as the
𝑇𝐹 − 𝐼𝐷𝐹 = 𝑇𝐹 × 𝐼𝐷𝐹 (3) model may encounter nearly identical synthetic samples
These TF-IDF features serve as input to the during both training and testing [37]. To mitigate this
classification model. Each unique word becomes a risk, in this study, SMOTE was strictly applied only to
feature column in the resulting matrix, with its TF-IDF the training set after data splitting, ensuring that the test
weight populating the corresponding cell. Terms with set remained purely representative of real-world data.
higher weights are prioritized by the model, while
common terms such as conjunctions and prepositions
are down-weighted, improving the classifier’s focus on
sentiment-bearing words.
The complete preprocessing pipeline—spanning from
data collection to TF-IDF vectorization—forms a
foundational element in this research, ensuring that the
SVM Porlis ensemble classifier receives clean,
balanced, and discriminative input to optimize its
sentiment classification performance.
2.2 SMOTE
This research addresses the problem of class imbalance
in the sentiment dataset, which can cause machine
learning models to be biased toward the majority class. Figure 3. Data After Class Balancing Process
Figure 2 presents the initial distribution of sentiment Besides balancing, this study also implemented
labels prior to balancing. It clearly illustrates that the strategies to prevent overfitting and ensure that the
majority class (label 0) significantly outnumbers the model generalizes well to unseen data. The dataset was
minority class (label 1), a condition that may lead to divided into training and testing sets using a stratified
poor performance in detecting underrepresented but split, preserving the proportion of classes across both
often critical sentiment patterns such as negative subsets to avoid skewed learning. Regularization
opinions. parameters (C, gamma) were fine-tuned to control the
To address the class-imbalance problem, we employed complexity of individual SVM models and prevent
the Synthetic Minority Oversampling Technique them from memorizing training data. Moreover, the use
(SMOTE). This method synthesizes additional of an ensemble approach—through hard and soft voting
908
Anam et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 9 No. 4 (2025)
in the SVM Porlis model—reduces the risk of RBF, the Sigmoid kernel is included in this study to
overfitting by aggregating predictions from diverse evaluate its adaptability to moderately complex
kernels. This ensemble mechanism stabilizes the sentiment patterns.
model’s decision boundary and helps it perform reliably
2.4 Modeling with SVM Porlis
across varying sentiment patterns.
In summary, class balancing through SMOTE and the Upon conclusion of the analysis and comprehension of
use of ensemble voting, combined with proper data the data through distinct SVM kernels (Linear, RBF,
splitting and regularization, collectively contribute to Polynomial, and Sigmoid), ensemble voting techniques
the robustness and generalizability of the sentiment were applied to enhance prediction accuracy. This
classification model [38]. approach combines the strengths of different kernel
perspectives. Voting was implemented in two ways:
2.3. Modeling with SVM
hard voting and soft voting.
This work trains SVM models with four kernel
In hard voting, the final class label is determined by the
functions. The linear kernel is appropriate when the
majority class predicted by all base models. The
classes are linearly (or nearly linearly) separable. It
decision rule is as shown in Equation 8.
computes similarity as the inner product of two feature
vectors [39] using Equation 4. Y = argmax(∑𝑛𝑖=1 𝐼(𝑦𝑖 = 𝑐)) (8)
𝐾(𝑥𝑖 , 𝑥𝑗 ) = 𝑥𝑖 . 𝑥𝑗 (4) 𝐼(𝑦𝑖 = 𝑐) is the Indicator that is 1 if model i predicts
class c, 0 otherwise. 𝑛 is the Number of models used in
Linear models are often adequate for straightforward
voting.
sentiment tasks where feature relationships are close to
linear. The most influential hyperparameter is C, which In soft voting, each base model outputs a probability
controls regularization: larger C reduces bias but can distribution over classes. The final class is selected
raise variance. based on the highest average probability:
Radial Basis Function (RBF) kernel. The RBF kernel is 1
Y = argmax ( ∑𝑛𝑖=1 𝑃𝑖 (𝑐)) (9)
effective for capturing non-linear structure that 𝑛
frequently appears in sentiment data. It measures 𝑃𝑖 (𝑐) is the Prediction probability of class c from model
similarity with a Gaussian function [40], see (5). i. 𝑛 is the Number of models used.
𝐾(𝑥𝑖 , 𝑥𝑗 ) = exp (−γ||𝑥𝑖 − 𝑥𝑗 ||2 ) (5) In the implementation, soft voting was enabled by
The γ parameter sets the radius of influence of setting probability=True in each SVM model to allow
individual training points (small γ = Narrow). As with probability outputs. This technique enhances model
the linear kernel, 𝐶 regulates the balance between fit flexibility by considering confidence levels from each
and generalization. base classifier, thereby offering improved robustness
compared to hard voting, especially in borderline cases.
Polynomial kernel. This kernel extends the linear case
by introducing polynomial interactions among input From Equations 8 and 9, the next is Pseudocode1. In
features, enabling the model to represent higher-order this process, the data that has been balanced with
non-linear boundaries [41] as expressed in Equation (6). SMOTE is trained using four SVM models with
Key hyperparameters include 𝐶, the degree of the different kernels (Linear, RBF, Polynomial, Sigmoid).
polynomial, and coef0; higher degrees allow more After that, the predictions from each model are
complex decision surfaces but increase overfitting risk. combined using two voting techniques, namely hard
voting and soft voting. In hard voting, the selected class
𝐾(𝑥𝑖 , 𝑥𝑗 ) = (𝑥𝑖 . 𝑥𝑗 + 𝑐)𝑑 (6) is the class that is most frequently predicted by all
models. In contrast, in soft voting, the prediction
Where c is a constant term (also known as coef0) and d
probabilities from all models are combined, and the
is the degree of the polynomial. The degree controls the
class with the highest probability is selected as the final
flexibility of the decision boundary. In this study, we
result. This technique allows each kernel to contribute
tested various degrees and selected optimal values
to its strengths, thus improving the accuracy and
during initial experimentation to balance complexity
robustness of the model to various data patterns. The
and performance.
end result is an ensemble model that is more robust and
Finally, the Sigmoid kernel mimics the behavior of a flexible than using a single kernel.
neural network activation function, making it useful in
To ensure transparency and reproducibility, the specific
modeling certain non-linear relationships [22] as shown
kernel parameters used in each SVM model were
in Equation 7.
configured as: For the Linear kernel, the penalty
𝐾(𝑥𝑖 , 𝑥𝑗 ) = 𝑡𝑎𝑛ℎ(α(𝑥𝑖 . 𝑥𝑗 ) + 𝑐) (7) parameter C was set to 1.0; For the RBF kernel, C = 1.0
and 𝛾 =’scale’, which automatically adjusts gamma to
Here, α (alpha) is the scaling parameter and c is the bias the number of features; For the polynomial kernel,
term. These hyperparameters affect the curvature of the parameter were set to C = 1.0, degree = 3, and coef0 =1;
decision boundary. While less commonly used than
909
Anam et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 9 No. 4 (2025)
For the Sigmoid kernel, C = 1.0 and coef0 = 0 were The evaluation will compare the classification
used. performance of each individual kernel (Linear, RBF,
Polynomial, and Sigmoid) against the ensemble model
These settings were based on empirical tuning and
using both hard voting and soft voting mechanisms.
maintained consistently across all base models in the
This comparison will help determine whether
ensemble.
combining multiple kernels provides significant
Pseudocode 1. SVM Porlis performance improvements over using a single kernel
Import required libraries:
- Import `SVC` from [Link] and
alone.
`VotingClassifier`from
[Link]. 3. Results and Discussions
Preprocess the data:
- Balance the dataset using SMOTE. This section presents the results obtained from the
- Split the dataset into training and
testing sets. implementation and evaluation of the proposed
Train individual SVM models with different sentiment analysis model using SVM Porlis with
kernels:
- Define and train SVM with Linear
ensemble soft voting. The discussion includes
kernel. performance comparisons between individual SVM
- Define and train SVM with RBF kernel. kernels and ensemble models, along with insights
- Define and train SVM with Polynomial
kernel. gained from applying the SMOTE technique to address
- Define and train SVM with Sigmoid class imbalance. Evaluation metrics such as accuracy,
kernel. precision, recall, F1-score, and the confusion matrix are
Combine the models using Voting:
- For hard voting: analyzed to determine the model’s effectiveness and
a. Initialize `VotingClassifier` with robustness. In addition, a comparative analysis with
`voting='hard'`. previous studies is presented to highlight the
b. Include all trained SVM models as
estimators. improvement achieved by the proposed approach.
- For soft voting:
a. Initialize `VotingClassifier` with 3.1 Results
`voting='soft'`.
b. Ensure each SVM model can output This research applied the Synthetic Minority Over-
probabilities (`probability=True`).
Train the Voting Classifier: sampling Technique (SMOTE) to overcome class
- Fit the `VotingClassifier` on the imbalance within the sentiment dataset. Class
training data. imbalance where one sentiment class (positive or
Evaluate the ensemble model:
- Predict on the test set using the negative) dominates the data can bias the model toward
Voting Classifier. the majority class and result in poor generalization for
- Calculate performance metrics minority sentiment. SMOTE addresses this by
(accuracy, precision, recall, F1-
score). generating synthetic data points for the minority class
Output the results: through interpolation, resulting in a balanced dataset.
- Print the evaluation metrics for both After applying SMOTE, the dataset was trained using
hard and soft voting.
End. multiple SVM kernels: Linear, RBF, Polynomial, and
Sigmoid, followed by ensemble learning using both soft
2.5 Evaluation Model and hard voting strategies.
Model performance was summarized with a confusion Table 2. Classification Report using SVM Porlis Soft Voting
matrix, which records four outcomes: true positives
(TP)—positive instances correctly identified as Class Precision Recall F1-score Support
0 0.98 0.98 0.98 599
positive; true negatives (TN)—negative instances 1 0.98 0.98 0.98 583
correctly identified as negative; false positives (FP)— accuracy 0.98 1182
negative instances mistakenly labeled as positive; and macro avg 0.98 0.98 0.98 1182
false negatives (FN)—positive instances incorrectly weighted avg 0.98 0.98 0.98 1182
labeled as negative.
Table 2 presents the classification report for the SVM
In addition to the performance indicators, evaluation Porlis Soft Voting model, which combines predictions
metrics derived from the Confusion Matrix included in from all kernels through soft voting. Evaluation metrics
Equations 10 through 13. such as precision, recall, and F1-score achieved 0.98 for
𝑇𝑃+𝑇𝑁 both class 0 and class 1, indicating excellent
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (10) classification performance. Out of 1,182 total samples,
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
𝑇𝑃 98% were correctly predicted by the model, resulting in
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (11) an overall accuracy of 0.98. Moreover, both macro and
𝑇𝑃+𝐹𝑃
𝑇𝑃
weighted averages reached 0.98, confirming that the
𝑅𝑒𝑐𝑎𝑙𝑙 = (12) model is not only accurate but also well-balanced across
𝑇𝑃+𝐹𝑁
classes.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 2 × (13)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
910
Anam et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 9 No. 4 (2025)
911
Anam et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 9 No. 4 (2025)
improvement in both accuracy and consistency Furthermore, the inclusion of the SMOTE (Synthetic
compared to the methods employed in prior research. Minority Over-sampling Technique) method plays a
Table 4. Comparison with Previous Research
crucial role in enhancing model performance by
addressing the issue of class imbalance, which often
Researchers Algorithm Accuracy causes bias toward the majority class. Therefore, the
Hokijuliandy et al. SVM + Chi-Square 96%
SVM Porlis Soft Voting approach excels not only in
2023 [10]
Ramasamy et al. 2021 SVM + Nature-inspired 87% terms of accuracy but also demonstrates strong
[42] Optimization robustness in handling complex and imbalanced real-
Supriyatna & Putri, SVM + PSO 97% world datasets.
2024 [43]
Hidayat & Wibowo, SVM + Information 89% In conclusion, these results confirm that the multi-
2024 [44] Gain kernel ensemble model optimized with soft voting
Imanuddin et al. 2023 SVM Kernel Linear 91%
[45] presents a highly promising solution for sentiment
Susanto & Suparwati, SVM + PSO 80% classification tasks and outperforms many existing
2023 [46] SVM-based approaches reported in prior studies.
Anam et al. 2022 [47] SVM + Adaboost 92%
This Research SVM Porlis Hard 96%
Voting 4. Conclusions
SVM Porlis Soft Voting 98%
This study demonstrates the effectiveness of the SVM
Table 4 presents a comparison between the proposed Porlis model, an ensemble approach that integrates
method in this study namely the SVM Porlis Soft multiple SVM kernels (Linear, RBF, Polynomial, and
Voting and Hard Voting models and several previous Sigmoid) using soft voting techniques. Unlike single-
studies that employed various combinations of SVM kernel models that rely on a singular decision boundary,
algorithms and performance enhancement techniques. this model leverages the strengths of each kernel to
The comparison shows that the SVM Porlis Soft Voting capture diverse sentiment patterns, ranging from linear
model achieved an accuracy of 98%, making it the most relationships to complex non-linear structures. The use
superior approach among the listed methods. This result of soft voting, which considers prediction probabilities,
outperforms the SVM + PSO approach by Supriyatna & represents a methodological advantage as it enables the
Putri (2024), which achieved 97%, and the SVM + Chi- model to make more accurate decisions, particularly in
Square method by [10] with 96% accuracy. The model ambiguous cases. This multi-kernel integration is a
also significantly surpasses the SVM + Adaboost unique approach that remains relatively unexplored in
technique used by [47], which only reached 92%, and sentiment analysis of Indonesian-language social media
feature selection-based methods such as SVM + data.
Information Gain by [44], which recorded 89% The model achieved an accuracy of 98%, significantly
accuracy. outperforming the individual kernel performances as
The SVM Linear Kernel approach tested by [45] well as previous SVM-based methods. This high level
achieved an accuracy of 91%, while another SVM + of performance was attained through a robust
PSO method by [46] recorded only 80%, the lowest preprocessing pipeline and data balancing using
among the compared results. Compared to these SMOTE, which effectively addressed the issue of class
outcomes, the SVM Porlis Soft Voting model imbalance. However, the polynomial kernel
demonstrated a +1% improvement over the previously underperformed due to its tendency to overfit,
best-performing method and up to +18% improvement highlighting the importance of specific parameter
over the lowest-performing method. The SVM Porlis tuning for each kernel within an ensemble framework.
Hard Voting model also delivered competitive Despite the promising results, several limitations must
performance with 96% accuracy, matching be acknowledged. The model was tested only within a
Hokijuliandy et al.’s results, though still slightly below specific domain—tweets in Bahasa Indonesia related to
the soft voting version. the naturalization of football players—so its
This significant performance gain can be attributed to generalizability to other domains remains uncertain. In
two key factors. First, the integration of multiple kernel addition, the model has not yet been evaluated in real-
types (Linear, RBF, Polynomial, and Sigmoid) enables time scenarios, where latency, data quality, and
the model to capture diverse patterns within the data— scalability are critical factors. Potential data bias also
both linear and non-linear—more comprehensively. warrants attention, as social media opinions do not
Second, the application of the soft voting strategy, always reflect the broader population and may be
which takes into account the prediction probabilities influenced by trends, echo chambers, or bot activity.
from each individual kernel before determining the final For future research, several directions can be pursued.
class, allows the model to be more adaptive to data First, the model should be evaluated on different
ambiguity. This contrasts with hard voting, which relies domains and data types to assess its generalizability
solely on majority votes without considering each beyond the context of football-related naturalization
model’s confidence level. issues in Indonesia. Second, testing in real-time
environments is necessary to measure the model’s
912
Anam et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 9 No. 4 (2025)
performance in handling streaming data with low Analysis of Indonesia’s National Health Insurance Mobile
Application,” Mathematics, vol. 11, no. 17, pp. 1–21, Sep.
latency. Third, exploring more adaptive parameter 2023, doi: 10.3390/math11173765.
optimization techniques such as Bayesian Optimization [11] V. KP, R. AB, G. HL, V. Ravi, and M. Krichen, “A tweet
or Optuna could further enhance the performance of sentiment classification approach using an ensemble
each kernel within the ensemble. Additionally, classifier,” International Journal of Cognitive Computing in
Engineering, vol. 5, pp. 170–177, Jan. 2024, doi:
incorporating other ensemble strategies such as 10.1016/[Link].2024.04.001.
stacking or kernel-based boosting may offer [12] M. Liebenlito, N. Inayah, E. Choerunnisa, T. E. Sutanto, and
alternatives to improve accuracy and classification S. Inna, “Active Learning on Indonesian Twitter Sentiment
stability. With these expansions, the SVM Porlis model Analysis Using Uncertainty Sampling,” Journal of Applied
Data Sciences, vol. 5, no. 1, pp. 114–121, Jan. 2024, doi:
can serve as a strong foundation for building more
10.47738/jads.v5i1.144.
resilient and applicable sentiment classification systems [13] N. Mardiah, L. Marlina, K. Khairul, Z. Sitorus, and M. Iqbal,
across various social and linguistic contexts “Analysis Of Indonesian People’s Sentiment Towards 2024
Presidential Candidates On Social Media Using Naïve Bayes
Classifier and Support Vector Machine,” Building of
Acknowledgements Informatics, Technology and Science (BITS), vol. 6, no. 2, pp.
The author reports no competing interests. This study 950–960, Sep. 2024, doi: 10.47065/bits.v6i2.5766.
[14] I. G. B. A. Budaya and I. K. P. Suniantara, “Comparison of
received no dedicated funding from governmental, Sentiment Analysis Algorithms with SMOTE Oversampling
commercial, or nonprofit organizations. and TF-IDF Implementation on Google Reviews for Public
Health Centers,” MALCOM: Indonesian Journal of Machine
Learning and Computer Science, vol. 4, no. 3, pp. 1077–1086,
References Jul. 2024, doi: 10.57152/malcom.v4i3.1459.
[1] M. K. Anam, M. B. Firdaus, F. Suandi, Lathifah, T. Nasution, [15] N. Saha, A. K. Show, P. Das, and S. Nanda, “Performance
and S. Fadly, “Performance Improvement of Machine comparison of different kernel tricks based on SVM approach
Learning Algorithm Using Ensemble Method on Text for parkinson’s disease detection,” in 2021 2nd International
Mining,” in ICFTSS 2024 - International Conference on Conference for Emerging Technology, INCET 2021, Institute
Future Technologies for Smart Society, Kuala Lumpur: of Electrical and Electronics Engineers Inc., May 2021, pp. 1–
Institute of Electrical and Electronics Engineers Inc., Sep. 4. doi: 10.1109/INCET51464.2021.9456233.
2024, pp. 90–95. doi: [16] X. Ding, J. Liu, F. Yang, and J. Cao, “Random radial basis
10.1109/ICFTSS61109.2024.10691363. function kernel-based support vector machine,” J Franklin
[2] R. Guido, S. Ferrisi, D. Lofaro, and D. Conforti, “An Inst, vol. 358, no. 18, pp. 10121–10140, Dec. 2021, doi:
Overview on the Advancements of Support Vector Machine 10.1016/[Link].2021.10.005.
Models in Healthcare Applications: A Review,” Information [17] S. D. Latif et al., “Improving sea level prediction in coastal
(Switzerland), vol. 15, no. 4, pp. 1–36, Apr. 2024, doi: areas using machine learning techniques,” Ain Shams
10.3390/info15040235. Engineering Journal, vol. 15, no. 9, pp. 1–21, Sep. 2024, doi:
[3] A. Zamsuri, S. Defit, and G. W. Nurcahyo, “Development and 10.1016/[Link].2024.102916.
Comparison of Multiple Emotion Classification Models in [18] Z. Abidin, W. Destian, and R. Umer, “Combining support
Indonesia Text Using Machine Learning,” Journal of vector machine with radial basis function kernel and
Advances in Information Technology, vol. 15, no. 4, pp. 519– information gain for sentiment analysis of movie reviews,” in
531, 2024, doi: 10.12720/jait.15.4.519-531. Journal of Physics: Conference Series, IOP Publishing Ltd,
[4] N. Amaya-Tejera, M. Gamarra, J. I. Vélez, and E. Zurek, “A Jun. 2021, pp. 1–5. doi: 10.1088/1742-6596/1918/4/042157.
distance-based kernel for classification via Support Vector [19] H. Prasetya, Z. Situmorang, and R. Rosnelly, “SVM
Machines,” Front Artif Intell, vol. 7, pp. 1–15, Feb. 2024, doi: Optimization with Kernel Function for Sentiment Analysis on
10.3389/frai.2024.1287875. Social Media twitter (X) in AFC U23 Asian Cup Case Study,”
[5] J. Nalepa and M. Kawulok, “Selecting training sets for in 1st Proceeding of International Conference on Science and
support vector machines: a review,” Artif Intell Rev, vol. 52, Technology UISU (ICST), 2024, pp. 227–233. doi:
no. 2, pp. 857–900, Aug. 2019, doi: 10.1007/s10462-017- 10.30743/wjxmmr59.
9611-1. [20] A. F. Rochim, K. Widyaningrum, and D. Eridani,
[6] M. A. Sembiring, H. Saputra, R. A. Yusda, S. Sutarman, and “Performance Comparison of Support Vector Machine Kernel
E. B. Nababan, “Performance of Robust Support Vector Functions in Classifying COVID-19 Sentiment,” in
Machine Classification Model on Balanced, Imbalanced and International Seminar on Research of Information
Outliers Datasets,” JITK (Jurnal Ilmu Pengetahuan dan Technology and Intelligent Systems, Institute of Electrical and
Teknologi Komputer), vol. 10, no. 1, pp. 208–215, Aug. 2024, Electronics Engineers Inc., 2021, pp. 224–228. doi:
doi: 10.33480/jitk.v10i1.5272. 10.1109/ISRITI54043.2021.9702845.
[7] W. Sholihah and A. Silvia Handayani, “Revolutionizing [21] F. M. Rizky, J. Jondri, and K. M. Lhaksmana, “Twitter
Healthcare: Comprehensive Evaluation and Optimization of Sentiment Analysis of Kanjuruhan Disaster using Word2Vec
SVM Kernels for Precise General Health Diagnosis,” and Support Vector Machine,” Building of Informatics,
Scientific Journal of Informatics, vol. 10, no. 4, pp. 445–454, Technology and Science (BITS), vol. 5, no. 1, pp. 219–227,
2023, doi: 10.15294/sji.v10i4.46430. Jun. 2023, doi: 10.47065/bits.v5i1.3612.
[8] R. A. Sulthana, A. K. Jaithunbi, H. Harikrishnan, and V. [22] I. S. Al-Mejibli, J. K. Alwan, and D. H. Abd, “The effect of
Varadarajan, “Sentiment Analysis on Movie Reviews Dataset gamma value on support vector machine performance with
Using Support Vector Machines and Ensemble Learning,” different kernels,” International Journal of Electrical and
International Journal of Information Technology and Web Computer Engineering, vol. 10, no. 5, pp. 5497–5506, Oct.
Engineering, vol. 17, no. 1, pp. 1–23, 2022, doi: 2020, doi: 10.11591/IJECE.V10I5.PP5497-5506.
10.4018/IJITWE.311428. [23] N. K. M. Budayani, I. Slamet, and S. S. Handajani, “A
[9] M. Khalid, I. Ashraf, A. Mehmood, S. Ullah, M. Ahmad, and Comparison of SVM Kernel Functions for Sentiment
G. S. Choi, “GBSVM: Sentiment classification from Analysis of UU TPKS,” Sci Educ (Dordr), vol. 2, pp. 761–
unstructured reviews using ensemble classifier,” Applied 765, 2023.
Sciences (Switzerland), vol. 10, no. 8, Apr. 2020, doi: [24] C. B. Tan, M. H. A. Hijazi, and P. N. E. Nohuddin, “A
10.3390/APP10082788. comparison of different support vector machine kernels for
[10] E. Hokijuliandy, H. Napitupulu, and Firdaniza, “Application artificial speech detection,” Telkomnika (Telecommunication
of SVM and Chi-Square Feature Selection for Sentiment Computing Electronics and Control), vol. 21, no. 1, pp. 97–
103, Feb. 2023, doi: 10.12928/TELKOMNIKA.v21i1.24259.
913
Anam et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 9 No. 4 (2025)
[25] A. Nurkholis, D. Alita, and A. Munandar, “Comparison of Fine-grained Sentiment Analysis Using the Ensemble,” ECTI
Kernel Support Vector Machine Multi-Class in PPKM Transactions on Computer and Information Technology
Sentiment Analysis on Twitter,” Jurnal RESTI (Rekayasa (ECTI-CIT), vol. 19, no. 2, pp. 159–167, Mar. 2025, doi:
Sistem dan Teknologi Informasi), vol. 6, no. 2, pp. 227–233, 10.37936/ecti-cit.2025192.257815.
Apr. 2022, doi: 10.29207/resti.v6i2.3906. [38] M. K. Anam, T. A. Fitri, Agustin, Lusiana, M. B. Firdaus, and
[26] D. Aryo Anggoro and D. Permatasari, “Performance A. T. Nurhuda, “Sentiment Analysis for Online Learning
Comparison of the Kernels of Support Vector Machine using The Lexicon-Based Method and The Support Vector
Algorithm for Diabetes Mellitus Classification,” Int J Adv Machine Algorithm,” ILKOM Jurnal Ilmiah, vol. 15, no. 2,
Comput Sci Appl, vol. 14, no. 2, p. 2023, 2023, doi: pp. 290–302, 2023, doi: 10.33096/ilkom.v15i2.1590.290-
10.14569/IJACSA.2023.0140226. 302.
[27] M. A. Nanda, K. B. Seminar, D. Nandika, and A. Maddu, “A [39] V. V., R. A. C, R. Mohammed, S. K. V, and P. S. Kumthekar,
comparison study of kernel functions in the support vector “Support Vector Machine Implementation to Separate Linear
machine and its application for termite detection,” and Non-Linear Dataset,” Saudi Journal of Engineering and
Information (Switzerland), vol. 9, no. 1, pp. 1–14, Jan. 2018, Technology, vol. 8, no. 1, pp. 4–15, Jan. 2023, doi:
doi: 10.3390/info9010005. 10.36348/sjet.2023.v08i01.002.
[28] M. K. Anam et al., “Sara Detection on Social Media Using [40] A. P. Gopi, R. N. S. Jyothi, V. L. Narayana, and K. S.
Deep Learning Algorithm Development,” Journal of Applied Sandeep, “Classification of tweets data based on polarity
Engineering and Technological Science, vol. 6, no. 1, pp. using improved RBF kernel of SVM,” International Journal
225–237, Dec. 2024, doi: 10.37385/jaets.v6i1.5390. of Information Technology (Singapore), vol. 15, no. 2, pp.
[29] fikrimln16, “data-crawling-x-tweetharvest.” 965–980, Feb. 2023, doi: 10.1007/s41870-019-00409-4.
[30] M. K. Anam, S. Defit, Haviluddin, L. Efrizoni, and M. B. [41] L. Muflikhah, D. Joko Haryanto, A. Andy Soebroto, and E.
Firdaus, “Early Stopping on CNN-LSTM Development to Santoso, “High Performance of Polynomial Kernel at SVM
Improve Classification Performance,” Journal of Applied Algorithm for Sentiment Analysis,” Journal of Information
Data Sciences, vol. 5, no. 3, pp. 1175–1188, 2024, doi: Technology and Computer Science, vol. 3, no. 2, pp. 194–201,
10.47738/jads.v5i3.312. 2018, doi: 10.25126/jitecs.20183260.
[31] F. Suandi et al., “Enhancing Sentiment Analysis Performance [42] L. K. Ramasamy, S. Kadry, Y. Nam, and M. N. Meqdad,
Using SMOTE and Majority Voting in Machine Learning “Performance analysis of sentiments in Twitter dataset using
Algorithms,” in International Conference on Applied SVM models,” International Journal of Electrical and
Engineering, Atlantis Press, 2024, pp. 126–138. doi: Computer Engineering, vol. 11, no. 3, pp. 2275–2284, Jun.
10.2991/978-94-6463-620-8_10. 2021, doi: 10.11591/ijece.v11i3.pp2275-2284.
[32] Hamdani, Randi N.A, and M. K. Anam, “Comparison of [43] B. L. Supriyatna and F. P. Putri, “Optimized support vector
Support Vector Machine and Random Forest Algorithms for machine for sentiment analysis of game reviews,”
Analyzing Online Loans on Twitter social media,” JAIA- International Journal of Informatics and Communication
Journal Of Artificial Intelligence And Applications, vol. 4, no. Technology (IJ-ICT), vol. 13, no. 3, p. 344, Dec. 2024, doi:
1, pp. 8–16, 2024, doi: 10.33372/jaia.v4i1.1087. 10.11591/ijict.v13i3.pp344-353.
[33] A. N. Ulfah, M. K. Anam, N. Y. S. Munti, S. Yaakub, and M. [44] M. Hidayat and A. Wibowo, “SVM Optimization With
B. Firdaus, “Sentiment Analysis of the Convict Assimilation Information Gain Feature Selection to Increase the Accuracy
Program on Handling Covid-19,” JUITA: Jurnal Informatika, of Sentiment Analysis of Increasing The Cost of the Hajj,”
vol. 10, no. 2, pp. 209–216, 2022, doi: Jurnal Teknik Informatika (Jutif), vol. 5, no. 4, pp. 579–591,
10.30595/juita.v10i2.12308. Aug. 2024, doi: 10.52436/[Link].2024.5.4.2217.
[34] P. P. Putra, M. K. Anam, S. Defit, and A. Yunianta, [45] Shahmirul Hafizullah Imanuddin, Kusworo Adi, and Rahmat
“Enhancing the Decision Tree Algorithm to Improve Gernowo, “Sentiment Analysis on Satusehat Application
Performance Across Various Datasets,” INTENSIF: Jurnal Using Support Vector Machine Method,” Journal of
Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi, Electronics, Electromedical Engineering, and Medical
vol. 8, no. 2, pp. 200–212, Aug. 2024, doi: Informatics, vol. 5, no. 3, pp. 143–149, Jul. 2023, doi:
10.29407/intensif.v8i2.22280. 10.35882/jeemi.v5i3.304.
[35] M. K. Anam et al., “Enhancing the Performance of Machine [46] N. W. Susanto and H. Suparwito, “SVM-PSO Algorithm for
Learning Algorithm for Intent Sentiment Analysis on Village Tweet Sentiment Analysis #BesokSenin,” Indonesian Journal
Fund Topic,” Journal of Applied Data Sciences, vol. 6, no. 2, of Information Systems (IJIS), vol. 6, no. 1, pp. 36–47, 2023,
pp. 1102–1115, 2025, doi: 10.47738/jads.v6i2.637. doi: 10.24002/ijis.v6i1.7551.
[36] M. K. Anam et al., “Improved Performance of Hybrid GRU- [47] M. K. Anam, M. I. Mahendra, W. Agustin, Rahmaddeni, and
BiLSTM for Detection Emotion on Twitter Dataset,” Journal Nurjayadi, “Framework for Analyzing Netizen Opinions on
of Applied Data Sciences, vol. 6, no. 1, pp. 354–365, Jan. BPJS Using Sentiment Analysis and Social Network Analysis
2025, doi: 10.47738/jads.v6i1.459. (SNA),” Intensif, vol. 6, no. 1, pp. 2549–6824, 2022, doi:
[37] M. K. Anam, T. P. Lestari, H. Yenni, T. Nasution, and M. B. 10.29407/intensif.v6i1.15870.
Firdaus, “Enhancement of Machine Learning Algorithm in
914