Comparing RF and LR for Password Strength
Comparing RF and LR for Password Strength
To what extent is Random Forest (RF) favourable over Logistic Regression (LR)
Context: Passwords have become the universal method of authentication due to their
simplicity and compatibility across a wide range of systems. However, due to their
wide-spreadness, they have become vulnerable to external attacks like password cracking.
Users are infamously poor at maintaining entropy in their passwords due to their tendency of
including dictionary words, names, places, dates, keyboard patterns and so on in passwords,
making them predictable. Password strength classifiers developed using machine learning
algorithms like Random Forest (RF) and Logistic Regression (LR) can efficiently prevent
Subjects and Methods: In this study, I trained two machine learning models to detect the
strength of different passwords. The two models use Random Forest and Logistic Regression
respectively to classify passwords strengths as 0,1 or 2 with 0 being weak and 2 being
strongest. I tested the model on 669,643 independent passwords retrieved from Kaggle and
Results: Random Forest has higher prediction accuracy whereas Logistic Regression has
learning
1. Introduction
Password Strength Classifiers are algorithms used to assess the strength or effectiveness of
passwords. They are designed to analyse characteristics of a password like length, use of
diverse character classes, and complexity to compute a score indicating the password’s level
of security. Higher score means the password is more effective against external attacks.
The usefulness of password classifiers however lies in their ability to give real-time feedback
to users regarding the strength of the password they are creating. Through this, users are
Password strength classification is used across a variety of websites during the user
registration process for enforcing strong passwords. These are also extended by password
managers like LastPass2 who use it for assessing the safety of users’ saved passwords and for
suggesting strong passwords. Additionally, password classifiers are paving their way into
education with services like Password Monster that are being used to teach users about good
password habits3.
An approach to creating these algorithms is through machine learning. Random Forest (RF)
and Logistic Regression (LR) are two machine learning models to classify password strength.
However, it is unclear which model is better based on prediction accuracy and prediction
time. Additionally, variation in performance between RF and RS could differ for different
1
“Real Time Password Strength Analysis on a Web Application Using Multiple Machine Learning Approaches –
IJERT.” International Journal of Engineering Research & Technology, 24 December 2020,
https://www.ijert.org/real-time-password-strength-analysis-on-a-web-application-using-multiple-machine-learni
ng-approaches. Accessed 2 July 2023.
2
LastPass. "How Secure is Your Password?" LastPass | Something Went Wrong, lastpass.com/howsecure.php.
Int. J. Mech. Eng. Res. & Tech 202 Int. J. Mech. Eng. Res. & Tech 20221
accuracy and prediction time. This research could help developers make comprehensive
eliminates investing unnecessary time, computer resources and labour attempting to fit data
into an ‘unsuitable’ machine learning algorithm when a better alternative exists, especially
considering machine learning models can require weeks to develop for real world accuracy.4
2. Theoretical Background
Password Strength is the measure of a password’s resistance against brute-force attacks and
password cracking. The strength lies in the following characteristics of the password:
complexity, length and unpredictability5. To fulfil these criteria, the following guidelines must
be adhered:
- Avoid dictionary words, keyboard patterns, repetitive characters, and letter sequences
weak, medium and strong passwords which are numerically represented as 0, 1 and 2
respectively. The nature of each password strength level is described in the table below
Although decision trees can be used for password strength classification, they may be deemed
unsuitable due to the problem of overfitting, especially when dealing with high-dimensional
A decision tree is generated by recursive splitting of data, based on the most informative
feature, into decision nodes that are used to make decisions and leaf nodes that determine the
result.8 Overfitting occurs during this recursive process when the decision tree catches
random fluctuations or noise in the data instead of learning the underlying patterns. This
overfitting issue can be solved by Random Forest.9 For this reason, Random Forest has been
Int. J. Mech. Eng. Res. & Tech 202 Int. J. Mech. Eng. Res. & Tech 20221
chosen instead.
Linear regression cannot be used because it lacks suitability for password strength
classification due its inherent nature of predicting only numerical continuous outcomes,
In Linear Regression, a linear regression is assumed between the inputted feature and output.
feature and password strength, therefore, the linear regression model will not accurately
capture the relationship11. For this reason, Logistic Regression is used instead, which is better
Random Forest is a supervised machine learning algorithm built on the concept of ensemble
learning, where multiple classifiers are combined to solve complex problems12. Like the name
suggests, a random forest is a collection of multiple decision trees that are trained on various
data subsets through a technique known as bootstrapping. Multiple decision trees enhance
10
"Linear Regression in Machine Learning - Javatpoint." Www.javatpoint.com,
www.javatpoint.com/linear-regression-in-machine-learning. Accessed 10 July 2023.
11
"Linear Regression in Machine Learning - Javatpoint." Www.javatpoint.com,
www.javatpoint.com/linear-regression-in-machine-learning. Accessed 10 July 2023.
12
"Random Forest Algorithm." Www.javatpoint.com,
www.javatpoint.com/machine-learning-random-forest-algorithm. Accessed 10 July 2023.
accuracy as instead of relying on a single decision tree, the algorithm takes the prediction
from each tree. In order to compute a result, a majority voting is done with each tree’s
prediction13.
Since each tree is generated independently with different data and attributes, Random Forest
allows parallelisation, meaning the CPU can fully be used to create random forests.
Moreover, as majority voting is carried out, the model’s performance is not heavily affected
by minor changes in the dataset, improving stability of the model. The majority vote also has
13
R, Sruthi E. "Random Forest | Introduction to Random Forest Algorithm." Analytics Vidhya, 21 June 2022,
www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/. Accessed 10 July 2023.
During Training: Random data points are selected without replacement from the training data
to build individual decision trees. During construction of the trees, at each node the algorithm
During Testing: For the same password instance from the training set, every tree individually
predicts based on its individual criteria, as seen in the image above, different trees give
different predictions. These predictions are then considered as votes and the class with the
target variable’s probability based on dependent variables. The main purpose of the model is
to find the best fitting model to describe a relationship between an independent variable and a
dependent variable14. A logistic function is used to model the dependent variable, hence, the
name logistic regression. This logistic function is represented the sigmoid function as below:
𝑆(𝑥) = 1
−𝑥
1+𝑒
Logistic Regression calculates its output using this equation to return a probability value
used for binary classification, it can be extended further to classify three or more classes,
14
"Logistic Regression in Machine Learning." Www.javatpoint.com,
www.javatpoint.com/logistic-regression-in-machine-learning. Accessed 10 July 2023.
15
Pant, Ayush. "Introduction to Logistic Regression." Medium, 22 Jan. 2019,
towardsdatascience.com/introduction-to-logistic-regression-66248243c148#:~:text=Logistic%20regression%20t
ransforms%20its%20output,to%20return%20a%20probability%20value. Accessed 10 July 2023.
16
"Just a Moment..." Just a Moment..,
machinelearningmastery.com/multinomial-logistic-regression-with-python/#:~:text=Logistic%20regression%20i
s%20a%20classification,to%20as%20binary%20classification%20problems. Accessed 10 July 2023.
A Comparison Study of Random Forest and Logistic Regression for Password Strength Classification
In order to classify the passwords, a One-Vs-Rest classification strategy has been used17.
17
GeeksforGeeks | A Computer Science Portal for Geeks,
www.geeksforgeeks.org/one-vs-rest-strategy-for-multi-class-classification/. Accessed 10 July 2023.
A Comparison Study of Random Forest and Logistic Regression for Password Strength Classification
During Training: Three different binary classifiers are trained for each output class (Weak,
Medium and Strong). All three models are trained on the same test data, however the labels
of positive or negative differ in each case. Through this, a threshold is calculated of ‘Weak’
or ‘Not Weak’, ‘Medium’ or ‘Not Medium’ and ‘Strong’ or ‘Not Strong’, as seen in the
image above.
During Testing: The password is passed through all three models and each model gives out an
output score indicating the probability of the password belonging to that class. The password
3. Methodology
The two models were developed with Google Collaboratory18 using Python. To train and test
the models, the Password Strength Classification dataset from Kaggle was used19. The dataset
contains a total of 669643 passwords. The CSV file contains the password along with a
strength measure equal to 0, 1 or 2, with 0 being weak. There are 496,801 medium (1)
passwords, accounting for about 74% of all the passwords in the dataset.
Total 669643
18
“Welcome To Colaboratory - Colaboratory.” Google Research, https://colab.research.google.com. Accessed 20
July 2023.
19
"Password Strength Classifier Dataset." Kaggle: Your Machine Learning and Data Science Community,
www.kaggle.com/datasets/bhavikbb/password-strength-classifier-dataset. Accessed 20 July 2023.
A Comparison Study of Random Forest and Logistic Regression for Password Strength Classification
As the data set is unbalanced with an approximately 1:6:1 ratio, the the accuracy is not
enough to establish which algorithm performs better due to the problem of overfitting, hence,
the confusion matrix is further used to calculate the F1-score in order to accurately evaluate
the models’ performances. Additionally, each model’s testing time was also noted.
During the pre-processing, the CSV file is loaded into a DataFrame. Then, the number of
missing values (NaNs) is identified and hence all missing values are deleted. The remaining
data is now converted into a numpy array and shuffled to avoid any biases that may exist in
the ordering of the dataset. Using a function for tokenization and the ‘fit_transform’ method,
For this experiment, the training/testing split had to be the same for a fair comparison. Hence,
the train/test split was 80/20% for both models, meaning 535,711 passwords were used for
While programming the Random Forest model, the number of decision trees used in this
investigation is 10, specified through the ‘n_estiamtors’ parameter. The criterion is set as
‘entropy’, meaning that the algorithm will be using information gain based on entropy of
class labels to evaluate splits. With ‘random_state = 0’ the algorithm sets the random seed to
ensure reproducibility.
the Logistic Regression model. This instance, named as ‘log_class’, is the logistic regression
1) F1 Score: calculated so that a single value can be used for comparison of the
algorithms to take into account a good average of the precision and recall21. Hence,
the F1 Score indicates a good balance between correctly identifying positive cases and
calculated for each password strength classification for both the models
2 × (𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙)
𝐹1 𝑠𝑐𝑜𝑟𝑒
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
=
2) Prediction Time (seconds): Measured by the difference between the end and start
time for each model’s program. The values were calculated through the use of
Python’s time library, using the time.time() method23. The start time was computed
after the training and the end time was computed after the testing. Therefore, the
lower the prediction time in seconds, the better the performance. The prediction time
2. Execute the code for Random Forest, listing the resultant prediction time and
3. Clear the Google Compute Engine’s memory to prevent lower time values in later
4. Repeat steps 2-3 two more times for trial 2 and trial 3
21
"F1 Score in Machine Learning: Intro & Calculation." V7 - AI Data Platform for Computer Vision, 1 2023,
www.v7labs.com/blog/f1-score-guide. Accessed 20 July 2023.
22
Allwright, Stephen. “What is a good F1 score? Simply explained (2022).” Stephen Allwright, 20 April 2022,
https://stephenallwright.com/good-f1-score/. Accessed 20 July 2023.
23
“time — Time access and conversions — Python 3.11.4 documentation.” Python Docs,
https://docs.python.org/3/library/time.html. Accessed 21 July 2023.
A Comparison Study of Random Forest and Logistic Regression for Password Strength Classification
5. Calculate the F-1 score for each password strength using the confusion matrix from
4. Experimental Results
The table below compares the classification performances of the Random Forest and Logistic
Regression with respect to the password strength. All values observed from the program were
rounded off to 4 decimal places for higher precision. Three trials were conducted because
both the models have randomness in their initialisation and training process; therefore
conducting three trials ensures that the evaluation is representative and not influenced by a
specific randomisation. This averaged result will, thus, be more indicative of expected
real-world performance.
The table below compares the prediction time of the Random Forest and Logistic Regression.
Each value in the table has been recorded in seconds elapsed during the program execution,
these lower values are more favourable as they indicate quicker predictions by the model. For
the same reasons as stated under ‘Prediction Accuracy’, four decimal places have been used
My experiment indicates that Random Forest can predict password strength with very good
accuracy with a Macro F1 Score of 0.9684. The accuracy seems to increase as password
strength increases, however after medium strength there seems to be diminishing return in
accuracy, this may be due to the imbalance in the dataset. Nevertheless, the overall accuracy
from weak to strong increases from 0.9495 to 0.9695, therefore, the impact of the unbalanced
dataset on the model is minor. This may be the case as Random Forest can effectively handle
imbalanced data by assigning higher weights to the minority class24 during training, thus
improving classification of weak passwords. This property combined with its ensemble
24
“Surviving in a Random Forest with Imbalanced Datasets | by Kyoun Huh | SFU Professional Computer
Science.” Medium, 13 February 2021,
https://medium.com/sfu-cspmp/surviving-in-a-random-forest-with-imbalanced-datasets-b98b963d52eb.
Accessed 3 August 2023..
A Comparison Study of Random Forest and Logistic Regression for Password Strength Classification
However, the pros of the Random Forest are also likely the reason for its poor time
Regression. Its ensemble nature constructs 10 decision trees, each which requires time for
Once again, the results indicate the same pattern of increasing and decreasing accuracy,
however the variations in Logistic Regression are much more significant. With a macro F1
score of 0.6746, the Logistic Regression model has an average accuracy. This may be due to
the fact Logistic Regression handles categorical features by encoding them26, however, in the
case of passwords, the categorical variables have complex relationships, making it difficult
for the model to encode them. The unbalanced dataset may also be a huge contributor to
25
“Random forest Algorithm in Machine learning: An Overview.” Great Learning,
https://www.mygreatlearning.com/blog/random-forest-algorithm/. Accessed 3 August 2023.
26
Roy, Baijayanta. “All about Categorical Variable Encoding | by Baijayanta Roy.” Towards Data Science,
https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02. Accessed 3 August
2023.
A Comparison Study of Random Forest and Logistic Regression for Password Strength Classification
towards the majority class27 and struggle to predict the minority classes as seen with the
lower F1 score for weak (0.3893) and strong passwords (0.7493). Despite its accuracy
limitations, Logistic Regression took the lead in prediction time with just 0.0439 seconds.
5. Conclusion
In this paper, the performance of Random Forest and Logistic Regression was compared
using the prediction accuracy and prediction time for classification of password strength.
While the inputted dataset for both the models was identical, differences in general trends for
Overall, my experiment shows that Random Forest performs with better accuracy whereas
tradeoff exists in both the models for password strength classification. Hence, the selection of
the model depends on the stakeholder’s priority. In case of lower computational resources and
large time constraints, Logistic Regression would be a better choice, whereas where accuracy
is the deciding factor, Random Forest should be used. Hopefully, this paper will prove useful
to developers who are looking to incorporate machine learning models for password strength
27
“Issues using logistic regression with class imbalance, with a case study from credit risk modelling.” American
Institute of Mathematical Sciences, https://www.aimsciences.org/article/doi/10.3934/fods.2019016. Accessed 3
August 2023.
A Comparison Study of Random Forest and Logistic Regression for Password Strength Classification
6. Bibliography
“Real Time Password Strength Analysis on a Web Application Using Multiple Machine
Learning Approaches – IJERT.” International Journal of Engineering Research &
Technology, 24 December 2020,
https://www.ijert.org/real-time-password-strength-analysis-on-a-web-application-using-multi
ple-machine-learning-approaches. Accessed 2 July 2023.
"Password Strength Classifier Dataset." Kaggle: Your Machine Learning and Data Science
Community, www.kaggle.com/datasets/bhavikbb/password-strength-classifier-dataset.
Accessed 20 July 2023.
"F1 Score in Machine Learning: Intro & Calculation." V7 - AI Data Platform for Computer
Vision, 1 2023, www.v7labs.com/blog/f1-score-guide. Accessed 20 July 2023.
Allwright, Stephen. “What is a good F1 score? Simply explained (2022).” Stephen Allwright,
20 April 2022, https://stephenallwright.com/good-f1-score/. Accessed 20 July 2023.
“time — Time access and conversions — Python 3.11.4 documentation.” Python Docs,
https://docs.python.org/3/library/time.html. Accessed 21 July 2023.
“Surviving in a Random Forest with Imbalanced Datasets | by Kyoun Huh | SFU Professional
Computer Science.” Medium, 13 February 2021,
https://medium.com/sfu-cspmp/surviving-in-a-random-forest-with-imbalanced-datasets-b98b
963d52eb. Accessed 3 August 2023..
Roy, Baijayanta. “All about Categorical Variable Encoding | by Baijayanta Roy.” Towards
Data Science,
https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02.
Accessed 3 August 2023.
“Issues using logistic regression with class imbalance, with a case study from credit risk
modelling.” American Institute of Mathematical Sciences,
https://www.aimsciences.org/article/doi/10.3934/fods.2019016. Accessed 3 August 2023.