Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2022, International Journal for Research in Applied Science & Engineering Technology (IJRASET)
https://doi.org/10.22214/ijraset.2022.48051…
8 pages
1 file
Email is the worldwide use of communication application. It is because of the ease of use and faster than other communication application. However, its inability to detect whether the mail content is either spam or ham degrade its performance. Nowadays, lot of cases have been reported regarding stealing of personal information or phishing activities via email from the user. This project will discuss how machine learning help in spam detection. Machine learning is an artificial intelligence application that provides the ability to automatically learn and improve data without being explicitly programmed. Binary classifier will be used to classify the text into two different categories: spam and ham. The algorithm will predict the score more accurately. The objective of developing this model is to detect and score word faster and accurately.
Indonesian Journal of Electrical Engineering and Computer Science, 2022
Because of its ease of use and speed compared to other communication applications, email is the most commonly used communication application worldwide. However, a major drawback is its inability to detect whether mail content is either spam or ham. There is currently an increasing number of cases of stealing personal information or phishing activities via email. This project will discuss how machine learning can help in spam detection. Machine learning is an artificial intelligence application that provides the ability to automatically learn and improve data without being explicitly programmed. A binary classifier will be used to classify the text into two different categories: spam and ham. This research shows the machine learning algorithm in the Azure-based platform predicts the score more accurately compared to the machine learning algorithm in visual studio, hybrid analysis and JoeSandbox cloud.
Computer Engineering and Intelligent Systems, 2020
Emails are essential in present century communication however spam emails have contributed negatively to the success of such communication. Studies have been conducted to classify messages in an effort to distinguish between ham and spam email by building an efficient and sensitive classification model with high accuracy and low false positive rate. Regular rule-based classifiers have been overwhelmed and less effective by the geometric growth in spam messages, hence the need to develop a more reliable and robust model. Classification methods employed includes SVM (support vector machine), Bayesian, Naïve Bayes, Bayesian with Adaboost, Naïve Bayes with Adaboost. However, for this project, the Bayesian was employed using Python programming language to develop a classification model.
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022
Spam email is one of the most serious problems in the online world. Nowadays, a large portion of the population relies on available emails or communications from strangers. As a result, the fact that anyone can leave an email or a message opens the door for spammers to compose spam messages concerning our various interests. Spam fills up our inbox with unnecessary messages, slowing down our internet connection and stealing valuable information such as our contact information and accurate information. Detecting spammers and spam content is a major issue of research and time-consuming tasks. Email spam is when someone sends out a large number of emails in a short period of time. The purpose of spam filtering is to determine whether an email is spam or ham. With this proposed system the specified mail can be detected as spam or ham and also IP address of mail.
Sakarya Üniversitesi Fen Bilimleri Enstitüsü dergisi/Sakarya Üniversitesi fen bilimleri enstitüsü dergisi, 2023
Electronic Electronic messages, i.e. e-mails, are a communication tool frequently used by individuals or organizations. While e-mail is extremely practical to use, it is necessary to consider its vulnerabilities. Spam e-mails are unsolicited messages created to promote a product or service, often sent frequently. It is very important to classify incoming e-mails in order to protect against malware that can be transmitted via e-mail and to reduce possible unwanted consequences. Spam email classification is the process of identifying and distinguishing spam emails from legitimate emails. This classification can be done through various methods such as keyword filtering, machine learning algorithms and image recognition. The goal of spam email classification is to prevent unwanted and potentially harmful emails from reaching the user's inbox. In this study, Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM) and Artificial Neural Network (ANN) algorithms are used to classify spam emails and the results are compared. Algorithms with different approaches were used to determine the best solution for the problem. 5558 spam and non-spam e-mails were analyzed and the performance of the algorithms was reported in terms of accuracy, precision, sensitivity and F1-Score metrics. The most successful result was obtained with the RF algorithm with an accuracy of 98.83%. In this study, high success was achieved by classifying spam emails with machine learning algorithms. In addition, it has been proved by experimental studies that better results are obtained than similar studies in the literature. 1. Introduction With the widespread use of the Internet, electronic communication has become more preferred. One of the most important tools of electronic communication is electronic messages, which we call e-mail. Today, individuals or organizations have one or more email accounts. Instant delivery of messages, no cost and ease of use increase the importance and prevalence of e-mail [1]. According to Statista Research Department data, the number of actively used e-mail accounts in 2020 is more than 4 billion. This number is estimated to increase to 4.6 billion in 2025. In 2020, 306 billion e-mails are sent and received every day, and this number is expected to exceed 376 billion in 2025 [2]. The use of e-mail is not only practical but also has various vulnerabilities. The e-mail account to be hijacked in various ways, for e-mails containing advertisements etc. to hijack your computer by installing a software on your computer when you click on the advertisement, and for the installed software to disrupt communication by sometimes filling the
Background: As people using social media increases the data generation also increases and the data generated may be safe or unsafe. If we see some applications like Twitter and mail. We get a lot of emails or twits that include all dangerous and useful things. Here to be safe from the threats and dangers we need a filter that separates useful messages from spam and helps us not to drown in a trap. And one of the approaches to do this is explained in this paper. In this paper, the algorithm followed is the Naïve Bayes classifier. This also provides the comparison between using Naïve Bayes, KNN, and Logistic Regression to solve the same problem that is spam filtering and term frequency-inverse document frequency (TFIDF).
International Journal of Innovative Research in Computer Science and Technology (IJIRCST), 2023
In today's era, almost everyone is using emails on their daily basis. In our proposed research, we suggest a machine learning-based strategy for enhancing email spam filters' accuracy. Traditional rule-based filters have grown less effective as spam emails have multiplied exponentially. Models can be trained to identify emails as spam or not using machine learning algorithms, particularly supervised learning. We need to create a simple and straightforward machine learning model in order to reach more accurate results while categorizing email spam. We picked the Naive Bayes technique for our model since it is quicker and more accurate than other algorithms. The suggested method can have incorporated into current email systems to enhance spam filtering functionality. This review paper provides an overview of the machine learning model we have suggested.
In this project, we focus on electronic mail, one of the most important means of communication among information professionals. As its use and significance among the general populace grows, so does its importance and utility. It has allowed for more adaptability and convenience in communication, both in the private and professional spheres. The increased use of email has led to a rise in spam as well as legitimate messages. An email that is sent to a large number of people without the sender's knowledge or consent is considered spam. Millions of internet users, both casual and professional, are currently frustrated by the widespread problem of email spam. The purpose of this study is to provide a hybrid approach to machine learning for identifying spam in email. Bagging and boosting of machine learning-based multinomial Decision Tree, Naive Bayes, KNN, Random Forest, and the SVM method are the proposed hybrid techniques. The bagging method uses a concurrent combination of weak classifiers to boost classification accuracy. The standard deviation of misclassifications is decreased by using bagging. Alternatively, by linking the classifiers in a series fashion, the boosting strategy can construct a robust classifier out of two or more relatively weak classifiers. Improved classification results can be achieved through reduced bias and variance thanks to the use of boosting. In order to detect spam in emails, it is necessary to take into account datasets, pre-process those datasets, extract and pick features, and classify the data. In this study, we evaluate the feasibility of conducting experiments using data from the Ling-Spam Corpus and the CSDMC2010 Spam Corpus. According to the stop-word list and lemmatiser, Ling-Spam Corpus's dataset is split into four different directories: bare, lemm, lemm stop, and stop. In addition, pre-processing consists of converting strings to word vectors (tokenization), stemming words, and removing stop words. Since the Ling Spam Corpus is already organised according to the stop-word list and the lemmatiser, only the CSDMC2010 Spam Corpus undergoes the stemming and XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE stop words removal processes. Features are extracted and selected from the preprocessed data. The feature selection procedure in this work makes use of a correlation-based approach.
In today's digital age, since email is the main form of communication, the identification of email spam is a critical issue. In addition to consuming a lot of time and money, email spam is also a security and privacy risk. In this paper, we provide a means for email spam detection that employes machine learning Algorithms. The required features for training the ML models have been engineered after analysis of the email dataset of contentbased filtering obtained from Kaggle website. We tested a Several types of algorithms for machine learning and analyzed their level of performance using the dataset. Our findings demonstrate how effective is the suggested approach in identifying email spam with highest accuracy of 99.8% and Rmse of 0.2 .Here we applied , the various ML classifier algorithm such as Decision tree , Voting Classifier , Random Forest, Logistic Regression and so on to our dataset ,compared among each other and found which suits best for the dataset with the highest accuracy. This method can be useful in email clients or servers to detect spam emails automatically and enhanced
International journal for research in applied science and engineering technology, 2024
Email communication has become an essential aspect of modern-day interactions, but the proliferation of spam emails poses significant challenges to users' productivity and security. This research paper presents a comprehensive study on the development and implementation of an efficient email spam detection and categorization system. The project aims to categorize emails into predefined sections by using the Support Vector Machine (SVM) model, Flask, and the Gmail API, ensuring accuracy and efficiency in email classification. The methodology involves data preparation, processing, storage, and management, ensuring robust security and privacy considerations. The system's three-tiered classification strategy enhances the accuracy of spam and ham detection. Future enhancements include integrating advanced machine learning models, user feedback mechanisms, and multi-platform support to adapt to evolving email trends and user preferences. This research contributes to the field of email management by offering a new approach to combat spam effectively and enhance email organization for users in the digital age.
IRJET, 2023
Emails are frequently used by individuals for professional and personal use. Many individuals possess more than one email id often provided by the organizations they are working with for professional use.[1] This indicates that multiple emails can be created and attackers make use of fake profile to con people by possessing as a genuine person from a legitimate organization. This is known as Email Phishing which is a popular cyber security attack used by attackers to gain sensitive information from users.[4] Nowadays, anyone can send an email to any organization or individual. This provides a golden opportunity to either send spam or malicious emails.[5] The goal of this paper is to identify these spam mails by using machine learning, which through its mechanisms, allows models to analyze massive amounts of complex data with the help of various algorithms and alert the user about suspicious and possibly spam mails.
Sci. Program., 2021
Social communication has evolved, with e-mail still being one of the most common communication means, used for both formal and informal ways. With many languages being digitized for the electronic world, the use of English is still abundant. However, various native languages of different regions are emerging gradually. The Urdu language, coming from South Asia, mostly Pakistan, is also getting its pace as a medium for communications used in social media platforms, websites, and emails. With the increased usage of emails, Urdu’s number and variety of spam content also increase. Spam emails are inappropriate and unwanted messages usually sent to breach security. These spam emails include phishing URLs, advertisements, commercial segments, and a large number of indiscriminate recipients. Thus, such content is always a hazard for the user, and many studies have taken place to detect such spam content. However, there is a dire need to detect spam emails, which have content written in Urd...
2022
People's communication methods are being transformed by electronic mail because of its affordability, speed, and simplicity. Due to their widespread exposure, spam emails have become a serious roadblock in electronic communication. The amount of time users sifting through incoming mail and eliminating spam necessitates the implementation of spam detection software. The main objective is to create suitable filters that can correctly recognise these emails and deliver outstanding performance in the majority of cases. This project makes use of Spam Detection to tell spam from valid email. SVM, a machine learning method, is employed in this case to assess it. SVMs and other approaches of machine learning (AI) Spam detection can benefit greatly from machine (SVM) detection. This project's classification is based on its features. In the email world, spam is a term that refers to unsolicited commercial communications or emails that deceive the recipient. With the use of artificial intelligence and machine learning, spam messages can be identified. Spam filtering is a popular application of machine learning techniques. Machine learning classifiers are used to identify emails as either ham (legitimate messages) or spam (unwanted messages) using these techniques.
figshare. Conference contribution., 2022
In today's world, email is used in almost every industry, from business to education. Emails can be categorized into two categories: ham and spam. Junk emails, also known as spam messages, are emails that have been designed to harm recipients by wasting their time, computing resources, and stealing their valuable information. It is estimated that spam emails are increasing at a rapid rate. One of the most important and prominent spam prevention techniques is filtering email. Naive Bayes, Decision Trees, Neural Networks, and Random Forests are among the methods used for this purpose by researchers. In this project, I examine the Logistic Regression machine learning model for spam filtering in email by categorizing messages into appropriate groups. This study also compares the techniques based on accuracy, precision, recall, etc. The accuracy level for this project was around 97%. Towards the end, these insights and future research directions, and challenges are outlined.
IRJET, 2022
Nowadays, Email spam has become a big problem, with the fast growth of internet users, email spams are also increasing. People are using them for phishing, illegal and unethical practices and frauds. Sending malicious links through spam emails that can harm for our system and may also get into your system. It is very simple for spammers to create a fake profile and email account, they show like a real person in their spam emails, these spammers simply target people who are not aware of these frauds. then there is a need to identify those spam mails which are frauds, this project will identifies those spams using techniques of machine learning, this paper will discuss machine learning algorithm's and apply all these algorithm's to our dataset. it select the best algorithm, for this project algorithm will be chosen based on the best accuracy and precision in email spam detecting.
IRJET, 2023
With fast development of web clients, E-mail spams are increasing alarmingly. People are misusing these spam mails in several ways, to transfer malicious content, unwanted, unsolicited, irrelevant advertisements which can hurt one's framework and spoof on our framework. It could contain malware, such as ransomware and spyware. Creation of a forged or the fake kind of profile and fake email account is far easier for spammers and they create spam mail that is difficult to distinguish from real mail. Thus, it is required to differentiate spam mails and prevent their entry into the inbox. This has been attempted using machine learning techniques. Spam detection through various machine learning algorithms has been attempted and it is found that Multinomial naive Bayes algorithm is more efficient and gives the highest Spam detection with finest accuracy and exactness.
2023
This paper focuses on the security of electronic mail, using machine learning algorithms. Spam email is unwanted messages, usually commercial, sent to a large number of recipients. In this work, an algorithm for the detection of spam messages with the aid of machine learning methods is proposed. The algorithm accepts as input text email messages grouped as benevolent ("ham") and malevolent (spam) and produces a text file in csv format. This file then is used to train a bunch of ten Machine Learning techniques to classify incoming emails into ham or spam. The following Machine Learning techniques have been tested: Support Vector Machines, k-Nearest Neighbour, Naïve Bayes, Neural Networks, Recurrent Neural Networks, Ada Boost, Random Forest, Gradient Boosting, Logistic Regression and Decision Trees. Testing was performed using two popular datasets, as well as a publicly available csv file. Our algorithm is written in Python and produces satisfactory results in terms of accuracy, compared to state-of-the-art implementations. In addition, the proposed system generates three output files: a csv file with the spam email IP addresses (of originating email servers), a map with their geolocation, as well as a csv file with statistics about the countries of origin. These files can be used to update existing organisational filters and blacklists used in other spam filters.
2020
1-4Student, Department of Computer Engineering, TEC, University of Mumbai, Mumbai, India 5Faculty, Department of Computer Engineering, TEC, University of Mumbai, Mumbai, India ---------------------------------------------------------------------***---------------------------------------------------------------------Abstract Due to its convenient, economical, fast, and easy to use nature Electronic mail is a vital revolution taking place over traditional communication systems. A main obstruction in electronic communications is the vast publicizing of unwanted, harmful emails known as spam emails. Lots of time of client is being wasted for sorting approaching mail and erasing undesirable correspondence, so there is a need for spam detection so that its outcomes can be reduced. The main aim is to development of suitable filters that can appropriately detect those emails and results in a high-performance rate.
International Journal of Scientific & Technology Research, 2020
Electronic mail (E-mail) is used to exchange messages between people via internet. E-mail protocols like Simple Mail Transfer Protocol (SMTP), POP (Post Office Protocol) and IMAP (Internet Message Access Protocol) are used to transfer messages from sender to receiver. Due to the flaws in E-mail protocols, development of online businesses and advertisement companies create E-mail based intimidation. E-mail spam is called as junk mail. Today handling spam mail is one of the major problems in software companies. Since spam mail causes traffic problems and bottle necks that limit memory space, computing power and speed. And also a user has to spend more time to detect and obliterate spam mails. Machine learning models are used to are used to overcome this problem. Machine learning models are categorized into supervised, unsupervised and semi supervised learning models. Supervised learning models are used to classify E-mails, filter and prevent the spam mail. The proposed work explores t...
2016
Web spam is one of the major problems of search engines because it reduces the quality of the Web page. Web spam also effects economically because spammers provide a large free advertising data or sites on the search engines and so an increase in the web traffic. There are certain ways to distinguish such spam pages and one of them is using classification techniques. Comparative analysis of web spam detection using machine learning algorithm like LAD Tree, and Random Forest, C4.5 and Naive bayes have been presented in this paper. Experiments were carried out on feature sets of universally accepted dataset WEB SPAM UK-2007 using WEKA. By observing all the results we found that Random forest works well on content based features, link based features and transformed link based features. But few techniques were found time consuming as compared to other classification techniques used. Keywords—Machine learning, Spamdexing, cloaking, link spam, content spam, C4.5, Naive bayes, LAD tree, de...
2015
Emails are used by number of users for educational purpose or professional purpose. But the spam mails causes serious problem for email users likes wasting of user"s energy and wasting of searching time of users. This paper present as survey paper based on some popular classification technique to identify whether an email is spam and non-spam. For representing spam mails ,we use vector space model(VSM). Since there are so many different word in emails, and all classifier can not be handle such a high dimension ,only few powerful classification terms should be used. Other reason is that some of the terms may not have any standard meaning which may create confusion for classifier.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.