Text Categorization Model Based on Linear Support Vector Machine



Spam mails constitute a lot of nuisances in our electronic mail boxes, as they occupy huge spaces which could rather be used for storing relevant data. They also slow down network connection speed and make communication over a network slow. Attackers have often employed spam mails as a means of sending phishing mails to their targets in order to perpetrate data breach attacks and other forms of cybercrimes. Researchers have developed models using machine learning algorithms and other techniques to filter spam mails from relevant mails, however, some algorithms and classifiers are weak, not robust, and lack visualization models which would make the results interpretable by even non-tech savvy people. In this work, Linear Support Vector Machine (LSVM) was used to develop a text categorization model for email texts based on two categories: Ham and Spam. The processes involved were dataset import, preprocessing (removal of stop words, vectorization), feature selection (weighing and sele...