Papers by Muigai Samuel

This study pioneers a new method for detecting violence in surveillance videos, addressing a majo... more This study pioneers a new method for detecting violence in surveillance videos, addressing a major challenge in public safety and video analysis. The study presents a hybrid model that uses Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Support Vector Machines (SVM) to detect violent incidents in video data. The Convolutional Long-Short-Term Memory and Support Vector Machines (Conv-LSTM-SVM) model combines CNN spatial feature extraction, LSTM temporal dependency modelling, and SVM classification power. A pre-trained DenseNet121 model extracts spatial information efficiently via transfer learning from large datasets in the proposed architecture. An LSTM layer captures temporal dynamics needed to understand video sequence activity development, while an SVM with a Radial Basis Function (RBF) kernel creates complex decision boundaries in the feature space with many dimensions for the final categorization. The model was developed, trained and tested using the Keras library running on TensorFlow, using an experimental research design. The model is tested using two well-known datasets: the UCF Crime dataset, which contains 1900 surveillance clips of 13 classes of violent situations, and the RWF-2000 dataset, which analyses real-world fighting. The proposed model is the best in its class, outperforming CNN, LSTM, and Conv-LSTM models with 97.3% accuracy on the UCF Crime dataset. Cross-dataset validation yielded 92.5% accuracy on the RWF-2000 dataset without changes, demonstrating robust generalization. The study also considers how public safety could be improved by processing several video streams in real time and reducing false alarms. An examination of the ethical challenges and restrictions of automated surveillance systems, such as privacy, biases, and human supervision was also done. This research uses advanced video analysis to improve public safety by creating more efficient and adaptive surveillance systems.
Thesis Chapters by Muigai Samuel

There has been widespread use of Closed-circuit Television (CCTV) surveillance cameras in both pu... more There has been widespread use of Closed-circuit Television (CCTV) surveillance cameras in both public and private settings to increase security. The bitrate for an FHD (Full High Definition) camera operating at thirty frames per second (30 fps) with moderate compression is eight megabits per second (Mbps). Based on the assumption of this bitrate and a twenty-four-hour recording period, the approximate daily data output of a single FHD camera would then amount to approximately eighty-six Gigabytes (86 GB). Monitoring and analyzing all of this material footage is challenging due to the large volume of the video data. Consequently, machine learning models have been utilized to automate analysis of surveillance footages in order to detect any forms of violence. While these models have demonstrated promising outcomes, they continue to face challenges in terms of processing speed and accuracy, particularly in the extraction of spatiotemporal features. This study developed a model based on the Convolutional Long-Short-Term Memory and Support Vector Machines (Conv-LSTM-SVMs) approach for detecting violence in CCTV surveillance footage. Convolutional Neural Networks (CNNs) are a type of deep neural networks that are made to handle organised grid data, like images. Long Short-Term Memory (LSTM) networks belong to the family of Recurrent Neural Networks (RNNs) and are designed for processing sequential data. Support Vector Machines (SVMs) are a type of supervised machine learning method used for tasks like regression and classification. The integration of CNNs, LSTM networks, and SVMs leverages the unique advantages of each design, resulting in a comprehensive approach. The model was developed, trained and tested using the Keras library running on TensorFlow, using an experimental research design. The impact of various hyper-parameters on the performance of the hybridized model was investigated, and the results used to optimize the model for better performance. The UCF-Crime dataset was used for model training, validation, and testing, while the RWF-2000 dataset was used for external validation. The training data was augmented to ensure the model was well trained on the wide range of violent and non-violent activities it may experience in real-world settings. The model’s performance was evaluated, and a comparative table used to compare the speed and recognition accuracy of the hybrid model against that of similar existing state of the art models. With an accuracy of 97.8%, the Conv-LSTM-SVM model demonstrated its potency in identifying violent action in surveillance footage, against 75%, 80% and 97% of the LSTM, CNN, and Convolutional Long-Short-Term Memory (Conv-LSTM) models respectively. Even though the Two-Stream Fusion CNN model demonstrated a marginally greater accuracy of 97.8%, the hybrid model demonstrated relatively higher computational efficiency with a low inference time of 36 milliseconds, and a training time of nine hours. Experimentation revealed that optimal regularization can be achieved by using a dropout rate of 0.5, learning rate of 0.001 and a batch size of 32. The Adam optimizer demonstrated the most rapid convergence, achieving experimental convergence in a span of 145 minutes. When tested on an unseen heterogeneous RWF-2000 dataset, the model verified cross-domain viability with 91.3% detection accuracy without retraining. The excellent performance and efficacy in accurately identifying violent behaviour make the hybrid model a feasible tool for enhancing public safety and security in a range of surveillance scenarios.
Uploads
Papers by Muigai Samuel
Thesis Chapters by Muigai Samuel