Face Recognition Attendance System
Face Recognition Attendance System
BATCHLORE OF TECHNOLOGY
in
Submitted to the
Department of Computer Science and Engineering
I hereby declare that the work presented in this project titled, “FACE RECOGNITION
BASED ATTENDANCE SYSTEM” submitted by me in the partial fulfilment of the
requirement of the award of the degree of Batchlor of Technology (B.Tech.) submitted in the
Department of Computer Science & Engineering, Uttarakhand Technical University,
Dehradun, is an authentic record of my project report carried out under the guidance of Mr.
Yogesh Bajpai, Assistant Professor Department of Computer Science and Engineering
under SoEC, Dev Bhoomi Uttarakhand University, Dehradun
ii
CERTIFICATE
iii
ABSTRACT
The "Face Recognition based Attendance System" is a Python-based project designed to streamline
the attendance tracking process in educational institutions. Leveraging computer vision and machine
learning techniques, the system employs libraries such as Cv2, Face Recognition, scikit, and NumPy
to enable efficient face recognition and attendance recording.
The project is structured around two main features. Firstly, an administrative interface allows the
addition of students' facial data along with their corresponding names to a centralized database. This
serves as the foundation for the second feature, where the system recognizes individual student faces
in real-time, marking their attendance and recording the time of entry.
The project is organized into distinct modules, including face recognition, attendance management,
and database operations. The codebase follows a modular structure to enhance maintainability and
ease of understanding. The system's dependencies are documented in the requirements.txt file,
ensuring easy reproduction of the development environment.
This report details the implementation, challenges faced, and the overall functionality of the "Face
Recognition based Attendance System." The potential applications, benefits, and future improvements
of the system are also discussed, highlighting its significance in automating attendance tracking
processes for educational institutions
iv
ACKNOWLEDGEMENT
At this ecstatic time of presenting this dissertation, first, the author bows to almighty God for
blessing with enough patience and strength to go through this challenging phase of life. I would
like to express a deep sense of gratitude and thanks to those people who have helped me in the
accomplishment of this M. Tech. thesis.
First and foremost, I would like to thank my supervisor, Mr. Dhajvir Singh Rai for their
expertise, guidance, enthusiasm, and patience. These were invaluable contributors whose
insightful guidance helped to the successful completion of this dissertation and spent many
hours patiently answering questions and troubleshooting the problems.
Beyond all this, I would like to give special thanks to my parents, Husband and daughter for
the unbounded affection, sweet love, constant inspiration, and encouragement. Without their
support this research would not have been possible.
Finally, I would like to thank all faculty, college management, administrative and technical staff
of School of Engineering & Computing, Uttarakhand Technical University, Dehradun for
their encouragement, assistance, and friendship throughout my candidature.
v
TABLE OF CONTENTS
Page
No.
Candidate’s Declaration ii
Certificate iii
Abstract iv
Acknowledgements
Contents.
List of Figures
List of Tables
CHAPTER 6: SNAPSHOTS
vii
LIST OF FIGURES
viii
Figure 1.1 Architecture of highlight generation system 2
Figure 1.2
Screenshot from a recording with argument 6
Figure 1.3
Image from a meeting that was conducted virtually 6
Figure 1.4
Meeting where the sentiment was of delight and 7
celebration
Figure 1.5
Overall architecture of proposed system 9
Figure 3.1
Extraction Component 30
Figure 3.2
Naïve Bayes Algorithm Steps 31
Figure 3.3 34
Analysis Component
Figure 3.4 36
Highlighter Engine
Figure 4.1 Screenshot from a recording on which the experiment 39
was conducted
ix
LIST OF TABLES
x
xi
CHAPTER 1 INTRODUCTION
1.1 Overview
The Face Recognition Attendance System represents a cutting-edge solution to
revolutionize the conventional methods of attendance tracking. This innovative system
leverages advanced facial recognition technology to provide an accurate, efficient, and
secure means of recording attendance in various educational and organizational settings.
This section will provide a concise overview of the key components and functionalities of
the Face Recognition Attendance System, setting the stage for an in-depth exploration of its
features, objectives, and underlying technologies.
Key Features:
Facial Enrolment: The system facilitates the secure capture and storage of facial data during
the enrolment process.
Automated Attendance Marking: Utilizing sophisticated facial recognition algorithms to
automatically mark attendance during scheduled sessions.
User-Friendly Interface: Designed with an intuitive and user-friendly interface to ensure
accessibility for both administrators and end-users.
Comprehensive Reporting: The View Attendance Module allows administrators to view,
analyse, and export attendance reports for informed decision-making.
Scalability: The system is adaptable and scalable, catering to the unique requirements of
various educational institutions and organizations.
By embracing modern technology and prioritizing accuracy and efficiency, the Face
Recognition Attendance System aims to redefine the way attendance is managed, paving the
way for a more streamlined and intelligent approach.
1.2 Background
1.3 Objectives
The Face Recognition Attendance System is designed with the following key objectives in
mind:
- Automate the attendance marking process: Replace traditional manual methods with a
sophisticated facial recognition system to eliminate the need for manual attendance taking.
- Improve accuracy and efficiency: Reduce the likelihood of errors associated with manual
data entry and enhance the overall efficiency of attendance tracking.
- Provide a user-friendly interface: Create an intuitive and user-friendly interface for both
students and administrators to interact with the system seamlessly.
- Reduce administrative burden: Alleviate the administrative workload by automating
attendance processes, allowing educators and administrators to focus on more strategic tasks.
1.4 Scope
The scope of the Face Recognition Attendance System encompasses a comprehensive set of
functionalities to cater to the diverse needs of educational institutions and organizations. The
system includes:
- Registration Module: Enabling the enrollment of students by capturing and storing their
facial data securely.
- Attendance Module: Automatically marking attendance based on facial recognition during
scheduled class sessions.
- View Attendance Module: Allowing administrators to view, analyze, and export attendance
reports for further analysis.
- Scalability: The system is designed to be scalable, adaptable, and customizable to meet the
specific requirements of different institutions.
The Face Recognition Attendance System is structured into distinct modules, each serving a
specific purpose:
Registration Module: This module facilitates the enrollment of students into the system by
capturing and storing their facial data securely.
Attendance Module: The core functionality that employs facial recognition algorithms to
mark attendance during class sessions automatically.
View Attendance Module: An interface for administrators to view, analyze, and export
attendance reports, providing valuable insights into attendance patterns.
Technologies Used: An overview of the technologies, frameworks, and tools employed in
the implementation of the system.
This structure ensures a systematic and organized development and implementation process,
resulting in a robust and effective Face Recognition Attendance System.
Notable breakthroughs, such as the use of neural networks for feature extraction and pre-trained
models for improved recognition accuracy, are discussed. These advancements have significantly
increased system efficiency and robustness, even in challenging conditions such as low light or
occluded faces.
CHAPTER 2 LITERATURE REVIEW
This section focuses on studies that have integrated facial recognition technology into
attendance management systems. The review includes an examination of various
implementations, considering factors such as system architecture, user experience, and
accuracy. Notable case studies and their outcomes are discussed to glean insights into
successful integrations.
The challenges associated with facial recognition in attendance systems are critically
examined. Privacy concerns, accuracy issues, and ethical considerations are explored, offering
Facial recognition technology has undergone significant advancements over the past decades, driven by
developments in:
1. Machine Learning (ML): Early algorithms like Eigenfaces and Fisherfaces laid the groundwork for
facial recognition. With the advent of ML, models such as Support Vector Machines (SVMs) and
Principal Component Analysis (PCA) improved system accuracy.
2. Deep Learning: The introduction of Convolutional Neural Networks (CNNs) revolutionized feature
extraction and recognition, with models like VGGFace, FaceNet, and DeepFace achieving exceptional
accuracy.
3. Real-Time Applications: Recent strides in computer vision and GPU acceleration have made real-
time facial recognition feasible, allowing for integration into daily applications like attendance
systems, surveillance, and user authentication.
Notable breakthroughs, such as the use of neural networks for feature extraction and pre-trained models
for improved recognition accuracy, are discussed. These advancements have significantly increased
system efficiency and robustness, even in challenging conditions such as low light or occluded faces.
Facial recognition technology has seen remarkable progress over the years, thanks to breakthroughs in
machine learning (ML), deep learning (DL), and computer vision. Key developments include:
• Face Detection Models: Algorithms such as Haar cascades, Multi-task Cascaded
Convolutional Networks (MTCNN), and YOLO (You Only Look Once) have significantly improved the
ability to detect faces in various lighting and environmental conditions.
• Feature Extraction Techniques: Early approaches like Eigenfaces and Fisherfaces have
been replaced with more robust deep learning-based techniques, such as those using
Convolutional Neural Networks (CNNs), which analyze facial features at a granular level.
• Recognition Frameworks: Models like FaceNet, VGGFace, and DeepFace leverage
embeddings to represent faces numerically, ensuring high recognition accuracy even in complex
scenarios like occlusions, varying facial expressions, and extreme angles.
These advancements have enabled facial recognition systems to achieve high precision, real-time
performance, and adaptability, making them suitable for diverse applications like attendance
management, security, and identity verification.
Studies have shown the successful application of facial recognition in attendance systems across various
domains, such as education, corporate offices, and industrial sectors. Key aspects of these integrations
include:
• System Architecture: Centralized vs. decentralized systems, with some leveraging
cloud-based platforms for storage and computation, while others operate locally for
enhanced privacy and security.
• User Experience: User-friendly interfaces, real-time updates, and seamless interactions
are critical for widespread adoption.
• Accuracy Metrics: Facial recognition systems have achieved up to 99% accuracy with
modern algorithms, although conditions like poor lighting or crowded spaces can still
pose challenges.
• Case Studies: Successful implementations include:
• An educational institution reducing proxy attendance by 95% through automated facial
recognition.
• Corporate offices increasing time management efficiency by integrating facial recognition
with employee databases.
These studies provide insights into the practical benefits, challenges, and considerations for deploying
such systems
CHAPTER 3 PROPOSED METHODOLOGY
As we concluded in the previous chapter and also learnt from the literature review of the work
done so far in this space, there is limited work done when it comes to the subject and domain of
recordings for meetings and mostly the focus has been to generate highlights for sports recordings
and that so based on the expression depicted from the audio of the recording. With that thought in
mind, here the proposed methodology is multi fold which is detailed in the following section of
this chapter. In here will try to create a complete view of what we are trying to do.
We have tried to use video recordings for meetings where we start by doing a lot of refinement
and pre-processing on the subject video which is intended to remove any noise from the
recording including the sections of video where there was no conversation happening. Also, we
try extracting the audio from the video recording to be used as a specific attribute for our
algorithm in the process further. With all the pre-processing out of the way the next goal of the
proposed method is to try getting the audio file transcript and run speaker diarization over that.
Speaker diarization as a concept is explained in detail in the following sections of this chapter,
but in a summary, it is the concept of figuring out the number of speakers during a conversation
and then we use this information to divide the audio/video file into segments on the basis of the
speakers who spoke in that section. This helps us create a group or collection of smaller
audio/video sections of the original recording basis when one of the different identified
speakers spoke during the meeting length.
Our next step is to then to use the input from the user to understand what highlight context is
he/she looking for in the video. This is needed to make the system more configurable and
personalized where the user can define the context in which they intend to generate the
highlights from the recordings. Once the context is provided by the user we use the Naïve Bayes
Algorithm to define the probability of the sections of video recording which have depicted that
sentiment and also rate them basis the probability at which they have demonstrated the
sentiment we are trying to narrow down to. In this step we are then using this information and
the data from our previous step/component to narrow down even smaller sections in
23
audio/video file segments we created earlier to find the required sentiment from those smaller
groups.
This is then followed by using our highlighter engine to give a highlight score to all the
audio/video segments we have thus generated so far from the previous two components. How
this is done is by suing two different theories, firstly we use the probability assigned to each of
the video segments basis the sentiment that they have depicted and this is our first parameter
to the highlight formula as presented in this proposed methodology. The second aspect of this
is that with our experiments we have observed that in such video recorded meetings the
speakers who have spoken more are given lesser weightage as they are not generating a lot of
value from a highlight perspective. Taking a use case to understand this, in an executive level
meeting if there is a disagreement between parties over a topic the most important section
maybe is when one of the executives used strong words to express that emotion and then would
choose to stay silent, in this case the executive will be given a higher weightage for our score.
We use this then to generate a score for all the video segments we have created so far and the
details of generating this highlight score have been detailed in the following sections of this
chapter. Here we use this highlight score to then order the video segments in decreasing order
of their score to be treated has highlights and given as result to our users.
The invention can be broken down into the following broader areas of work/implantation. We
shall discuss the role of each individual component and finally talk about how the system comes
together to deliver the objective.
This component deals with taking the video file in the supported formats and subjecting it to
the extraction algorithm to take the audio file in known formats that we can use for further
processing. Once the audio file is received, the audio is done with pre-processing to extract
noise which can be in the form of silent sections or background disturbances. After the pre-
processing step we then run feature extraction which helps us train the model and get the
number of speakers and the sections in audio where each of those speakers were talking. We
care calling these sections of audio as speaker conversations and are mapped to each speaker
24
in the form of start and end times in the whole audio script where that speaker was talking in
the meeting.
Background Noise Reduction:
The capacity to improve an audio segment that is noisy by the virtue of removing the main
audio content is considered to be the process of reducing background noise. Background noise
elimination is utilised almost in all areas, including in video conferencing systems, software
used for editing videos and audio files, and headphones with noise cancellation features.
Reducing background noise is still an area that is fast growing and evolving when spoken about
in technology space, and artificial intelligence has opened up a whole new range of methods
for doing it better.
Recurrent neural networks are models with sequential data recognition and comprehension
capabilities. In order to understand what is sequential data we can consider examples of the
location of an object across time or say music and similarly text.
RNNs are especially good at eliminating background noise because they can recognise patterns
over long periods of time, which is necessary for interpreting audio.
A feed-forward neural network with an input layer, a hidden layer, and an output layer as its
three primary layers. As the model goes through each item in a sequence, with recurrent neural
networks there is also an availability of a feedback loop that can be considered as hidden state
that is abstracted from the hidden layer which keeps updating itself.
Audio sample may be divided into a series of equally spaced time segments. The hidden state
is updated throughout each iteration, keeping track of the prior steps each time, with the submission
of individual sequence sample into the recurrent neural network. After each cycle, the output is
routed via a feed-forward neural network to create a new audio stream that is completely free of
background noise.
Speaker Diarization:
25
human beings, the task gets complex and the most relevant and needful solution available is
speaker diarization.
When in an audio conversation the file or the recording can be broken into segments or sections
in which we can uniquely identify who is the speaker that the conversation is happening
between, makes it much simpler for human understanding and as well as in the field of Artificial
intelligence. This enabled both to comprehend the context and flow of the dialogue in subject.
Speaker diarization can be achieved by doing the following steps which is a two-way process:
Finding Speakers:
Speaker segmentation is another word for the same. In this step mostly analyses the
characteristics and zero-crossing rates of each voice to determine who is speaking and when.
Gender of each speaker is identified by features such as pitch.
Clustering Speakers:
Once a speaker is recognized, it is divided into separate segments, for the whole conversation
to be correctly marked or tagged and also understood easily, all the non-speech sections are
skipped. In order to do this the approach of probabilistic analysis is used in order to identify the
number of people who were contributing to the dialogue at a particular point of time.
Algorithm:
Step 1: Take input as the video file and extract audio using MoviePy library.
Step 2: Take the audio like and subject to pre-processing to remove noise.
Step3: Now the file is subjected to the voice activity detector that will remove the
speech sections from the non-speech sections, thus trimming silences from the audio
recording/file.
Step4: Now we break the audio into segments that are of varied length, these are
created basis the statements made in the conversation. Let’s call them audio segments
of
36
CHAPTER -4
37
o Optional: Send email or SMS notifications for specific scenarios, such as a student’s
absence.
4.2Non-Functional Requirements
Performance:
o Ensure recognition occurs within 1-2 seconds per face.
o Maintain smooth operation for up to 100 concurrent recognitions.
Scalability:
o Handle increasing numbers of students and attendance sessions without degradation
in performance.
Usability:
o Provide an intuitive graphical user interface (GUI) for administrators.
o Simplify workflows for report generation and student management.
Security:
o Use encryption for storing biometric data and sensitive user details.
o Implement secure login mechanisms for administrators.
Reliability:
o Achieve a minimum recognition accuracy of 95%.
o Ensure consistent performance under diverse environmental conditions.
Compliance:
o Adhere to data privacy laws such as GDPR for biometric data handling.
39
CHAPTER 5: SYSTEM DESIGN
5.1 Overview
The system design for the "Face Recognition Based Attendance System" outlines the architectural
layout and interaction among the key components to achieve a seamless and efficient attendance
management solution. The system integrates facial recognition technology with a robust database
to automate attendance processes in educational institutions.
40
5.4 Data Flow Diagram
Level 0 DFD:
1. Student data and facial images are captured via the camera module.
2. The Face Recognition Module processes the image to identify the student.
3. Attendance is marked in the system database.
4. Reports are generated and made available through the Administrator Interface.
41
5.5 Class Diagram
Student:
o Attributes: studentID, name, photo, class.
o Methods: registerStudent(), updateProfile().
Attendance:
o Attributes: attendanceID, date, status.
o Methods: markAttendance(), viewAttendance().
Database:
o Attributes: connectionDetails.
o Methods: executeQuery(), connect().
FaceRecognitionModule:
o Attributes: modelPath.
o Methods: trainModel(), recognizeFace().
42
43
5.6 Sequence Diagram
3. Facial recognition is performed, and the result is matched with stored data.
44
Students ↔ AttendanceLogs (1:N relationship).
45
CHAPTER:6 SNAPSHOTS
USER INTERFACE
46
NEW REGISTRATION PAGE
47
DETAILS FILLING
48
ATTENDENCE CHECK
49
ATTENDENCE SHEET
50
CHAPTER 7 EXPERIMENTAL RESULTS
Alternative sensible way of handling this could be to adjust the interesting video fragments by
the ones within the unique livestream recording, e.g. by recognizing those outlines in the
livestream content that are too within the highlight video that is in context here. In any case,
whereas smaller amount of expending as compared to human comment, it again features a
critical information collection punishment similar to citing traditions for the highlights and
livestream recordings together are unpredictable, needing human pre-processing. Moreover, the
outline coordinating prepare is costly from a computation standpoint. In this kind of approach
that is need of an equally desired highlight video for every livestream recorded video and a
livestream recorded video for every highlight video within the dataset. Such constraints and
difficulties make robotized recorded content highlights coordinating illogical for datasets that
are large-scale.
51
Instep, the approach which is of unlabelled content that is positive in nature, suggested by
(Xiong, Kalantidis, Ghadiyaram , & Grauman, 2019) prescribes collection of two datasets, one
that contains blended names, from livestream recordings and other of the positive names, from
suggested highlight recordings, although they have no relation among the various datasets used.
Following same approach, we thus collected many meeting video recordings that were self-
curated and also collected from various meeting recording sources. The dataset was then
divided into groups of training data set and test data sets.
Figure 4.1 Screenshot from a recording on which the experiment was conducted
52
7.2 Result Analysis
53
Above results demonstrate the F1 scores for our algorithm that generates the highlights. It is
descriptive of the value where the highlight generated was effectively denoting the required
sentiment in the video and then the segment that was extracted was also demonstrating the
required expression.
We ran the algorithm on various set of video recordings that included recordings ranging from
those of meetings that had a lot of conflicts and arguments to video recordings that did not have
any specific sentiment highlighted. With this we were able to find out that our algorithm does
not perform well for videos that did not have any specific highlighted sentiment.
Also, with the algorithm being run on technical training recordings it was observed that the
algorithm generated too many segments of the video that were considered to be highlights and
thus was not very effective to create a summary. We shall choose to take this up as a future
improvement to our work.
We then also ran the experimentation with different sets of training/validation and test data and
observed our scores with each of these combinations to end up with the following scores. The
list of figures below shows the results observed on the algorithm with various combinations of
the dataset subsets.
54
Figure 4.3 Graph demonstrating results with accuracy of 0.85
55
Figure 4.5 Results with executed with 40/20/40 dataset combination
56
CHAPTER 8 CONCLUSION & FUTURE SCOPE
8.1 Conclusion
The Face Recognition Attendance System has successfully demonstrated the potential of
integrating advanced facial recognition technology into attendance management processes. By
addressing inefficiencies in traditional methods and leveraging cutting-edge innovations, the
project establishes a framework for automating attendance in educational institutions and
organizational settings. Key accomplishments of the system include:
The system eliminates the need for manual attendance tracking, a process that is often tedious,
error-prone, and time-consuming. By automating attendance through facial recognition, the
system provides an efficient alternative, significantly reducing the workload on administrators
while ensuring timely and accurate record-keeping.
The use of advanced machine learning algorithms ensures that the system achieves high levels
of accuracy, even in challenging conditions such as low lighting, varying facial expressions,
and slight occlusions. This reliability enhances user trust and eliminates common issues like
attendance fraud or duplicate entries.
User-Centric Design
The system’s interface is designed with simplicity and ease of use in mind, making it
accessible to a wide range of users, including non-technical staff. The modular approach
ensures that administrators can easily manage the system, while students and employees find it
intuitive to use.
57
Scalability and Adaptability
Built with scalability in mind, the system can handle large databases of facial data, making it
suitable for institutions and organizations of varying sizes. Moreover, the system’s modular
and customizable design allows it to adapt to different environments and specific user
requirements.
This project highlights the growing importance of biometric systems in modern applications,
showcasing how facial recognition can redefine traditional processes. By ensuring accuracy,
security, and efficiency, the system positions itself as a vital tool in educational and corporate
ecosystems.
Impact on Stakeholders
For educational institutions, the system ensures fairness and transparency in attendance
tracking, reducing potential conflicts or errors. For organizations, it optimizes resource
allocation, improves time management, and enhances employee monitoring. These benefits
lead to greater operational efficiency and satisfaction among stakeholders.
In summary, the Face Recognition Attendance System exemplifies the transformative power
of technology when applied to practical challenges. While the project has achieved its primary
goals, it also opens the door for future advancements that can make attendance management
even more efficient, secure, and widely applicable across industries. The system is a step
forward in the adoption of biometric technology in daily life, setting a benchmark for future
projects in this domain.
Final Thoughts
The Face Recognition Attendance System is more than a tool for attendance—it is a step
toward modernizing institutional operations, fostering innovation, and embracing the potential
of biometric technology. It reflects the project team’s commitment to addressing critical
challenges and contributing meaningfully to the adoption of advanced technologies in
everyday life. As the system evolves with future enhancements, it is poised to become a
cornerstone for efficient and secure attendance management worldwide.
58
8.2 Future Scope
While the project achieves its primary goals, there are several avenues for improvement and
expansion:
1. Integration with Other Biometric Systems:
• Combining facial recognition with fingerprint or voice recognition can enhance system
security and reliability.
• Multimodal biometric systems can address edge cases where facial recognition alone might
fail.
2. Advanced AI Features:
• Predictive Analysis: Implementing AI to predict attendance trends and generate insights for
better resource planning.
• Emotion Detection: Adding emotion recognition to analyze engagement levels during
sessions.
3. Privacy and Security Enhancements:
• Data Encryption: Using advanced encryption methods to secure biometric data.
• Compliance with Privacy Laws: Ensuring adherence to global regulations like GDPR or
local data protection acts.
4. Cloud and Edge Integration:
• Cloud-based storage and processing for scalable and centralized solutions.
• Edge computing to enable faster, real-time processing for on-premise deployments.
5. Mobile and IoT Integration:
• Developing mobile apps for administrators and users to view attendance data, make updates,
or take attendance remotely.
• Integration with IoT devices, such as smart cameras or attendance kiosks, for real-time
monitoring.
6. Wider Applications:
• Expanding beyond educational institutions to corporate offices, healthcare facilities, and
government sectors.
• Adapting the system for public use cases like visitor management in public events or
restricted areas.
7. Support for Diverse Environments:
• Enhancing the system’s ability to operate in low-light or high-traffic scenarios.
• Adapting the model to recognize faces with diverse appearances and occlusions, such as
masks or accessories.
59
By incorporating these enhancements, the system can evolve into a comprehensive and
versatile solution for attendance and beyond, catering to a broader range of industries and use
cases.
REFERENCES
1. Agyeman, R., Muhammad, R., & Choi, G. S. (2019). Soccer Video Summarization using
Deep Learning. IEEE, 270-273.
2. Bertini, M., Bimbo, A. D., & Nunriati, W. (2004). Common Visual Cues for Sports
Highlights Detection. IEEE, 1399-1402.
3. Chakraborty, P. R., Tjondronegoro, D., Zhang, L., & Chandran, V. (n.d.). Using Viewer’s
Facial Expression and Heart Rate for Sports Video Highlights Detection. 371378.
4. Chakraborty, R. P., Tjondronegoro, D., Zhang, L., & Chandran, V. (2016). Automatic
Identification of Sports Video Highlights using Viewer Interest Features. 55-62.
5. CHING, W.-S., TOH, P.-S., & ER, M.-H. (n.d.). A New Specular Highlights Detection
Algorithm Using Multiple Views. 474-478.
7. Gao, X., Liu, X., Yang, T., Deng, G., Peng, H., Zhang, Q., . . . Liu, J. (2020). Automatic
Key Moment Extraction And Highlights Generation Based On Comprehensive Soccer
Video Understanding. IEEE, 1-6.
8. Gygli, M., Grabner, H., & Gool, L. V. (2015). Video Summarization by Learning
Submodular Mixtures of Objectives. IEEE, 3090-3098.
60
10. Hanjalic, A. (2005). Adaptive Extraction of Highlights From a Sport Adaptive Extraction
of Highlights From a Sport. IEEE, 1114-1122.
11. Hanjalic, A. (2005). Adaptive Extraction of Highlights From a Sport Video Based on
Excitement Modeling. IEEE, 1114-1122.
12. Hsieh, J.-T. T., Li, C. E., Liu, W., & Zeng, K.-H. (n.d.). Spotlight: A Smart Video
Highlight Generator. stanford.edu, 1-7.
13. Hu, L., He, W., Zhang, L., Xiong, H., & Chen, E. (2021). Detecting Highlighted Video
Clips Through Emotion-Enhanced Audio-Visual Cues. IEEE.
14. Jiang, K., Chen, X., & Zhao, Q. (2011). Automatic composing soccer video highlights
with core-around event model. IEEE, 183-190.
15. Jiang, R., Qu, C., Wang, J., Wang, C., & Zheng, Y. (2020). Towards Extracting Highlights
From Recorded Live Videos: An Implicit Crowdsourcing Approach. IEEE, 1810-1813.
16. Kostoulas, T., Chanel, G., Muszynski, M., Lombardo, P., & Pun, T. (2015). Identifying
aesthetic highlights in movies from clustering of physiological and behavioral signals.
IEEE.
17. Kudi, S., & Namboodiri, A. M. (2017). Words speak for Actions: Using Text to find Video
Highlights. Asian Conference on Pattern Recognition.
18. Li, Q., Chen, J., Xie, Q., & Han, X. (2020). Detecting boundaries of absolute highlights
for sports videos.
19. Liu, C., Huang, Q., Jiang, S., & Zhang, W. (2006). Extracting Story Units In Sports Video
Based On Unsupervised Video Scene Clustering. IEEE, 1605-1608.
20. Longfei, Z., Yuanda, C., Gangyi, D., & Yong, W. (2008). A Computable Visual Attention
Model for Video Skimming. IEEE, 667-672.
21. Ma, Y.-F., & Zhang, H. J. (2005). Video Snapshot: A Bird View of Video Sequence.
IEEE.
61
22. Marlow, S., Sadlier, D. A., O’Connor, N., & Murphy, N. (2002). Audio Processing for
Automatic TV Sports Program Highlights Detection. ISSC.
23. Merler, M., Joshi, D., Nguyen, Q.-B., Hammer, S., Kent, J., Smith, J. R., & Feris, R. S.
(2017). Automatic Curation of Golf Highlights using Multimodal Excitement Features.
IEEE, 57-65.
24. Merler, M., Mac, K.-H. C., Joshi, D., Nyugen, Q.-B., Hammer, S., Kent, J., . . . Feris, R.
S. (2018). Automatic Curation of Sports Highlights using Multimodal Excitement
Features. Ieee Transactions On Multimedia, 1-16.
62
25. Ngo, C.-W., ma, Y.-F., & ZhANG, H.-J. (2005). Video Summarization and Scene
Detection by Graph Modeling. IEEE, 296-305.
26. Pun, H., Beek, p. v., & Sezan, M. I. (2001). Detection Of Slow-Motion Replay
Segments In Sports Video For Highlights Generation. IEEE, 1649-1652.
27. Ringer, C., Nicolaou, M. A., & Walker, J. A. (2022). Autohighlight: Highlight detection
in League of Legends esports broadcasts via crowd-sourced data. Machine Learning
with Applications, 1-15.
28. Shih, H.-C., & Huang, C.-L. (2004). Detection Of The Highlights In Baseball Video
Program. IEEE, 595-598.
29. Tang, h., Kwatra, V., Sargin, M. E., & Gargi, U. (n.d.). Detecting Highlights In Sports
Videos: Cricket As A Test Case.
30. Tang, H., Kwatra, V., Sargin, M., & Gargi, U. (2011). Detecting Highlights In Sports
Videos: Cricket As A Test Case. IEEE.
31. Tang, K., Bao, Y., Zhao, Z., Zhu, L., Lin, Y., & Peng, Y. (2018). AutoHighlight :
Automatic Highlights Detection and Segmentation in Soccer Matches. IEEE, 4619-
4624.
32. Tao, S., Luo, J., Shang, J., & Wang, M. (2020). Extracting Highlights from a
Badminton Video Combine Transfer Learning with Players’ Velocity. International
Conference on Computer Animation and Social Agents, 82-91.
33. Tjondronegoro, D. W., Chen, Y.-P. P., & Pham, B. (2004). Classification of Self-
Consumable Highlights for Soccer Video Summaries. IEEE, 579-582.
34. WAN, K., YAn, X., & Xu, C. (2005). Automatic Mobile Sports Highlights. IEEE.
35. Wang, H., Yu, H., Chen, P., Hua, R., Yan, C., & Zuo, L. (2018). Unsupervised Video
Highlight Extraction via Query-related Deep Transfer. 24th International Conference
on Pattern Recognition, 2971-2976.
36. Wu, P. (2004). A Semi-automatic Approach to Detect Highlights for Home Video
Annotation. IEEE, 957-960.
37. Wung, P., Cui, R., & Yang, S.-Q. (2004). Contextual Browsing For Highlights In Sports
Video. IEEE, 1951-1954.
38. Xiao, B., Yin, X., & Kang, S.-C. (2021). Vision-based method of automatically
detecting construction video highlights by integrating machine tracking and CNN
feature extraction. Automation in Construction, 1-13.
39. Xiong, B., Kalantidis, Y., Ghadiyaram , D., & Grauman, K. (2019). Less Is More:
Learning Highlight Detection From Video Duration. Less Is More: Learning Highlight
Detection From Video Duration, 1258-1267.
40. Xiong, Z., Radhakrishnan, R., Divakaran, A., & Huang, T. S. (2005). Highlights
Extraction From Sports Video Based On An Audio-Visual Marker Detection
Framework. IEEE.
41. Xiong, Z., Radhakriuhnan, R., Divakaran, A., & Huang, T. S. (2004). Effective And
Efficient Sports Highlights Extraction Using The Minimum Description Length
Criterion In Selecting Gmm Structures. IEEE, 1947-1950.
42. Yang, H., Wang, B., Lin, S., Wipf, D., Gio, M., & Guo, B. (2015). Unsupervised
Extraction of Video Highlights Via Robust Recurrent Auto-encoders. IEEE
International Conference on Computer Vision, 4633-4641.
43. Yao, T., Mei, T., & Rui, Y. (n.d.). Highlight Detection with Pairwise Deep Ranking for
First-Person Video Summarization. IEEE, 982-990.