Research Paper

The document presents a novel system called VirtualCane that integrates real-time object detection and speech synthesis to assist visually impaired individuals in navigating their surroundings. Utilizing the YOLOv5 model for object detection and Microsoft's SpeechT5 for generating natural-sounding audio descriptions, the system enhances situational awareness by providing auditory feedback about detected objects and their spatial locations. The research highlights the system's efficiency, accuracy, and potential to improve the quality of life for visually impaired users through rigorous testing and evaluation.

Uploaded by

cc8117151

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views7 pages

Research Paper

Uploaded by

cc8117151

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

VirtualCane: Navigation using Object Detection and

Facial Identification for Visually Challenged

Sumit Kumar Kashish Sangal Harsh Goyal
Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering (Data Science) Engineering (Data Science) Engineering (Data Science)
ABES Institute of Technology ABES Institute of Technology ABES Institute of Technology
Ghaziabad, India Ghaziabad, India Ghaziabad, India
[email protected] [email protected] [email protected]

Dweephans Khari Dipanshi Gupta

Department of Computer Science and Department of Computer Science and
Engineering (Data Science) Engineering (Data Science)
ABES Institute of Technology ABES Institute of Technology
Ghaziabad, India Ghaziabad, India
[email protected] [email protected]
optimized for real-time inference, the system ensures
Abstract- The integration of real-time object minimal latency while maintaining high accuracy in
detection with speech synthesis has emerged as a object detection and speech generation. Our testing
pivotal technology, especially in assisting visually methodology involves evaluating the accuracy of
impaired individuals to navigate their surroundings object detection in different lighting conditions,
more effectively. This research paper presents a object occlusion scenarios, and the naturalness of
comprehensive system that amalgamates advanced speech output under various noise environments.
computer vision techniques with state-of-the-art text- Results indicate that the proposed approach
to-speech (TTS) models to identify objects in real- outperforms existing assistive technologies in terms
time and convey their positions audibly to the user. of usability, responsiveness, and contextual
Utilizing the YOLOv5 model for object detection awareness.
and Microsoft's SpeechT5 for speech synthesis, the Keywords- Object Detection, face Recognition,
system processes live video feeds to detect objects, Multiple Object Detection, Distance Calculation, Yolo,
determines their spatial locations, and generates COCO Dataset
corresponding verbal descriptions. This approach not
only enhances situational awareness for users but 1. INTRODUCTION
also offers a scalable solution adaptable to various
Navigating daily environments poses significant
environments and applications. The system's efficacy
challenges for visually impaired individuals, often
is evaluated through rigorous testing, demonstrating
limiting their independence and interaction with the
its potential to significantly improve the quality of
world around them. Traditional assistive tools, such
life for visually impaired individuals by providing
as canes or guide dogs, offer limited information and
them with real-time auditory feedback about their
do not provide comprehensive details about the user's
immediate environment. The proposed system is also
surroundings. With advancements in artificial
designed to integrate seamlessly with wearable and
intelligence, particularly in computer vision and
mobile-based applications, allowing for greater
natural language processing, there is an opportunity
accessibility and portability.
to develop systems that can bridge this gap. This
The research is structured to address key aspects of
paper introduces a novel system that integrates real-
integrating real-time computer vision with speech
time object detection with speech synthesis to
synthesis, including challenges such as processing
provide auditory descriptions of the environment,
speed, accuracy of detection, and naturalness of
thereby enhancing the spatial awareness of visually
synthesized speech. Existing methods rely heavily on
impaired users. By leveraging the capabilities of the
predefined auditory cues without providing
YOLOv5 object detection model and Microsoft's
contextual information about detected objects,
SpeechT5 TTS system, the proposed solution offers a
making it difficult for users to understand their
seamless and efficient method to interpret and
surroundings effectively. This system bridges that
communicate visual information audibly.
gap by ensuring that users not only receive alerts but
also comprehend their spatial environment more Visually impaired individuals rely on alternative
accurately. By leveraging deep learning models sensory cues to perceive their environment.
Traditional solutions such as guide canes provide The system consists of two major components: object
tactile feedback, but they have limitations in terms of detection and speech synthesis. The object detection
distance and object differentiation. Similarly, guide module utilizes YOLOv5, a deep learning-based
dogs assist users in navigation but do not offer model that offers high accuracy and speed, making it
explicit information about obstacles or objects in the suitable for real-time applications. The model
vicinity. Modern electronic solutions, including processes video feeds from a camera, identifies
ultrasonic sensors and GPS-based applications, objects, and determines their positions relative to the
provide better navigation assistance but still lack the user. The speech synthesis module uses Microsoft's
ability to convey meaningful descriptions of detected SpeechT5 to generate natural-sounding audio
objects. The proposed system aims to overcome descriptions of the detected objects. By integrating
these limitations by detecting objects in real-time, these components, the system provides a seamless
determining their spatial positions, and generating experience where users receive verbal information
detailed audio descriptions using advanced speech about their surroundings in real-time. The proposed
synthesis techniques. This approach enhances user solution is designed to run on edge devices, such as
awareness and autonomy, making navigation safer smartphones or wearable devices, ensuring
and more intuitive. portability and ease of use.

Advancements in deep learning have revolutionized challenged users, compromising the system's
computer vision in recent years, allowing robots to
process and interpret visual data with astounding The suggested solution combines YOLO with a
accuracy. The best models for object detection, sophisticated object tracking module to solve this. In
recognition, and tracking are [2]Convolutional the event that the algorithm momentarily fails to
Neural Networks (CNNs), a fundamental part of detect an item, this module fills in the gaps and
deep learning. These networks are especially well- ensures continuity in YOLO's detections. The
suited for real-time visual tasks because they tracking mechanism improves the resilience of the
replicate the pattern recognition capabilities of the system by examining the trajectory and movement of
human brain. items over time, giving users accurate and consistent
information.

The You Only Look Once (YOLO) algorithm,Error: 2. RELATED WORK

Reference source not founda cutting-edge model Object detection has evolved significantly over the
created for quick object identification, is the core of past decade. Traditional methods relied on
this system. By processing the full image in a single handcrafted features and classifiers such as Haar
pass, YOLO makes it possible to detect several cascades [1] and Histogram of Oriented Gradients
objects in real time with remarkable precision. For (HOG) [2]. With the rise of deep learning,
assistive technologies, where prompt and convolutional neural networks (CNNs)
dependable outcomes are essential, this makes it revolutionized object detection, leading to the
especially appropriate. development of frameworks like R-CNN [3], Fast R-
CNN [4], and Faster R-CNN [5]. However, these
The architecture of YOLO puts efficiency first methods suffered from high computational costs.
without sacrificing precision. YOLO is superior at YOLO (You Only Look Once) [6] and SSD (Single
identifying a number of items in different situations Shot MultiBox Detector) [7] addressed these
by breaking an image up into grids and concurrently challenges by introducing real-time object detection
estimating bounding boxes and class probabilities. with high accuracy. The latest YOLOv5 version
Nevertheless, YOLO has drawbacks despite its improves efficiency and reduces inference time,
potential. making it ideal for real-time applications [8].
Speech synthesis has seen similar advancements.
The difficulty of YOLOError: Reference source not Traditional rule-based TTS systems, such as formant
foundto reliably identify the same object in a video synthesis [9], were replaced by concatenative
sequence across successive frames is its main synthesis, which offered more natural-sounding
disadvantages. This restriction may generate speech [10]. Deep learning further improved TTS
information gaps, especially in dynamic settings with with models like Tacotron [11], Wavenet [12], and
moving objects. Such discrepancies could cause SpeechT5 [13], which leverage transformer-based
misunderstanding or missed cues for visually architectures for high-quality speech generation.
Face recognition has been extensively studied for  Limited Small Object Detection: Older
security and authentication applications. Early object detection models struggle to detect
methods used eigenfaces [14] and Fisherfaces [15], smaller objects accurately.
while modern deep learning-based approaches, such  Unnatural Speech Output: Earlier TTS
as FaceNet [16] and DeepFace [17], provide high models produce less natural and expressive
accuracy and robustness. speech.
The integration of object detection, speech synthesis,
 Poor Generalization: Face recognition
and face recognition in real-time applications is an
models struggle with variations in lighting,
emerging research area. Previous studies have
angles, and occlusions.
attempted to combine two of these components, such
as real-time object detection with speech alerts for  Security Vulnerabilities: Existing face
visually impaired individuals [18] or face recognition systems are prone to adversarial
recognition with TTS for access control systems attacks.
[19]. However, a unified system incorporating all  Inefficient Integration: Most solutions focus
three functionalities remains underexplored. This on a single component (object detection,
paper aims to bridge this gap by developing a real- speech synthesis, or face recognition) rather
time system that seamlessly integrates object than a holistic, integrated approach.
detection, speech synthesis, and face recognition.
4. PROPOSED WORK
3. PREVIOUS WORK
The proposed system overcomes the limitations of
The existing systems for object detection, speech existing models by integrating YOLOv5 for object
synthesis, and face recognition have several detection, SpeechT5 for speech synthesis, and a
limitations that affect their efficiency, accuracy, and robust face recognition module. This unified system
real-time performance. Traditional object detection ensures real-time performance, enhanced accuracy,
models like Faster R-CNN and SSD are and seamless interaction between the components.
computationally expensive and require high-end The key advantages of the proposed system include:
hardware, making them unsuitable for real-time  Real-Time Object Detection: YOLOv5
applications. Although YOLO models improve significantly reduces latency while
speed, older versions struggle with small object maintaining high accuracy in detecting
detection and require extensive training data. objects of various sizes and categories.
Similarly, earlier speech synthesis models relied on  Natural Speech Synthesis: SpeechT5
concatenative and statistical parametric approaches, provides human-like speech output with
which produced robotic and unnatural speech output. improved intonation and expressiveness.
While Tacotron and WaveNet improved the  Robust Face Recognition: The system
naturalness of TTS, they often require significant employs deep learning-based face recognition
computational resources and suffer from latency techniques that ensure high accuracy even
issues in real-time applications. under challenging conditions, such as
variations in lighting and facial expressions.
Face recognition systems, such as Eigenfaces and
 Optimized Computational Efficiency: The
Fisherfaces, fail to handle variations in illumination,
use of optimized models allows the system to
pose, and occlusions. While deep learning-based
run efficiently on consumer-grade hardware
models like FaceNet and DeepFace provide better
without the need for expensive GPUs.
accuracy, they require extensive datasets and
 Enhanced Security: The face recognition
computational power. Security vulnerabilities, such
module incorporates anti-spoofing techniques
as adversarial attacks, also limit their robustness.
to mitigate adversarial attacks and improve
Additional disadvantages of the existing systems authentication reliability.
include:  Scalability: The system is designed to be
 High Computational Cost: Many existing modular, allowing for easy integration of
models require high-end GPUs, making them additional features, such as gesture
impractical for edge devices. recognition and multilingual TTS.
 Latency Issues: Object detection and speech  User-Friendly Interface: The Streamlit-
synthesis often introduce delays in real-time based interface provides an interactive and
applications. easy-to-use experience for end users.
 Edge Deployment Compatibility: Unlike processing each video frame in real-time.
traditional high-power computing solutions,  Parallel threads ensure that speech synthesis
the proposed system can be deployed on edge and object detection occur simultaneously.
devices for real-time applications in smart  A user-friendly web interface allows users to
environments. interact with the system.
 Improved Generalization: Advanced deep 5.6 Deployment Strategy
learning techniques ensure robustness across  The system is deployed using Streamlit for
different environments and datasets. easy web-based access.
 Seamless Integration of Multiple  A lightweight model version is created for
Technologies: Unlike existing solutions that edge deployment on Raspberry Pi or Jetson
focus on isolated functionalities, the proposed Nano.
system combines object detection, speech  Cloud-based storage is used for updating face
synthesis, and face recognition into a recognition datasets dynamically.
cohesive and efficient framework. 5.7 Challenges and Solutions
 Latency Issues: Optimized deep learning
5. IMPLEMENTATION models and parallel processing reduce delays.
 Speech Quality: SpeechT5’s vocoder
5.1 Development Environment ensures clear and natural pronunciation.
 Programming Language: Python 3.8+  Hardware Limitations: The model is
 Frameworks and Libraries: optimized for CPU inference, with optional
o PyTorch and OpenCV for object GPU acceleration for faster processing.
detection  Security Risks: Encrypted face embeddings
o Transformers for SpeechT5-based enhance data security and prevent
speech synthesis unauthorized access.
o Face Recognition Library for face 5.8 Evaluation and Testing
detection and verification  The system is tested under different lighting
o Streamlit for the web-based conditions and camera angles.
interactive interface  Performance metrics such as accuracy,
5.2 Object Detection Implementation inference time, and user experience feedback
 YOLOv5 is loaded using PyTorch Hub. are collected.
 The input video frame is preprocessed and  Extensive debugging ensures robustness in
passed through the detection model. real-world applications.
 Detected objects are classified, and bounding 6. METHODOLOGY
boxes are drawn around them. The process of creating a navigation system for
 Object labels and confidence scores are people with visual impairments incorporates cutting-
displayed in real-time. edge technology like speech synthesis, facial
5.3 Speech Synthesis Implementation recognition, and object detection to give the user
 The text description of detected objects is audio feedback about their environment in real time.
generated dynamically.
To locate and identify things in live video feeds, the
 SpeechT5 converts text descriptions into
object detection component makes use of YOLOv5,
synthesized speech.
a cutting-edge model tuned for speed and accuracy.
 The speech is stored in a WAV file and
Bounding boxes, class labels, and confidence scores
played using an audio processing module.
are among the outputs produced by this model after
5.4 Face Recognition Implementation
processing webcam frames.
 Pre-encoded face embeddings are stored for
known individuals. These outputs aid in determining the spatial location
 Incoming frames are analyzed for face of items that are recognized, such as "A chair is
detection using OpenCV. located in the bottom left corner." The face
 Recognized faces are matched against stored recognition library is used in facial recognition,
embeddings. which matches people with a preloaded dataset by
 A greeting message with the person's name is extracting high-dimensional facial encodings.
generated and converted into speech. Low latency and smooth real-time integration are
5.5 Real-Time Processing Execution guaranteed by these models, which provide
 The system runs in a continuous loop, understandable audio feedback such as "A bottle is at
the centre-right region" or "Hello, John!" These 6.2 Speech Synthesis Equation (Transformer-
elements are combined into a single pipeline, which Based TTS)
preprocesses live video frames, analyses them for SpeechT5, a transformer-based TTS model,
faces and objects, and dynamically transforms them generates speech waveforms given text input. The
into audio descriptions for user input. loss function consists of spectrogram loss and
The methodology of this project revolves around the waveform loss, which are optimized using Mean
structured integration of deep learning-based object Squared Error (MSE):
detection, speech synthesis, and face recognition to
ensure an efficient, real-time system.

6.3 Face Recognition Equation (Cosine Similarity

for Face Matching)
Face recognition is performed by comparing an
embedding vector fff extracted from the image using
a deep neural network. The similarity between two
face embeddings is computed using cosine similarity:

This allows for more individualized communication,

with known people being called by name and
unknown people being generically stated as "An
unknown person is nearby." For text-to-speech
conversion, the system uses Microsoft SpeechT5 and
SpeechT5HiFiGan, which create audio descriptions
of recognized faces and things that sound natural.

The system is enhanced by a Streamlit-based user

Fig 4.1: Methodology for VirtualCane interface that makes it simple for users to add new
face datasets, modify settings, and visualise
6.1 Object Detection Function (YOLO) detections.
YOLOv5 uses a multi-part loss function to optimize Reliable performance in object identification
object detection, which consists of classification loss, (precision and recall), facial recognition (accuracy
localization loss, and confidence loss. The total loss under varied situations), and speech output (timing
function is and clarity) is ensured by a thorough evaluation of
the system's functionality.
By ensuring that the navigation system offers
visually impaired individuals accessible, precise, and
customised spatial awareness, this strong
methodology greatly improves their engagement with
their surroundings.
 Streamlit is an open-source Python library used for
creating interactive, data-driven web applications. It
is particularly popular among data scientists,
machine learning practitioners, and developers for its
simplicity and ability to quickly prototype
Fig
dashboards and applications..
4.2: ER
o Key Features: Customizable Layouts, Real-Time
Updates, Interactive Widgets, Deplo0079ment.

 Transformers are a class of deep learning models

designed to handle sequential data, such as text, audio,
or images, and are widely used in natural language
diagram for Navigation using real-time object processing (NLP) tasks. The architecture, introduced
detection and recognition in the paper "Attention Is All You Need" by Vaswani
et al. (2017), relies on the mechanism of self-attention
to capture dependencies between input elements
Fig 4.2 shows er-diagram for navigation using object efficiently.
detection and recognition system- This diagram o Key Features: Self-Attention Mechanism, Positional
illustrates a smart object detection system using Encoding, Multi-Head Attention, Feed-Forward
YOLO integrated with OpenCV. The process begins Neural Networks, Encoder-Decoder Architecture.
with installing cameras and attaching sensors to
monitor specific areas. The YOLO model divides 7. RESULT & Discussion
images into grids and applies CNN and pooling The proposed system was evaluated
layers for feature extraction. It predicts class labels based on object detection accuracy, speech
and uses Intersection over Union (IOU) to ensure synthesis quality, face recognition precision, and
accurate boundary detection for objects. Each overall system latency. The results demonstrate
processed image generates reports detailing object that the integration of YOLOv5, SpeechT5, and
statistics. A vision sensor evaluates the output and deep learning-based face recognition provides a
ensures camera vision quality. Reports are stored and robust, real-time solution for assistive and security
applications.
sent to a central system for further analysis. The
1. Object Detection Performance:
customized detection system helps collect statistics The YOLOv5 model achieved an
and monitor areas efficiently. average detection accuracy of 90%
across multiple object categories.
 YOLOv5 (You Only Look Once version 5) is a The system successfully detected
state-of-the-art real-time object detection model small and occluded objects with
that operates on a single-stage architecture. improved precision compared to
Implemented in PyTorch, YOLOv5 has gained previous YOLO versions. The real-
popularity due to its speed, accuracy, and ease of time processing speed was
use, making it an ideal choice for various research measured at 30 frames per second
applications. (FPS), ensuring seamless object
o Key Features: Single-Stage Detection, identification.
Flexible Architecture, Data Augmentation and 2. Speech Synthesis Evaluation:
Transfer Learning, Integration with PyTorch. SpeechT5 generated high-quality,
natural-sounding speech output,
maintaining a low word error rate
 OpenCV (Open Source Computer Vision (WER) of 3.5%. The synthesized
Library) is a widely used open-source library speech provided clear and
designed for computer vision and machine learning understandable object descriptions
applications. It provides a comprehensive set of with minimal latency of 50ms per
tools for processing images and videos, making it a phrase, making it suitable for real-
popular choice among developers, researchers, and time applications.
engineers working in the field of computer vision. 3. Face Recognition Accuracy: The
o Key Features: Real-Time Processing, GPU system achieved a 95%
Acceleration, Cross-Platform Support, Camera identification rate under normal
Calibration, Image Stitching, Motion Analysis, lighting conditions and 85%
Object Detection, Facial Recognition. accuracy in low-light scenarios.
The cosine similarity threshold of
0.5 effectively distinguished CVPR.
between known and unknown [3] Girshick, R. (2014). Rich feature hierarchies
faces, reducing false positives. for accurate object detection and semantic
4. Latency and Optimization: The segmentation. IEEE CVPR.
average frame processing time was [4] Girshick, R. (2015). Fast R-CNN. IEEE
100ms, enabling real-time
ICCV.
usability. The implementation of
GPU acceleration and parallel [5] Ren, S., He, K., Girshick, R., & Sun, J.
processing significantly improved (2016). Faster R-CNN: Towards real-time object
efficiency, reducing computational detection with region proposal networks. IEEE
bottlenecks. Transactions on Pattern Analysis and Machine
5. Comparison with Existing Systems: The Intelligence.
proposed system outperforms conventional [6] Redmon, J., Divvala, S., Girshick, R., &
models in object detection speed, speech Farhadi, A. (2016). You Only Look Once:
quality, and recognition accuracy. A Unified, real-time object detection. IEEE CVPR.
comparative analysis shows that it achieves [7] Liu, W., Anguelov, D., Erhan, D., et al.
three times faster object detection and (2016). SSD: Single shot MultiBox detector.
enhanced real-time speech synthesis than ECCV.
traditional models.
[8] Jocher, G. (2020). YOLOv5. Retrieved from
https://github.com/ultralytics/yolov5
[9] Klatt, D. H. (1980). Software for a
8. CONCLUSION cascade/parallel formant synthesizer. Journal of
the Acoustical Society of America.
The proposed system successfully integrates object [10] Hunt, A. J., & Black, A. W. (1996). Unit
detection, speech synthesis, and face recognition selection in a concatenative speech synthesis
into a real-time application. The results demonstrate system. IEEE ICASSP.
that the system can effectively identify objects and
individuals while providing auditory feedback with [11] Wang, Y., Skerry-Ryan, R., et al. (2017).
minimal latency. This makes it a valuable tool for Tacotron: Towards end-to-end speech synthesis.
visually impaired users and enhances security Interspeech.
applications requiring automated monitoring. The [12] van den Oord, A., et al. (2016). WaveNet: A
system’s ability to operate efficiently on consumer- generative model for raw audio. arXiv preprint
grade hardware ensures accessibility without arXiv:1609.03499.
requiring high computational resources. [13] Hsu, W., Zhang, Y., Glass, J., & Chan, W.
Despite these advantages, certain challenges remain. (2021). SpeechT5: Transformer-Based Text-to-
The model's performance can be affected by varying Speech and Speech-to-Text Models.
lighting conditions, occlusions, and complex Proceedings of ICML.
backgrounds. Future work will focus on optimizing [14] Turk, M., & Pentland, A. (1991).
the deep learning models to handle such variations Eigenfaces for recognition. Journal of Cognitive
effectively. Additionally, multilingual support for
speech synthesis will be incorporated to extend the Neuroscience.
usability of the system across diverse user groups. [15] Belhumeur, P. N., Hespanha, J. P., &
With these improvements, the system can evolve Kriegman, D. J. (1997). Eigenfaces vs.
into a comprehensive, intelligent framework for Fisherfaces. IEEE TPAMI.
real-world applications in accessibility, security, and [16] Schroff, F., Kalenichenko, D., & Philbin, J.
automation. The continuous advancement of deep (2015). FaceNet: A unified embedding for face
learning and hardware acceleration will further recognition. IEEE CVPR.
enhance its efficiency, making it an essential tool for [17] Taigman, Y., et al. (2014). DeepFace:
a wide range of industries. Closing the gap to human-level performance in
face verification. IEEE CVPR.
9. REFERENCES
[1] Viola, P., & Jones, M. (2001). Rapid object
detection using a boosted cascade of simple
features. IEEE Conference on Computer Vision
and Pattern Recognition (CVPR).
[2] Dalal, N., & Triggs, B. (2005). Histograms of
oriented gradients for human detection. IEEE

Review 5
No ratings yet
Review 5
17 pages
Real-Time Object Detection for Visually Impaired
No ratings yet
Real-Time Object Detection for Visually Impaired
5 pages
08 Real - Time - Object - Detection - With - Audio - Feedback - Using - Yolo - vs. - Yolo - v3
No ratings yet
08 Real - Time - Object - Detection - With - Audio - Feedback - Using - Yolo - vs. - Yolo - v3
7 pages
1240 SCEECS25 Review
No ratings yet
1240 SCEECS25 Review
12 pages
4th Review
No ratings yet
4th Review
20 pages
Real Time Object Detection With Audio Feedback Using Yolo v3
No ratings yet
Real Time Object Detection With Audio Feedback Using Yolo v3
4 pages
Object Detection With Voice Guidance To Assist Visually Impaired Using Yolov7
No ratings yet
Object Detection With Voice Guidance To Assist Visually Impaired Using Yolov7
7 pages
Object Detection System With Voice Alert For Blind
No ratings yet
Object Detection System With Voice Alert For Blind
7 pages
VisioNR Docu Stage-I
No ratings yet
VisioNR Docu Stage-I
19 pages
Assisting Blind People Using Object Detection With Vocal Feedback
No ratings yet
Assisting Blind People Using Object Detection With Vocal Feedback
5 pages
Third Review
No ratings yet
Third Review
19 pages
2023 Voice Assisted Real-Time Object Detection
No ratings yet
2023 Voice Assisted Real-Time Object Detection
14 pages
Review 2
No ratings yet
Review 2
30 pages
Blind Assistance: Real-Time Object Detection
No ratings yet
Blind Assistance: Real-Time Object Detection
16 pages
Final Invision
No ratings yet
Final Invision
6 pages
Final Invision
No ratings yet
Final Invision
6 pages
Real-Time Object Detection for the Blind
No ratings yet
Real-Time Object Detection for the Blind
4 pages
Presentation1 FINAL 1
No ratings yet
Presentation1 FINAL 1
11 pages
Object Detection and Recognition Using TensorFlow For Blind People
No ratings yet
Object Detection and Recognition Using TensorFlow For Blind People
6 pages
Leveraging Computer Vision and Natural Language Processing For Object Detection and Localization
No ratings yet
Leveraging Computer Vision and Natural Language Processing For Object Detection and Localization
11 pages
Voice Assisted Object Detection For Visually Impaired
No ratings yet
Voice Assisted Object Detection For Visually Impaired
5 pages
Object Detection Aid for Visually Impaired
No ratings yet
Object Detection Aid for Visually Impaired
4 pages
Vocaleyes: Enhancing Environmental Perception For The Visually Impaired Through Vision-Language Models and Distance-Aware Object Detection
No ratings yet
Vocaleyes: Enhancing Environmental Perception For The Visually Impaired Through Vision-Language Models and Distance-Aware Object Detection
6 pages
Set Conference 22mdt1034
No ratings yet
Set Conference 22mdt1034
17 pages
Ijcrt July Student 2022
No ratings yet
Ijcrt July Student 2022
5 pages
2303 07451 PDF
No ratings yet
2303 07451 PDF
6 pages
Object Detection
No ratings yet
Object Detection
25 pages
YOLO-Based Object Detection with Voice and Cartoon Effects
No ratings yet
YOLO-Based Object Detection with Voice and Cartoon Effects
6 pages
Real-Time Object Detection with YOLO
No ratings yet
Real-Time Object Detection with YOLO
8 pages
Obstacle Detection For Visually Impaire Using IoT
No ratings yet
Obstacle Detection For Visually Impaire Using IoT
21 pages
Navigation System Using YOLOv8
No ratings yet
Navigation System Using YOLOv8
16 pages
Chatbot Paper
No ratings yet
Chatbot Paper
10 pages
SSRN Id4528448 Code6045370
No ratings yet
SSRN Id4528448 Code6045370
6 pages
Final Review
No ratings yet
Final Review
31 pages
Route Detection and Navigation System Using Optimized YOLOv8
No ratings yet
Route Detection and Navigation System Using Optimized YOLOv8
21 pages
Blind Assistance System
No ratings yet
Blind Assistance System
8 pages
Assistive Technology For Visually Impaired Using Tensor Flow Object Detection in Raspberry Pi and Coral USB Accelerator
No ratings yet
Assistive Technology For Visually Impaired Using Tensor Flow Object Detection in Raspberry Pi and Coral USB Accelerator
4 pages
Smart Glasses Application System For Visually Impaired People Based On Deep Learning
No ratings yet
Smart Glasses Application System For Visually Impaired People Based On Deep Learning
5 pages
Final Project Report
No ratings yet
Final Project Report
76 pages
VisioNR Final
No ratings yet
VisioNR Final
22 pages
Final Report Yolo Voice
No ratings yet
Final Report Yolo Voice
94 pages
Enabling Object Detection Through Speech For Visually Impaired-2
No ratings yet
Enabling Object Detection Through Speech For Visually Impaired-2
55 pages
390 Submission
No ratings yet
390 Submission
5 pages
Synopsis - Internship - Group-53
No ratings yet
Synopsis - Internship - Group-53
8 pages
Lit Survey
No ratings yet
Lit Survey
1 page
Real Time Object Detection
No ratings yet
Real Time Object Detection
36 pages
iCMLDE2019 Paper 25
No ratings yet
iCMLDE2019 Paper 25
5 pages
Mca - Doc Full - 28
No ratings yet
Mca - Doc Full - 28
84 pages
Multimodal Obstacle Detection for Vision Impairment
No ratings yet
Multimodal Obstacle Detection for Vision Impairment
12 pages
CNN Object Recognition for the Visually Impaired
No ratings yet
CNN Object Recognition for the Visually Impaired
24 pages
IoT-based Obstacle Recognition Technique For Blind
No ratings yet
IoT-based Obstacle Recognition Technique For Blind
20 pages
Phase 2 Report
No ratings yet
Phase 2 Report
79 pages
Blind
No ratings yet
Blind
24 pages
Deeplearningfor Objectdetect
No ratings yet
Deeplearningfor Objectdetect
20 pages
AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies
No ratings yet
AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies
6 pages
Capstone Final
No ratings yet
Capstone Final
45 pages
3rd Base Title
No ratings yet
3rd Base Title
13 pages
A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq
No ratings yet
A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq
51 pages
Survey
No ratings yet
Survey
18 pages
AadhyaKaul RESUME G
No ratings yet
AadhyaKaul RESUME G
1 page
Deep Learning With Python Lab File
No ratings yet
Deep Learning With Python Lab File
33 pages
Rohit's Resume
No ratings yet
Rohit's Resume
1 page
Tensorflow PDF
No ratings yet
Tensorflow PDF
62 pages
3.2 Automatic Speech Recognition
No ratings yet
3.2 Automatic Speech Recognition
151 pages
DEEP LEARNING - FDP
No ratings yet
DEEP LEARNING - FDP
18 pages
An Optimal Transport Based Embedding To Quantify The Distance Between Playing Styles in Collective Sports.
No ratings yet
An Optimal Transport Based Embedding To Quantify The Distance Between Playing Styles in Collective Sports.
33 pages
Jour 2
No ratings yet
Jour 2
11 pages
Final Unit 5 Questions
No ratings yet
Final Unit 5 Questions
6 pages
SkillDzire Artificial Intelligence Internship Report Documentation
50% (2)
SkillDzire Artificial Intelligence Internship Report Documentation
35 pages
Team in Convolution
No ratings yet
Team in Convolution
4 pages
Algorithm and Hardware Implementation For Visual Perception System in Autonomous Vehicle
No ratings yet
Algorithm and Hardware Implementation For Visual Perception System in Autonomous Vehicle
9 pages
Galenet: Multimodal Learning For Disaster Prediction, Management and Relief
No ratings yet
Galenet: Multimodal Learning For Disaster Prediction, Management and Relief
9 pages
Analysis of Seismic Activity Using Deep Learning
No ratings yet
Analysis of Seismic Activity Using Deep Learning
5 pages
PR2 Aps
No ratings yet
PR2 Aps
5 pages
YOLO-Based Video Processing For CCTV Surveillance
No ratings yet
YOLO-Based Video Processing For CCTV Surveillance
5 pages
Affective Speech in AI
No ratings yet
Affective Speech in AI
24 pages
Batch 08 Final
No ratings yet
Batch 08 Final
40 pages
Deep Learning for PCB Defect Detection
No ratings yet
Deep Learning for PCB Defect Detection
22 pages
Macroblock 7
No ratings yet
Macroblock 7
7 pages
Scalable AI Framework For Defect Detection in Meta
No ratings yet
Scalable AI Framework For Defect Detection in Meta
29 pages
Advertisement Detection, Segmentation, and Classification For Newspaper Images and Website Snapshots
No ratings yet
Advertisement Detection, Segmentation, and Classification For Newspaper Images and Website Snapshots
6 pages
Graph Neural Networks: Aakash Kumar Arvind Ramadurai
No ratings yet
Graph Neural Networks: Aakash Kumar Arvind Ramadurai
22 pages
Deep Learning Models: CNN, RNN, LSTM, GRU Analysis
No ratings yet
Deep Learning Models: CNN, RNN, LSTM, GRU Analysis
16 pages
Krishi Mitra - Intelligent Crop Recommender System: Department of Computer Engineering
No ratings yet
Krishi Mitra - Intelligent Crop Recommender System: Department of Computer Engineering
22 pages
Mini - Project Final Report
No ratings yet
Mini - Project Final Report
33 pages
Machine Learning Enhanced Voice Interation Revolutionizing Windows
No ratings yet
Machine Learning Enhanced Voice Interation Revolutionizing Windows
6 pages
Deepfake Detection
No ratings yet
Deepfake Detection
3 pages
Deep Learning in Inertial Navigation
No ratings yet
Deep Learning in Inertial Navigation
12 pages

Research Paper

Uploaded by

Research Paper

Uploaded by

VirtualCane: Navigation using Object Detection and

Facial Identification for Visually Challenged

Dweephans Khari Dipanshi Gupta

The You Only Look Once (YOLO) algorithm,Error: 2. RELATED WORK

6.3 Face Recognition Equation (Cosine Similarity

This allows for more individualized communication,

The system is enhanced by a Streamlit-based user

 Transformers are a class of deep learning models

You might also like