0% found this document useful (0 votes)
56 views56 pages

Blind Assistance System: Real Time Object Detection With Distance and Voice Alerts

Uploaded by

Maria Rufina P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views56 pages

Blind Assistance System: Real Time Object Detection With Distance and Voice Alerts

Uploaded by

Maria Rufina P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Jnana Sangama”, Machhe, Belagavi, Karnataka-590018

A Project Report
on

“BLIND ASSISTANCE SYSTEM: REAL TIME OBJECT


DETECTION WITH DISTANCE AND VOICE ALERTS”
Submitted in partial fulfillment of the requirements for the award of the degree of
Bachelor of Engineering
in
Computer Science & Engineering
Submitted by

RAMITHA SHEKAR C 4GW21CS085


RUCHITHA M S 4GW21CS088
SMITHA K L 4GW21CS103
SRINIDHI V 4GW21CS109

Under the Guidance of


Mr. Rajath A N
Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


(Accredited by NBA, New Delhi, Validity 01.07.2023 to 30.06.2026)

GSSS INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN


(Affiliated to VTU, Belagavi, Approved by AICTE, New Delhi & Govt. of Karnataka)
K.R.S ROAD, METAGALLI, MYSURU-570016, KARNATAKA
Accredited with Grade “A” by NAAC
2024-25
Geetha Shishu Shikshana Sangha (R )
GSSS INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN
K.R.S Road, Mysuru-570016, Karnataka
(Affiliated to VTU, Belagavi, Approved by AICTE -New Delhi & Govt. of Karnataka)
Accredited with Grade “A” by NAAC

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


(Accredited by NBA, New Delhi, Validity 01.07.2023 to 30.06.2026)

CERTIFICATE

Certified that the 7th Semester Project titled “BLIND ASSISTANCE SYSTEM: REAL
TIME OBJECT DETECTION WITH DISTANCE AND VOICE ALERTS” is a bonafide
work carried out by RAMITHA SHEKAR C (4GW21CS085), RUCHITHA M S
(4GW21CS088), SMITHA K L (4GW21CS103) and SRINIDHI V (4GW21CS109) in
partial fulfilment for the award of degree of Bachelor of Engineering in Computer Science &
Engineering of the Visvesvaraya Technological University, Belagavi, during the year
2024-25. The Project report has been approved as it satisfies the academic requirements with
respect to the project work prescribed for Bachelor of Engineering Degree.

Signature of the Guide Signature of the HOD Signature of the Principal


(Mr. Rajath A N) (Dr. Raviraj P) (Dr. Shivakumar M)

External Viva

Name of the Examiners Signature with Date

1.

2.
ACKNOWLEDGEMENT
The joy and satisfaction that accompany the successful completion of any task
would be incomplete without the mentioning the people who made it possible.

First and foremost we offer our sincere phrases of thanks to Late Smt.
Vanaja B Pandit, Former Honorary Secretary, GSSS(R) Mysuru for the blessings
and support.

We offer our sincere thanks to Smt. Anupama B Pandit, Secretary,


GSSS(R) and the Management of GSSSIETW, Mysuru for their invaluable support
and guidance in carrying out this project.

We would like to express our gratitude to our Principal, Dr. Shivakumar M


for providing us a congenial environment for engineering studies and also for having
showed us the way to carry out the project.

We consider it is a privilege and honour to express our sincere thanks to


Dr.Raviraj P Professor and Head, Department of Computer Science& Engineering
for his support and invaluable guidance throughout the tenure of this project.

We would like to sincerely thank our guide Mr. Rajath A N, Assistant


Professor Department of Computer Science and Engineering for his support, guidance,
motivation, encouragement for the successful completion of this project.

We would like to thank our Project Co-ordinators Dr. Rajashekar M B,


Associate Professor & Ms. Nischitha B S, Assistant Professor, Department of
Computer Science & Engineering for their constant monitoring, guidance &
motivation throughout the tenure of this project.

We intend to thank all the teaching and non teaching staffs of our Computer
Science & Engineering department for their immense help and co-operation.

Finally we would like to express our gratitude to our parents and friends who
always stood with us to complete this work successfully.

Ramitha Shekar C (4GW21CS085)


Ruchitha M S (4GW21CS088)
Smitha K L (4GW21CS103)
Srinidhi V (4GW21CS109)
i
ABSTRACT

The Blind Assistance System: Real-Time Object Detection with Distance and Voice Alerts
is a transformative web application designed to enhance the mobility and independence of
visually impaired individuals. By employing advanced computer vision technologies and
machine learning, the system detects objects in real-time using a camera feed, estimates
their distances, and provides voice alerts to convey actionable information. This seamless
integration of AI-driven capabilities allows users to navigate their surroundings
confidently, reducing dependency on external assistance.

At the core of the system lies a combination of pre-trained deep learning models, such as
SSD, for object recognition, and depth estimation algorithms to measure distances
accurately. The system processes video input in real-time, ensuring minimal latency for
dynamic scenarios, and delivers precise auditory feedback about the objects’ location and
proximity. Its web-based design ensures accessibility across multiple devices, making it
portable, user-friendly, and cost-effective. This practical solution addresses critical
mobility and safety challenges faced by visually impaired individuals in various
environments.

Beyond its core functionalities, the system has been designed with adaptability in mind,
making it suitable for diverse real-world applications. These include personal navigation in
crowded spaces, assisting users in identifying obstacles in unfamiliar areas, and promoting
safety in public or work environments. The integration of voice alerts enables interaction
without the need for visual cues, ensuring it caters specifically to the needs of its target
audience. The application can also serve as a foundation for future smart living
technologies, with potential integration into smart homes or wearable devices.

ii
TABLE OF CONTENTS

Acknowledgement i
Abstract ii
List of Figures v

1 INTRODUCTION 1
1.1 Overview 1
1.2 Existing System 2
1.3 Scope and Objectives 3
1.4 Limitations of Existing System 3
1.5 Problem Statement 4
1.6 Motivation 4
1.7 Proposed System 5
1.8 Organization of Report 6

2 LITERATURE SURVEY 7
2.1 Survey Findings 7

3 SOFTWARE AND HARDWARE REQUIREMENTS 15


3.1 Functional and Non – Functional Requirements 15
3.2 System Requirements 17
3.2.1 Hardware Requirements 17
3.2.2 Software Requirements 20
3.3 Requirement Analysis 21

4 DESIGN AND IMPLEMENTATION 25


4.1 Architectural Design 27
4.1.1 Introduction 27

iii
4.2 Detailed Design 29
4.2.1 Use Case Diagram 29
4.2.2 Data Flow Diagram 30
4.2.3 Sequence Diagram 32
4.3 Implementation 33
4.3.1 Packages used 33
4.3.2 SSD Model Explanation 35
4.3.3 Methodology 36
4.3.4 Distance and Obstacle Estimation 37

5 TESTING 38
5.1 Purpose of Testing 38
5.2 Test Case 39
5.3 Different Types of Testing 40
5.3.1 Unit Testing 40
5.3.2 Integration Testing 40
5.3.3 System Testing 41
5.3.4 Acceptance Testing 41
RSSULTS AND CONCLUSION 42
SNAPSHOTS 47
CONCLUSION 48
FUTURE SCOPE 49
REFERENCES 50

iv
LIST OF FIGURES
FIGURE PAGE
NUMBER DESCRIPTION
NUMBER

Figure 4.1 Architecture Design of a System 27

Figure 4.2 Use Case Diagram of Object Detection with 29


Voice Alerts

Figure 4.3 Data Flow Diagram of Object Detection with 31


Voice Alerts

Figure 4.4 Sequence Diagram of Object Detection with Voice 32


Alerts

Figure 4.5 SSD Architecture of Object Detection with 36


Voice Alerts

Figure 4.6 Methodology of Object Detection with Voice 37


Alerts
Snapshot 1 Cell phone detected 42

Snapshot 2 Water bottle detected 42

Snapshot 3 Person detected 43

Snapshot 4 Laptop detected 43

Snapshot 5 Umbrella detected 44

Snapshot 6 TV detected 44

Snapshot 7 Book detected 45

Snapshot 8 Remote detected 45

Snapshot 9 Keyboard detected 46

Snapshot 10 Car detected 46

Snapshot 11 Chair detected 47

LIST OF TABLES
TABLE PAGE
NUMBER DESCRIPTION NUMBER
Table 5 Test Cases of each module 36

v
Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Chapter 1

INTRODUCTION

1.1 Overview

The Blind Assistance System is a web-based application designed to assist visually


impaired individuals by providing real-time object detection and voice-based feedback. This
solution utilizes computer vision and artificial intelligence technologies to detect obstacles
in the user's surroundings, estimate their proximity, and generate corresponding voice alerts
for guidance.

The system is powered by the SSD model, which ensures fast and accurate object detection
while being lightweight enough for deployment on various platforms. The application
captures video feed, processes the frames to detect objects, and computes their distances.
The results are then converted into audible instructions using text-to-speech technology,
offering users actionable insights about nearby obstacles, such as “Obstacle ahead: Chair, 2
meters.”

As a web-based application, the system can be accessed from devices with an internet
connection and a camera, such as laptops or smart phones, eliminating the need for
specialized hardware. This makes the solution highly accessible and scalable for users across
different regions and socio-economic backgrounds. The platform is designed with simplicity
and usability in mind, ensuring that even non-technical users can operate it effectively.

This project focuses on providing a cost-effective, user-friendly alternative to traditional


assistive tools, enabling visually impaired individuals to navigate their environments with
confidence and independence.

In addition to its core functionality, the Blind Assistance System offers the flexibility to be
integrated into various assistive environments. The web-based nature of the application
ensures compatibility with different devices and allows for easy updates and scalability. This
modular design also opens avenues for future enhancements, such as integrating advanced
features like multi-language support, personalized obstacle detection based on user
preferences, or compatibility with wearable devices like AR glasses. By leveraging modern

Dept. of CSE 1 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

web technologies, this project creates a platform that can evolve alongside advancements in
computer vision and artificial intelligence, ensuring its long-term relevance and impact in
improving the lives of visually impaired individuals.

1.2 Existing System


Currently, visually impaired individuals rely on a variety of tools and technologies to
navigate their environments. These solutions, while helpful, often come with significant
limitations that hinder their effectiveness in dynamic or complex settings. Below are some
commonly used systems and their associated challenges:

1. Walking Canes

Walking canes are one of the most widely used tools for navigation by visually impaired
individuals. They are lightweight, portable, and easy to use, allowing users to detect objects
or obstacles at ground level. However, their scope is limited to tactile feedback and cannot
identify obstacles at a distance or above the waist level. Additionally, they provide no
information about the type or size of the obstacle, leaving the user to interpret environmental
cues manually.

2. Guide Dogs

Guide dogs are another popular aid, offering companionship and assistance in navigation.
These trained animals can help users avoid obstacles and navigate crowded areas. However,
they come with several limitations, including high training and maintenance costs, limited
availability, and the inability to provide detailed feedback about the environment.
Furthermore, guide dogs cannot adapt to rapidly changing environments or provide
information about distant objects.

3. Ultrasonic and Infrared Devices

Technological advancements have introduced ultrasonic and infrared devices that detect
obstacles using sound or light waves. These tools can identify objects at varying distances
and provide feedback through vibrations or sound signals. While they address some
limitations of canes and guide dogs, these devices often lack precision in identifying the type

Dept. of CSE 2 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

of obstacle and may produce false positives in noisy or cluttered environments. Moreover,
their high cost makes them inaccessible to a large segment of users.

4. Camera-Based Systems

Some advanced systems use cameras and computer vision algorithms to detect and
recognize obstacles. These systems offer better accuracy and can identify specific objects,
such as furniture or vehicles. However, most camera-based systems are either standalone
devices requiring dedicated hardware or part of expensive IoT-based solutions. They often
suffer from scalability issues and require substantial technical expertise for setup and use.

1.3 Scope and Objectives


Scope:
 To create a lightweight, scalable system deployable on portable devices.
 Integration into smart glasses or mobile apps to provide ubiquitous accessibility.
 Extending the system for multi-object tracking and classification.

Objectives:

 Develop a real-time object detection system capable of identifying obstacles in the


user’s surroundings.
 Integrate distance measurement functionality to estimate the proximity of detected
objects.
 Implement a voice alert system to provide auditory feedback for obstacle
identification and distance.
 Ensure the system is cost-effective, portable, and user friendly for visually impaired
individuals.

1.4 Limitations of the Existing System

Despite the availability of these tools, significant gaps remain in providing an affordable,
accessible, and user-friendly solution. Key challenges include:

 Cost: Many advanced systems, such as LiDAR-based tools or high-end camera


systems, are prohibitively expensive.

Dept. of CSE 3 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

 Real-Time Detection: Few systems provide real-time feedback, which is crucial in


dynamic environments.
 Complexity: Some solutions require extensive training or specialized equipment,
making them impractical for widespread adoption.
 Limited Feedback: Current tools often fail to provide comprehensive information,
such as the type and distance of obstacles, in an intuitive manner.

The lack of an integrated, real-time, and cost-effective solution highlights the need for a
system like the Blind Assistance System, which combines object detection, distance
estimation, and voice feedback into a scalable and accessible web application.

1.5 Problem Statement

 Visually impaired individuals face difficulties in detecting and identifying obstacles in


their surroundings, leading to limited mobility and independence.
 Lack of real-time feedback on the proximity of objects increases the risk of accidents or
injuries.

• The absence of user-friendly, portable systems with voice based alerts limits the
effectiveness of current assistive technologies.

1.6 Motivation

The motivation behind developing the Blind Assistance System lies in the desire to
improve the quality of life for visually impaired individuals by providing a practical,
accessible, and real-time solution to aid in navigation. Traditional tools like walking canes
and guide dogs, while beneficial, have limitations in detecting dynamic obstacles or offering
detailed feedback, leaving users reliant on others in unfamiliar environments. Advanced
assistive technologies, though more effective, are often expensive and require specialized
hardware, making them inaccessible to a significant portion of the visually impaired
population. This project leverages advancements in computer vision, particularly lightweight
object detection models like SSD, to deliver a fast and accurate system that provides real-
time feedback through voice alerts. By designing a web-based platform, the system ensures
affordability and compatibility with everyday devices such as smartphones and laptops,
making it accessible to a wider audience. Furthermore, this project aims to bridge the gaps in
existing solutions by integrating object detection, distance estimation, and audio feedback

Dept. of CSE 4 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

into a unified platform, enhancing safety and independence for visually impaired users. The
initiative is also driven by the broader goal of fostering inclusivity and creating technologies
that cater to the diverse needs of society, contributing to the empowerment of individuals
with disabilities.

1.7 Proposed System

Key Features:

1. Real-Time Object Detection: SSD is used due to its balance of speed and accuracy,
allowing detection on low-powered devices.
2. Distance Estimation: Calculating object distance using focal length and pixel size
ensures accurate proximity alerts.
3. Voice Feedback: Text-to-speech (TTS) modules convert detection results into audio
instructions.
4. User-Friendly Interface: Simple controls and configurations ensure ease of use
without requiring technical expertise.

System Flow:

1. Capture real-time video feed.


2. Apply SSD to identify objects in the frame.
3. Calculate distances for detected objects.
4. Generate and deliver relevant voice alerts.

Advantages:

 Low latency, making it ideal for real-time applications.


 Scalable for multiple environments (indoors, outdoors, crowded areas).
 Compatible with low-power hardware like Raspberry Pi or smartphones.

1.8 Organization of the Report


The report is organized in the following manner:

 Chapter 1 gives the introduction of this project that is what the project does and how it is
helpful.

Dept. of CSE 5 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

 Chapter 2 gives the literature survey of the project. In order to understand the project
in cleaner manner, a survey was done to know about existing systems.
 Chapter 3 gives the system requirements and design of the project and this chapter
briefs out the requirements required to fulfill the project.
 Chapter 4 gives the implementation of the project, with the help of the flow chart and
algorithms, project flow can be understood clearly and this chapter also includes the
test case description.
 Chapter 5 gives the testing of the project; it measures the quality of the software we
are developing

Dept. of CSE 6 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts
Chapter 2

LITERATURE SURVEY

Literature Survey is the most important step in the software development process. Before
developing the tool, it is necessary to determine the time factor, economy and company
strength. Once these things are satisfied, the next step is to determine which operating
system and language can be used for developing the tool. Once the programmers start
building the tool the programmers need a lot of external support. This support can be
obtained from senior programmers, from books or from websites. Before building the
system, the above considerations are taken into account for developing the proposed system.

2.1 Survey Findings


1. Blind Assistance In Object Detection And Generating Voice Alerts
UGC Care Group I Journal Vol-08 BLIND ASSISTANCE IN OBJECT DETECTION
AND GENERATING VOICE ALERTS in 2023

The document details a blind assistance system designed to enhance the mobility and
independence of visually impaired individuals by addressing challenges in object
detection and navigation. The system employs cameras embedded in devices like
walking sticks, sunglasses, or caps to capture visual data from the user’s surroundings.
Using advanced machine learning algorithms, the system detects objects in real time,
estimates their distance, and generates voice alerts to inform the user. Optical Character
Recognition (OCR) is integrated to identify and interpret text content from images,
further extending the utility of the device.

A notable advantage of this system is its ability to function independently of human


assistance. Unlike traditional methods, such as walking sticks, which require manual
effort and external aid, this proposed solution minimizes risks, reduces the time needed
for tasks, and enables users to navigate their environment more effectively. The system
relies on Python-based technologies, including TensorFlow and PyTorch, along with
pre-trained datasets, to achieve accurate and efficient object detection. Stereo

Dept. of CSE 7 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

headphones connected to the device deliver audio feedback, acting as a virtual guide for
the user [1].

While the system marks a significant step forward in assistive technology, it does face
certain limitations. The recognition of objects is constrained by the breadth of the
dataset, and performance is hindered in low-light conditions where the camera may fail
to capture clear visuals. Despite these challenges, the project sets a foundation for further
enhancements, such as expanding object databases and optimizing the system for diverse
lighting scenarios. By bridging the gap between dependence and independence, this
innovative approach empowers visually impaired individuals to navigate their
environments confidently and safely.

2. Blind Assistance System using Digital Image Processing


Blind Assistance System using Digital Image Processing 2023 International Conference
on Network, Multimedia and Information Technology (NMITCON).

The document discusses a Blind Assistance System that combines YOLOv3 (You Only
Look Once), a state-of-the-art real-time object detection algorithm, with OpenCV's DNN
(Deep Neural Network) module and Google Text-to-Speech (GTTS) technology. The
goal of the system is to enhance the mobility and independence of visually impaired
individuals by providing accurate and real-time object detection with audio feedback in
the user's preferred language. The integration of YOLOv3 ensures high-speed and
precise detection of objects, while GTTS translates the detected objects into auditory
outputs, empowering users to navigate their surroundings confidently.

Key features of the system include the ability to process live video feeds from a
webcam, classify diverse object categories using a pre-trained COCO dataset, and
deliver language-customized voice alerts. By utilizing bounding boxes for object
localization, the system enables users to understand the spatial positions of objects in
real time. The adaptability of the system is further enhanced through the use of transfer
learning and the integration of language translation APIs, ensuring user-friendliness
across different linguistic preferences. Testing and optimization focus on improving
accuracy, speed, and usability to meet the dynamic needs of real-world applications.

Dept. of CSE 8 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

The system represents a significant advancement in assistive technologies, fostering


autonomy for visually impaired individuals. It facilitates seamless navigation, precise
object recognition, and safe mobility in complex environments. Future enhancements
aim to refine model accuracy, expand dataset diversity, and develop interfaces
[2]
compatible with various devices . By leveraging cutting-edge AI and deep learning
technologies, this research lays the foundation for more inclusive solutions, bridging
gaps in accessibility and enabling greater independence for the visually impaired
community.

3. Blind Assistance System using Image Processing


2023 International Conference on Network, Multimedia and Information Technology
(NMITCON) “Blind Assistance System using Image Processing”.

The Blind Assistance System designed to improve the mobility and independence of
visually impaired individuals. Leveraging advanced image processing and machine
learning techniques, the system uses TensorFlow's Object Detection API alongside
models like YOLO (You Only Look Once) and MobileNet for real-time object detection
and depth estimation. Through the integration of a Raspberry Pi camera, the system
captures live video, processes it to identify objects and their spatial positions using
bounding boxes, and provides auditory feedback via pyttsx3 text-to-speech libraries. The
use of Optical Character Recognition (OCR) further enables the system to read and
convert textual content from images into speech, enhancing its utility for navigation and
understanding surroundings.

The system’s core functionalities include object recognition, distance calculation, and
voice alerts, which help users navigate complex environments safely and independently.
It supports real-time interaction by employing pre-trained datasets such as COCO ,
KITTI, and Open Images for training detection models. The use of lightweight models
like MobileNet, with depth-wise separable convolutions, ensures both accuracy and
efficiency in detecting objects and calculating distances. Additionally, the system is
designed to be portable and compatible with Android devices, allowing users to access
these features seamlessly through a user-friendly interface.

Dept. of CSE 9 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

This Blind Assistance System represents a significant advancement in assistive


technologies, addressing the challenges faced by visually impaired individuals in daily
navigation and object recognition. It combines cutting-edge AI tools with real-world
applicability, providing accurate, real-time, and customizable auditory guidance. Future
improvements aim to expand object datasets, improve low-light performance, and extend
device compatibility, fostering greater independence and inclusion for the visually
impaired community[3].

4. Object Detection System For The Blind With Voice guidance


Miss Rajeshavree Ravindra Karmarkar, Prof. V.N. Honmane “Object Detection System
For The Blind With Voice guidance”- Published Online June 2022 in Ijeast .

The paper discusses an innovative object detection system for visually impaired
individuals using deep learning and voice guidance. The system leverages the YOLO
(You Only Look Once) algorithm for real-time object detection and position estimation,
providing audio feedback via Google Text-to-Speech (TTS). Designed for Android, it
processes images to recognize objects, determines their locations, and communicates this
information audibly, enhancing accessibility and independence for visually impaired
users. The system primarily uses a camera to capture surroundings, YOLO for object
recognition and location estimation, and TTS to convert detected data into speech.

The methodology involves several key steps: preprocessing images (converting to


grayscale and standardizing dimensions), employing YOLO for bounding box
generation and classification, and using algorithms like Intersection Over Union (IOU)
for precise object localization. Once identified, the system calculates object distances
using geometric techniques and outputs audible guidance[4]. It supports multiple object
types, utilizing the COCO dataset, which includes diverse categories like animals,
devices, and food items.

The research concludes that while the current Android implementation is effective, it
sacrifices some detection accuracy for speed due to its use of tiny YOLO. Future work
aims to enhance accuracy and compatibility with broader platforms and devices. This
system showcases significant potential for empowering visually impaired individuals
through technology, simplifying daily navigation and object recognition.

Dept. of CSE 10 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

5. Voice Enable Blind Assistance System -Real time Object Detection


Jigar Parmar, Vishal Pawar, Babul Rai, Prof. Siddhesh Khanvilkar “Voice Enable Blind
Assistance System -Real time Object Detection”- IRJET, Apr 2022

The research paper introduces a voice-enabled blind assistance system for real-time
object detection, aimed at enhancing the independence of visually impaired individuals.
It uses a lightweight network model, MobileNet, in conjunction with a Single Shot
Multi-Box Detection (SSD) algorithm to detect household objects efficiently. The
system employs TensorFlow APIs for implementing deep learning frameworks,
leveraging the COCO dataset for training. It integrates object detection, voice output,
and distance-based alerts, enabling visually impaired users to interact with their
surroundings via audio feedback about detected objects and obstacles.

The methodology involves capturing real-time frames through a webcam, processing


them using a pre-trained SSD model, and generating audio outputs based on detected
objects. The system is designed to ensure practical utility by alerting users about nearby
obstacles and maintaining efficiency through threading methodologies to optimize frame
processing. The voice generation module, using libraries like pyttsx3, converts detected
object data into verbal cues. Additionally, image processing tools such as OpenCV
facilitate object recognition, and Python-tesseract assists in extracting text from images
for reading purposes.

The study concludes that this technology has potential applications beyond aiding the
visually impaired, including sports tracking, traffic management, and textual analysis.
Despite achieving reliable object detection and voice output functionality, the system
faces challenges like a slight delay in object detection transitions[5]. Future enhancements
could broaden its applicability, improve responsiveness, and address diverse scenarios
like currency detection and language translation.

6. Blind Assistance in Object Detection and Generating Voice Alerts


N. V. N. Vaishnavi, Tummala Navya, Velagapudi Srilekha, Vinnakota Karthik, D. Leela
Dharani “Blind Assistance in Object Detection and Generating Voice Alerts”, in 2023

Dept. of CSE 11 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

The article addresses the challenges faced by visually impaired individuals, emphasizing
their difficulties in navigating unknown environments and identifying obstacles. The
proposed solution is an integrated machine learning system that leverages cameras
embedded in everyday accessories like walking sticks or sunglasses. This system detects
objects, estimates their distance, and generates voice alerts, providing real-time feedback
to users. The core objective is to offer a visual aid through an Android smartphone
interface, reducing dependence on others while enhancing safety and efficiency in daily
activities.

The proposed system offers significant advantages over existing methods like traditional
walking sticks, which are slow and rely heavily on human assistance. By using machine
learning algorithms trained on datasets of common objects, the system detects and
recognizes items in the user's surroundings. The captured visuals are converted into
audio signals delivered through headphones, helping users avoid obstacles and complete
tasks independently[6]. This solution minimizes risks, shortens response time, and
ensures users can navigate safely without constant reliance on others.

Future enhancements of the system could focus on expanding the dataset to include a
wider variety of objects and addressing challenges like poor performance in low-light
conditions. Incorporating advanced technologies like night vision or infrared cameras
could further improve usability. Overall, this innovative approach has the potential to
empower visually impaired individuals by increasing their independence and integration
into society.

7. Object Detection System with Voice Alert for Blind

The Single Shot MultiBox Detector (SSD) is a highly efficient object detection
algorithm designed to detect multiple objects in an image in a single pass through the
network, making it suitable for real-time applications. SSD works by using a
convolutional neural network (CNN) to extract feature maps from an input image. It then
applies additional convolutional layers to predict bounding boxes and class labels for
each object. What sets SSD apart is its use of multiple feature maps at different scales,
which helps it detect objects of various sizes within the same image. It also uses
predefined anchor boxes at each position in the feature map to make predictions, with

Dept. of CSE 12 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

the network outputting the probability of an object being present and the adjustments
needed to improve the bounding box location.

One of SSD’s key advantages is its speed. Unlike other models, such as Faster R-CNN,
which rely on generating region proposals before performing detection, SSD predicts
bounding boxes and classifications directly from the feature maps, making it much
faster. Additionally, it uses a technique called Non-Maximum Suppression (NMS) to
eliminate duplicate bounding boxes and keep the most accurate ones, further enhancing
the detection process. While SSD is known for its speed, it can be less accurate in
detecting smaller objects compared to more complex models. Despite this, it performs
well in applications like autonomous vehicles, surveillance systems, robotics, and
augmented reality, where real-time object detection is crucial. Overall, SSD strikes an
excellent balance between speed and accuracy, making it a popular choice for many real-
time computer vision tasks.

8. Blind Assist System Using AI And Image Processing

International Advanced Research Journal in Science, Engineering and Technology


“Blind Assist System using AI And Image Processing” in 2023

The document discusses a Blind Assist System that leverages AI and image processing
to enhance autonomy and safety for visually impaired individuals. The introduction
highlights the challenges faced by over 250 million visually impaired individuals
worldwide, emphasizing the limitations of traditional aids like white canes and guide
dogs. The proposed solution integrates real-time image processing, deep learning
algorithms, and sonar sensors to detect objects, recognize text, and alert users through
audio feedback. Additional features include an SOS button for emergencies and
portability-enhancing components such as a Raspberry Pi 4B, power banks, and
earphones.

A detailed review of existing systems notes advancements like ultrasonic white canes,
guide dogs, and Braille displays, while identifying their limitations in cost, accessibility,
and environmental awareness. The proposed system is built upon prior research and
introduces features like a compact design, text recognition via Tesseract, and real-time
obstacle detection using a COCO-trained deep neural network[8]. The methodology
ensures

Dept. of CSE 13 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

continuous monitoring of the user's environment, with emergency alert mechanisms


integrated for added safety.

The results demonstrate the system’s accuracy and reliability in obstacle detection,
thanks to robust AI and sensor technologies. The design is user-friendly and portable,
promoting independence and safety. Regular updates to the AI models and addressing
privacy concerns are essential for long-term functionality and user trust. This innovative
system aims to significantly improve the quality of life for visually impaired individuals
by combining affordability, accessibility, and advanced technological capabilities.

9. Object And Distance Detection System for Visually Impaired People

Prof. Ashwini Phalke “Object And Distance Detection System for Visually Impaired
People” published in the year April 2022.

The document discusses a Blind Assist System that leverages AI and image processing
to enhance autonomy and safety for visually impaired individuals. The introduction
highlights the challenges faced by over 250 million visually impaired individuals
worldwide, emphasizing the limitations of traditional aids like white canes and guide
dogs. The proposed solution integrates real-time image processing, deep learning
algorithms, and sonar sensors to detect objects, recognize text, and alert users through
audio feedback. Additional features include an SOS button for emergencies and
portability-enhancing components such as a Raspberry Pi 4B, power banks, and
earphones.

A detailed review of existing systems notes advancements like ultrasonic white canes,
guide dogs, and Braille displays, while identifying their limitations in cost, accessibility,
and environmental awareness. The proposed system is built upon prior research and
introduces features like a compact design, text recognition via Tesseract, and real-time
obstacle detection using a COCO-trained deep neural network[9]. The methodology
ensures continuous monitoring of the user's environment, with emergency alert
mechanisms integrated for added safety.

Dept. of CSE 14 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Chapter 3

SOFTWARE AND HARDWARE REQUIREMENTS

Software Requirement Specification (SRS) is the starting of software development activity.


SRS is a document that completely describes what the proposed software should do. It is a
means of translating the ides ion the minds of the user into formal document.

3.1 Functional and Non – Functional Requirements

3.1.1 Functional Requirements

Functional requirements define the specific behaviours and operations of the system to meet
its intended purpose. For the Blind Assistance System, these include:

1. Object Detection
 The system should detect objects in real-time using a live camera feed.
 It should classify detected objects into categories (e.g., furniture, vehicles,
pedestrians).
2. Distance Estimation
 The system should calculate the approximate distance between the detected object
and the user.
 Distance measurements should be updated dynamically as objects move closer or
further away.
3. Voice Alert System
 The system must provide clear audio feedback about detected objects, including
their type and distance (e.g., “Obstacle ahead: Chair, 1.5 meter” and “Obstacle
ahead: TV, 2 meter ).
 Voice alerts should be customizable for language and volume.
4. Web-Based Interface
 The system should be accessible via a browser, providing a user-friendly interface.
 The interface should allow users to start/stop detection and adjust settings such as
voice alert preferences.
5. Camera Integration
 The system must integrate seamlessly with the user's webcam or connected camera
for capturing the live video feed.

Dept. of CSE 15 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

6. Error Handling
 The system should handle cases where no objects are detected by
providing appropriate feedback.
 It should recover gracefully from errors like camera disconnection or low
light conditions.
7. Multi-Device Support
 The system should be compatible with various devices, including laptops and
smart phones, as long as they have a functional camera and browser.
8. Scalability
 The system should allow future features, such as GPS integration for navigation or
multi-language support, without significant architectural changes.

3.1.2 Non-Functional Requirements

Non-functional requirements define the quality attributes and constraints of the system. For
the Blind Assistance System, these include:

1. Performance
 The system should process video frames and generate alerts within 1 second to
ensure real-time operation.
 It should maintain high detection accuracy, with at least 90% precision for
supported object classes.
2. Usability
 The system must be easy to operate, requiring minimal technical expertise.
 The web interface should be intuitive, with simple controls and clear instructions.
3. Accessibility
 The application should support screen readers and provide an option for audio-based
navigation for visually impaired users.
4. Reliability
 The system must perform consistently across various environments, including
different lighting conditions.
 It should have a low error rate for object detection and distance estimation.

Dept. of CSE 16 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

5. Scalability
 The architecture should support additional features, such as advanced obstacle
tracking or integration with wearables, without compromising performance.
6. Portability
 The system should run on multiple platforms, such as Windows, macOS, and
Android, through a web browser.
 It should not require specialized hardware beyond a camera and internet access.
7. Security
 The system should ensure data privacy, particularly when handling sensitive video
feeds.
 Communication between the client and server should be encrypted to prevent
unauthorized access.
8. Maintainability
 The codebase should be modular and well-documented to facilitate updates and
debugging.
 Any dependencies, such as libraries or frameworks, should be easily upgradable.
9. Efficiency
 The application should use system resources optimally, avoiding excessive CPU or
memory usage.
 It should operate smoothly on devices with moderate hardware capabilities (e.g.,
4GB RAM, Intel i3 processor).
10. Availability
 The system should have high uptime, with minimal downtime for updates or
maintenance.
 The web application should load and function effectively even on slower internet
connections.
11. Aesthetic Design
 The interface should be visually appealing and organized, ensuring ease of
navigation for sighted users who may assist in setting up the system.

3.2 System Requirements

3.2.1 Hardware Requirements

Dept. of CSE 17 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

 Processor:

The processor handles the computations for object detection and distance estimation.

 Minimum:
 Intel Core i3 (4th generation or newer) or equivalent.
 Suitable for lightweight operations but may experience slight latency in real-time
processing.
 Recommended:
 Intel Core i5/i7 (8th generation or newer) or equivalent (e.g., AMD Ryzen 5 or
higher).
 Ensures faster processing and smoother performance, especially when handling
high-resolution video streams.

 RAM:RAM is crucial for handling real-time video data and running detection
algorithms.

 Minimum:
 4GB.
 Allows the system to operate but might limit multitasking capabilities.
 Recommended:
 8GB or higher.
 Ensures smooth real-time detection and simultaneous tasks like audio processing
without system lag.

 Camera: A camera is required to capture the live video feed for object detection.

 Minimum:
 Integrated or external webcam with a resolution of 720p (HD).
 Capable of providing basic video quality for detection.
 Recommended:
 Full HD (1080p) webcam or higher resolution.
 Offers better object clarity and improves detection accuracy in diverse lighting
conditions.
 Examples: Logitech C920, Razer Kiyo, or similar webcams.

Dept. of CSE 18 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

 Speakers or Headphones: Audio output devices are necessary for delivering voice
alerts to the user.

 Minimum: Basic built-in speakers on laptops or external speakers.


 Recommended: Noise-cancelling headphones or high-quality external speakers for
clear audio alerts, especially in noisy environments.

 Storage: Storage is required for application files, model weights, and temporary data.

 Minimum:
 500MB free space.
 Sufficient for the application and pre-trained SSD.
 Recommended:
 1GB or higher.
 Accommodates additional features, updates, or model expansions in the future.

 Graphics Processing Unit (GPU) : A GPU accelerates the processing of deep learning
models like SSD.

 Minimum:
 Integrated graphics (e.g., Intel UHD Graphics or AMD Radeon Vega).
 Adequate for basic functionality but may slow down object detection.
 Recommended:
 Dedicated GPU, such as NVIDIA GeForce GTX 1050 or higher.
 Boosts detection speed and supports smoother real-time operation, especially for
high-resolution video feeds.

 Network: The system relies on web-based access, requiring a stable internet connection.

 Minimum:
 2 Mbps speed.
 Enables basic functionality but may result in slower interface responses or delayed
updates.
 Recommended:
 10 Mbps or higher.
 Ensures smooth interaction with the web interface and quick loading of resources.

Dept. of CSE 19 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

3.2.2 Software Requirements

The software components ensure the development, deployment, and functionality of the
web application.

 Programming Languages

The system is built using versatile and efficient programming languages to ensure
seamless functionality.

 Python:
 Core language for implementing object detection, distance estimation, and voice
alerts.
 Easy to integrate with machine learning libraries like TensorFlow and OpenCV.
 HTML, CSS, JavaScript:
 Used for designing the web interface for accessibility and usability.
 JavaScript is also employed for front-end interactivity.

 Frameworks and Libraries

Computer Vision and Machine Learning

 OpenCV:
 For real-time video frame processing.
 Supports camera feed integration and preprocessing of input images.
 OpenCV provides pre-trained models and tools for working with deep learning
frameworks such as TensorFlow and PyTorch

 SSD:
 A lightweight and efficient object detection model, ensuring real-time
performance.
 Implemented using frameworks like TensorFlow, Keras, or Darknet.
 SSD uses a multi-scale feature map approach, allowing it to detect objects of
varying sizes with high accuracy.

Dept. of CSE 20 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Text-to-Speech

 pyttsx3:
 Python-based library for converting text to speech.
 Used to generate audio alerts for detected objects.

Data Manipulation and Analysis

 NumPy : For handling multidimensional arrays and numerical computations.


 Pandas :For data manipulation and analysis, if needed for future enhancements.

Distance Estimation

 Matplotlib (Optional):For visualizing distance estimations during development and


debugging.

 Operating System:

The system should support major operating systems to ensure cross-platform


compatibility

 Windows 10/11.
 Linux distributions (e.g., Ubuntu, Fedora).
 macOS.
 Browser: The web application must run on modern browsers for a seamless user
experience.

 Google Chrome (Recommended).


 Mozilla Firefox.
 Microsoft Edge.
 Safari (for macOS/iOS).

3.3 Requirement Analysis

Requirement analysis for carbon footprint analysis involves identifying the goals, stakeholders, data
sources, and methodologies needed to measure and analyze an entity's carbon emissions. It includes:

Dept. of CSE 21 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

1. Goal Definition
 Clearly define the purpose of the carbon footprint analysis.
 Objectives could include regulatory compliance, sustainability reporting, internal
optimization, or enhancing corporate social responsibility.
 Align the goals with the organization’s broader environmental and strategic
objectives to ensure relevance and focus.

2. Stakeholder Identification
 Identify and categorize stakeholders, such as:
 Internal: Management, operations, and sustainability teams focused on resource
efficiency.
 External: Investors, regulators, customers, and advocacy groups interested in
transparency and environmental stewardship.
 Understand the unique needs and interests of each stakeholder group to tailor the
analysis outcomes.

3. Data Collection
 Determine the type of data required, including:
 Direct emissions: On-site fuel combustion, vehicle fleet usage, etc.
 Indirect emissions: Electricity consumption, supply chain emissions, business
travel, and purchased goods.
 Identify data sources such as utility bills, fuel receipts, and supplier disclosures.
 Establish data collection protocols to ensure completeness and timeliness.

4. Methodology Selection
 Choose suitable methodologies for calculating the carbon footprint, adhering to
recognized standards like:
 GHG Protocol: Corporate Standard, Scope 1, 2, and 3 guidelines.
 ISO 14064: Standards for greenhouse gas quantification and reporting.
 Define operational boundaries, including:
 Organizational Boundaries: Control-based or equity-share approaches.
 Operational Scope: Direct and indirect emissions across Scopes 1, 2, and 3.
 Select appropriate emission factors and allocation methods for accurate calculations.

Dept. of CSE 22 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

5. Software and Tools


 Evaluate and select software or tools for efficient data management and analysis,
considering:
 Integration capabilities with existing systems (e.g., ERP or energy management
systems).
 Support for real-time monitoring and reporting.
 User-friendliness and customization options.
 Examples of tools:
 GHG Protocol Calculation Tools: For specific sectors and activities.
 Sphera, Simapro, or GaBi: Advanced lifecycle assessment software.

6. Reporting Requirements
 Identify applicable reporting frameworks and standards, such as:
 Regulatory Compliance: Government-mandated disclosures (e.g., EU ETS).
 Voluntary Initiatives: CDP, GRI (Global Reporting Initiative), or TCFD (Task
Force on Climate-related Financial Disclosures).
 Ensure reports are tailored to stakeholder needs, emphasizing transparency and
comparability.

7. Quality Assurance
 Implement robust quality assurance mechanisms, including:
 Verification of data sources for accuracy and relevance.
 Cross-checking calculations with recognized benchmarks or third-party reviews.
 Consistency checks across reporting periods to ensure data integrity.
 Develop clear documentation of methodologies, assumptions, and data sources to
enhance traceability.

8. Continuous Improvement
 Establish a framework for ongoing monitoring and refinement, such as:
 Periodic reviews of methodologies and emission factors to reflect the latest
scientific and industrial practices.
 Benchmarking performance against industry peers to identify opportunities for
improvement.
 Incorporating feedback from stakeholders to enhance future analyses.

Dept. of CSE 23 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Chapter 4

DESIGN AND IMPLEMENTATION


The purpose of the design phase is to plan a solution of the problem specified by the
required documents. This phase is the first step in moving from the problem domain to the
solution domain. In other words, starting with what is needed; design takes us toward how to
satisfy the needs. The design of a system is perhaps the most critical factor affecting the
quality of the software; it has a major impact on the later phases particularly testing and
maintenance.

Architecture Design
The architecture of the Blind Assistance System is designed to be modular, ensuring
seamless integration of various components for real-time performance and user-centric
operation.

Input layer comprising a camera and distance. The camera captures live video streams for
object detection, while the sensors measure the distance of obstacles, providing an accurate
understanding of the user's immediate surroundings.

Processing layer forms the brain of the system, where data from the input layer is processed
using a combination of computer vision and distance measurement algorithms. A pre-trained
object detection model, such as SSD, analyzes the video feed to identify and classify objects
in real time. Simultaneously, the distance measurement module calculates the proximity of
these objects to the user. The data from these two modules are integrated to generate
meaningful insights, such as identifying critical obstacles and prioritizing alerts.

Output layer translates the processed data into actionable information for the user. A text-
to-speech (TTS) system generates real-time voice alerts to inform the user about the objects
detected and their distances. For example, alerts like “Table ahead, 36 inches away,” or
“Person to your right, 60 inches away” provide clear and immediate guidance. These alerts
are delivered through speakers or earphones, ensuring the user is well-informed of their
surroundings.

Dept. of CSE 25 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Workflow
The workflow of the Blind Assistance System is designed to ensure seamless and real-time
operation, enabling visually impaired users to navigate their surroundings safely. The system
follows a series of well-defined steps, as outlined below:

 Data Acquisition via Camera:


The system begins by using a camera to capture live video streams of the user’s
surroundings. This video feed serves as the sole input for detecting objects in real
time.
 Object Detection and Classification:
The captured video is processed by a pre-trained object detection model, such as
SSD. This model analyzes the frames of the video feed to identify and classify
objects in the environment. Examples of detected objects might include tables,
people, bottles, or other obstacles
 Object Localization and Approximation:
Using the relative size and position of objects in the video frames, the system
estimates their proximity to the user. For instance, larger objects appearing in the
foreground are assumed to be closer, while smaller objects in the background are
considered farther away. These approximations provide users with a general sense of
object distances, though without the precision of sensor-based measurements.
 Voice Alert Generation:
A text-to-speech (TTS) system generates real-time audio alerts based on the detected
objects and their approximate locations. Alerts are descriptive and provide actionable
information, such as “Table ahead”. These messages are tailored to inform the user
about the most relevant objects in their path.
 Alert Delivery:
The audio alerts are delivered through a speaker or earphones, ensuring that the user
can understand their surroundings clearly. Alerts are generated dynamically as the
user moves, offering continuous updates.
 Real-Time Feedback Loop:
The system operates continuously, capturing and analyzing new video frames in real
time. This feedback loop ensures that the user is consistently informed of changes in
their environment.

Dept. of CSE 26 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

4.1 Architecture Design

4.1.1 Introduction

The architecture of the Blind Assistance System: Real-Time Object Detection with Distance
and Voice Alerts is designed to empower visually impaired individuals with enhanced
situational awareness. This system leverages computer vision technology to detect objects in
the user’s surroundings, estimate their distances, and provide real-time audio feedback,
enabling safe and independent navigation.

Figure 4.1: Architecture Design of a System

1. Input Layer: The Input Layer serves as the starting point where the system collects data
about the user's surroundings.

 Webcam: The system relies on a standard webcam connected to the computer. This
webcam continuously captures live video streams of the environment. The video
feed includes objects like tables, people, or bottles visible within the camera's field of
view.
 For example: If the user is in a room, the webcam will capture images of nearby
furniture, people, or items in the space. The video feed acts as the raw input data for
the system.

Dept. of CSE 27 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

2. Processing Layer

 This layer handles the heavy lifting by analyzing the video feed and generating
actionable insights.
 Object Detection Module (SSD):
The video frames captured by the webcam are passed to a pre-trained object
detection algorithm such as SSD (Single Shot Detector).
These algorithms process the video feed frame by frame to:
 Detect objects in the video.
 Classify each object into categories such as "table," "person," or "bottle."
 Draw bounding boxes around the detected objects in the video frame,
labeling them with their respective class names.
 Example: If the webcam captures a scene with a table and a person, the
algorithm will output the labels "Table" and "Person" along with their
locations in the video frame.
 Distance Estimation:
 Larger objects occupying more pixels in the foreground are closer to the
camera.
 Smaller objects in the background are farther away.
Using these cues, the system approximates distances (in inches) for each
detected object.
 Example: A person appearing large in the frame might be estimated to be 24
inches away, while a bottle appearing smaller might be 60 inches away.
 Prioritization Module:
 After object detection and distance estimation, the system determines which
objects are most important to notify the user about:
 Objects directly ahead of the user or closer in proximity are given higher
priority.
 Non-critical objects, like those farther away or off to the sides, are de-
emphasized.
 Example: If the system detects a "Person" 24 meters ahead and a "Bottle" 60
meters away, it prioritizes the person for the voice alert.

Dept. of CSE 28 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

3. Output Layer:
The Output Layer communicates the processed information to the user in a clear and
actionable way.
 Text-to-Speech (TTS) System:
The prioritized information from the processing layer is converted into spoken
language using a TTS engine. The system generates real-time voice alerts that
describe the objects and their approximate distances.
Example: The system may generate the following alerts:
 “Person ahead, 24 meters away.”
 “Bottle to your left, 60 meters away.”

4.2 Detailed Design

4.2.1 Use Case Diagram

Use case diagrams are considered for high level requirement analysis of a system. Use case
diagrams are used to gather the requirements of a system including internal and external
influences. Now when the initial task is complete use case diagrams are modelled to present
the outside view.

Figure 4.2: Use Case Diagram of Object Detection with Voice Alerts

Dept. of CSE 29 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

4.2.2 Data Flow Diagram

The Object Detection System with Voice Alert for the Blind is a robust assistive solution
designed to help visually impaired individuals navigate their environment with greater
confidence and independence. This system leverages cutting-edge technologies such as SSD
object detection and real-time video processing to identify objects in the user's surroundings.
The primary objective of the system is to provide immediate, accurate, and practical
feedback through voice alerts, which inform users about the presence and type of objects in
their vicinity. The system architecture is built on Python, integrating OpenCV for video
handling, YOLO for object detection, and a text-to-speech library for audio output.
Together, these components form a seamless real-time assistive tool.

The software begins by initializing its core components. The SSD model, a pre-trained
convolutional neural network optimized for real-time object detection, is loaded with its
configuration and weights. Additionally, class labels are read from a predefined text file,
ensuring the system can recognize and label a wide variety of objects. For optimal
performance, parameters such as the confidence threshold and Non-Maximum Suppression
(NMS) threshold are fine-tuned to balance detection accuracy and computational efficiency.
Once initialized, the system activates the video stream using a camera device. This stream
serves as the input for the object detection process, capturing live frames for analysis.

When the system processes each video frame, it uses SSD deep learning model to identify
objects and their locations within the frame. Detected objects are enclosed in bounding
boxes, each accompanied by a label indicating the object’s name and the confidence score of
the detection. These labels are color-coded to enhance clarity in the visual display. The
bounding boxes, along with the original camera feed, are shown side-by-side in a graphical
interface, allowing sighted users or developers to monitor the system’s performance. This
interface also serves as a debugging tool to refine detection parameters or validate the
system in various environments.

A unique feature of this system is its voice alert functionality, which translates visual
information into auditory feedback for blind users. After detecting objects, the system
generates a voice message announcing each detected object's name. This is achieved through
text-to-speech synthesis, ensuring that alerts are not only accurate but also clear and easily
understandable. For example, when a "chair" is detected, the system announces “Chair

Dept. of CSE 30 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

detected,” allowing the user to comprehend the environment without relying on visual cues.
This feature is designed to work continuously, ensuring users receive updated information as
they move through their surroundings.

The system also incorporates user interaction features to enhance its usability. By pressing a
designated key, the user can capture images of the current view, which are stored as
reference images. These images can later be used for various purposes, such as training
customized models or maintaining a log of detected environments. Additionally, the system
allows for graceful termination, enabling users to exit the application with a specific
command while ensuring all resources, such as camera devices and OpenCV windows, are
released properly. This thoughtful design ensures the system is not only functional but also
user-friendly and safe to operate.

Figure 4.3: Data Flow Diagram of Object Detection with Voice Alerts

Dept. of CSE 31 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

The Object Detection System with Voice Alert for the Blind represents a meaningful
application of artificial intelligence and computer vision in assistive technology. By
combining real-time object detection with an intuitive voice alert system, it bridges the gap
between technology and accessibility. This project not only highlights the potential of AI in
improving quality of life but also serves as a foundation for future advancements in assistive
tools for visually impaired individuals. Whether used in a controlled indoor environment or
adapted for outdoor navigation, this system demonstrates the power of innovation in
addressing real-world challenges.

4.2.3 Sequence diagram

Figure 4.4: Sequence Diagram of Object Detection with Voice Alerts

Dept. of CSE 32 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

The Object Detection System with Voice Alert for the Blind utilizes the SSD (Single Shot
MultiBox Detector) model to identify objects in real-time video captured by the laptop’s
camera. When the user starts the application, the video feed is continuously processed by the
SSD model, which efficiently detects multiple objects within a single frame. SSD works by
predicting object boundaries and classifying the objects in real-time with a high degree of
accuracy, making it suitable for quick detection in dynamic environments. This allows the
system to promptly identify obstacles and other objects that may be in the user’s path.
Once objects are detected, the system sends the relevant information, such as the type and
location of the object, to the Voice Alert Module. The Voice Alert Module then converts this
information into audio cues, such as “Obstacle ahead with distance” and plays these alerts
through a speaker or headset. This continuous interaction between the SSD model, object
detection, and voice alerts ensures that the user receives timely feedback about their
environment, improving their ability to navigate safely and independently. The system
operates in real-time, providing essential assistance for the visually impaired.

4.3 Implementation

4.3.1 Packages Used

OpenCV (cv2): OpenCV is an open-source computer vision library that provides functions
for real-time computer vision tasks such as image processing, object detection, and video
analysis. In this project, OpenCV is used to read video frames from the camera, perform
object detection, draw bounding boxes around detected objects, and display the processed
frames. It also helps in capturing real-time camera feed and managing video streams.
The following commands are used to install OpenCV:
“pip install opencv-python”

“pip install opencv-python-headless”


SSD MobileNet (Single Shot Multibox Detector):
SSD MobileNet is a deep learning-based object detection framework optimized for real-time
applications. In this project, the SSD MobileNet model is employed to detect objects in the
camera feed, outputting bounding boxes, class labels, and confidence scores.
Files Required:
Pre-trained model: frozen_inference_graph.pb
Configuration file: ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt

Dept. of CSE 33 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

NumPy: NumPy is a fundamental package for scientific computing in Python, primarily


used for working with arrays and matrices. It is utilized in this project to handle numerical
data, such as pixel values in an image, bounding box coordinates, and distance calculations.
It also helps in manipulating large data structures efficiently.
The following command are used to install NumPy:
“pip install numpy”

Pyttsx3: Pyttsx3 is a Python library for text-to-speech conversion. It is used to convert the
text information (such as detected object names and distances) into audible speech, which is
a crucial feature of this project for providing voice alerts to the user (the blind person).
Pyttsx3 works offline, making it suitable for real-time applications.
The following commands are used to install Pyttsx3:
“pip install pyttsx3”
Pandas: Pandas is object detection systems to handle, process, or manipulate the data
collected from detections. For example, Pandas used to store and manage a list of detected
objects or model outputs.
The following command are used to install Pandas:
“pip install pandas”
Imutils: Imutils is a package that provides convenience functions to handle basic image
processing tasks such as resizing, rotating, and translating images. It simplifies tasks that are
often required when working with computer vision systems like resizing frames or bounding
boxes.
The following command are used to install Imutil:
“pip install imutils”
Math: The math module in Python is used for performing basic mathematical operations. In
this project, it might be used for tasks like calculating the Euclidean distance from the
camera to the detected object based on the size of the bounding box or for any geometry
calculations related to the detected objects.
“No installation required”
OS: The OS module is used for interacting with the operating system. In this project, it
could be used for enabling file handling tasks such as loading model weights, reading the
coco.names file, or managing system paths during execution.
“No installation required”
Time: The time module helps in tracking the time during processing tasks, such as
measuring frame processing time to calculate FPS (frames per second). It can also be used

Dept. of CSE 34 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

for introducing delays when needed, like controlling the rate of voice alerts.
“No installation required”
DataSet: The dataset forms the backbone of any object detection system. The project uses a
dataset with annotated images to train the SSD MobileNet model. The pre-trained model
uses the COCO dataset, covering 80 common object categories. The file coco.names
specifies these categories (e.g., "person," "bottle," "car").

Pre-trained weights provided with the model (SSD.weights) suggest that it has been trained
on a large-scale dataset like COCO (Common Objects in Context), which consists of 80
common object categories. Additionally, a file named classes.txt in the project specifies
these categories, such as "person," "bottle," or "vehicle." If custom datasets were utilized,
they would include objects relevant to visually impaired navigation, such as "stairs" or
"doorways," annotated with tools like Label Img. This enables the model to predict objects
that enhance safety and navigation for the visually impaired.

4.3.2 SSD Model Explanation

SSD (Single Shot Multibox Detector) is a state-of-the-art deep learning-based object


detection framework designed for speed and efficiency, making it ideal for real-time
applications. It performs object detection in a single pass, extracting both object localization
and classification simultaneously. SSD employs feature maps at multiple scales, allowing it
to detect objects of varying sizes effectively. Combined with the lightweight and efficient
MobileNet backbone, SSD MobileNet delivers high-speed performance while maintaining
accuracy, even on resource-constrained devices. In this project, the SSD MobileNet model,
pre-trained on the COCO dataset, is used to identify objects in real-time, outputting
bounding boxes, class labels, and confidence scores for each detected object.

Dept. of CSE 35 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Figure 4.5: SSD Architecture of Object Detection with Voice Alerts

4.3.3 Methodology

 WEB CAMERA
 Live video from the surroundings and breaks it into frames.
 This is where the system starts gathering the data to analyze.
 The Capture() function processes the video for further steps.
 The frames are sent to the object detection system.
 OBJECT DETECTION
 Identifies objects in the video using the SSD model.
 The detect() function recognizes objects in each frame.
 Allocates model weights to make detection more accurate.
 Sends detected objects for classification and tracking.
 FIND OBJECT CLASS
 Categorizes the detected objects into types like a bottle, chair, or person.
 Assigns names to the objects using the find object() function.
 Helps the system understand what each detected object is.
 Makes the alerts more meaningful to the user.
 FIND OBJECT VALUES
 Locates where the objects are in the video frame using coordinates.
 Tracks the movement of objects by calculating X-Y positions.
 Provides position data to understand the relevance of each object.
 Prepares information for the next step of the process.
 COMPARE LOCATION VALUE:
 Matches the object type with its location in the frame.

Dept. of CSE 36 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

 Checks if the object is close enough to be relevant to the user.


 Filters out unnecessary objects or those far away.
 Sends important data to the speech conversion step.
 CONVERT TEXT TO SPEECH:
 Converts the object’s name and location into voice alerts.
 Uses text-to-speech technology to provide audio output.
 Generates real-time spoken instructions or saves them as MP3.
 Assists users, especially visually impaired individuals,in understanding
their surroundings.

Figure 4.6: Methodology of Object Detection with Voice Alerts

4.3.4 Distance and Obstacle Estimation

A critical feature of the system is calculating the distance between the user and detected
objects. This is implemented using:

 The size of bounding boxes in the video feed (larger boxes usually indicate closer
objects).
 Camera calibration techniques to map pixel dimensions to real-world distances.
 Scripts like DistanceEstimation.py to perform calculations and ensure accurate
spatial information for navigation assistance.

Dept. of CSE 37 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Chapter 5
TESTING

Software testing can be stated as the process of verifying and validating whether a software
or application is bug-free, meets the technical requirements as guided by its design and
development, and meets the user requirements effectively and efficiently by handling all the
exceptional and boundary cases. The process checks whether the actual software matches
the expected requirements and ensures the software is bug-free. The purpose of software
testing is to identify the errors, faults, or missing requirements in contrast to actual
requirements. It mainly aims at measuring the specification, functionality, and performance
of a software program or application.

5.1 Purpose of Testing

Testing accomplishes a variety of things, but most importantly it measures the quality of the
software we are developing. This view presupposes there are defects in the software waiting
to be discovered and this view is rarely disproved or even disputed. Several factors
contribute to the importance of making testing a high priority of any software development
effort. These include:
 Reducing the cost of developing the program.
 Ensuring that the application behaves exactly as we explain to the user for the vast
majority of programs, unpredictability is the least desirable consequences of using an
application.
 Reducing the total cost of ownership. By providing software that looks and behaves
as shown in the documentation, the customers require fewer hours of training and
less support from product experts.
 Developing customer loyalty and word-of-mouth market share.

Dept. of CSE 38 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

5.2 Test Cases


5.1 Test cases for Each Module

Test Case Test Case Given Expected Stat


us
Number Description Input Output

01. Object Detection A Frame Containing Bounding box drawn


on a Single Object one object(example: around the object, Pass
a person) object label matches
predicted class

Object Detection with A frame containing Bounding boxes for


02. Multiple Objects several objects (e.g., a each object, object
person and a bicycle) labels confidence Pass
scores, and voice alert
for each object.

Object Detection in a A frame with several Bounding boxes drawn


Complex Scene objects in a cluttered for all objects, accurate
03. environment(e.g.perso class labels for each
-n, bicycle, car) object, corresponding Pass
confidence scores
displayed.

A frame with an System estimates


Correct Distance
Estimation object at a known correct distance from an Pass
04. distance (e.g., person object.
1.27 meters away)

05. Distance Calculation A frame containing System calculates and


Pass
for Multiple Objects multiple objects at announces distance for
different distances each object separately

Dept. of CSE 39 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Voice Alert for A frame with one object System generates a


Detected Object (e.g., a person) clears,
06. understandable voice Pass
alert

System generates a
07. voice alert for each
Voice Alert for A frame containing
object with its Pass
Multiple Detected multiple objects (e.g.,
measurement
Objects person and bicycle)

5.3 Different Types of Testing


5.3.1 Unit Testing
Unit testing focuses verification on the smallest unit of software design, the software
component or module. Using the component level design description as a guide, important
control paths are tested to uncover errors within the boundary of the module. The unit
testing is a white box-oriented testing. First of all, the module interface is tested to ensure
that the information properly flows into and out of the program until under test. Then the
local data structure is tested to ensure the data stored temporarily maintains its integrity
during all steps in an execution. Boundary conditions are tested to ensure that the module
operates properly at boundaries established to limit or restrict processing. All independent
paths through the control structure are exercised to ensure that all statements in a module
have been executed at least once. And finally, all errors handling paths are tested. In this
project the testing is done according to bottom-up approach. Starting with smallest and
lowest level modules and processing one at a time. For each module a driver and
corresponding stubs were also written. If any errors found they were corrected immediately
and the unit was tested again.

5.3.2 Integration Testing

Integration testing is a logical extension of unit testing. In its simplest form, two units that
have already been tested are combined into a component and the interface between them is
tested. A component, in this sense, refers to an integrated aggregate of more than one unit.

Dept. of CSE 40 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

The idea is to test combinations of pieces and eventually expand the process to test your
modules with those of other groups. Eventually all the modules making up a process are
tested together. Any errors discovered when combining units are likely related to the
interface between units. This method reduces the number of possibilities to a far simpler
level of analysis. In this software, the bottom-up integration testing approached has been
used, starting with the smallest and lowest level modules and proceeding one at a time. For
each module the tests were conducted and the results were noted down.

5.3.3 System Testing


System Testing is a level of testing that validates the complete and fully integrated software
product. The purpose of a system test is to evaluate the end-to-end system specifications.
Usually, the software is only one element of a larger computer-based system. Ultimately, the
software is interfaced with other software/hardware systems. System Testing is defined as a
series of different tests whose sole purpose is to exercise the full computer-based system.

5.3.4 Acceptance Testing


It is formal testing according to user needs, requirements, and business processes conducted
to determine whether a system satisfies the acceptance criteria or not and to enable the users,
customers, or other authorized entities to determine whether to accept the system or not.
Acceptance Testing is the software testing performed after System Testing and before
making the system available for actual use.

Figure 5.2: Types of Testing

Dept. of CSE 41 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

RESULTS AND CONCLUSION


Snapshots

Snapshot 1: Cell phone detected

Snapshot 2: Water bottle detected

Dept. of CSE 42 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Snapshot 3: Person detected

Snapshot 4: Laptop detected

Dept. of CSE 43 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Snapshot 5: Umbrella detected

Snapshot 6: TV detected

Dept. of CSE 44 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Snapshot 7: Book Detected

Snapshot 8: Remote Detected

Dept. of CSE 45 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Snapshot 9: Keyboard Detected

Snapshot 10: Car Detected

Dept. of CSE 46 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

Snapshot 11: Chair Detected

Dept. of CSE 47 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

CONCLUSION

This project successfully addresses the need for enhanced mobility and safety for visually
impaired individuals. By integrating real-time object detection with voice alerts, the system
provides accurate and timely assistance, enabling users to navigate their surroundings with
greater confidence and independence. The project demonstrates the potential of combining
computer vision and auditory feedback technologies to improve accessibility, setting a
foundation for further advancements in assistive devices. Future enhancements could focus
on improving detection accuracy, expanding object recognition capabilities, and ensuring
scalability for practical deployment.

The project showcases the practical application of artificial intelligence, computer vision,
and audio processing in solving real-world challenges, demonstrating its potential for
broader implementation in assistive technology. Throughout the development process,
considerations were made to optimize accuracy, processing speed, and user-friendliness,
ensuring the system meets the needs of its target audience. This project not only addresses a
critical societal need but also serves as a foundation for continued innovation in assistive
technologies, reaffirming the importance of inclusivity and accessibility in modern
advancements.

Dept. of CSE 48 GSSSIETW, Mysuru


Blind Assistance System: Real Time Object Detection with Distance and Voice Alerts

FUTURE SCOPE
The system can also be enhanced to distinguish between static and moving objects, providing
more precise warnings about immediate dangers, such as approaching vehicles or pedestrians.

Integrating multi-language support and customizable voice alerts is another potential upgrade
to cater to diverse users worldwide. This could include options for varying alert tones,
personalized object prioritization, and region-specific language packs.

The system can evolve into a compact, wearable device such as smart glasses or belt-mounted
systems, making it more user-friendly and less obtrusive.

Integration with GPS and navigation systems to guide users to their destinations while
avoiding obstacles dynamically.

Dept. of CSE 49 GSSSIETW, Mysuru


REFERENCES
[1]. UGC Care Group I Journal Vol-08 BLIND ASSISTANCE IN OBJECT DETECTION
AND GENERATING VOICE ALERTS in 2023.
[2]. Blind Assistance System using Digital Image Processing 2023 International Conference
on Network, Multimedia and Information Technology (NMITCON).
[3]. 2023 International Conference on Network, Multimedia and Information Technology
(NMITCON) “Blind Assistance System using Image Processing”.
[4]. Miss Rajeshavree Ravindra Karmarkar, Prof. V.N. Honmane “Object Detection System
For The Blind With Voice guidance”- Published Online June 2022 in Ijeast .
[5]. Jigar Parmar, Vishal Pawar, Babul Rai, Prof. Siddhesh Khanvilkar “Voice Enable Blind
Assistance System -Real time Object Detection”- IRJET, Apr 2022
[6]. N. V. N. Vaishnavi, Tummala Navya, Velagapudi Srilekha, Vinnakota Karthik, D. Leela
Dharani “Blind Assistance in Object Detection and Generating Voice Alerts”, in 2023
[7]. International Journal for Research in Applied Science & Engineering Technology “Object
Detection System with Voice Alert for Blind” in Feb 2023
[8]. International Advanced Research Journal in Science, Engineering and Technology “Blind
Assist System using AI And Image Processing” in 2023
[9]. Prof. Ashwini Phalke “Object And Distance Detection System for Visually Impaired
People” published in the year April 2022.
[10].https://www.researchgate.net/publication/374229722_BLIND_ASSISTANCE_SYSTEM_
OBSTACLE_DETECTION_WITH_DISTANCE_AND_VOICE_ALERTS
[11].OpenCV official Documentation: https://www.quora.com/Will-learning-OpenCV-and-
computer-vision-be-a-benefit-for-an-embedded-systems-engineering-student
[12].https://www.researchgate.net/publication/339884124_Survey_of_Solid_State_Drives_Char
acteristics_Technology_and_Applications
[13]. Numpy Documentation https://numpydoc.readthedocs.io/en/latest/format.html

50

You might also like