0% found this document useful (0 votes)

6 views5 pages

Voicecommandocr

The document outlines a project that develops a voice-activated OCR-based book reader using Python, aimed at assisting visually impaired users. It captures images of book pages, extracts text using EasyOCR, and reads it aloud through voice commands, supporting multiple languages. The system is designed for offline use and includes modules for managing OCR, speech interaction, and potential future multilingual support.

Uploaded by

sharma892078

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

Voicecommandocr

Uploaded by

sharma892078

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Voice-Activated OCR Book Reader:-

Abstract

This project implements a voice-activated OCR-based book reading system using

EasyOCR, OpenCV, and speech processing libraries in Python. The system captures
images of book pages using a webcam, extracts and saves the text using OCR, and
reads it aloud based on voice commands. It is particularly helpful for visually impaired
users or those with reading difficulties. The application supports multiple languages
(English and Hindi for OCR) and responds to basic voice commands for interaction.

1. Introduction

Reading printed books can be challenging for people with visual impairments or
dyslexia. Although screen readers exist for digital content, access to printed material
remains limited. This project addresses this gap by building an offline, voice-
controlled book reader that captures images of printed pages using a webcam,
performs OCR, stores the extracted text, and reads it aloud when instructed via speech
commands.

2. Motivation and Objective

2.1. Why this project?

• Accessibility: Empower visually impaired users to read physical books without

assistance.

• Language support: Enable OCR and text-to-speech in multiple languages.

• Offline processing: Avoid reliance on continuous internet access, using local

OCR and TTS libraries.

• Usability: Use natural language voice commands for hands-free operation.

2.2. Objectives

• Enable capturing images from a live webcam feed.

• Extract text using EasyOCR in English and Hindi.

• Store extracted text for multi-page reading.

• Use voice commands to control reading and navigation.

• Provide spoken feedback using text-to-speech.

3. System Architecture

The system consists of four main Python modules:

3.1. easyocr_main.py

Acts as the main controller. Responsibilities include:

• Managing file paths and setup.

• Listening for voice commands.

• Triggering capture and OCR.

• Saving and reading stored text files.

• Handling termination using external trigger files.

3.2. ocr.py

• Uses EasyOCR to extract text from image frames.

• Processes frames (grayscale and resized) to improve OCR accuracy.

• Returns extracted text for storage and reading.

3.3. speech.py

• Provides voice interaction.

• speak_text() uses pyttsx3 to convert text to speech.

• listen_for_command() uses speech_recognition to detect voice input from the

user and convert it to text.

3.4. translator.py

• Contains translate_text() using deep_translator to translate extracted text to

another language.

• Although not yet integrated into the main loop, it opens doors for multilingual
support in future updates.

4. System Workflow

1. The system initializes the webcam and enters a loop.

2. It waits for the user to say "capture" (which triggers a file creation externally).

3. When capture_trigger.txt is detected:

o A frame is captured from the webcam.

o Text is extracted using OCR.

o Text is saved in pages/pageN.txt.

o Feedback is given via TTS.

4. The user can

then say "read page X"
to hear that page’s
content.

5. If stop_ocr.txt is
detected, the program
exits.

5. How to Use

1. Setup: Place all files in the same project directory structure.

2. Run: Execute easyocr_main.py.

3. Capture: Create a file named capture_trigger.txt in the same folder to trigger

page capture.

4. Read: Use voice command like “Read page 2” to read saved text.

5. Stop: Create a file named stop_ocr.txt to stop the application.

Tip: Use external automation or scripts to create trigger files (capture_trigger.txt and
stop_ocr.txt) when a voice command "capture" or "stop" is heard.

6. Challenges
Faced

6.1. Multipage
Management

•Managing
multiple page files
dynamically
required systematic file naming and page tracking (page1.txt, page2.txt, etc.).

• Implemented checks for missing or empty pages before reading aloud.

6.2. Voice Command Accuracy

• Inconsistent recognition due to ambient noise or accents.

• Added exception handling for UnknownValueError and TimeoutError.

6.3. OCR Accuracy

• Text extraction failed in poor lighting or low-quality images.

• Applied grayscale conversion and image resizing to enhance input for EasyOCR.

6.4. Language Support

• OCR supported English and Hindi, but TTS was limited to English in current
implementation.

• Translation module added for future integration of multilingual reading.

7. Conclusion

This project
successfully integrates
voice and vision
technologies to build
an interactive,
accessible, and
modular book reading
system. It showcases
how OCR and speech
processing can work
together for accessibility and educational tools. Future improvements could include:

• Continuous voice command detection (e.g., "capture" spoken directly, not via
trigger file).

• GUI for easier operation.

• Integration of real-time translation into reading flow.

8. Future Work:
• Continuous voice activation (removing trigger files).

• Text summarization before reading long pages.

• Multilingual TTS support using tools like gTTS or Azure TTS.

• Cloud synchronization of captured pages.

• Error feedback with more descriptive prompts.

Appendix: Requirements

Python Packages:

pip install easyocr opencv-python pyttsx3 SpeechRecognition deep-translator

Hardware:

• Webcam

• Microphone

Survey Paper Image Reader For Blind Pers
No ratings yet
Survey Paper Image Reader For Blind Pers
3 pages
EasyOCR: Multilingual Text Recognition
No ratings yet
EasyOCR: Multilingual Text Recognition
11 pages
AI Summary
No ratings yet
AI Summary
3 pages
Multilingual Text Recognition System
No ratings yet
Multilingual Text Recognition System
21 pages
Project Final Report
No ratings yet
Project Final Report
35 pages
Ocr 2
No ratings yet
Ocr 2
42 pages
Cam2Pdf .
No ratings yet
Cam2Pdf .
6 pages
Project Overview
No ratings yet
Project Overview
15 pages
Raspberry Pi Based Reader For Blind People
No ratings yet
Raspberry Pi Based Reader For Blind People
4 pages
Report
No ratings yet
Report
6 pages
Sign Board Reader for Visually Impaired
No ratings yet
Sign Board Reader for Visually Impaired
22 pages
Presentation 4
No ratings yet
Presentation 4
17 pages
Optical Character Recognizer: Team Member
No ratings yet
Optical Character Recognizer: Team Member
7 pages
MP Final Report
No ratings yet
MP Final Report
38 pages
A Smart Reader For Visually Impaired People Using Raspberry PI
No ratings yet
A Smart Reader For Visually Impaired People Using Raspberry PI
5 pages
Wa0015.
No ratings yet
Wa0015.
10 pages
Multilingual Ocr System
No ratings yet
Multilingual Ocr System
3 pages
Smart Reader For Blind People
No ratings yet
Smart Reader For Blind People
3 pages
Ai Powered Ocr For Efficient Government Documentation
No ratings yet
Ai Powered Ocr For Efficient Government Documentation
49 pages
Documentation Final
No ratings yet
Documentation Final
73 pages
Multilingual OCR Software Report
No ratings yet
Multilingual OCR Software Report
85 pages
Synopsis Sample
No ratings yet
Synopsis Sample
7 pages
Batch66 First Review
No ratings yet
Batch66 First Review
9 pages
Text-to-Speech for Accessibility
No ratings yet
Text-to-Speech for Accessibility
2 pages
Math El
No ratings yet
Math El
17 pages
Usha Mittal Institute of Technology S.N.D.T. Women's University Electronics & Communication Engg. / Electronics Engg. Department
No ratings yet
Usha Mittal Institute of Technology S.N.D.T. Women's University Electronics & Communication Engg. / Electronics Engg. Department
1 page
Speech Recognition System Using Python Report
No ratings yet
Speech Recognition System Using Python Report
7 pages
Major Project SEE Progress Report
No ratings yet
Major Project SEE Progress Report
35 pages
IP MINI GD (Ver02) FINAL DG
No ratings yet
IP MINI GD (Ver02) FINAL DG
18 pages
Text & Face Detection Aid for Visually Impaired
No ratings yet
Text & Face Detection Aid for Visually Impaired
62 pages
Last Edited
No ratings yet
Last Edited
8 pages
OCR Translation App for Tourists
No ratings yet
OCR Translation App for Tourists
8 pages
Python A Versatile Tool For Automation and Insights
No ratings yet
Python A Versatile Tool For Automation and Insights
10 pages
OCR Speech Synthesis for EEE Students
0% (1)
OCR Speech Synthesis for EEE Students
17 pages
Text Recognition and Speech Synthesis
50% (2)
Text Recognition and Speech Synthesis
26 pages
Bilingual OCR Report
No ratings yet
Bilingual OCR Report
10 pages
Automatic Book Reader: Submitted By
No ratings yet
Automatic Book Reader: Submitted By
27 pages
Open Source Computer Vision
No ratings yet
Open Source Computer Vision
79 pages
Voice Assisted Text Reading System For Visually Impaired Persons
No ratings yet
Voice Assisted Text Reading System For Visually Impaired Persons
6 pages
Multilingual Translator
No ratings yet
Multilingual Translator
16 pages
Text To Speech Conversion
No ratings yet
Text To Speech Conversion
4 pages
Document Reader For Visually Imapired: Prof. Deepti Chandran
No ratings yet
Document Reader For Visually Imapired: Prof. Deepti Chandran
26 pages
Ocr&Promptengineering
No ratings yet
Ocr&Promptengineering
6 pages
Visual Writing with OpenCV
No ratings yet
Visual Writing with OpenCV
23 pages
Visually Disabled
No ratings yet
Visually Disabled
7 pages
SYnopsis
No ratings yet
SYnopsis
5 pages
Optical Character Recognition Project Report
No ratings yet
Optical Character Recognition Project Report
71 pages
Final Year Project: Embedded Based Reading and Speaking Support System For Blind and Mute
No ratings yet
Final Year Project: Embedded Based Reading and Speaking Support System For Blind and Mute
15 pages
Smart AI Cane Project
No ratings yet
Smart AI Cane Project
9 pages
Ocr PPT GRP 12
No ratings yet
Ocr PPT GRP 12
10 pages
IT ProjectManagement
No ratings yet
IT ProjectManagement
13 pages
3 M&a
No ratings yet
3 M&a
24 pages
7.optical Character
No ratings yet
7.optical Character
7 pages
Raspberry Pi Based Smart Reader For Visually Impaired People
50% (2)
Raspberry Pi Based Smart Reader For Visually Impaired People
12 pages
OCR for Ancient Tamil Documents
No ratings yet
OCR for Ancient Tamil Documents
18 pages
Best OCR Toolsets Compared: A Study
No ratings yet
Best OCR Toolsets Compared: A Study
12 pages
Speech To Text
No ratings yet
Speech To Text
17 pages
Raspberry Pi Based Voice-Operated Personal Assistant (Neobot)
No ratings yet
Raspberry Pi Based Voice-Operated Personal Assistant (Neobot)
5 pages
PDL-III Report FINAL
No ratings yet
PDL-III Report FINAL
34 pages
ICSE 2013-14 Practical Programming Tasks
No ratings yet
ICSE 2013-14 Practical Programming Tasks
6 pages
Presentation - RAG-based Chatbot For Document Intelligence
No ratings yet
Presentation - RAG-based Chatbot For Document Intelligence
7 pages
1.2.4 Types of Programming Language
No ratings yet
1.2.4 Types of Programming Language
12 pages
Profile Core Banking Brochure
No ratings yet
Profile Core Banking Brochure
7 pages
Iti 1120 Outline
No ratings yet
Iti 1120 Outline
2 pages
Manual Testing
No ratings yet
Manual Testing
69 pages
Guide To The Application Event System (5675)
No ratings yet
Guide To The Application Event System (5675)
244 pages
Lab Manual - CP
100% (1)
Lab Manual - CP
101 pages
Test Case Report Template
No ratings yet
Test Case Report Template
2 pages
Branch and Bug Management
No ratings yet
Branch and Bug Management
15 pages
Shell Scripting
No ratings yet
Shell Scripting
33 pages
Competency-Based C Programming Module
50% (2)
Competency-Based C Programming Module
71 pages
TMSFNCKanban Board Dev Guide
No ratings yet
TMSFNCKanban Board Dev Guide
29 pages
Core Java 8
No ratings yet
Core Java 8
5 pages
Software Engineer Candidate Profile
No ratings yet
Software Engineer Candidate Profile
2 pages
Tenable Enclave Security Container Security
No ratings yet
Tenable Enclave Security Container Security
29 pages
Chapter 1 Introduction To Systems Analysis and Design Compress
No ratings yet
Chapter 1 Introduction To Systems Analysis and Design Compress
49 pages
Integration Hackathon - Details
No ratings yet
Integration Hackathon - Details
11 pages
Python Fundamentals for Class 11 & 12
No ratings yet
Python Fundamentals for Class 11 & 12
77 pages
Ooad Question Bank
No ratings yet
Ooad Question Bank
5 pages
2303res149 - Shashi Shankar Jha
No ratings yet
2303res149 - Shashi Shankar Jha
10 pages
Guideline - How To Create A Submodel Template
100% (1)
Guideline - How To Create A Submodel Template
35 pages
ISight User Guide 1
No ratings yet
ISight User Guide 1
619 pages
Memory Management for CS Students
No ratings yet
Memory Management for CS Students
59 pages
Unit 05 - SRE
No ratings yet
Unit 05 - SRE
15 pages
Simocrane Cms System Manual En-Us
No ratings yet
Simocrane Cms System Manual En-Us
438 pages
LS-5 & LS-8 Worksheet Grade-6
No ratings yet
LS-5 & LS-8 Worksheet Grade-6
6 pages
Technical Core Competence (NT/MS SQL Server)
No ratings yet
Technical Core Competence (NT/MS SQL Server)
34 pages
Tien Ba Dinh: CV and Achievements
No ratings yet
Tien Ba Dinh: CV and Achievements
5 pages
WINCC System Overview
No ratings yet
WINCC System Overview
104 pages

Voicecommandocr

Uploaded by

Voicecommandocr

Uploaded by

Voice-Activated OCR Book Reader:-

This project implements a voice-activated OCR-based book reading system using

2. Motivation and Objective

2.1. Why this project?

• Accessibility: Empower visually impaired users to read physical books without

• Language support: Enable OCR and text-to-speech in multiple languages.

• Offline processing: Avoid reliance on continuous internet access, using local

• Usability: Use natural language voice commands for hands-free operation.

• Enable capturing images from a live webcam feed.

• Extract text using EasyOCR in English and Hindi.

• Store extracted text for multi-page reading.

• Use voice commands to control reading and navigation.

• Provide spoken feedback using text-to-speech.

The system consists of four main Python modules:

Acts as the main controller. Responsibilities include:

• Managing file paths and setup.

• Listening for voice commands.

• Triggering capture and OCR.

• Saving and reading stored text files.

• Handling termination using external trigger files.

• Uses EasyOCR to extract text from image frames.

• Processes frames (grayscale and resized) to improve OCR accuracy.

• Returns extracted text for storage and reading.

• Provides voice interaction.

• speak_text() uses pyttsx3 to convert text to speech.

• listen_for_command() uses speech_recognition to detect voice input from the

• Contains translate_text() using deep_translator to translate extracted text to

1. The system initializes the webcam and enters a loop.

3. When capture_trigger.txt is detected:

o A frame is captured from the webcam.

o Text is saved in pages/pageN.txt.

o Feedback is given via TTS.

4. The user can

1. Setup: Place all files in the same project directory structure.

2. Run: Execute easyocr_main.py.

3. Capture: Create a file named capture_trigger.txt in the same folder to trigger

5. Stop: Create a file named stop_ocr.txt to stop the application.

• Implemented checks for missing or empty pages before reading aloud.

6.2. Voice Command Accuracy

• Inconsistent recognition due to ambient noise or accents.

• Added exception handling for UnknownValueError and TimeoutError.

6.3. OCR Accuracy

• Text extraction failed in poor lighting or low-quality images.

6.4. Language Support

• Translation module added for future integration of multilingual reading.

• GUI for easier operation.

• Integration of real-time translation into reading flow.

• Text summarization before reading long pages.

• Multilingual TTS support using tools like gTTS or Azure TTS.

• Cloud synchronization of captured pages.

• Error feedback with more descriptive prompts.

pip install easyocr opencv-python pyttsx3 SpeechRecognition deep-translator

You might also like