Voice-Activated OCR Book Reader:-
Abstract
This project implements a voice-activated OCR-based book reading system using
EasyOCR, OpenCV, and speech processing libraries in Python. The system captures
images of book pages using a webcam, extracts and saves the text using OCR, and
reads it aloud based on voice commands. It is particularly helpful for visually impaired
users or those with reading difficulties. The application supports multiple languages
(English and Hindi for OCR) and responds to basic voice commands for interaction.
1. Introduction
Reading printed books can be challenging for people with visual impairments or
dyslexia. Although screen readers exist for digital content, access to printed material
remains limited. This project addresses this gap by building an offline, voice-
controlled book reader that captures images of printed pages using a webcam,
performs OCR, stores the extracted text, and reads it aloud when instructed via speech
commands.
2. Motivation and Objective
2.1. Why this project?
• Accessibility: Empower visually impaired users to read physical books without
assistance.
• Language support: Enable OCR and text-to-speech in multiple languages.
• Offline processing: Avoid reliance on continuous internet access, using local
OCR and TTS libraries.
• Usability: Use natural language voice commands for hands-free operation.
2.2. Objectives
• Enable capturing images from a live webcam feed.
• Extract text using EasyOCR in English and Hindi.
• Store extracted text for multi-page reading.
• Use voice commands to control reading and navigation.
• Provide spoken feedback using text-to-speech.
3. System Architecture
The system consists of four main Python modules:
3.1. easyocr_main.py
Acts as the main controller. Responsibilities include:
• Managing file paths and setup.
• Listening for voice commands.
• Triggering capture and OCR.
• Saving and reading stored text files.
• Handling termination using external trigger files.
3.2. ocr.py
• Uses EasyOCR to extract text from image frames.
• Processes frames (grayscale and resized) to improve OCR accuracy.
• Returns extracted text for storage and reading.
3.3. speech.py
• Provides voice interaction.
• speak_text() uses pyttsx3 to convert text to speech.
• listen_for_command() uses speech_recognition to detect voice input from the
user and convert it to text.
3.4. translator.py
• Contains translate_text() using deep_translator to translate extracted text to
another language.
• Although not yet integrated into the main loop, it opens doors for multilingual
support in future updates.
4. System Workflow
1. The system initializes the webcam and enters a loop.
2. It waits for the user to say "capture" (which triggers a file creation externally).
3. When capture_trigger.txt is detected:
o A frame is captured from the webcam.
o Text is extracted using OCR.
o Text is saved in pages/pageN.txt.
o Feedback is given via TTS.
4. The user can
then say "read page X"
to hear that page’s
content.
5. If stop_ocr.txt is
detected, the program
exits.
5. How to Use
1. Setup: Place all files in the same project directory structure.
2. Run: Execute easyocr_main.py.
3. Capture: Create a file named capture_trigger.txt in the same folder to trigger
page capture.
4. Read: Use voice command like “Read page 2” to read saved text.
5. Stop: Create a file named stop_ocr.txt to stop the application.
Tip: Use external automation or scripts to create trigger files (capture_trigger.txt and
stop_ocr.txt) when a voice command "capture" or "stop" is heard.
6. Challenges
Faced
6.1. Multipage
Management
•Managing
multiple page files
dynamically
required systematic file naming and page tracking (page1.txt, page2.txt, etc.).
• Implemented checks for missing or empty pages before reading aloud.
6.2. Voice Command Accuracy
• Inconsistent recognition due to ambient noise or accents.
• Added exception handling for UnknownValueError and TimeoutError.
6.3. OCR Accuracy
• Text extraction failed in poor lighting or low-quality images.
• Applied grayscale conversion and image resizing to enhance input for EasyOCR.
6.4. Language Support
• OCR supported English and Hindi, but TTS was limited to English in current
implementation.
• Translation module added for future integration of multilingual reading.
7. Conclusion
This project
successfully integrates
voice and vision
technologies to build
an interactive,
accessible, and
modular book reading
system. It showcases
how OCR and speech
processing can work
together for accessibility and educational tools. Future improvements could include:
• Continuous voice command detection (e.g., "capture" spoken directly, not via
trigger file).
• GUI for easier operation.
• Integration of real-time translation into reading flow.
8. Future Work:
• Continuous voice activation (removing trigger files).
• Text summarization before reading long pages.
• Multilingual TTS support using tools like gTTS or Azure TTS.
• Cloud synchronization of captured pages.
• Error feedback with more descriptive prompts.
Appendix: Requirements
Python Packages:
pip install easyocr opencv-python pyttsx3 SpeechRecognition deep-translator
Hardware:
• Webcam
• Microphone