0% found this document useful (0 votes)
6 views5 pages

Voicecommandocr

The document outlines a project that develops a voice-activated OCR-based book reader using Python, aimed at assisting visually impaired users. It captures images of book pages, extracts text using EasyOCR, and reads it aloud through voice commands, supporting multiple languages. The system is designed for offline use and includes modules for managing OCR, speech interaction, and potential future multilingual support.

Uploaded by

sharma892078
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Voicecommandocr

The document outlines a project that develops a voice-activated OCR-based book reader using Python, aimed at assisting visually impaired users. It captures images of book pages, extracts text using EasyOCR, and reads it aloud through voice commands, supporting multiple languages. The system is designed for offline use and includes modules for managing OCR, speech interaction, and potential future multilingual support.

Uploaded by

sharma892078
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Voice-Activated OCR Book Reader:-

Abstract

This project implements a voice-activated OCR-based book reading system using


EasyOCR, OpenCV, and speech processing libraries in Python. The system captures
images of book pages using a webcam, extracts and saves the text using OCR, and
reads it aloud based on voice commands. It is particularly helpful for visually impaired
users or those with reading difficulties. The application supports multiple languages
(English and Hindi for OCR) and responds to basic voice commands for interaction.

1. Introduction

Reading printed books can be challenging for people with visual impairments or
dyslexia. Although screen readers exist for digital content, access to printed material
remains limited. This project addresses this gap by building an offline, voice-
controlled book reader that captures images of printed pages using a webcam,
performs OCR, stores the extracted text, and reads it aloud when instructed via speech
commands.

2. Motivation and Objective

2.1. Why this project?

• Accessibility: Empower visually impaired users to read physical books without


assistance.

• Language support: Enable OCR and text-to-speech in multiple languages.

• Offline processing: Avoid reliance on continuous internet access, using local


OCR and TTS libraries.

• Usability: Use natural language voice commands for hands-free operation.

2.2. Objectives

• Enable capturing images from a live webcam feed.

• Extract text using EasyOCR in English and Hindi.

• Store extracted text for multi-page reading.

• Use voice commands to control reading and navigation.

• Provide spoken feedback using text-to-speech.


3. System Architecture

The system consists of four main Python modules:

3.1. easyocr_main.py

Acts as the main controller. Responsibilities include:

• Managing file paths and setup.

• Listening for voice commands.

• Triggering capture and OCR.

• Saving and reading stored text files.

• Handling termination using external trigger files.

3.2. ocr.py

• Uses EasyOCR to extract text from image frames.

• Processes frames (grayscale and resized) to improve OCR accuracy.

• Returns extracted text for storage and reading.

3.3. speech.py

• Provides voice interaction.

• speak_text() uses pyttsx3 to convert text to speech.

• listen_for_command() uses speech_recognition to detect voice input from the


user and convert it to text.

3.4. translator.py

• Contains translate_text() using deep_translator to translate extracted text to


another language.

• Although not yet integrated into the main loop, it opens doors for multilingual
support in future updates.

4. System Workflow

1. The system initializes the webcam and enters a loop.

2. It waits for the user to say "capture" (which triggers a file creation externally).

3. When capture_trigger.txt is detected:

o A frame is captured from the webcam.


o Text is extracted using OCR.

o Text is saved in pages/pageN.txt.

o Feedback is given via TTS.

4. The user can


then say "read page X"
to hear that page’s
content.

5. If stop_ocr.txt is
detected, the program
exits.

5. How to Use

1. Setup: Place all files in the same project directory structure.

2. Run: Execute easyocr_main.py.

3. Capture: Create a file named capture_trigger.txt in the same folder to trigger


page capture.

4. Read: Use voice command like “Read page 2” to read saved text.

5. Stop: Create a file named stop_ocr.txt to stop the application.

Tip: Use external automation or scripts to create trigger files (capture_trigger.txt and
stop_ocr.txt) when a voice command "capture" or "stop" is heard.

6. Challenges
Faced

6.1. Multipage
Management

•Managing
multiple page files
dynamically
required systematic file naming and page tracking (page1.txt, page2.txt, etc.).

• Implemented checks for missing or empty pages before reading aloud.

6.2. Voice Command Accuracy

• Inconsistent recognition due to ambient noise or accents.

• Added exception handling for UnknownValueError and TimeoutError.

6.3. OCR Accuracy

• Text extraction failed in poor lighting or low-quality images.

• Applied grayscale conversion and image resizing to enhance input for EasyOCR.

6.4. Language Support

• OCR supported English and Hindi, but TTS was limited to English in current
implementation.

• Translation module added for future integration of multilingual reading.

7. Conclusion

This project
successfully integrates
voice and vision
technologies to build
an interactive,
accessible, and
modular book reading
system. It showcases
how OCR and speech
processing can work
together for accessibility and educational tools. Future improvements could include:

• Continuous voice command detection (e.g., "capture" spoken directly, not via
trigger file).

• GUI for easier operation.

• Integration of real-time translation into reading flow.

8. Future Work:
• Continuous voice activation (removing trigger files).

• Text summarization before reading long pages.

• Multilingual TTS support using tools like gTTS or Azure TTS.

• Cloud synchronization of captured pages.

• Error feedback with more descriptive prompts.

Appendix: Requirements

Python Packages:

pip install easyocr opencv-python pyttsx3 SpeechRecognition deep-translator

Hardware:

• Webcam

• Microphone

You might also like