Python Image To Text Using OCR (Simple Examples)

Welcome to a tutorial on how to convert an image to text using OCR in Python. So you are working on a project that needs to “extract” text from an image? A common solution is called Optical Character Recognition, and here are some possible ways to do it in Python. Read on!

 

 

TABLE OF CONTENTS

 

DOWNLOAD & NOTES

Here is the download link to the example code, so you don’t have to copy-paste everything.

 

EXAMPLE CODE DOWNLOAD

Source code on GitHub Gist

Just click on “download zip” or do a git clone. I have released it under the MIT license, so feel free to build on top of it or use it in your own project.

 

SORRY FOR THE ADS...

But someone has to pay the bills, and sponsors are paying for it. I insist on not turning Code Boxx into a "paid scripts" business, and I don't "block people with Adblock". Every little bit of support helps.

Buy Me A Coffee Code Boxx eBooks

 

PYTHON IMAGE TO TEXT WITH OCR

All right, let us now get into the examples of converting images to text in Python using OCR.

 

QUICK SETUP

The “usual stuff”:

  • Create a virtual environment virtualenv venv and activate it – venv\Scripts\activate (Windows) venv/bin/activate (Linux/Mac)
  • Install required libraries – pip install flask
  • For those who are new, the default Flask folders are –
    • static Public files (JS/CSS/images/videos/audio)
    • templates HTML pages

 

 

SOLUTION 1) TESSERACT

1A) DOWNLOAD & INSTALL TESSERACT

There is a popular open-source OCR library called Tesseract, but unfortunately, I can’t find a Python port-over. Don’t worry though, we can still use this library. First, install it:

  • For the experts, here’s the Tesseract Github page… If you want to download and compile it yourself.
  • Otherwise, the easy way is to download and install one of their pre-built versions.

 

1B) PYTHON RUN TESSERACT IN THE COMMAND LINE

1-tesseract.py
# (A) LOAD SUBPROCESS MODULE & SETTINGS - CHANGE TO YOUR OWN!
import subprocess 
tes = "C:/Program Files/Tesseract-OCR/tesseract.exe"
img = "demo.png"
lang = "eng"
 
# (B) RUN TESSERACT COMMAND
cmd = f'"{tes}" {img} - -l {lang}'
res = subprocess.run(cmd, stdout=subprocess.PIPE)
 
# (C) GET TEXT
txt = res.stdout.decode("utf-8")
# @TODO - WHATEVER YOU NEED WITH THE TEXT
print(txt)

How to “gel” Tesseract and Python together:

  • (B) Run PATH/TO/TESSERACT IMAGE.FILE - -l eng in the command line.
  • (C) Get the command line output as a string.

 

 

SOLUTION 2) TESSERACT JS

2A) HTML PAGE

2A-tesseract-js.html
<!-- (A) FILE SELECTOR -->
<input type="file" id="select" accept="image/png, image/gif, image/webp, image/jpeg">

<!-- (B) LOAD TESSERACT -->
<!-- https://cdnjs.com/libraries/tesseract.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/tesseract.js/4.0.6/tesseract.min.js"></script>

<!-- (C) INIT -->
<script>
window.addEventListener("load", async () => {
  // (C1) GET HTML FILE SELECTOR
  const hSel = document.getElementById("select");

  // (C2) CREATE ENGLISH WORKER
  const worker = await Tesseract.createWorker();
  await worker.loadLanguage("eng");
  await worker.initialize("eng");

  // (C3) ON FILE SELECT
  hSel.onchange = async () => {
    // (C3-1) IMAGE TO TEXT
    const res = await worker.recognize(hSel.files[0]);

    // (C3-2) UPLOAD TO SERVER
    let data = new FormData();
    data.append("text", res.data.text);
    fetch("/save", { method:"post", body:data })
    .then(res => res.text())
    .then(txt => console.log(txt))
    .catch(err => console.error(err));
  };
});
</script>

If you cannot install anything on the server, here’s an alternative – Tesseract does not have a “Python version”, but someone did manage to create a Javascript web assembly version.

 

 

2B) FLASK HTTP SERVER

2B-tesseract-js.py
# (A) INIT
# (A1) LOAD MODULES
from flask import Flask, render_template, request, make_response, send_from_directory
 
# (A2) FLASK SETTINGS + INIT
HOST_NAME = "localhost"
HOST_PORT = 80
app = Flask(__name__)
# app.debug = True
 
# (B) VIEWS
# (B1) "LANDING PAGE"
@app.route("/")
def index():
  return render_template("2A-tesseract-js.html")
 
# (B2) SAVE CONVERTED TEXT
@app.route("/save", methods=["POST"])
def txt():
  data = dict(request.form)
  # @TODO - WHATEVER YOU NEED WITH THE TEXT
  print(data["text"])
  return "OK"
 
# (C) START
if __name__ == "__main__":
  app.run(HOST_NAME, HOST_PORT)

TesseractJS is client-side, how does it work with Python? This is unfortunately a little bit roundabout:

  • Create a simple HTTP server with Flask, and serve the above Tesseract page at http://localhost.
  • The Tesseract page will send the result to http://localhost/save.

 

SOLUTION 3) GOOGLE CLOUD VISION

If all else fails, the final alternative you have is to use an online image-to-text recognition service – Google Cloud Vision is a good option. At the time of writing, they offer 1000 free processes per month.

  • Sign up with Google Cloud first.
  • Follow their instructions.
    • Create a new project in Google Cloud console.
    • Enable Vision API.
    • Install Google Cloud CLI.
  • All the code samples are on the instructions page, also on Github.

 

 

EXTRAS

That’s all for the tutorial, and here is a small section on some extras and links that may be useful to you.

 

LINKS & REFERENCES

 

THE END

Thank you for reading, and we have come to the end. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!