How to operate OCR engines

This blog is part one of a comprehensive guide to Optical Character Recognition (OCR). We discuss popular open-source tools, Tesseract & EasyOCR, with hands-on tutorials on how to use the tools effectively.

GraphQL has a role beyond API Query Language- being the backbone of application Integration
background Coditation

How to operate OCR engines

Optical character recognition or OCR is not a new topic in the field of document understanding, OCR is a technique (both electronic and mechanical) to transfer image un-editable text data to machine-encoded editable text (i.e., a "string" data type). We usually associate OCR with software. in other words, these are methods that:

  1. Accept input in the form of images, scanned documents, PDF photographs, or computer-generated files.
  2. The machine will detect the text present in pixel format, automatically, and “read” and “edit” it as a human would. 
  3. Convert the text to machine-readable format in such a manner that, we can search, edit, index, and further detail our understanding of unstructured data.

Skilled practitioner flow of OCR recognization

The task is to convert image text data to machine-readable text using OCR engines. However, since the 1960s, when image interpretation and computer vision were first developed. researchers have struggled to develop generalized OCR systems that work in cases of broad and vague use.

For example, if I had to show the following image to my OCR engine, I would expect it to sense the text, recognize the text, and then encrypt the text as editable string data.

Input

Output => CODITATION

Why OCR is challenging

However, despite its simplicity, OCR is exceptionally hard. Although the discipline of computer vision has been around for more than 50 years (with mechanical OCR machines dating back over 100 years), we have yet to "solve" OCR and create an off-the-shelf OCR system that works in almost any situation.
There are too many factors to think about., such as noise, writing style, image quality, etc. We're still a long way from resolving OCR. There are so many complexities in how humans share information through writing. As a result, we assert that systems for computer vision will never be able to read image text with 100% reliability
This blog would not exist if OCR had already been rectified. Your 1st Google search would have directed you to the program code you needed to apply OCR convincingly and correctly to your tasks. However, that is not the world we reside in. While we're getting better at tackling OCR challenges, knowing how to apply the present OCR engine, nevertheless requires a skilled practitioner.

Open-source OCR tools and Libraries

  1. Tesseract

Tesseract, which was created by Hewlett Packard in the 1980s, was made open-source in 2005. Google eventually endorse the endeavor in 2006 and has served as a supporter since at. Tesseract software supports a wide range of natural languages, from English (at first) to Punjabi to Yiddish. Since the updates in 2015, it now supports over 100 written languages and has code in place so that it can easily be trained in other languages as well. Originally a C program, it was ported to C++ in 1998. The software is headless and can only be run from the command line. It does not include a graphical user interface (GUI), but various other software packages wrap Tesseract to offer one.
Tesseract is particularly fit for document processing piping systems in which images are scanned & pre-processed, and afterward, Optical Character Recognition is used.

  1. EasyOCR

EasyOCR, as the name implies, is a Python package that enables computer vision programmers to accomplish Optical Character Recognition with ease.
The EasyOCR package is created and maintained by Jaided AI, a company that specializes in Optical Character Recognition services. Python and the PyTorch library are used to implement EasyOCR. When you have a CUDA-capable GPU, the inherent PyTorch deep learning library can drastically improve text detection and OCR speed. EasyOCR can currently OCR text in 58 languages, including English, German, Hindi, Russian, and others. The EasyOCR developers intend to add more languages in the coming years.EasyOCR currently only supports OCRing typed text. They also intend to release a handwriting identification system later in 2020!

Hands-on OCR

Tesseract

  1. Install Tesseract on the system.
    We must first configure the Tesseract library on the system before we can use it.
    Tesseract will be installed on macOS using Homebrew.
    $ brew install tesseract

    If you're running Ubuntu, simply type apt-get to install Tesseract OCR.
    $ sudo apt-get install tesseract-ocr

For Windows

  1. Check that Tesseract is installed.
    To make sure Tesseract was properly installed on your system, run the following command:
    $ tesseract -v

    Tesseract's version should be displayed on your screen, as well as a list of image file format libraries with which Tesseract is compatible.
  1. Test out Tesseract OCR

How to Improve OCR Results

You can improve OCR accuracy by preprocessing your images with computer vision and image processing libraries like OpenCV and scikit-image. however, the question is what algorithms and techniques do you employ? Deep learning is willing to take responsibility for near-perfect accuracy in almost every field of computer science. For OCR, which deep learning models, layer types, and loss functions do you use?
Utilizing Tesseract options and configurations to improve OCR accuracy We are using machine learning to denoise our images to improve OCR accuracy. Tesseract performs different image processing operations internally (via the Leptonica library) before performing OCR. It usually does a fine job of this, but there will undoubtedly be cases where it falls short, resulting in a significant decrease in accuracy. However, image pre-processing techniques such as Rescaling, Binarisation, Noise Removal, Dilation or Erosion, Rotation or Deskewing, Borders, and Transparency or Alpha channel enhance OCR final inferences. In the case of complex images yielding no results, Tries to OCR the text but fails miserably, returning illogical results. I was annoyed when I couldn't get the correct OCR result. I had no thought about when and how to utilize various options. I had no idea how half of the options were managed because the documentation was so thin and lacked actual examples!

The lesson I learned, and perhaps one of the most common issues I see new OCR solving problems and making now, is failing to understand fully how Tesseract's page segmentation modes can strongly impact the correctness of your OCR output.

When operating with the Tesseract OCR engine, you must become acquainted with Tesseract's PSMs; without them, you will easily become upset and will be unable to achieve high OCR accuracy.

Simply supply the —help-psm argument to tesseract to get a list of the 14 PSMs. Moreover, skilled practitioners can play with the option of Tesseract Page Segmentation options as per input data. To see the detail of the tesseract PSM option - $ tesseract –help-psm

Figure 1: PSM option detail descriptions

Let's play with the input type and the PSM options. 

CASE 1: we just want to verify the direction of text present in the input image for the below image

Figure 2: Just need orientation of text

It is pretty simple using tesseract PSM option 0, and the execution command is $ tesseract <image path> stdout --psm 0

Figure 3 Output of PSM 0 option

You can see the orientation of input is 0 degrees [maybe the degree 90, 180, 270 based on input], and also returns the script's confidence. (i.e., graphics signs/writing system), such as Latin, Han, Cyrillic, etc.

 Figure 4: Just need the orientation of the text

$ tesseract aboutcoditation_rotated.png stdout –psm 0

Figure 5: Just need the orientation of the text

You can see the orientation for Figure 4 is in the output window of Figure 5 is 270 degrees and if you want to correct the visibility just rotate by 90 degrees in the reverse direction which also given in the output as a rotate option. However you may be confused about where is OCR text, --psm 0 mode does not perform OCR, just gives Orientation and script detection (OSD). In short, If you only need the info on the text, —psm 0 is the mode to use. Let's move toward the title of the blog.
CASE 2: Desire is text in the image of Figure
2, and it's not possible with PSM 0 then is any choice, yes there is the next number is 1 - $ tesseract aboutcoditation_rotated stdout --psm 1

Figure 6: OCR text of Figure 2

Awesome, you have taken baby steps toward the OCR engine, however, if you see the output there is no OSD information. Now let's take another new step.


CASE 3: OCR default PSM is 3, so if I use that one for Figure 2 will it give me some improvement? The answer is yes. So skilled practitioners suppose to start with option PSM 3. Now take the simplest one. 


CASE 4: Single digit number
depicted in Figure 6.  As we said, start with the default option –psm 3 whereas the result is unfortunately Empty! So need to experiment with other options and if you test with PSM 6, 7, 8, 9, 10, and 13 gives the expected text. However, you better go with option PSM 10 only as per the remark of PSM 10: Image as a Single Character

Figure 7: One-digit number

$ tesseract 4.png stdout --psm 6

$ tesseract 4.png stdout --psm 7

$ tesseract 4.png stdout --psm 8

$ tesseract 4.png stdout --psm 9

$ tesseract 4.png stdout --psm 10

$ tesseract 4.png stdout --psm 13

$ tesseract 4.png stdout --psm 3

Figure 8: Result for Figure 7

These use cases will be discussed in more detail in my next blog. Stay Tuned!

Hi, my name is Kiran Kamble. When I am done analyzing data, I play badminton and cricket and weekends are meant for hiking.

Want to receive update about our upcoming podcast?

Thanks for joining our newsletter.
Oops! Something went wrong.

Latest Articles

Implementing Custom Instrumentation for Application Performance Monitoring (APM) Using OpenTelemetry

Application Performance Monitoring (APM) has become crucial for businesses to ensure optimal software performance and user experience. As applications grow more complex and distributed, the need for comprehensive monitoring solutions has never been greater. OpenTelemetry has emerged as a powerful, vendor-neutral framework for instrumenting, generating, collecting, and exporting telemetry data. This article explores how to implement custom instrumentation using OpenTelemetry for effective APM.

Mobile Engineering
time
5
 min read

Implementing Custom Evaluation Metrics in LangChain for Measuring AI Agent Performance

As AI and language models continue to advance at breakneck speed, the need to accurately gauge AI agent performance has never been more critical. LangChain, a go-to framework for building language model applications, comes equipped with its own set of evaluation tools. However, these off-the-shelf solutions often fall short when dealing with the intricacies of specialized AI applications. This article dives into the world of custom evaluation metrics in LangChain, showing you how to craft bespoke measures that truly capture the essence of your AI agent's performance.

AI/ML
time
5
 min read

Enhancing Quality Control with AI: Smarter Defect Detection in Manufacturing

In today's competitive manufacturing landscape, quality control is paramount. Traditional methods often struggle to maintain optimal standards. However, the integration of Artificial Intelligence (AI) is revolutionizing this domain. This article delves into the transformative impact of AI on quality control in manufacturing, highlighting specific use cases and their underlying architectures.

AI/ML
time
5
 min read