How to operate OCR engines - II

This blog explores advanced Optical Character Recognition (OCR) applications using the Tesseract engine & reviews Tesseract's Page Segmentation Modes (PSMs) and provides guidance for their usage.

GraphQL has a role beyond API Query Language- being the backbone of application Integration
background Coditation

How to operate OCR engines - II

In our previous blog, we covered the basics of OCR, popular open-source tools; Tesseract & EasyOCR, with hands-on tutorials on how to use the tools effectively. In this blog, we all talk about some advanced use cases that we may encounter with OCR.

CASE 1:

Take different italic style input, now this is a little more challenging and the output for this is unfortunately not recognized with any of the tesseract options, however for a few PSM options 80-90% result is accurate and depicted in Figure 9. For “i” it is t and “t” is ‘k’.  Hence for now in tesseract their no option to recognize this scenario, so you can either try to re-train the tesseract model for this kind of input or you can use a commercial OCR engine.

Figure 1: Italic style

$ tesseract italic.png stdout --psm 8

Figure 2: Figure 1 output

Figure 2: Figure 1 output

CASE 2: Consider Figure 3, which is a receipt from the grocery store. Let’s try to OCR this image using the default (--psm 3) mode:

Figure 3: Whole Foods Market receipt we will OCR.

$ tesseract receipt.png stdout --psm 3

Figure 4: On Figure 3 with PSM 3

$ tesseract receipt.png stdout --psm 4

Figure 5: On Figure 3 with PSM 4

That did not go so well. Tesseract cannot imply that we are going to look at column data and that text within the same row must be associated together when we use the default —psm 3 mode.
To address this issue, we can use the —psm 4 mode. As you can see, the results are far superior. Tesseract understands that text should be clustered row-by-row, enabling us to OCR the receipt's items.
As you'll see, the outcomes are much better here. Tesseract acknowledges that text should be grouped row-by-row, enabling us to OCR the receipt's items.

[Figures 9 and 10]. PSM 12 mode is essentially identical to PSM 11, but it now includes OSD.

CASE 3: Now we will try for interesting and challenging input “automatic license/number plate recognition (ANPR) system” 

Figure 6. Unfortunately, PSM 3 doesn't work for this input, whereas if we provide PSM 7 which handles the Image as a Single Text Line, gives the correct result, and even if tested with PSM 8 that also gives the same. However the difference between PSM 7 and 8 is a single line or a single word, so based on your input type you can select either of them.

Figure 6: A license plate we will OCR

$ tesseract numberplate1.png stdout --psm 3

$ tesseract numberplate1.png stdout --psm 7

Figure 7: Result on Figure 6.

CASE 4: Text presented in the form of rows and columns i.e sparse text, depicted in Figure 15 for this kind of input again we can go with the first PSM 3 default option whereas PSM 11 is best suited for this as it is specially designed for sparse text recognition.[Exprimention you can refer to Figures 

Figure 8: Sparse text

$ tesseract sadhgurubook_chapter.png stdout --psm 3

Figure 9: Figure 8 OCR using PSM 3

$ tesseract sadhgurubook_chapter.png stdout --psm 11

Figure 10: Figure 8 OCR using PSM 11 

Now let's try some big hurdles 

“CASE 5: Handwritten text” and “CASE 10: Image in table form”. Figures 11 and 12 respectively. For case 5, our experimentation shows tesseract has the option PSM 9, which works well, however, a little harder handwriting does not work even with PSM 9. That's why full handwritten OCR is still a research topic. 

Figure 11: Handwritten text image

$ tesseract handwriten.png stdout --psm 3

$ tesseract handwriten.png stdout --psm 9

Figure 12: Result of Figure 11

Moving towards the table image, of Figure 13: Top 10 cricket highest score teams in ODI presented table image format. If the table is present we expect the output is also in table format only but unfortunately with option PSM 3 and even with 11 we are not getting the same output result, output is depicted in Figure 14. In order to handle inputs of CASE 9 and 10, some image pre-processing will be necessary. To address this, I will be writing an additional blog post in the near future.

Figure 13: Top 10 cricket highest score teams in ODI in table image format

$ tesseract tabel.png stdout --psm 11 or 3

Figure 14: Result of Figure 13

Summary 

There are lots of option are available in the tesseract PSM option. Each one of Tesseract's fourteen PSMs assumes certain information regarding your source images, such as a block of content for eg, a scanned book, a single sentence of text for eg, a single statement from an article, or perhaps a single word for eg, a driving license plate. Our skill is to select the correct option for desired output. Here I have presented various cases for the right choice of PSM. most of the time OCR is used in traffic monitoring video surveillance applications for number plate recognition and we want to go for an Open-source engine such as tesseract or Easyocr, currently, the tesseract is the best preference with PSM 7 or 8. In the billing receipt digitization process, if we need an invoice in excel for word format for further accounting, we can go tesseract PSM option 4, however, a few Non-ASCII characters present in an invoice are missing, you can ignore them by applying a filter in your script. Likewise before applying any PSM option just refer –psm help and start with the default preference PSM 3 and then rest as per PSM descriptions. The more experience you gain with PSMs, the easier it will be to apply OCR to your own tasks.

Hi, my name is Kiran Kamble. When I am done analyzing data, I play badminton and cricket and weekends are meant for hiking.

Want to receive update about our upcoming podcast?

Thanks for joining our newsletter.
Oops! Something went wrong.

Latest Articles

Designing multi-agent systems using LangGraph for collaborative problem-solving

Learn how to build sophisticated multi-agent systems using LangGraph for collaborative problem-solving. This comprehensive guide covers the implementation of a software development team of AI agents, including task breakdown, code implementation, and review processes. Discover practical patterns for state management, agent communication, error handling, and system monitoring. With real-world examples and code implementations, you'll understand how to orchestrate multiple AI agents to tackle complex problems effectively. Perfect for developers looking to create robust, production-grade multi-agent systems that can handle iterative development workflows and maintain reliable state management.

time
7
 min read

Designing event-driven microservices architectures using Apache Kafka and Kafka Streams

Dive into the world of event-driven microservices architecture with Apache Kafka and Kafka Streams. This comprehensive guide explores core concepts, implementation patterns, and best practices for building scalable distributed systems. Learn how to design event schemas, process streams effectively, and handle failures gracefully. With practical Java code examples and real-world architectural patterns, discover how companies like Netflix and LinkedIn process billions of events daily. Whether you're new to event-driven architecture or looking to optimize your existing system, this guide provides valuable insights into building robust, loosely coupled microservices.

time
12
 min read

Implementing Custom Instrumentation for Application Performance Monitoring (APM) Using OpenTelemetry

Application Performance Monitoring (APM) has become crucial for businesses to ensure optimal software performance and user experience. As applications grow more complex and distributed, the need for comprehensive monitoring solutions has never been greater. OpenTelemetry has emerged as a powerful, vendor-neutral framework for instrumenting, generating, collecting, and exporting telemetry data. This article explores how to implement custom instrumentation using OpenTelemetry for effective APM.

Mobile Engineering
time
5
 min read