Detect only horizontal text with Tesseract

Question

I've an image with some horizontal and vertical text. And I'm detecting text using tesseract OCR. And here is the array tesseract returns

'text': ['', '', '', '', 'Some', 'other', 'text', 'horizontal', '', '', '', 'JEDIY9A', ']xO]', 'WOPUeI', 'BWOS', 'SI', 'SIUL']

As you can see it only detect horizontal text correctly. So is there a way to force tesseract to detect only horizontal text? So later I will rotate the image by 90 and again pass image to detect vertical text(which is now horizontal).

Or is there a simple solution?

Sachin Rajput Sachin Rajput · Accepted Answer · 2020-12-26T14:29:47

Read about the page segmentation you will it there . there is one valid value of psm that does exactly what you want....

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
                        bypassing hacks that are Tesseract-specific.

try --psm 6 or 12

or you can try this answer here is a solution that could work for you How do I detect vertical text with OpenCV for extraction

Detect only horizontal text with Tesseract

2 Answers