5
votes

I started off writing a simple script to read data from an image. Here is my Ruby code that uses RTesseract to read it:

require 'rtesseract'
require 'mini_magick'

RTesseract.configure do |config|
    config.processor = "mini_magick"
end

image = RTesseract.new("myImage.jpg")
puts image.to_s

I started off with this image:

enter image description here

The results that came back were: 132B 4.

I understand that the 0 came back as a B (I can solve that). But the following 3, 0, 8 did not return at all. Now I know it already knows how to read a 3 and 0, because it did it in the first number. I figure it had some issues rendering the following numbers, so I made it black and white.

This is the second image I tried:

enter image description here

However the results still came back as: 132B 4.

Finally I cut the image and just tried the final 3 numbers.

Here is the image:

enter image description here

But when I ran the script, it returned no result. Any thoughts on why I am not able to read the final numbers?

I'm using Ruby 2.2.2, rTesseract 2.1.0 and MiniMagick 4.5.1.

I am using Tesseract 3.04.01

1
Turn the image into black text on white and remove image compression artifacts, a la @eric-duminil's suggestion. Anecdotally, for a consistent & known font, I've had better accuracy just doing naive pixel-diff matching per character on my own.Kache
@Kache: Sounds interesting. Do you have any link?Eric Duminil
@EricDuminil ah, I don't have a link. It was a very naive method: 1. modify & cut up text into normalized black-on-white characters 2. datamine all possible character images & variations that may appear for the font, 3. select character with least pixel-by-pixel differences, using some tricks to not need to count every pixel of every character (e.g. character pixel height/width, num black/white pixels, etc)Kache

1 Answers

3
votes

I tested your script on my Linux Mint 17 machine, with tesseract 3.03 , Ruby 2.1.5 and MiniMagick 4.5.1

It also returns 132B 4.

If you're sure that digits are encoded, you could try :

image = RTesseract.new("myImage.jpg", options: :digits)

It returns 13223 4.

Launching tesseract without parameter gives you a list of possible options. "pagesegmode 7" looks interesting : 7 = Treat the image as a single text line.

So :

image = RTesseract.new("myImage.jpg", options: :digits, psm: 7)

It returns 13223 4 3 21 8.

With your second image, it returns 3 21 8.

I think the biggest problem now is that the JPG artifacts are pretty strong and the contrast is relatively low between digits and background. A PNG image would probably yield better results.

With gimp, I resized the image to 200px height, cropped close to the digits to remove some artifacts, used Colors/Threshold at 150, inverted the image and saved as png :

enter image description here

Rtesseract returns :

1320 4 3 0 8

With Image Magick, this command achieved the same result :

convert myImage.jpg -geometry x200 -threshold 13% -negate myImage.png