2
votes

I am trying to use OCR SDK in PHP from ABBYY.com for recognizing business cards. I have the following code just to check out how it works. When I execute the code I get a blank output. Where I could be gonig wrong on the code?


$applicationId = "MyBusinessCardReader";
$password = "password";
$filename = "businesscard.jpg";
$localDir = dirname(__FILE__);
$url = "http://cloud.ocrsdk.com/processBusinessCard";

$c = curl_init();
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_USERPWD, "$applicationId:$password");
curl_setopt($c, CURLOPT_POST, 1);

$post_array = array(
  "my_file" => "@$localDir$filename"
);

curl_setopt($c, CURLOPT_POSTFIELDS, $post_array);
$response = curl_exec($c);
curl_close($c);

echo "<pre>";
echo $response;
echo "&lt/pre>";

The samle business card image can be seen at http://test.goje87.com/vangal/businesscard.jpg

1

1 Answers

3
votes

I don't know much about the Abbyy SDK. But before you try any OCR engine on an image, you should always make sure to...

  • ...crop all borders with different coloring,
  • ...scale the image so you get your text to a (virtual) size of at least 10 pt per 300 DPI.

I tried Tesseract v3.01 against your original sample, and it didn't find anything.

Then I applied an ImageMagick command to crop the borders and scale the image to 200% like this:

convert                 \
    businesscard.jpg    \
   -crop 440x200+30+120 \
   -scale 180%          \
    cropped+scaled-businesscard.jpg

to get this picture:

Cropped businesscard

This already lets Tesseract's commandline recognize most of the text (it fails on @ and .):

tesseract b.jpg bcard && cat bcard.txt

  Tesseract Open Source OCR Engine v3.01 with Leptonica

    Fe/<70"
    MIKE FARAG
    PH 913 284 6455
    EM milzeocreatefervoncom
    Tw 0mil<efarag01
    createfervoncom

One could most likely get Tesseract's recognition rate close to 100% if I'd...

  • ... enhance the picture quality for OCR purposes: increase contrast and convert to pure grayscale ('binarization');
  • ...'train' Tesseract on the specific font used in this document.

I assume that you can make Abbyy's life easier by similar measures...