How to use Computer Vision API to recognize runners' bib numbers

Question

I would like to use the Microsoft Cognitive Services Computer Vision API to recognize bib numbers on photos of runners in a race, either single runners or a reasonably small number of individual runners.

Is that a task that the OCR function should be able to handle? I have tried a couple samples with the "getting started" program and the testing console, and it returns an empty array of regions. Am I doing something wrong, or is that beyond its capabilities?

I tried handwritten OCR with a picture and got an acceptable result: i68.tinypic.com/2jduva.png — Maria Ines Parnisari

Semih Korkmaz Semih Korkmaz · Accepted Answer · 2017-05-21T23:57:49

First, check if your image fits the description of the API.

Supported image formats: JPEG, PNG, GIF, BMP. Image file size must be less than 4MB. Image dimensions must be between 40 x 40 and 3200 x 3200 pixels, and the image cannot be larger than 10 megapixels.

OCR systems do generally make a few assumptions;

Images are not rotated more than some degree, in Microsoft's case it is 40 degrees.

Text detection is still a hot topic of research. Detecting text in the wild can be challenging. For example, image in Maria's comment is pretty simple. Text color are black white, photo is taken from

Here, I share two photos:

A bad one for OCR: http://www.athletico.com/blog2/wp-content/uploads/2012/04/Runners.jpg

Here is the output for this image from Microsoft Cognitive Services Vision OCR API

{
  "language": "zh-Hant",
  "textAngle": 6.0999999999999641,
  "orientation": "Up",
  "regions": [
    {
      "boundingBox": "1441,490,51,41",
      "lines": [
        {
          "boundingBox": "1441,490,51,41",
          "words": [
            {
              "boundingBox": "1441,490,51,41",
              "text": "39"
            }
          ]
        }
      ]
    }
  ]
}

A good one for OCR:

http://running.competitor.com/files/2014/04/HappyRunner-Raleigh14.jpg

And now let's see the output from same API:

{
  "language": "en",
  "textAngle": -2.900000000000035,
  "orientation": "Up",
  "regions": [
    {
      "boundingBox": "1597,1824,585,576",
      "lines": [
        {
          "boundingBox": "1654,1824,528,67",
          "words": [
            {
              "boundingBox": "1654,1829,211,62",
              "text": "7?.cek"
            },
            {
              "boundingBox": "2146,1824,36,52",
              "text": "Y'"
            }
          ]
        },
        {
          "boundingBox": "1603,1889,551,98",
          "words": [
            {
              "boundingBox": "1603,1889,551,98",
              "text": "RALEIGH"
            }
          ]
        },
        {
          "boundingBox": "1695,1990,370,37",
          "words": [
            {
              "boundingBox": "1695,1990,79,35",
              "text": "1/2"
            },
            {
              "boundingBox": "1794,1993,271,34",
              "text": "marathon"
            }
          ]
        },
        {
          "boundingBox": "1742,2052,138,26",
          "words": [
            {
              "boundingBox": "1742,2052,105,23",
              "text": "presented"
            },
            {
              "boundingBox": "1856,2053,24,25",
              "text": "by"
            }
          ]
        },
        {
          "boundingBox": "1798,2099,156,21",
          "words": [
            {
              "boundingBox": "1798,2099,65,17",
              "text": "APRIL"
            },
            {
              "boundingBox": "1872,2101,26,19",
              "text": "13,"
            },
            {
              "boundingBox": "1905,2101,49,15",
              "text": "2014"
            }
          ]
        },
        {
          "boundingBox": "1597,2160,536,159",
          "words": [
            {
              "boundingBox": "1597,2160,536,159",
              "text": "19401"
            }
          ]
        },
        {
          "boundingBox": "1749,2368,101,32",
          "words": [
            {
              "boundingBox": "1749,2368,101,32",
              "text": "benefiting"
            }
          ]
        }
      ]
    }
  ]
}

Far better ! And one might think the second image is harder to recognize. But here is the difference, geometric image transformations (https://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/geometry/geo-tran.html), especially, affine transformations are still very hard for computers to grasp. Which our brains handle at very good success rates.

Therefore, OCR will be good at recognizing camera facing images while it can easily fail at text images with such transforms.

How to use Computer Vision API to recognize runners' bib numbers

1 Answers