Different text detection result from Google Vision API

Question

Get different result for text detection from .Net code and demo app for the same image google vision api result and .net result

this is my code:

            var response = vision.Images.Annotate(
            new BatchAnnotateImagesRequest()
            {
                Requests = new[]
                {
                    new AnnotateImageRequest()
                    {
                        Features = new[]
                        {
                            new Feature()
                            {
                                Type =
                                    "TEXT_DETECTION"
                            }
                        },
                        Image = image
                    }
                }
            }).Execute();

I'm facing the same issue with PHP library, result is different between browser and REST API call, why? — Emil Borconi

Jon Skeet Jon Skeet · Accepted Answer · 2018-01-05T16:19:53

As noted in Emil's answer, you want the DOCUMENT_TEXT_DETECTION feature rather than TEXT_DETECTION. However, you can do it all rather more simply than with the current code.

Rather than using Google.Apis.Vision.V1 (which it looks like you're doing, and which uses the REST endpoint), I'd suggest using Google.Cloud.Vision.V1 (which uses the gRPC endpoint, and is hopefully easier to use. Disclaimer: I work on the latter library). You can do most work with the REST endpoint, mind you.

Here's a complete example using the latter library.

using Google.Cloud.Vision.V1;
using System;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        var client = ImageAnnotatorClient.Create();
        var image = Image.FromUri("https://i.stack.imgur.com/H21rL.png");
        var annotations = client.DetectDocumentText(image);
        var paragraphs = annotations.Pages
            .SelectMany(page => page.Blocks)
            .SelectMany(block => block.Paragraphs);
        foreach (var para in paragraphs)
        {
            var box = para.BoundingBox;
            Console.WriteLine($"Bounding box: {string.Join(" / ", box.Vertices.Select(v => $"({v.X}, {v.Y})"))}");
            var symbols = string.Join("", para.Words.SelectMany(w => w.Symbols).SelectMany(s => s.Text));
            Console.WriteLine($"Paragraph: {symbols}");
            Console.WriteLine();
        }
    }
}

That loses the spaces between the symbols, but shows that all the text is being detected - and the method call to perform the actual detection is very simple:

var client = ImageAnnotatorClient.Create();
var image = Image.FromUri("https://i.stack.imgur.com/H21rL.png");
var annotations = client.DetectDocumentText(image);

Most of the code above is handling the response.

Different text detection result from Google Vision API

2 Answers