0
votes

I am using Google cloud vision OCR to detect text. The displayed text is always 1. detected text, 2. each of the detected words. I only want to display the detected text.

I am using the code from Google Cloud Platform Github where I set the type to Text Detection labelDetection.setType("TEXT_DETECTION"); in callCloudVision method.

I also modified the convertResponseToString method to:

private String convertResponseToString(BatchAnnotateImagesResponse response) {
        String message = "";

        List<EntityAnnotation> labels = response.getResponses().get(0).getTextAnnotations();
        for (EntityAnnotation label : labels) {
            if (labels != null) {
                System.out.println(label.getDescription());
                message += String.format(Locale.US, "%s", label.getDescription()) + "\n";
            }
            else
            {
                message += "nothing";
            }
        }
        return message;
    }

This is my gradle:

apply plugin: 'com.android.application'

android {
    compileSdkVersion 25
    defaultConfig {
        applicationId "com.example.mhci"
        minSdkVersion 24
        targetSdkVersion 25
        versionCode 1
        versionName "1.0"
        testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
        multiDexEnabled true
        javaCompileOptions {
            annotationProcessorOptions {
                includeCompileClasspath false
            }
        }
    }
    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
        }
    }
    packagingOptions {
        exclude 'META-INF/LICENSE'
        exclude 'META-INF/io.netty.versions.properties'
        exclude 'META-INF/INDEX.LIST'
        exclude 'META-INF/DEPENDENCIES'
    }
}

dependencies {
    compile fileTree(dir: 'libs', include: ['*.jar'])
    implementation 'com.android.support.constraint:constraint-layout:1.0.2'
    testCompile 'junit:junit:4.12'

    androidTestCompile('com.android.support.test.espresso:espresso-core:3.0.1', {
        exclude group: 'com.android.support', module: 'support-annotations'
    })

    compile 'com.android.support:appcompat-v7:25.1.1'
    compile 'com.android.support:design:25.4.0'

    compile 'com.google.api-client:google-api-client-android:1.20.0' exclude module: 'httpclient'
    compile 'com.google.http-client:google-http-client-gson:1.20.0' exclude module: 'httpclient'

    compile 'com.google.apis:google-api-services-vision:v1-rev2-1.21.0'

    compile 'com.android.support:design:25.4.0'

    compile ('com.google.apis:google-api-services-translate:v2-rev47-1.22.0') {
        exclude group: 'com.google.guava'
    }

    compile ('com.google.cloud:google-cloud-translate:0.5.0') {
        exclude group: 'io.grpc', module: 'grpc-all'
        exclude group: 'com.google.protobuf', module: 'protobuf-java'
        exclude group: 'com.google.api-client', module: 'google-api-client-appengine'
    }
}

The detected text of this image that was displayed is:

Hello world

Hello
world

But I want it to only display Hello world

How can I do it?

1
did you try TextAnnotationkrishank Tripathi
@krishankTripathi u mean instead of using EntityAnnotation, replace it with TextAnnotation?k8892
yes use the TextAnnotation and go through this link developers.google.com/resources/api-libraries/documentation/…krishank Tripathi
when you get the page from that method you can detect the block of words or use the block method to detect the block of works that is in Page method given below developers.google.com/resources/api-libraries/documentation/…krishank Tripathi
The code I am using does not have TextAnnotation in it. It does not have the external library for itk8892

1 Answers

0
votes

The text detection should work even for this use case. The problem is that all labels returned everything, that's just how it's designed. You can see how if you have say two street signs side by side in a picture ("Main Street" and "Park Avenue"), you would want the API to break down what it's seeing into parts so it makes more sense. If it just returns one string of "Main Street Park Avenue", that information is not useful. That's why it always returns the whole thing and then all of its parts, so if you are doing a query through the returned strings, you will find the relevant pictures.

So basically if you trust it to read the label properly and in its entirety, you can simply use the first result in the returned array instead of all 3. Or you can implement some logic that display only the longest and most trusted results.

So basically, manipulate the returned list, List labels, and extract the kind of result you want. In your particular case, don't display the entire list, just take the first value in list and you will have what you want.