0
votes

I am currently developing a commercial software. I need to add the feature of chinese character and word detection but it seems the functions of Scene Text Detection can only detect english characters and words. I searched on google and nothing related showed up.

I will feed a scanned A4 paper image to the application for it to find some chinese words based on some pre-set conditions.For example, the image contains the word "你好"(it means "Hello" in chinese) twice but only extract it once and save it as a string when it meets the pre-set condition of it is next to the title of 姓名(Name).

Here is a small illustration of the example:

Greeting: 你好

姓名(Name): 你好 <--- this word detection only

Can someone please, who has decent experience with opencv or emgucv help me out?

If a custom dataset is needed in order to achieve my goal, can someone guide me on how to perform dataset training for word detection in opencv or emgucv.

1
I would recommend you to take a look on github.com/tesseract-ocr/tesseract. This an ocr engine wich is able to detect text on scanned documents. The newest version has an already trained neural network. OpenCV has a wrapper for it. youtube.com/watch?v=vtSGSXKggEo - TruckerCat

1 Answers

0
votes

(OpenCV or EmguCV is not your solution) You need Deep Neural NetWork(DNN) such as TensorFlow