I want to turn scanned images into black and white images, the goal is to reduce the file size before the images are transferred over the internet for OCR.
The normal binarisation/ black and white images created by scanners/ general image editing software produces undesirable results.
Lots of random black pixels are left behind which are really just noise from binarisation, this causes the OCR to try and recognise characters where there are none, or insert full stops, colons etc after characters.
What can I use in OpenCV to binarise an image, keep lines, characters & dark areas solid, and, reduce pixel noise in white areas?
I've toyed with cvThreshold and cvAdaptiveThreshold but results have not been great yet.
As an example, check out this original image and the desired result.