I am working on tesseract and created box file for the characters in the image. Using tesseract on command line, it has detected the characters and shown the position of the each detected character in box file.
Here is command line output.
/Desktop $ tesseract spa.arial.first_page.tif spa.arial.box nobatch
box.train .stderr
read_params_file: Can't open .stderr
Tesseract Open Source OCR Engine v4.0.0-146-gc39a with Leptonica
Page 1
Detected 74 diacritics
row xheight=2, but median xheight = 17.4815
row xheight=2.5, but median xheight = 17.4815
row xheight=91, but median xheight = 17.4815
row xheight=2.5, but median xheight = 17.4815
row xheight=3, but median xheight = 17.4815
row xheight=61.875, but median xheight = 17.4815
row xheight=23, but median xheight = 17.4815
row xheight=3, but median xheight = 17.4815
row xheight=3, but median xheight = 17.4815
row xheight=12.8333, but median xheight = 17.4815
row xheight=15.1282, but median xheight = 17.4815
row xheight=3.5, but median xheight = 17.4815
row xheight=3.5, but median xheight = 17.4815
row xheight=3.5, but median xheight = 17.4815
row xheight=628, but median xheight = 17.4815
row xheight=415.5, but median xheight = 17.4815
row xheight=4, but median xheight = 17.4815
row xheight=630, but median xheight = 17.4815
FAIL!
APPLY_BOXES: boxfile line 7/A ((286,1979),(325,2002)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 11/U ((199,1943),(239,1967)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 14/R ((298,1943),(323,1967)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 16/M ((325,1943),(360,1967)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1611/a ((849,451),(875,480)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1617/5 ((947,457),(973,480)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1622/. ((1038,457),(1042,460)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1839/a ((679,280),(705,303)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1860/u ((1030,274),(1063,304)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1865/p ((1113,274),(1133,304)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1876/a ((1303,275),(1329,302)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1879/, ((1362,275),(1365,282)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1886/c ((1467,278),(1494,301)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1889/d ((1542,277),(1551,300)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1892/h ((1569,277),(1595,300)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1895/c ((619,245),(645,268)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1910/n ((888,245),(920,262)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1911/l ((941,245),(949,267)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: boxfile line 1913/e ((981,239),(997,267)): FAILURE! Couldn't find a matching blob
APPLY_BOXES: Unlabelled word at :Bounding box=(133,887)->(1631,893)
APPLY_BOXES: Unlabelled word at :Bounding box=(132,569)->(1631,575)
APPLY_BOXES: Unlabelled word at :Bounding box=(132,484)->(1631,491)
APPLY_BOXES: Unlabelled word at :Bounding box=(1408,418)->(1470,479)
APPLY_BOXES: Unlabelled word at :Bounding box=(132,413)->(1630,420)
APPLY_BOXES: Unlabelled word at :Bounding box=(1238,346)->(1415,400)
APPLY_BOXES: Unlabelled word at :Bounding box=(1408,359)->(1476,425)
APPLY_BOXES: Unlabelled word at :Bounding box=(133,341)->(1628,348)
APPLY_BOXES: Unlabelled word at :Bounding box=(133,205)->(137,1461)
APPLY_BOXES: Unlabelled word at :Bounding box=(598,203)->(602,1034)
APPLY_BOXES: Unlabelled word at :Bounding box=(133,200)->(1629,208)
APPLY_BOXES: Unlabelled word at :Bounding box=(1628,200)->(1633,1460)
Found 1698 good blobs.
Leaving 59 unlabelled blobs in 0 words.
21 remaining unlabelled words deleted.
Generated training data for 353 words
I want to draw the blob(box) for each detected blob, I have searched but failed to get the reference. Can anybody help me to get blob drawn on the image of create file.
I have tried below python code to draw the blob for the text using pytesseract
import cv2
import pytesseract
file = '/home/Desktop/second_page.png'
img = cv2.imread(file)
h, w, _ = img.shape
boxes = pytesseract.image_to_boxes(img)
for b in boxes.splitlines():
b = b.split(' ')
img = cv2.rectangle(img, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2)
cv2.imshow(filename, img)
cv2.waitKey(0)
Output got: