I am trying to detect text part of an image(jpg file) using Tesseract-OCR and OpenCV in Python. The text part of the imageis Turkish, therefore I am using 'Turkish trained data (tur)' which is in Tesseract-OCR file. I have applied dilation and erosion to remove the noise before using tesseract.
The problem is, eventhough some of the characters in particular areas can be detected, the detection is mostly unsuccesful and fails to detect Turkish characters. Do you know any method or do you have any suggestion to get more success. Here are my codes below :
import pytesseract
from PIL import Image
import cv2
img= cv2.imread('C:\Users\gulsa\Desktop\Tesseract-OCR\alm98_2.jpg')
img = Image.open('alm98_2.jpg')
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-
OCR/tesseract'
tex = pytesseract.image_to_string(Image.open('alm98_2.jpg'),lang='tur')
print(tex)
Thank you in advance!