I'm using the pytesseract library to create an OCR translation discord bot. But the output from tesseract is 90% complete gibberish, and I do not understand why.
The image that I try to use is already cropped to the area that I wish to use. I have tried to convert the image to grayscale via PIL but then pytesseract will output nothing.
I'm using the latest version of both pytesseract (0.2.7) and tesseract (v5 alpha)
I use the following code to get the image from the internet, pass it through tesseract and later (commented) translate the text.
from PIL import Image
import requests
import pytesseract
from io import BytesIO
from translate import Translator
translator = Translator(from_lang="autodetect", to_lang="en")
response = requests.get('https://image.prntscr.com/image/acqm3LDeSJOHtUZEMfA9eA.png')
#image = Image.open(BytesIO(response.content)).convert('LA')
image = Image.open(BytesIO(response.content))
string = pytesseract.image_to_string(image, lang='fra')
#image.save('greyscale.png')
print(string.format())
#translation = translator.translate(string)
#print(translation)
The output I get from tesseract can be found here: https://pastebin.com/kDYuTE4Q
I'm entirely new to both tesseract and python, so I may be doing something fundamental wrong, or I ask something from tesseract that is just not possible at the moment.