I am working on OCR using python with pytesseract. So exactly what i am trying to do is to read the text on image, extract the text and store the extracted text in a txt or csv file using file handling.
I want multiple files to be read, store the text and perform a check if the image's text im going to get read and store is already exist in a txt file.
Here is my code that is working without any error. The last lines are what i was trying to do but doesn't seem to work. Can anyone help me out regarding this?
Thanks in advance.
import cv2
import pytesseract,csv,re,os
from PIL import Image
from ast import literal_eval
img = pytesseract.image_to_string(Image.open("test1.png"), lang="eng")
print(img)
with open('C:\\Users\\Hasan\\Videos\\Captures\\saved.csv', "w") as outfile:
writer = csv.writer(outfile)
writer.writerow(img)
string = open('C:\\Users\\Hasan\\Videos\\Captures\\saved.csv').read()
new_str = re.sub('[^a-zA-z0-9\n\.]', ' ', string)
open('C:\\Users\\Hasan\\Videos\\Captures\\saved.csv', "w").write(new_str)
# f = open("saved.csv", "r")
# read = f.readline()
# print("\n" + f.read())
with open('C:\\Users\\Hasan\\Videos\\Captures\\saved.csv') as sv:
for line in sv:
if img in line:
print("Data already exists")
else:
print("file saved successfully")
new_str
it will create a new row whenever it encounters new line character. So while iterating it in the last step you get only the first line of the text inline
, whereasimg
contains entire extracted text. – Satheesh K