I have a .jpg containing an image of a table which I am attempting to extract to Excel, using Python.
I am following an example from here:
I have hit a problem though, where the horizontal rows are not being identified. In the source image (above) you can see that the horizontal rows are much lighter than the vertical columns, but they are visible in the source and I believe they should still be detected.
I have altered the cv2.threshold value almost every way I can think of, but still this has no affect on the returned image (see below):
- thresh, img_bin = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
- thresh, img_bin = cv2.threshold(img, 0, 256, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
Results in the same image:
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import csv
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
# read your file
file = r'venv/images/iiCrop.jpg'
img = cv2.imread(file, 0)
img.shape
# thresholding the image to a binary image
thresh, img_bin = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# inverting the image
img_bin = 255 - img_bin
cv2.imwrite('venv/images/cv_inverted.png', img_bin)
# Plotting the image to see the output
plotting = plt.imshow(img_bin, cmap='gray')
plt.show()
Is there something obvious, or not so obvious I am doing wrong?