I'm trying to build a CBIR system and recently wrote a program in Python using OpenCV functions that lets me query a local database of images and return a result (followed this tutorial). I now need to link this up with another web scraping module (used Scrapy) wherein I output ~1000 links to images online. These images are scattered throughout the web and should be input to the first OpenCV module. Is it possible to perform calculations on this online image set without downloading it ?
These are the steps I followed for the OpenCV module
1) Define the region-based color image descriptor
2) Extract features from dataset (Indexing) (dataset to be passed as command line argument)
# import the necessary packages
import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')
from colordescriptor import ColorDescriptor
import argparse
import glob
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required = True,
help = "Path to the directory that contains the images to be indexed")
ap.add_argument("-i", "--index", required = True,
help = "Path to where the computed index will be stored")
args = vars(ap.parse_args())
# initialize the color descriptor
cd = ColorDescriptor((8, 12, 3))
# open the output index file for writing
output = open(args["index"], "w")
# use glob to grab the image paths and loop over them
for imagePath in glob.glob(args["dataset"] + "/*.jpg"):
# extract the image ID (i.e. the unique filename) from the image
# path and load the image itself
imageID = imagePath[imagePath.rfind("/") + 1:]
image = cv2.imread(imagePath)
# describe the image
features = cd.describe(image)
# write the features to file
features = [str(f) for f in features]
output.write("%s,%s\n" % (imageID, ",".join(features)))
# close the index file
output.close()
3) Deifning the similarity metric
# import the necessary packages
import numpy as np
import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')
import csv
class Searcher:
def __init__(self, indexPath):
# store our index path
self.indexPath = indexPath
def search(self, queryFeatures, limit = 5):
# initialize our dictionary of results
results = {}
# open the index file for reading
with open(self.indexPath) as f:
# initialize the CSV reader
reader = csv.reader(f)
# loop over the rows in the index
for row in reader:
# parse out the image ID and features, then compute the
# chi-squared distance between the features in our index
# and our query features
features = [float(x) for x in row[1:]]
d = self.chi2_distance(features, queryFeatures)
# now that we have the distance between the two feature
# vectors, we can udpate the results dictionary -- the
# key is the current image ID in the index and the
# value is the distance we just computed, representing
# how 'similar' the image in the index is to our query
results[row[0]] = d
# close the reader
f.close()
# sort our results, so that the smaller distances (i.e. the
# more relevant images are at the front of the list)
results = sorted([(v, k) for (k, v) in results.items()])
# return our (limited) results
return results[:limit]
def chi2_distance(self, histA, histB, eps = 1e-10):
# compute the chi-squared distance
d = 0.5 * np.sum([((a - b) ** 2) / (a + b + eps)
for (a, b) in zip(histA, histB)])
# return the chi-squared distance
return d
`
4) Perform the actual search
# import the necessary packages
from colordescriptor import ColorDescriptor
from searcher import Searcher
import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')
import argparse
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--index", required = True,
help = "Path to where the computed index will be stored")
ap.add_argument("-q", "--query", required = True,
help = "Path to the query image")
ap.add_argument("-r", "--result-path", required = True,
help = "Path to the result path")
args = vars(ap.parse_args())
# initialize the image descriptor
cd = ColorDescriptor((8, 12, 3))
# load the query image and describe it
query = cv2.imread(args["query"])
features = cd.describe(query)
# perform the search
searcher = Searcher(args["index"])
results = searcher.search(features)
# display the query
cv2.imshow("Query", query)
# loop over the results
for (score, resultID) in results:
# load the result image and display it
result = cv2.imread(args["result_path"] + "/" + resultID)
cv2.imshow("Result", result)
cv2.waitKey(0)
And the final command line command is:
python search.py --index index.csv --query query.png --result-path dataset
where index.csv is the file generated after step 2 on the database of images. query.png is my query image and dataset is the folder containing the ~100 images.
So is it possible to modify the indexing such that I don't need a local dataset and to be querying and indexing can be done directly from the list of URLs ?
downloading procedure
? How to work onunowned
file(s) ? – dsgdfg