I would like to read compressed files directly from Google Cloud Storage and open them with the Python csv package. The code for a local file would be:
def reader(self):
print "reading local compressed file: ", self._filename
self._localfile = gzip.open(self._filename, 'rb')
csvReader = csv.reader(self._localfile, delimiter=',', quotechar='"')
return csvReader
I have played with several GCS APIs (JSON based, cloud.storage), but none of them seem to give me something that I can stream through gzip. What is more, even if the file was uncompressed, I could not open the file and give it to cv.reader (Iterator type).
My compressed CSV files are about 500MB, while uncompressed they use up to a few GB. I don't think it would be a good idea to: 1 - locally download the files before opening them (unless I can overlap download and computation) or 2 - Open it entirely in memory before computing.
Finally, I current run this code on my local machine, but ultimately, I will move to AppEngine, so it must work there too.
Thanks!!