I want to read gzip compressed files into a RDD[String]
using an equivalent of sc.textFile("path/to/file.Z")
.
Except my file extension if not gz
but is Z
instead, so the file is not recognised as being gzipped.
I cannot rename them as it would break production code. I do not want to copy them as they are massive and many. I guess I could use some kind of symlinks but I want to see if there is a way with scala/spark first (I am on my local windows machine for now).
How can I read this file efficiently?