I'm using Apache Commons Compress library for iterating .tar.gz files. My question is that if I'm iterating over tar file using .getNextTarEntry() can I always assume that tarArchiveEntry objects are descendants of previous entries which are directories. I'm having trouble explaining this in plain English so here is code sample:
try (
FileInputStream fileInputStream = new FileInputStream(tarFile);
GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
TarArchiveInputStream tarArchiveInputStream = new TarArchiveInputStream(gzipInputStream);) {
TarArchiveEntry tarArchiveEntry;
while (null != (tarArchiveEntry = tarArchiveInputStream.getNextTarEntry())) {
if (tarArchiveEntry.isDirectory()) {
currentDirEntry = tarArchiveEntry
} else {
//Is tarAchiveEntry always "child" of currentDirEntry
}
}
}
My problem is that I'm dealing with huge .tar.gz files (several GB large, containing > 100k files) and I don't want to parse parent directory name (they contain important information) for every single file. I'd just like to parse directory name once and assume all next entries are children of this directory. If I hit next directory then this process begins from the start.
I can't use DIY approach since I'm not sure what affects file order when creating .tar.gz files but since tar format doesn't contain any index (as far as i know?), it would make sense that directory entries are listed before their content.
Any help appreciated.