1
votes

I recently found out this solution to less through compressed gz files parellelly based on the cores available.

find . -name "*.gz" | xargs -n 1 -P 3 zgrep -H '{pattern to search}'

P.S. 3 is the number of cores

I was wondering if there was a way to do it for bz2 files as well. Currently I am using this command:

find -type f -name '*.bz2' -execdir bzgrep "{text to find}" {} /dev/null \;
1
Just substitute bzgrep for zgrep in the xargs? - blm
Eh? It's not the less that's parallelized, it's the grepping. Actually, you don't have less in your question anywhere at all... and you're parallelizing the easy way, multiple files at the same time but only one thread of execution per file, as opposed to the only-sometimes-possible way, decompressing the same file from multiple points in parallel (which requires the compressor to be configured to periodically reset itself and build a new table -- enabling parallel decoding at some cost to performance and output size). - Charles Duffy
Also, your current version of the gzip one won't work for all possible filenames, since it's taking the output from find in line-oriented form, but filenames are allowed to contain literal newlines. To be fully safe, you need to use NUL delimiters (which can't exist in filenames or other content represented by C strings). - Charles Duffy

1 Answers

4
votes

Change *.gz to *.bz2; change zgrep to bzgrep, and there you are.

For a bit of extra safety around unusual filenames, use -print0 on the find end and -0 on the xargs:

find . -name "*.bz2" -print0 | xargs -0 -n 1 -P 3 bzgrep -H '{pattern to search}'