My goal is to do some regex and some processing on the data (line based) that comes out of a process. Since I've already got a bunch of tool in perl, I decided to use perl to solve my problem.
Let's say a process that output a large file for example :
cat LARGEFILE.txt | grep "A String"
Obviously the process I want to call is not "cat" but something that output a bunch of lines (typically 100 GB of data).
I had doubt about the performance of my perl program and I started to strip down code to the minimum. I realized that my problem might come from the way I read the output from the command in perl.
Here's my perl script :
#!/usr/bin/perl
use strict;
open my $fh, "cat LARGE.txt |";
while (<$fh>) {
print $_ if $_ =~ qr/REGEX NOT TO BE FOUND/o;
}
I decided to compare my program with a simple bash command :
cat LARGE.txt | grep "REGEX NOT TO BE FOUND"
Results :
time cat LARGE.txt | grep "REGEX NOT TO BE FOUND"
real 0m0.615s
user 0m0.352s
sys 0m0.873s
time ./test.pl
real 0m37.339s
user 0m36.621s
sys 0m1.766s
In my example, LARGE.txt file is about 1.3GB.
I understand that the perl solution might be slower than the cat | grep example, but I was not expecting that much difference.
Is there something wrong with my way of reading the output of a command ?
P.S. I use perl v5.10.1 on a Linux box
open my $fh, "cat LARGE.txt |";withopen my $fh, '<', 'LARGE.txt';and try again? - Lee Duhemgrep. A hard disk average read speed rarely exceeds 150MB/s, meaning it should take 15s just to read a 2GB file. - Borodinmy $regex = qr/REGEX NOT TO BE FOUND/o;and in the loop,print $_ if $_ =~ $regex;. - Oktalist