After long searches on the net, I decide to ask here regarding my problem.I have a CSV file set (36 files total), coming every 5 minutes. Each file contain around 1.5 million lines. I need to process this files in 5 minutes. I have to parse this files and create required directory from them inside the storage zone. Each unique line will be then translated to a file and put inside related directory. Also related lines will be written inside related files. As you see there are lots of I/O operation.
I can finish total 12 files for around 10 minutes. Target is to finish 36 in 5 minutes. I am using PERL to complete this operation. My seen problem is system calls for i/o operations.
I want to control file handlers and I/O buffer in Perl so that I will not have to go to write to file every time. Here is where I got lost actually. Plus creating directories seems also consuming too much time.
I search CPAN ,web to find some lead that can put light on my way but no luck. Does anybody have a suggestion in that subject ? Where should I read or how should I proceed ? I believe that Perl is more than capable to fix this issue, but I guess I am not using correct tools.
open(my $data,"<", $file);
my @lines = <$data>;
foreach (@lines) {
chomp $_;
my $line = $_;
my @each = split(' ',$line);
if (@each == 10) {
my @logt = split('/',$each[3]);
my $llg=1;
if ($logt[1] == "200") {
$llg = 9;
}
my $urln = new URI::URL $each[6];
my $netl = $urln->netloc;
my $flnm = md5_hex($netl);
my $urlm = md5_hex($each[6]);
if ( ! -d $outp."/".$flnm ) {
mkdir $outp."/".$flnm,0644;
}
open(my $csvf,">>".$outp."/".$flnm."/".$time."_".$urlm) or die $!;
print $csvf int($each[0]).";".$each[2].";".$llg."\n";
close $csvf; #--->> I want to get rid of this so I can use buffer
}
else {
print $badf $line;
}
}
Assume that above code is used inside a subroutine and are threaded 12 times. Parameter for above code is filename . I wanna get rid of close. Cause every time I open and close a file makes a call for system I/O which cause slowness. This is my assumption of course and I am more then open to any suggestion
Thanks in Advance
system()
to copy files and make directories? – Filippo Lauriacat *csv > /dev/null
? – mpapec