1
votes

I have a csv file that has several columns. Examples,

"00000089-6d83-486d-9ddf-30bbbf722583","2011-09-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"

These are samples line from a huge file I need to parse. I need to select only those lines from this file where the 4th column is within a certain list (say 1000, 2000, .....) and second column between certain dates (say 2011-11-01 00:00:00 to 2011-11-15 00:00:00).

So, how do I do those date selection and only output those line in tab delimited form.

In the example only the second row would be chosen and saved in tab delimited form in another file.

4

4 Answers

2
votes

Using Parse::CSV, here is a way to do the job:

#!/usr/local/bin/perl 
use Modern::Perl;
use Parse::CSV;

my $parser = Parse::CSV->new(
    file => 'text.csv',
);
while ( my $value = $parser->fetch ) {
    if ($value->[3] > 1000 && $value->[3] <= 2000
      && $value->[1] gt '2011-11-01 00:00:00' 
      && $value->[1] lt '2011-11-15 00:00:00' ) {
        say "$value->[0] --> OK";
    }else {
        say "$value->[0] --> KO";
    }
}

output:

00000089-6d83-486d-9ddf-30bbbf722583 --> KO
000004c9-92c6-4764-b320-b1403276321e --> OK

You can also use the filter capability:

my $parser = Parse::CSV->new(
    file => 'text.csv',
    filter => sub{
            if ($_->[3] > 1000 && $_->[3] <= 2000
             && $_->[1] gt '2011-11-01 00:00:00' 
             && $_->[1] lt '2011-11-15 00:00:00' ) {
               return $_;
            }else {
                return undef;
            }
        }
);

while ( my $value = $parser->fetch ) {
    # do what you want with the filtered rows
}
1
votes

you may want to take a look at Time::Piece, use it like this (for instance):

# use strftime() formats.
my $time = Time::Piece->strptime($date, "%Y%m%d %H:%M");

(Apply the relevant strftime format for you data)

1
votes

First, that looks like CSV, so you should use Text::CSV_XS (or Text::CSV) to parse it. The "standard" module to use to handle dates/times in Perl is DateTime which goes along with DateTime::Format::ISO8601 or similar, but Date::Parse is also a possibility.

0
votes
#!/usr/bin/env perl
use strict;
use warnings;

use 5.010;
use utf8;
use Carp;
use Date::Parse;
use English qw(-no_match_vars);

our $VERSION = '0.01';

my @list = qw(1000 2000 3000);

#say "@list";
# if ( '1000' ~~ @list ) {
# say 'done';
# }

#s (say 2011-11-01 00:00:00 to 2011-11-15 00:00:00).

my $start_date = str2time('2011-11-01 00:00:00');
my $end_date   = str2time('2011-11-15 00:00:00');

#my $input_time    = str2time($input_date);
my $RGX_FOUR_FULL = qr{"([^"]+)","([^"]+)","([^"]+)","([^"]+)","([^"]+)"}smo;
my $RGX_DATE_FULL = qr{.*"(\d{4}-\w{2}-\d{2} \d{2}:\d{2}:\d{2})".*}smo;
my @input_data    = <DATA>;

my @res =
grep {
      extract_time($_) >= $start_date
  and extract_time($_) <= $end_date
  and ( extract_four($_) ~~ @list )
} @input_data;

print @res;

#say 'Z';

sub extract_time {
    my ($search_str) = @_;
    $search_str =~ s/$RGX_DATE_FULL/$1/sm;
    return str2time($search_str);
}

sub extract_four {
    my ($search_str) = @_;
    $search_str =~ s/$RGX_FOUR_FULL/$4/sm;
    chomp($search_str);
    #print $search_str;
    return $search_str;
}

__DATA__
"00000089-6d83-486d-9ddf-30bbbf722583","2011-08-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"00000089-6d83-486d-9ddf-30bbbf722583","2011-09-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-10 14:52:30","INTNAME","4000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","3000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"

and you get

"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","3000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"