I am trying to read a log file whose contents look like this:
2013-03-28T12:19:03.639648-05:00 host1 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
2013-03-28T12:20:33.158823-05:00 host2 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
I have tried using the PigStorage space delimiter like so:
cmessages = LOAD 'data.txt' USING PigStorage(' ') AS (date:chararray, host:chararray, message:chararray);
But that kills the message in the third field, which I think might be useful later.
dump cmessages;
<snip>
(2013-03-28T12:19:03.639648-05:00,host1,rpcbind:)
(2013-03-28T12:20:33.158823-05:00,host2,rpcbind:)
</snip>
Is there a better way to read in this log file that doesn't require costly regular expressions or a UDF loader? There should be something in Pig that maybe says stop after the second space? Maybe not.
UPDATE: Just to revise what I want: Instead of
(2013-03-28T12:19:03.639648-05:00,host1,rpcbind:)
I'd like:
(2013-03-28T12:19:03.639648-05:00, host1, rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w")
Essentially, I want the full log message in the last field of the tuple. I hope that is clearer.