0
votes

Aim:- To parse and load log data into HIVE using SerDe feature.Facing an issue while retrieving the data using SELECT statement.

We created a table and are able to successfully load data. However, the select statement retrieves only NULL values.

Sample log data:

2013-02-21 00:13:48,916 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5729677439273359430_1495

The RegEx we came up with to parse the above log is :

([^ ]*) ([^ ]{8})[^ ]* ([A-Z]*) ([^ ]*): ([[^ ]*\s]*)

Create Table

CREATE EXTERNAL TABLE log (
dt STRING,
time STRING,
loglevel STRING,
check STRING,
status STRING )
ROW FORMAT
SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex"([^ ]*) ([^ ]{8})[^ ]* ([A-Z]*) ([^ ]*): ([[^ ]*\s]*)",
"output.format.string"="%1$s %2$s %3$s %4$s %5$s")
STORED AS TEXTFILE LOCATION '/tmp/log/';

We added the jar:

add jar /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u4.jar;

Load data:

load data local inpath "/tmp/logdata.txt" into table log;

Retreive data:

Select * from log LIMIT 1;

Output:

NULL NULL NULL NULL NULL

Sample Log data:

2013-02-21 00:13:48,916 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
Verification succeeded for blk_5729677439273359430_1495

2013-02-21 00:15:39,929 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
Verification succeeded for blk_-4787916211671845946_1464

Thanks in Advance!!

2

2 Answers

0
votes

Please try this, Rubular link:

([^ ]*) ([^ ]{8})[^ ]* ([A-Z]*) ([^ ]*): (.*)
0
votes

Looks like you should add "=" following "input.regex"

and usually, this kind of error is caused by regular expression doesn't FULLY match the input.