0
votes

We have files in HDFS with raw logs, each individual log is a line as these logs are line separated.

Our requirement is that to add a text (' 12345' for e.g. ) by the end of every log in these files ... using pig / hadoop command / or any other map reduce based tool.

Please advice

Thanks AJ

2
What have you tried so far? Please post so we can try and help - Raquel GuimarĂ£es
You just need a mapper to do this. What have you tried so far? - user238607

2 Answers

0
votes

Load the files where each log entry is loaded into one field i.e. line:chararray and use CONCAT to add the text to each line.Store it into new log file.If you want the individual files then you will have to parameterize the script to load each file and store into a new file instead of wildcard load.

Log = LOAD '/path/wildcard/*.log' USING TextLoader(line:chararray);
Log_Text = FOREACH Log GENERATE CONCAT(line,'Your Text') as newline;
STORE Log_Text INTO /path/NewLog.log';
0
votes

If your files aren't extremely large, you can do that with a single shell command.

hdfs dfs -cat /user/hdfs/logfile.log | sed 's/$/12345/g' |\
hdfs dfs -put - /user/hdfs/newlogfile.txt