0
votes

Below sample data input.txt, it has 2 columns key & value. For each record processed by Mapper, the output of map should be written to

1)HDFS => A new file needs to created based on key column

2)Context object

Below is the code, where 4 files need to be created based on key column, but files are not getting created. Output is incorrect too. I am expecting wordcount output, but I am getting character count output.

input.txt
------------
key         value
HelloWorld1|ID1
HelloWorld2|ID2
HelloWorld3|ID3
HelloWorld4|ID4



    public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable> {
        public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException {

            String line = value.toString();
            String[] fileContent = line.split("|");
            Path hdfsPath = new Path("/filelocation/" + fileContent[0]);
            System.out.println("FilePath : " +hdfsPath);

            Configuration configuration = con.getConfiguration();
            writeFile(fileContent[1], hdfsPath, configuration); 

            for (String word : fileContent) {
                Text outputKey = new Text(word.toUpperCase().trim());
                IntWritable outputValue = new IntWritable(1);
                con.write(outputKey, outputValue);
            }       
        }

        static void writeFile(String fileContent, Path hdfsPath, Configuration configuration) throws IOException {

            FileSystem fs = FileSystem.get(configuration);
                FSDataOutputStream fin = fs.create(hdfsPath);
                fin.writeUTF(fileContent);
                fin.close();
        }
    }
1

1 Answers

0
votes

Split uses regexp. You need to escape the '|' like .split("\\|");

See docs here: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html