0
votes

I have my data sources in CSV and text file formats and I want to run Hadoop Map-reduce jobs.

how do I convert the data sources into Hadoop Sequence file format and store in HDFS.

1

1 Answers

0
votes

The simplest solution for you to convert them to Sequence Files is to run a mapreduce with your default Mapper and Reducer. You will need to specify the OutputFormatClass as SequenceFileOutputFormat. Here is the relevant portion of the driver code.

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(SequenceFileOutputFormat.class);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(Text.class);

    // Default Mapper, specified just for clarity
    job.setMapperClass(Mapper.class);
    // Default reducer
    job.setReducerClass(Reducer.class);