0
votes

Does Hadoop officially support streaming with binary formats as of 0.21?

The hadoop-streaming.jar accepts an inputFormat that is a Java class name. How do you provide the Hadoop streaming job this Java class? Besides executing hadoop-streaming.jar with inputFormat, outputFormat arguments, what else must be done to run a Python streaming job with SequenceFile input/output format?

1

1 Answers

3
votes

It loos like some muddy documentation.

Try reading these answers

How to use Hadoop Streaming with LZO-compressed Sequence Files?

Python read file as stream from HDFS

and don't worry about Sequencefile. It's not worth it.