I defined my own input format as follows which prevents file spliting:
import org.apache.hadoop.fs.*;
import org.apache.hadoop.mapred.TextInputFormat;
public class NSTextInputFormat extends TextInputFormat {
@Override
protected boolean isSplitable(FileSystem fs, Path file) {
return false;
}
}
I compiled this using Eclipse into a class NSTextInputFormat.class. I copied this class to a client from where the job is launched. I used following command for launching the job and passing above class as inputformat.
hadoop jar $HADOOP_HOME/hadoop-streaming.jar -Dmapred.job.queue.name=unfunded -input 24222910/framefile -input 24225109/framefile -output Output -inputformat NSTextInputFormat -mapper ExtractHSV -file ExtractHSV -file NSTextInputFormat.class -numReduceTasks 0
This fails saying: -inputformat : class not found : NSTextInputFormat Streaming Job Failed!
I set the PATH and CLASSPATH variable to the directory containing NSTextInputFormat.class, but still that doesnot work. Any pointers to this will be helpful.