0
votes

Lets say I have a client script that pulls a large size of data from hadoop. What functionality in hadoop gives me advantage of looking at the retrieved data and ask for (point out) a missing part of data, to make a specific request just to read that missing part? Is this functionality a part of datanode map or reduce?

Thanks

1

1 Answers

0
votes

There is no direct way to achieve this. Once your script has pulled the data and written it to HDFS it is just another piece of data. It has nothing to do with rest of your data. You have to read it along with the data you want to compare it with and do the comparison yourself by writing some comparison logic which suits your needs.

To start with you can have a look at MultipleInputs.

P.S. : If you are able to find something which does this for you please share it with us. It'll be of great value. Many thanks.