Related to my question I have a streaming process written in Python.
I notice that each Reducer
gets all the values associated with multiple keys through sys.stdin
.
I would prefer to have the sys.stdin only have the values associated with one key. Is this possible with Hadoop? I figure a different process per key would be perfect, but can't find a configuration that gives this behavior.
Can someone assist me with information or code that can aid me with this?