1
votes

I have no problems with Eclipse's remote debugging when running hadoop in standalone mode. However, it does not work when I'm running hadoop in pseudo-distributed mode. Here's how I attempt eclipse remote debugging with hadoop in pseudo-distributed mode :

I add a line to my hadoop script like so :

#added this line to enable remote debugging
HADOOP_OPTS="$HADOOP_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5000"

# run it
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"

And then I create a remote debugging configuration like so :

creating a remote debugging configuration

I run the job from the command line, and it says what it should :

Listening for transport dt_socket at address: 5000

I then go back to eclipse and run the debug configuration. It steps into my main() function like it should :

enter image description here

However, it doesn't hit any of the breakpoints I set in my mapper or reducer.

What's the problem here? How come it worked with hadoop in standalone mode but not pseudo-distributed mode? Is it possible to do remote debugging with hadoop in pseudo-distributed mode? If not, what's the "right" way to debug my mapreduce code in Eclipse?

2
The problem is that in pseudo-distributed mode compared to the standalone mode, the mappers and reducers (to be more precise all the daemons) are running in their own JVM so you can't debug them with just one Eclipse instance which resides in another JVM. If you have a local Hadoop setup then debug your code in standalone mode. Besides, you can use custom counters, logging or MRUnit to find out the root of the problem. - Lorand Bendig
Thanks for the advice! Wrote a couple scripts to toggle between standalone and pseudo-distributed, and now everything's working like a charm. Thanks! - sangfroid
How do you run hadoop in standalone mode? Also, if possible, can you share those scripts that toggle between standalone and pseudo-distributed mode? - Doron Gold
Hai...For me it doesnt hit the break points in stand alone mode...can u share ur method of debugging.. - Reddevil
In my use case the code is working fine in with LocalJobRunner but not working in cluster mode(pseudo distributed mode) - sachingupta

2 Answers

2
votes

See Lorand's comment above. Remote debugging will only work in standalone mode.

2
votes

You can specify:

<property>
  <name>mapred.map.child.java.opts</name>
  <value>-Xdebug -Xrunjdwp:transport=dt_socket,address=5001,server=y,suspend=y</value>
</property>

That will launch the map-task in debug mode. Also see Debugging multiple hadoop jvms with Eclipse