Debugging a Python script that randomly hangs and uses 100% of a processor core

Question

I'm currently working on a fairly complex multithreaded Python script. There is one main function that is run in about 5 threads at a time. I've been having some issues with it hanging and using 100% of the processor core that it is running on. This hanging occurs after the main function has been run hundreds of times, so it's hard to pinpoint exactly when or where it is happening. Once the program hangs, it never starts running again.

It seems that it is only one thread that hangs at a time, so I didn't really understand why it was hanging the entire program. That's when I found this Stack Overflow solution that explained, "In some Python implementations, only one Python thread can execute at a time. Threads in CPython are only really useful for multiplexing IO operations, not for putting CPU-intensive tasks in the background." So when one thread hangs with full CPU usage, the entire program understandably comes to a halt.

Below is a screenshot of Process Explorer's view of the python.exe process when the program has hung. As you can see, only one thread is actually doing something.

Process Manager screenshot

I'd like to be able to analyze exactly what lines were executed before the script hung. I don't really know where I could insert a breakpoint using "import pdb; pdb.set_trace()" because I don't know when or where it'll screw up. I can't manually step through the program since it takes 30 minutes to a few hours of running for it to hang. I tried looking through my script to find any obvious infinite loops that could result or anything like that, but I can't seem to figure out what is causing the hang.

My question is this: how would I go about debugging this? I'd ideally just like to see what lines were executed right before it hung, but I don't even know how to detect when it hangs. I can't post the full script here, so hopefully someone will know how I can debug this. Thanks in advance.

One advice is that a thread in Python can block entire program only on atomic operations, for example if you are doing sorting of a very large list. — freakish
Have you tried to signal a KeyboardInterrupt and see where the traceback originates? — moooeeeep
@Marcin That looks interesting. I'll try it and see if it helps. Thanks! — Nicholas

varun varun · Accepted Answer · 2013-06-03T21:20:59

This might help src https://softwareengineering.stackexchange.com/questions/126940/debug-multiprocessing-in-python

import multiprocessing, logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(multiprocessing.SUBDEBUG)

Debugging a Python script that randomly hangs and uses 100% of a processor core

2 Answers