0
votes

I'm using python threads to resolve website IP Addresses. This is my worker process for the resolving. This is a daemon thread.

def get_ip_worker():
    """This is the worker (thread) process for parsing ips, this process takes domain from the q processes it
    and then saves it to another q"""

    socket.setdefaulttimeout(3)
    while True:
        domain = domains_q.get()
        try:
            addr_info = socket.getaddrinfo(domain, 80, 0, 0, socket.SOL_TCP)
            for family, socktype, proto, name, ip in addr_info:
                if family == 2: #okay it's ipv4
                    ip, port = ip
                    processed_q.put((ip, domain))
                elif family == 10: #okay it's ipv6
                    ip, port, no_1, no_2 = ip
                    processed_q.put((ip, domain))
        except:
            pass
            #print 'Socket Error'

        domains_q.task_done()

EDIT: domain = domains_q.get() this line blocks until an item is available in the Queue.

The problem comes when I run this on 300 threads, Load Average seems okay, but simple ls -la takes 5 secs and everything is slow. Where did I go wrong? Should I use async or multiprocessing?

1
are you sure empty queue exceptions are breaking the loop ? - andsoa
domains_q.get() this line blocks until an item is available, I have added it in the post. - nacholibre

1 Answers

0
votes

Do you really need to process 300 connections in parallel by 300 threads? I have never tried creating that many threads, but it may be a problem. And that is definitely not a good way of solving the problem. Usually there are other options. First, you do not need 300 threads to listen 300 connections. Create a number of threads that seem to work on your HW and OS. Use a single thread to retrieve requests from main queue, then pass them to a thread from a thread pool.

BTW, check if your "retrieve from a queue" operation really blocks and waits if the queue is empty. If not, the loops may be executed all the time, not depending on whether there are incoming requests or not.

What you may really need is a non-blocking mode for sockets and something like select.select() to wait until one of your sockets is ready for reading or writing. You may write that code on your own. If you are not eager to do that, probably a good asynchronous networking library like gevent (or twisted) can help to improve the architecture of your program. Utilizing the full power of multicore CPUs is a separate question, but I've heard there are solutions, at least for gevent (they are based on gunicorn that runs several processes; have never tried it). But I think you are experiencing problems not with execution speed, but with a need to effectively wait for I/O on many objects at a time. If so, avoid massive use of threads for that purpose, it is usually ineffective not only in Python, but even in languages without GIL that are better suited for multithreded programming. multiprocessing avoids GIL but adds its own execution costs, so I would suggest not to use it here.