Asynchronous libpcap: losing packets?

Question

I have a program that sends a set of TCP SYN packets to a host (using raw sockets) and uses libpcap (with a filter) to obtain the responses. I'm trying to implement this in an asynchronous I/O framework, but it seems that libpcap is missing some of the responses (namely the first packets of a series when it takes less than 100 microseconds between the TCP SYN and the response). The pcap handle is setup like this:

pcap_t* pcap = pcap_open_live(NULL, -1, false, -1, errorBuffer);
pcap_setnonblock(pcap, true, errorBuffer);

Then I add a filter (contained on the filterExpression string):

struct bpf_program filter;
pcap_compile(pcap, &filter, filterExpression.c_str(), false, 0);
pcap_setfilter(pcap, &filter);
pcap_freecode(&filter);

And on a loop, after sending each packet, I use select to know if I can read from libpcap:

int pcapFd = pcap_get_selectable_fd(pcap);
fd_set fdRead;
FD_ZERO(&fdRead);
FD_SET(pcapFd, &fdRead);
select(pcapFd + 1, &fdRead, NULL, NULL, &selectTimeout);

And read it:

if (FD_ISSET(pcapFd, &fdRead)) {
     struct pcap_pkthdr* pktHeader;
     const u_char* pktData;
     if (pcap_next_ex(pcap, &pktHeader, &pktData) > 0) {
         // Process received response.
     }
     else {
         // Nothing to receive (or error).
     }
}

As I said before, some of the packets are missed (falling into the "nothing to receive" else). I know these packets are there, because I can capture them on a synchronous fashion (using tcpdump or a thread running pcap_loop). Am I missing some detail here? Or is this an issue with libpcap?

It may be the case that you're sending too many requests too quickly, and the server is sending responses faster then you can handle them thus overloading the OS's network buffer and dropping packets. Or it's possible you're receiver socket is not set up in time to handle the initial responses. Can you verify that all the responses you assume you're receiving are actually getting there? Todo this run tcpdump on the same interface as your application simultaneously. If you see all the packets you expect in tcpdump and not in your application, you may have one of the problems above. — ryanbwork
I've already done it (tcpdump on the side, but also pcap_loop a different thread), and all the packets were there. Thus, I don't believe I'm sending the responses too fast. How can I tell if my receiver socket (i.e. libpcap) is not yet setup? This would make sense since the lost responses are always the first one or two. — bruno nery
Even if you see the packets in tcpdump, they could still be dropped by OS if your application can't handle the rate they are received at. In the case that your application is starting after responses are already being sent, try adding some significant delay before sending the initial response from your server; if you successfully receive all responses, you've found your problem. — ryanbwork
Unfortunately, I don't control the server :(. What I'm seeing is pcap_next_ex returning 0 for the first couple packets, even though they are captured by the other thread (and tcpdump). Any other possibility? — bruno nery
Looking closer at the man linux.die.net/man/3/pcap_next_ex, pcap_next_ex returns 0 when a packet is being read from the line but the timeout expires. When you create the pcap object, you specify -1 as the timeout (see linux.die.net/man/3/pcap_open_live for constructor args). Sorry I didn't look closer at this before, but it seems that increasing the to_ms arg to something besides -1 when calling pcap_open_live may give you the results you expect. — ryanbwork

Unknown Unknown · Accepted Answer · 2012-07-25T00:49:38

If the FD for the pcap_t is reported as readable by select() (or poll() or whatever call/mechanism you're using), there is no guarantee that this means that only one packet can be read without blocking.

If you use pcap_next_ex(), you will read only one packet; if there's more than one packet available to be read, then, if you do another select(), it should immediately return, reporting the FD as being readable again, in which case you'll presumably call pcap_next_ex() again, and so on. This means at least one system call per packet (the select()), and possibly more calls, depending on what version of what OS you're doing and what version of libpcap you have.

If, instead, you were to call pcap_dispatch(), with a packet-count argument of -1, that call will return all the packets that can be obtained with a single read operation and process all of them, so, on most platforms, you may get multiple packets with one or two system calls if there are multiple packets available (which, with high network traffic, as you might get if you're testing your program with a SYN flood, is likely to be the case).

In addition, on Linux systems that support memory-mapped packet capture (I think all 2.6 and later kernels do, and most if not all 2.4 kernels do), and with newer versions of libpcap, pcap_next_ex() has to make a copy of the packet to avoid having the kernel change the packet out from under the code processing the packet and to avoid "locking up" a slot in the ring buffer for an indefinite period of time, so there's an extra copy involved.

Asynchronous libpcap: losing packets?

2 Answers