I'm writing a program to reconstruct TCP streams captured by Snort. Most of the examples I've read regarding session reconstruction either:
- load the entire pcap file in to memory to start with (not a solution because of hardware constraints and the fact that some of the capture files are 10 GB in size), or
- cache each packet in memory as it reads through the capture and discards the irrelevant ones as it goes; this presents basically the same problems as reading the entire file in to memory
My current solution was to write my own pcap file parser since the format is simple. I save the offsets of each packet in a vector and can reload each one after I've passed it. This, like libpcap, only streams one packet in to memory at a time; I am only using sequence numbers and flags for ordering, NOT the packet data. Unlike libpcap, it is noticeably slower. processing a 570 MB capture with libpcap takes roughly 0.9 seconds whereas my code takes 3.2 seconds. However, I have the advantage of being able to seek backwards without reloading the entire capture.
If I were to stick with libpcap for speed issues, I was thinking I could just make a currentOffset
variable with an initial value of 24 (the size of the pcap file global header), push it to a vector every time I load a new packet, and increment it every time I call pcap_next_ex
by the size of the packet + 16 (for the size of the pcap record header). Then, whenever I wanted to read an individual packet, I could load it using conventional means and seek to packetOffsets[packetNumber]
.
Is there a better way to do this using libpcap?