1
votes

I need to parse the pcap files and count the packets separately (TCP,UDP,IP). I found a lot of libraries for this like pcap, jnetpcap but I want to do this without using any external libraries.I do not need a code but a just a conceptual explanation.

Question

While parsing pcap files how should I distinguish between the frames(be it TCP,UDP,IP). I tried reading about the format but what I do not understand is how would I come to know about how many bytes should I read for a particular frame and how would i know what type of a frame is it.Because only once I am able to extract the packets separately I will be able to filter out other information.

1
"I want to do this without using any external libraries" - then dive into the specs and reinvent the wheel. Asking us how to do so is too broad.CodeCaster
pcap captures the data like they are on the wire. To find out if it is IP, TCP, UDP.. you have to understand how these protocols are transmitted. You will not find these things in the pcap specification but in the IP, TCP... specifications.Steffen Ullrich

1 Answers

1
votes

You'd have to parse each frame separately and have a counter for each value you are trying to count. Assuming the capture you are examining is in pcap/pcapng format you might find libpcap helpful.

To give a quick run of what you might have to do (assuming the lower level is Ethernet without VLAN tags)

uint64_t ip_count, tcp_count, udp_count;

void parse_pkt(uint8_t *data, uint32_t data_len) {
    uint8_t *ether_hdr = data;
    uint16_t ether_type = ntohs(*(uint16_t *) (data + 12))

    if (ether_type != 0x800) {
        return;
    }
    ip_count += 1;

    uint8_t *ip_hdr = data + 14;
    protocol = ntohs(*(uint16_t *) (ip_hdr + 9))
    //protocol is either udp/tcp/sctp...etc
    if (protocol == 0x11) {
        udp_count++;
    } else if (protocol == 0x06) {
       tcp_count++;
    }
}

// foreach pkt from libpcap_open call parse_pkt with the data and data_len

This code is fragile. Jumping to direct offsets without the proper length and type checks is not a good idea.