Extracting packet informations using C++

Question

I have been messing with Wireshark for a while, and I wonder if anyone could help me. I have recorded a random browsing with it, and I saved it to a pcap file. I would like to create a C/C++ program (I know many exist, but I want to practise) that extracts every info from the packets, like source and target IP, port used, data, etc. My finish goal in learning is to extract an image or a Youtube video or anything from the stream (I know, I'll have to group the packets and sort them, etc.), but that's a later project I guess. :)

I am using libpcap (on Linux), and my code so far can read the offline file packet by packet, and - since I know they are PPP packets in my case - if I load a self defined structure with the info from the 20th byte of the packet, I can view the mac addresses and the ip addresses.

My problems:

1) How do I know/determine without Wireshark that what kind of data link type, is used? (Ethernet, WiFi, PPP, etc)

2) How do I read further data of packets? If I just read one byte, my program doesn't do anything, every variable gets empty.

I have a ppphdr struct, which contains:

u_int16_t htype;
u_int16_t ptype;
u_char hplen;
u_char plen;
u_int16_t oper;
u_char sha[6];
u_char spa[4];
u_char tha[6];
u_char tpa[4];

And I call this for every packet:

pppheader = (struct ppphdr*)(packet+20);

Because the ppp frame starts from the 20th byte. It gives back sender and target mac and IP address.

After I continue reading the next few bytes, with the same call just different struct, it comes back empty, and the program stops after 1 packet. I'm trying to use this guide: http://www.tcpipguide.com/free/t_PPPGeneralFrameFormat.htm

You will have much higher chance of getting a reasonable answer, if you add a bit of code to the question, and also some information about what capture files you have, exactly. Especially problem 2 sounds like a problem with your code. — hyde
(continuing from above..), once you have the frame, then you can extract the ip headers, and then from there either tcp/udp headers, and each level will expose some specific bits of information.. — Nim
I think I would recommend looking at the source code for tshark, tcpdump and other such tools. Basically, you need to know the structure of each layer of protocol headers/trailers for every type of packet you're interested in, which is, well... quite a lot of information... — twalberg

Vlad Lazarenko Vlad Lazarenko · Accepted Answer · 2013-11-20T16:29:01

How do I know/determine without Wireshark that what kind of data link type, is used? (Ethernet, WiFi, PPP, etc)

Wireshark itself works with different file formats. Two of them that you are probably interested in are "pcap" and "pcap-ng".

If you have recorded data in "pcap" format, the link type is stored in the "Link-layer header type" field in the pcap file header; see the pcap-savefile man page.

If you have recorded data using "pcap-ng" format, then link type is stored in Interface Description Block.

You can read more about these two formats here and there.

If you are reading a pcap or pcap-ng file with libpcap, the pcap_datalink() routine will return a DLT_ value specifying the link-layer header type. See the list of link-layer header types for a description of the DLT_ values and the headers that correspond to them. DLT_EN10MB is for Ethernet (the "10MB" is historical - it's used for all Ethernet speeds); DLT_PPP is the most likely type for PPP. If you have Wi-Fi packets with Wi-Fi headers (if you don't capture in monitor mode, you'll probably get Ethernet headers, and DLT_EN10MB on Wi-Fi adapters), you'll get DLT_IEEE802_11; if you also have "radio metadata" headers before the 802.11 headers, you'll get something such as DLT_IEEE802_11_RADIO or DLT_IEEE802_11_RADIO_AVS or DLT_PRISM_HEADER.

Do NOT assume what the link-layer header type is for the packets you will get from libpcap. ALWAYS call pcap_datalink() to determine the link-layer header type, and use that to parse the packets; if your code doesn't know how to parse packets for a particular DLT_ value, it should report an error and exit.

How do I read further data of packets? If I just read one byte, my program doesn't do anything, every variable gets empty.

Assuming that you record Ethernet data, you need to parse/process data in accordance with standard specifications. For example, first parse Ethernet frame. Even at that point, Ethernet frame can be of variable length. For example, given that tcpdump/wireshark does not record Preamble field, you need to read 15 octets to determine how much more you can/should read.

After you are done with Ethernet frame, you need to parse IP, then possibly UDP and/or TCP. Some other data can be in other formats, but in each and every case you have to carefully study the format specification and parse the data accordingly. Reading one byte will not get you anywhere. So I'd recommend you to start from learning basic network layers - Ethernet, IP, UDP, first, and then get back to the problem of parsing them.

At the end of the day, Wireshark is an open source program that does most of what you want to do as an exercise. Meaning that you can always download the source code, see what it does and learn from it.

Hope it helps. Good Luck!

Extracting packet informations using C++

1 Answers