2
votes

I'm using pcap to capture TCP packets for which I would like to parse the payload. My strategy is as follows:

  1. Get the ethernet header and check if it has type ETHERTYPE_IP (IP packet)
  2. Check if the IP packet has protocol IPPROTO_TCP (TCP packet)
  3. Check for payload size > 0 (size = ntohs(ip_header->total_length - ip->header_length*4 - sizeof(struct tcp_header)).

  4. parse payload (grab the host url)

I haven't begun parsing the payload yet because I am getting discrepancies. Below is a printout of the payload of 10 captured TCP packets, using filter "host = www.google.com".

packet number: 3 : TCP Packet: Source Port: 80 Dest Port: 58723 No Data in packet

packet number: 4 : TCP Packet: Source Port: 58723 Dest Port: 80 No Data in packet

packet number: 5 : TCP Packet: Source Port: 58723 Dest Port: 80 Payload : GET / HTTP/1.1 Host: www.google.com User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4 Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5 Accept-Language: en-us Accept-Encoding: gzip, deflate Cookie: THICNT=25; SID=DQAAAKIAAAB2ktMrEftADifGm05WkZmlHQsiy1Z2v- Connection: keep-alive

packet number: 6 : TCP Packet: Source Port: 80 Dest Port: 58723 No Data in packet

packet number: 7 : TCP Packet: Source Port: 80 Dest Port: 58723 Payload: \272نu\243\255\375\375}\336H\221\227\206\312~\322\317N\236\255A\343#\226\370֤\245[\327`\306ըnE\263\204\313\356\3268 )p\344\301_Y\255\267\240\222x\364

packet number: 8 : TCP Packet: Source Port: 58723 Dest Port: 80 No Data in packet

packet number: 9 : TCP Packet: Source Port: 80 Dest Port: 58723 Payload: HTTP/1.1 200 OK Date: Mon, 29 Nov 2010 10:11:36 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=UTF-8 Content-Encoding: gzip Server: gws Content-Length: 8806 X-XSS-Protection: 1; mode=block \213

Why is there a discrepancy in the payloads and the ports? Ideally I would like to only parse packets like packet 5. How do I ignore packets like 7 and 9?

3
It's not clear what part of the packet you want to filter on. I'm assumming it's source or destination port.sashang
I would like to grab the host url. I found that by filtering packets with destination port 80 I manage to "weed-out" the unwanted ones but what happens when someone accesses a url on a non-standard port?David
For example, ssh & ssl encrypt data. Those packets would look like the ones you do not want. FWIW a substantial amount of traffic may be like that. Use a regular expresssion (regex.h) to select packets - regcomp, regexec, etc. Look for packets with blocks of readable characters.jim mcnamara

3 Answers

6
votes

Only by analyzing content. Nothing in IP or TCP header what can mark "HTTP Request" packets. Even "first data packet in connection" wouldnot work because there are persistent connections.

Also, to be completely sure about catching all URIs you need to reassemble TCP stream and parse HTTP request: URI can be split on two or more packets.

3
votes

Like the IP header, the TCP header is variable-length as well. You are not taking that into account. Rather than blindly subtracting sizeof(struct tcp_header)) from the total packet size, you need to locate the TCP header within the IP data, then use its length field (which needs to be multiplied by 4, just like the IP header length field does) to know where the actual data payload is located.

2
votes

Your size calculation is incorrect - you can't do the subtraction in network-host-order, you have to convert each field to host-byte-order first:

size = ntohs(ip_header->total_length) - ntohs(ip->header_length) * 4 - sizeof(struct tcp_header))

However, as Remy Lebeau points out, you actually need to examine the offset field in the TCP header to know where the payload starts.

The difference between packet 5 and packet 7 is that the former is going from the client, to the server, and the latter is a response from the server to the client. This is why the ports are switched around - the source and destination addresses will be switched also.

If you want to only look at packets coming from the client, check that the source address is equal to the client's address.