I'm trying to set up a pipeline to extract, via awk, certain fields and the ascii data (source IP, target IP, and payload) from each packet in a stream of packets captured by tcpdump, but I'm having difficulty. I think the problem is that the payload are arbitrary and it's hard to find a fixed structure one can use to filter it into a record via awk. Here's my current command:
sudo tcpdump -i en1 -A -q -l | awk ' { print "fields are $3 $5 $8} '
Here is a single line of the output I'm trying to filter:
12:45:23.890302 IP 10.0.1.3.52695 > weblnb.fogcreek.com.http: tcp 739
E....M@.@...
T.........P-.....&.....
2U......GET /default.asp?pg=pgRss&ixDiscussGroup=5 HTTP/1.1
Host: discuss.joelonsoftware.com
User-Agent: Vienna/2.6.0.2601
Accept: */*
Accept-Encoding: gzip
Accept-Language: en-us
Cookie: __utma=261409944.1875583.1351297139.1362842383.1362868129.78; __utmz=261409944.1358134504.43.4.utmcsr=joelonsoftware.com|utmccn=(referral)|utmcmd=referral|utmcct=/; fb_SessionId=qc48cvnjvacl3jeo76l8qv69emn119; DBID=LTOJIXRXTFAPXDGFBKCAYLVCILYFCA; fbToken=lqdf3avvfodabtfvd5c4drt18107B8; sUniqueID=20121026230417-66.117.217.10-slb5btkgb5; __utma=131697940.47826445.1351869116.1360335377.1361680499.5; __utmz=131697940.1361680499.5.2.utmccn=(referral)|utmcsr=statcounter.com|utmcct=/p8568424/exit_link_activity/|utmcmd=referral
Connection: keep-alive
The desired output from this filter is
10.0.1.3.52695 weblnb.fogcreek.com.http: { E....M@.@...
T.........P-.....&.....
2U......GET /default.asp?pg=pgRss&ixDiscussGroup=5 HTTP/1.1
Host: discuss.joelonsoftware.com
User-Agent: Vienna/2.6.0.2601
Accept: */*
Accept-Encoding: gzip
Accept-Language: en-us
Cookie: __utma=261409944.1875583.1351297139.1362842383.1362868129.78; __utmz=261409944.1358134504.43.4.utmcsr=joelonsoftware.com|utmccn=(referral)|utmcmd=referral|utmcct=/; fb_SessionId=qc48cvnjvacl3jeo76l8qv69emn119; DBID=LTOJIXRXTFAPXDGFBKCAYLVCILYFCA; fbToken=lqdf3avvfodabtfvd5c4drt18107B8; sUniqueID=20121026230417-66.117.217.10-slb5btkgb5; __utma=131697940.47826445.1351869116.1360335377.1361680499.5; __utmz=131697940.1361680499.5.2.utmccn=(referral)|utmcsr=statcounter.com|utmcct=/p8568424/exit_link_activity/|utmcmd=referral
Connection: keep-alive}
Note: the level of abstraction here is not limited to the single specific example above. The general structure of the filtered output should look like this:
$sourceip $targetip {$raw_packet_data/payload,_could_be_http_stream_or_just_plain_gibberish}
The ending demarcation of the payload field should be the start of the next packet, cf. $sourceip.
And the awk filter should capture every line of the tcpdump output stream in this fashion, not just a single line.
Any suggestions on how to implement this?