0
votes

please i am trying to analyse a pcap file in python using dpkt. i want to get 1) the number of unique IP addresses, 2) calculate the total number of bytes per flow, 3) total number of packets per flow and 4) the average packet size per flow

I will appreciate it if anyone can help me with the python code for the above question. Thanks

this is what i have done so far

if __name__ == "__main__":

    # Packet Counters
    counter=0
    ipcounter=0
    nonipcounter=0    
    tcpcounter=0
    udpcounter=0
    httpcounter=0
    httpscounter=0
    ipv4counter=0
    ipv6counter=0

    # Subnet Dictionary
    subnets = {}

    # Open file

    # Packet processing loop
    for ts,pkt in dpkt.pcap.Reader(open('tesst.pcap','rb')):
        counter+=1

        # Parse ethernet packet
        eth=dpkt.ethernet.Ethernet(pkt)
        ip=eth.data       

        #check if IP packet or non-ip packet
        if eth.type == dpkt.ethernet.ETH_TYPE_IP or eth.type == dpkt.ethernet.ETH_TYPE_IP6:
            ipcounter = ipcounter + 1
        else:
            nonipcounter = nonipcounter + 1    

        # IPV6 packets
        if eth.type==dpkt.ethernet.ETH_TYPE_IP6: 
            ipv6counter+=1     


        # IPV4 packets
        elif eth.type==dpkt.ethernet.ETH_TYPE_IP:
            ipv4counter+=1

            # Extract destination
            string = socket.inet_ntoa(ip.dst)
            address = '.'.join(string.split(".")[:]) 
            if address in subnets: #increase count in dict
                subnets[address] = subnets[address] + 1
            else: #insert key, value in dict
                subnets[address] = 1            

            # TCP packets
            if ip.p==dpkt.ip.IP_PROTO_TCP: #ip.p == 6: 
                tcpcounter+=1
                tcp=ip.data

                # HTTP uses port 80
                if tcp.dport == 80 or tcp.sport == 80:
                    httpcounter+=1

                # HTTPS uses port 443
                elif tcp.dport == 443 or tcp.sport == 443:
                    httpscounter+=1


            # UDP packets
            elif ip.p==dpkt.ip.IP_PROTO_UDP: #ip.p==17:
                udpcounter+=1
                udp=ip.data


    # Print packet totals
    print ("Total number of ETHERNET packets in the PCAP file :", counter)
    print ("\tTotal number of IP packets :", ipcounter)
    print ("\t\tTotal number of TCP packets :", tcpcounter)
    print ("\t\t\tTotal number of HTTP packets :", httpcounter)
    print ("\t\t\tTotal number of HTTPS packets :", httpscounter)
    print ("\t\t\tTotal number of IPV6 packets :", ipv6counter)
    print ("\t\tTotal number of UDP packets :", udpcounter)    
    print ("\t\tTotal number of IPV4 packets :", ipv4counter)
    print ("\tTotal number of NON-IP packets :", nonipcounter)
    print ("--------------------------------------------------------------")
    other = counter-(arpcounter+httpcounter+httpscounter+ipv6counter)



    # Print addresses
    print ("Address \t \t Occurences")
    for key, value in sorted(subnets.items(), key=lambda t: int(t[0].split(".")[0])):
        print ("%s/16 \t = \t %s" %(key, value))
1
What have you tried so far? Where have you looked?wobr
What have you tried so far? Please post your code and possibly a data sample.EricMPastore
i was able to output the total number of IP packets, TCP packets, UDP packets, IPV4 and IPV6 packetsTeeGee
@EricMPastore i have posted the code of what i have done so farTeeGee
@wobr the code is working perfectly. I am having troubles with the code that can output 1) the number of unique IP addresses, 2) calculate the total number of bytes per flow, 3) total number of packets per flow and 4) the average packet size per flow. i will post the output right awayTeeGee

1 Answers

1
votes

I've taken part in your initial code and added an initial piece of functionality that gathers IP4-TCP packets by flow and then prints the total bytes per flow. You would need to add handling for IP6 and UDP and also for gathering the remaining stats, but hopefully, this gets you most the way there. This assumes a flow is defined wholly by the 4-tuple (src IP, src port, DST IP, DST port). In real life, those values can be reused over multiple flows (eventually), so that assumption doesn't quite hold, but again, hoping this gets you going.

[Edit]:

Hopefully, this meets your requirements:

import dpkt
from functools import reduce
import socket

tflows = {}
uflows = {}
ips = set()

def dumpFlow(flows, flow):
    print(f'Data for flow: {flow}:')
    bytes = reduce(lambda x, y: x+y,
                   map(lambda e: e['byte_count'], flows[flow]))
    duration = sorted(map(lambda e: e['ts'], flows[flow]))
    duration = duration[-1] - duration[0]
    print(f"\tTotal Bytes: {bytes}")
    print(f"\tAverage Bytes: {bytes / len(flows[flow])}")
    print(f"\tTotal Duration: {duration}")


for ts,pkt in dpkt.pcap.Reader(open('/tmp/tcpdump.pcap','rb')):
    eth=dpkt.ethernet.Ethernet(pkt)

    if eth.type==dpkt.ethernet.ETH_TYPE_IP:

        ip=eth.data

        # determine transport layer type
        if ip.p==dpkt.ip.IP_PROTO_TCP:
            flows = tflows
        elif ip.p==dpkt.ip.IP_PROTO_UDP:
            flows = **uflows**

        # extract IP and transport layer data
        src_ip = socket.inet_ntoa(ip.src)
        src_port = ip.data.sport
        dst_ip = socket.inet_ntoa(ip.dst)
        dst_port = ip.data.dport

        # keeping set of unique IPs
        ips.add(src_ip)
        ips.add(dst_ip)

        # store flow data
        flow = sorted([(src_ip, src_port), (dst_ip, dst_port)])
        flow = (flow[0], flow[1])
        flow_data = {
            'byte_count': len(eth),
            'ts': ts
        }

        if flows.get(flow):
            flows[flow].append(flow_data)
        else:
            flows[flow] = [flow_data]


print(f'Total TCP flows: {len(tflows.keys())}')
print(f'Total UDP flows: {len(uflows.keys())}')
print(f'Total IPs: {len(ips)}')

for k in tflows.keys():
    dumpFlow(tflows, k)
for k in uflows.keys():
    dumpFlow(uflows, k)