0
votes

Team, I would like to extract No_of_ARP_Request, No_of_TCP_SYN, Number_UDP_138, NBNS, MDNS, IGMP, ICMP data Src_MAC_Address, Dest_MAC_Address, Src_Port, Dest_Port etc features from wireshark pcap file.
This is to inform, I have already extracted features and save as CSV for ARP data by DPKT. may any one can have better suggestion or code for how to extract all features by DPKT and save as CSV. Thank you.

def arp_analys(filename):
    with open("../data/" + filename + ".pcap", 'rb') as f:

        pcap = dpkt.pcap.Reader(f)

        requests = []
        replies = []

        for ts, buf in pcap:

            eth = dpkt.ethernet.Ethernet(buf)
            # If the packet is not arp

            if eth.type != 2054:
                continue
            try:
                arp = eth.arp
            except Exception as e:
                continue

            packet_time = datetime.datetime.utcfromtimestamp(ts).strftime("%m/%d/%Y,%H:%M:%S")

            src = dpkt.socket.inet_ntoa(arp.spa)
            tgt = dpkt.socket.inet_ntoa(arp.tpa)


            # Src and Dest MAC

            from src.arpbasic import mac_addr
            s_mac = mac_addr(eth.src)
            d_mac = mac_addr(eth.dst)
1

1 Answers

0
votes

You can extract features (fields) from a dump easily, using the tshark's -e option:

-e Add a field to the list of fields to display if -T ek|fields|json|pdml is selected. This option can be used multiple times on the command line. At least one field must be provided if the -T fields option is selected. Column names may be used prefixed with "_ws.col."

Example: tshark -e frame.number -e ip.addr -e udp -e _ws.col.Info

Giving a protocol rather than a single field will print multiple items of data about the protocol as a single field. Fields are separated by tab characters by default. -E controls the format of the printed fields.

$ tshark -r dump -e tcp.srcport -Tjson
[
  {
    "_index": "packets-2019-04-14",
    "_type": "pcap_file",
    "_score": null,
    "_source": {
      "layers": {
        "tcp.srcport": [
          "42130"
        ]
      }
    }
  }
]

In case you'd like to do some post-processing with the data in python I'd recommend using tshark -T and then parsing this output in your code.

Something like pdml2frame can help you with the parsing. It should be simple to write a new plugin which does what you want.

Disclosure: I did write pdml2flow.