0
votes

I would like to create a histogram on the occurence of text from a single column dataset using gnuplot. I would like some help please. Example for the dataset is like:

UDP
TCP
TCP
UDP
ICMP
ICMP
ICMP
TCP
1
Single column or single row? How large is the dataset, approx. how many items? - theozh
Sorry, single column. I couldn't put it into a column form. Forgive the formating. The items are 10 unique items that make a up total of 3000 records. - kbb

1 Answers

1
votes

There are similar questions, e.g. gnuplot automatic stack bar graph, however, still a bit different. The following examples creates some test data. If you know the keywords already and want to have them in a certain order, skip the step of creating a unique list and define Uniques = '...' yourself. It might be advantageous to enclose the items into double quotes in case you have keywords which include spaces.

  • create a unique list of your keywords.
  • define a lookup function via (mis)using the sum function (check help sum)
  • use the plot option smooth (check help smooth frequency) by taking the lookup index as x

Code:

### histogram: occurrences of keywords
reset session

# create some random test data
myKeywords = 'UDP TCP ICMP ABC WWW NET COM FTP HTTP HTTPS'
set print $Data
    do for [i=1:3000] {
        print word(myKeywords,int(rand(0)*10)+1)
    }
set print

# create a unique list of strings from a column
addToList(list,col) = list.( strstrt(list,'"'.strcol(col).'"') > 0 ? '' : ' "'.strcol(col).'"')
set table $Dummy
    plot Uniques='' $Data u (Uniques=addToList(Uniques,1),'') w table
unset table

N = words(Uniques)
Lookup(s) = (sum [_i=1:N] (s eq word(Uniques,_i)) ? _idx=_i : 0), _idx)

set xrange [1:N]
set xtics out
set ylabel "Counts"
set grid x,y
set offsets 0.5,0.5,0.5,0
set boxwidth 0.8

set style fill transparent solid 0.5 border
set key noautotitle

plot $Data u (Lookup(strcol(1))):(1):xtic(1) smooth freq w boxes
### end of code

Result:

enter image description here