2
votes

(Edit: Histogram removed, not relevant and confusing.)

I want a boxplot which can visualize statistical data. I made two data files for two types of data. In the first column the level, which is the x value, is written, in the second column the value. One line for each data point, several points per level. I want the same levels in the different files compared next to each other. I came up with the following code:

Tournament5 = "#99ffff"; Sigmascaling = "#4671d5"
set terminal pngcairo
set output "generations_dev.png"
set yrange [0:17.5]
set ylabel "Maximum Compactness of Best Solutions"
set xlabel "Number of Generations"

set autoscale fix

set style fill solid 0.25 border -1
set style boxplot nooutliers pointtype 7 separation 3
set style data boxplot
set boxwidth 1

plot "generation_tour.data" using (1.0):2:(0):1, "generation_sig.data" using (2.0):2:(0):1

which gives me the following picture: boxplot

Now my problems/questions are:

  • Why are the x values not stretched to the right but only to the middle? This happens when I add the second file, with just one file it uses the full width of the graphic.
  • I only need one x label for each pair but I get it two times, how can I supress one?

Thank you for any help!

Data File generation_sig.data

Data File generation_tour.data

1

1 Answers

2
votes

Ok, that seems to be a bit tricky.

Two things: seems like gnuplot fails to produce a correct autoscale for the x-values in your plotting case. You would need to set an explicit xrange like you already do for the yrange. Second: gnuplot seems always to use the values given in the levels column as xticlabel, without giving you the change to suppress them.

Here I give you a possible solution which relies on the data file to have blocks with equals values in the first column kept together and separated from other blocks with different values by two empty lines, so that you can access each block via the index keyword and iterate over them:

...
"0" 14.49786677484523
"0" 14.691225516174955


"20" 10.28997920528754
"20" 8.764312035687594
...

Then you can use the following script to plot all those boxplots in the positions you want:

    Tournament5 = "#99ffff"; Sigmascaling = "#4671d5"
set terminal pngcairo
set output "generations_dev.png"
set yrange [0:17.5]
set ylabel "Maximum Compactness of Best Solutions"
set xlabel "Number of Generations"

set autoscale xfix

set style fill solid 0.25 border -1
set style boxplot nooutliers pointtype 7
set style data boxplot
set boxwidth 1

stats "generation_sig.data" using 2 nooutput

plot for [i=0:STATS_blocks-1] "generation_sig.data" using (3*i):2 index i lt 1 title (i==0 ? 'Sigmascaling' : ''),\
     for [i=0:STATS_blocks-1] "generation_tour.data" using (3*i+1):2 index i lt 2 title (i==0 ? 'Tournament 5' : ''),\
     for [i=0:STATS_blocks-1] "generation_sig.data" using (3*i+0.5):(-1):xticlabel(1) index i w l notitle

The stats call is used to count the number of blocks contained in the data file. The third plot is explicitely put outside of the defined yrange. It only produces the xtics in the middle of two boxplots. You could also have used

plot for [i=0:STATS_blocks-1] "generation_sig.data" using (3*i):2:(0):1 index i lt 1,\
     for [i=0:STATS_blocks-1] "generation_tour.data" using (3*i+1):2 index i lt 2

that would give you xtics centered under the first of the two boxplots.

The output is

enter image description here

If you don't want to change the data files and you can use awk, then you could also add the empty lines on-the-fly with

cmd(file) = '< awk ''{if (NR != 1 && $1 != prev) print "\n"; prev=$1; print}'' '.file
plot for [i=0:STATS_blocks-1] cmd("generation_sig.data") using (3*i):2 index i lt 1 # ....