I have been learning Gnuplot for about a day now and I would like to use boxplot to spot outliers from a Data Set at a Glance.
So let us say I am conducting an experiment:
- On 10 subjects
- I make the 10 subjects repeat a task for a 100 times,to reach 3 specific targets.
- I collect how many times they reach Target1, Target2, Target3.
Those result are gathered in the file data_File_new.dat described below:
Name Target1 Target2 Target3
subject1 10 30 50
subject2 11 31 51
subject3 9 29 49
subject4 12 32 52
subject5 8 28 48
subject6 13 33 53
subject7 7 27 47
subject8 50 34 54
subject9 6 50 46
subject10 15 35 20
Now I create a boxplot from this data
file = 'data_File_new.dat'
header = system('head -1 '.file);
N=words(header)
set title 'BoxPlot Subject Success'
set ylabel 'Number Of Success'
set xtics border in scale 0,0 nomirror norotate offset character 0, 0, 0 autojustify
set xtics norangelimit
set xtics rotate -45
set xtics ('' 2)
set for [i=2:N] xtics add (word(header, i) i)
set style data boxplot
plot for [i=2:N] file using (i):i
So the result is a boxplot with outliers being plotted as solid points (I wanted to post the picture but I need 10 reputation to post the image). It tells me whether there are outliers or not. However I want to know more I want to know who are the outliers, that is:
- Subject 8 is an outlier for Target 1
- Subject 9 is an outlier for Target 2
- Subject 10 is an outlier for Target 3
Since Gnuplot knows these points are outliers, I expect Gnuplot to store them in some kind of list. I would like to tell Gnuplot 'plot the outliers and label them with the word of the first column (subjectx) corresponding to the line they belong to'.
Then when I open the boxplot I can identify at a glance not only there are outliers but also who they are.
Does anyone know how to do this? I looked on the forum and saw some people doing this in R but not in Gnuplot.
labels
plotting style for some labelling. – Christoph