Gnuplot: dealing with anomalous results

Question

I am using gnuplot version 4.6, patch level 3 on a windows 8 machine, terminal set to wxt.

The file results.csv has a list of energies varying with radius of a sphere. I am using Gnuplot to produce a graph in order to show the trend.

Unfortunately, due to what can only be described as 'numerical instabilities' within the program that was used to compute these energies, results.csv includes anomalous results. Thus plotting results.csv with:

set datafile separator “,”
set title “Oxygen3 Point Vacancy Defect Energy Variation with \n Radius of Region I in Mott-Littleton Defect Model”
set xlabel ‘Radius of Region I (Å)’
set ylabel ‘Defect Energy (eV)’
set grid
unset key
set border 3
set xtics border nomirror outwards
set ytics border nomirror outwards
set format y '%.2f'
plot ‘results.csv’ using 2:4 smooth unique with linespoints lw 3 lc rgb ‘black’

gives the following graph: enter image description here

[N.B. I have reduced the number of datalines for this example]

As I want the overall trend, I want to skip the point at radius = 16. However, changing my plot command to:

plot 'results.csv' using 2: ($4 > 20 ? $4 : 1/0) smooth unique with linespoints lw 3 lc rgb 'black'

results in: enter image description here

Has anyone got any suggestions as to what is making gnuplot connect the x=9 point to x=17 and how to overcome this problem.

Also, how do I skip the anomalous data point when I try and fit a 'line of best fit'?

Any help would be much appreciated

Christoph Christoph · Accepted Answer · 2013-10-22T07:17:06

Ignoring anomalous points when plotting

In principal, gnuplot knowns

missing points, can be set with set datafile missing. These points are skipped during plotting and don't affect the line plot.
undefined points, like 1/0. These points are also skipped, but the line plot gets interrupted here.

Unfortunately, missing data points can't be used when calcuations (like the filtering in your case) are involved, see e.g. In gnuplot, with “set datafile missing”, how to ignore both “nan” and “-nan”?.

So, the best way is to use an external tool to filter the data:

plot '< awk ''{ if ($4 > 20) print }'' results.csv' using 2:4 smooth unique

But that requires, that each point, over which the averaging is done, fulfills this criterion.

Ignoring anomalous points when fitting

About ignoring anomalous point during fitting, I recently gave an answer to Ignore points far away from mean gnuplot.

I think you can do the fitting without smoothing, possibly its even better than first smoothing and then fitting. Fitting with the smooth data would ignore the distribution of the single data points, and the better/worse fidelity of single data points.

If you still want to use the smoothed data, you must first write them to a temporary file, because smooth unique can't be used with fit:

set output '| head -n -2 > results.tmp'
set table
plot '< awk ''{ if ($4 > 20) print }'' results.csv' using 2:4 smooth unique
unset table

Concerning the set output part, see Why does the 'set table' option in Gnuplot re-write the first entry in the last line?.

Gnuplot: dealing with anomalous results

1 Answers

Ignoring anomalous points when plotting

Ignoring anomalous points when fitting