I have to admit that it's not completely clear to me what exactly you want to achieve, nevertheless I have also the feeling that, as mentioned by @KevinBoone in the comments, you are trying to do some kind of binned statistic on the data. If this is the case, then Gnuplot is unfortunately not the proper tool for this task. In my opinion, it would be much more practical to delegate this processing task to something more appropriate.
As an example, let's say that the strategy would indeed be:
- load all the csv files in the current directory
- divide the x-range into M bins and calculate the average of the y-values that fall into each of the bins
- plot this "averaged" data
To this end, one might prepare a short Python script (which implements the steps outlined above) based on the
binned_statistic function provided by the scipy toolkit. The required number of bins is passed as first argument, while the remaining arguments are interpreted as csv files for processing:
#!/usr/bin/env python
import sys
import numpy as np
from scipy.stats import binned_statistic
num_of_bins = int(sys.argv[1])
data = []
for fname in sys.argv[2:]:
with open(fname, 'r') as F:
for line_id, line in enumerate(F):
if line_id < 3: continue
cols = line.strip().split(',')
x, y = map(float, [cols[i] for i in [2, 3]])
data.append((x, y))
data = np.array(data)
stat, bin_edges, _ = binned_statistic(data[:, 0], data[:, 1], 'mean', bins = num_of_bins, range = None)
for val, (lb, ub) in zip(stat, zip(bin_edges, bin_edges[1:])):
print('%E,%E' % ( (lb+ub)/2, val ))
Now, in Gnuplot, we can invoke this script (lets say that it is stored in the current working directory as stat.py
) externally and plot it together with the individual files:
set terminal pngcairo enhanced
set output 'fig.png'
#get all csv files in current directory as a space-delimited string
files = system("ls *.csv | xargs")
#construct a "pretty" label from the file name
getLabel(fname)=system(sprintf('echo "%s" | gawk -F"-" "BEGIN{OFS=\"-\"} {NF=NF-2;print}"', fname))
set datafile separator ","
set key spacing 1.5
LINE_WIDTH = 1.25
plot \
for [filename in files] filename u 3:4 w l lw LINE_WIDTH t getLabel(filename), \
sprintf('<python ./stat.py 20 %s', files) w l lw 3*LINE_WIDTH lc rgb 'red' t 'average'
With some of the sample data you provided in the comments, this produces:
However, as pointed out by @KevinBoone, whether this "average" has a justifiable mathematical meaning in your specific setting is another question on its own...
fit
you can fit any function to your data. – Christoph