Python: average and standard deviation of specific columns among multiple files and plot the average with standard deviation bar

Question

I have input data which look like below, where I want to average the 6th column and standard deviation of that column. I also need a graph where the 1st column will be in x-axis and average will be in the y-axis with error bar.

I have attached the script which can only plot the 1st column vs 6th column. I have not much idea to do the average and plot together. Any kind of help will be helpful for me.

Script:

import numpy as np
import matplotlib.pyplot as plt

x1,x2,y1, y2 = [], [], [],[]
label_added =False
with open("1.txt") as f:
    for line in f:
        cols = line.split()

    x1.append(float(cols[0]))
    y1.append(float(cols[5]))
    if not label_added:
        plt.plot(x1,y1,'r-', label="300_temp")
        label_added = True
    else:
        plt.plot(x1,y1,'r-')

label_added =False
with open("2.txt") as f:
for line in f:
    cols = line.split()

    x2.append(float(cols[0]))
    y2.append(float(cols[5]))
    if not label_added:
        plt.plot(x2,y2, 'g-', label="800_temp")
        label_added = True
    else:
        plt.plot(x2,y2, 'g-')

plt.title('final_output')
plt.xlabel('time_fs')
plt.ylabel('intersitial')
plt.legend()
plt.tight_layout()
plt.savefig("final_interstitial.jpeg", dpi=100)

Input Data structure:

1.txt

40.1 -970181.423308824 25086.8510704775 1030.68868052956 2.98863069261149 34845
40.2 -969291.275241766 24578.0340950803 1002.86354474784 3.27434173388944 40208.5
40.3 -968489.350679405 24160.1307947391 977.795055894274 3.55155208480988 45345
40.4 -967676.040718834 23644.7886925808 952.370742000842 3.81838293934396 50205
40.5 -966981.971290069 23225.0631104031 930.672470146222 4.07354498687891 55854.5
40.6 -966254.82735723 22651.1303668863 907.940243789837 4.31555138493202 62278.5
40.7 -965668.239087129 22190.7422743739 889.603544698553 4.54318654063522 67333
.
.
.

2.txt

40.1 -955398.198359867 33344.4512324167 1408.73933784128 3.12396891367147 36796.5
40.2 -954229.783369542 32683.9304617525 1372.22031719846 3.42945308560201 42943.5
40.3 -953191.590417265 32095.1208511191 1339.76973308475 3.73344595502824 49085
40.4 -952117.587463572 31487.7205358262 1306.19919339234 4.03307586152993 56499.5
40.5 -951132.223115772 30875.4404971051 1275.39557738745 4.32525826680283 64040.5
40.6 -950246.534420928 30277.6289073256 1247.7121918422 4.60798342893888 71283
40.7 -949410.920964954 29712.2289807824 1221.59019340933 4.8790799203458 78363.5
.
.
.

How do you plan to plot the average? The average will be a single number. You want the whole 1st column on x-axis and just a single number on y-axis? Do you see my point? — Sheldore
the average will not be a single number because 1st file column 6th row 1st and 2nd file column 6th row 1st will be the average and then 1st file column 6th row 2nd and 2nd file column 6th row 2nd and so on. it means 1st file column 6th with 2nd file column 6th will be the average. — Alex

Sharu Sharu · Accepted Answer · 2018-12-26T06:02:50

If I understood you correctly, you want to plot the observations in the first file, the corresponding observation in the second file and then the average between these two for all observations. A good way to do this is to first define a function that reads any of the files and format the data to numerical values using numpy:

import numpy as np
import matplotlib.pyplot as plt 

def read_data(filename):
  with open(filename) as f:
    content = f.read().splitlines()
    content = [[float(col) for col in row.split(' ')] for row in content]
    return np.array(content)

Then we read the files and plot the data with the average.

data_1 = read_data('1.txt')
data_2 = read_data('2.txt')

plt.plot(data_1[:, 0], data_1[:, 5], 'g-', label="300_temp")
plt.plot(data_2[:, 0], data_2[:, 5], 'g-', label="800_temp")

# Average of column 5 from both files
plt.plot(data_2[:, 0], (data_1[:, 5] + data_2[:, 5]) / 2, 'g-', label="300/800 
temp avg")
plt.show()

Here we take advantage of how numpy works. The numpy array is 2-dimensional and data[:, n] simply means that we extract all rows and the n-th column and vice versa.

Python: average and standard deviation of specific columns among multiple files and plot the average with standard deviation bar

1 Answers