I have a dataset like below. In this dataset, there are different colored thermometers, and given a 'True' or reference temperature, how different they measure according to some measurement methods 'Method 1' and 'Method 2'.
I am having trouble calculating two important parameters that I need which are Mean Absolute Error (MAE) and Mean Signed Error (MSE). I want to use the non-NaN values for each method and print the result.
I was able to get the to a point where I can return a two column series of index and sum, but the problem in this case is that I need to divide by the number of method values summed, which changes depending on how many NaN's there are in a row. And I do NOT want to just skip an entire row just because there is an NaN in it.
| number | date | Thermometer | True Temperature | Method 1 | Method 2 |
|---|---|---|---|---|---|
| 0 | 1/1/2021 | red | 0.2 | 0.2 | 0.5 |
| 1 | 1/1/2021 | red | 0.6 | 0.6 | 0.3 |
| 2 | 1/1/2021 | red | 0.4 | 0.6 | 0.23 |
| 3 | 1/1/2021 | green | 0.2 | 0.4 | NaN |
| 4 | 1/1/2021 | green | 1 | 1 | 0.23 |
| 5 | 1/1/2021 | yellow | 0.4 | 0.4 | 0.32 |
| 6 | 1/1/2021 | yellow | 0.1 | NaN | 0.4 |
| 7 | 1/1/2021 | yellow | 1.3 | 0.5 | 0.54 |
| 8 | 1/1/2021 | yellow | 1.5 | 0.5 | 0.43 |
| 9 | 1/1/2021 | yellow | 1.5 | 0.5 | 0.43 |
| 10 | 1/1/2021 | blue | 0.4 | 0.3 | NaN |
| 11 | 1/1/2021 | blue | 0.8 | 0.2 | 0.11 |
My Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('default'
data = pd.read_csv('data.txt', index_col=0)
data
data["M1_ABS_Error"]= abs(data["True_Temperature"]-data["Method_1"])
data["M2_ABS_Error"]= abs(data["True_Temperature"]-data["Method_2"])
MAE_Series=data[['Name', 'M1_ABS_Error', 'M2_ABS_Error' ]]
MAE_Series.sum(axis=1, skipna=True)
but output is something like this at the moment, which doesn't specify which color thermometer this belongs to, and I would like this to print out in a way that is easy to associate it with which it belongs. Also, as I mentioned, this does not yet account for how to divide by the number of values/methods in the given row to account for NaN. :
0 4.94
1 3.03
2 11.88
3 3.28
4 8.14
5 7.80
6 2.76
7 2.71
I would appreciate your help on this. Thanks!