Numpy Array fill empty data to "uniformity"

Question

I have a 2D numpy array with lack of data, and I want to fill them by giving a mathematical uniformity to the array. I got something like this :

[[72829],
 [nan],
 [73196],
 [73087],
 [nan],
 [nan],
 [72294.5]]

I want to fill those empy cells with the mean between the closest cells, with return with something like this :

[[72829],
 [73012.5],
 [73196],
 [73087],
 [72888.875],
 [72492.625],
 [72294.5]]

I tried to use SimpleImputer and KNNImputer from Scikit-learn, but all what I got is the same value to all data, not the mean between the cells as I mentioned before. Thats the code :

for label, column in data.iteritems():
            reshaped = np.array(column.values)  # Creating a np array to use scikitlearn
            reshaped = reshaped.reshape(-1,1)  # changing shape of data to a 2D array
            normalized = imputer.fit_transform(reshaped) # transforming data
            data[label] = normalized # changing the column value to the new one

With KNNImputer, I got something like this (The way that I don't want):

[[72829],
 [68088.71106114],
 [73196],
 [73087],
 [68088.71106114],
 [68088.71106114],
 [72294.5]]

Someone knows any ideia or algorithm that could give a "uniformity" to the array numbers like this ? The ideia is that the return of this method gives me the possibility to plot graphs without missing data. If were something with pandas/numpy/scikit-learn would be better, thanks.

DSteman DSteman · Accepted Answer · 2022-07-28T12:57:20

Convert data to a dataframe and use b(efore)fill and f(orward)fill

x = [[72829],
 [np.nan],
 [73196],
 [73087],
 [np.nan],
 [np.nan],
 [72294.5]]
df = pd.DataFrame(x)
df = (df[0].bfill() + df[0].ffill())/2
df
>>>
0    72829.00
1    73012.50
2    73196.00
3    73087.00
4    72690.75
5    72690.75
6    72294.50

Numpy Array fill empty data to "uniformity"

2 Answers