I have a 2D numpy array with lack of data, and I want to fill them by giving a mathematical uniformity to the array. I got something like this :
[[72829],
[nan],
[73196],
[73087],
[nan],
[nan],
[72294.5]]
I want to fill those empy cells with the mean between the closest cells, with return with something like this :
[[72829],
[73012.5],
[73196],
[73087],
[72888.875],
[72492.625],
[72294.5]]
I tried to use SimpleImputer and KNNImputer from Scikit-learn, but all what I got is the same value to all data, not the mean between the cells as I mentioned before. Thats the code :
for label, column in data.iteritems():
reshaped = np.array(column.values) # Creating a np array to use scikitlearn
reshaped = reshaped.reshape(-1,1) # changing shape of data to a 2D array
normalized = imputer.fit_transform(reshaped) # transforming data
data[label] = normalized # changing the column value to the new one
With KNNImputer, I got something like this (The way that I don't want):
[[72829],
[68088.71106114],
[73196],
[73087],
[68088.71106114],
[68088.71106114],
[72294.5]]
Someone knows any ideia or algorithm that could give a "uniformity" to the array numbers like this ? The ideia is that the return of this method gives me the possibility to plot graphs without missing data. If were something with pandas/numpy/scikit-learn would be better, thanks.