Replace last non NaN value in row

Question

I'd like to replace all the last non NaNs in rows in data frame with NaN value. I have 300 rows and 1068 columns in my data frame. and each row have different number of valid values in them padded with NaNs. Here is an example of a row:

a row in dataframe = [1 2 3 NaN NaN NaN] output = [1 2 NaN NaN NaN NaN]

How to replace last non NaN value in rows in CSV file?

I assume you mean the last nan in a sequence ? that said update your question with the advice from the links above and you'll get a solution lickety split. — Umar.H
@yatu sorry for the mistake in question. I have edited it now. — mathque33

yatu yatu · Accepted Answer · 2020-07-15T11:35:38

Here's a numpy based one:

import numpy as np
df = pd.DataFrame([[1, 2, 3, np.nan, np.nan, np.nan], [1, 2, 3, np.nan, np.nan, 2]])

You can slice the array of values, and get it into reverse order, and look for the first valid value. Then get the indices, and use np.put_along_axis to set them to NaNs:

a = df.to_numpy()
m = a.shape[1]-1 - np.argmax(~np.isnan(a[:,::-1]), axis=1)
np.put_along_axis(a, m[:,None], np.nan, axis=1)
df[:] = a

print(df)

     0    1    2   3   4   5
0  1.0  2.0  NaN NaN NaN NaN
1  1.0  2.0  3.0 NaN NaN NaN

Further details -

The first step is to find where the NaNs are. And since we want the last valid value, we should start from the end. So slice to get the array with the columns reversed, and use np.isnan:

np.isnan(a[:,::-1])
array([[ True,  True,  True, False, False, False],
       [False,  True,  True, False, False, False]])

Now we can find the first False, i.e the last valid value using np.argmax:

np.argmax(~np.isnan(a[:,::-1]), axis=1)
# array([3, 0], dtype=int64)

Now by subtracting the col length to the above we get the actual indices:

a.shape[1]-1 - np.argmax(~np.isnan(a[:,::-1]), axis=1)
# array([2, 5], dtype=int64)

Now we can just set those indices to NaN in the correspondin indices:

np.put_along_axis(a, m[:,None], np.nan, axis=1)

Replace last non NaN value in row

2 Answers