0
votes

I'd like to replace all the last non NaNs in rows in data frame with NaN value. I have 300 rows and 1068 columns in my data frame. and each row have different number of valid values in them padded with NaNs. Here is an example of a row:

a row in dataframe = [1 2 3 NaN NaN NaN] output = [1 2 NaN NaN NaN NaN]

How to replace last non NaN value in rows in CSV file?

2
I assume you mean the last nan in a sequence ? that said update your question with the advice from the links above and you'll get a solution lickety split. - Umar.H
@yatu sorry for the mistake in question. I have edited it now. - mathque33

2 Answers

1
votes

Here's a numpy based one:

import numpy as np
df = pd.DataFrame([[1, 2, 3, np.nan, np.nan, np.nan], [1, 2, 3, np.nan, np.nan, 2]])

You can slice the array of values, and get it into reverse order, and look for the first valid value. Then get the indices, and use np.put_along_axis to set them to NaNs:

a = df.to_numpy()
m = a.shape[1]-1 - np.argmax(~np.isnan(a[:,::-1]), axis=1)
np.put_along_axis(a, m[:,None], np.nan, axis=1)
df[:] = a

print(df)

     0    1    2   3   4   5
0  1.0  2.0  NaN NaN NaN NaN
1  1.0  2.0  3.0 NaN NaN NaN

Further details -

The first step is to find where the NaNs are. And since we want the last valid value, we should start from the end. So slice to get the array with the columns reversed, and use np.isnan:

np.isnan(a[:,::-1])
array([[ True,  True,  True, False, False, False],
       [False,  True,  True, False, False, False]])

Now we can find the first False, i.e the last valid value using np.argmax:

np.argmax(~np.isnan(a[:,::-1]), axis=1)
# array([3, 0], dtype=int64)

Now by subtracting the col length to the above we get the actual indices:

a.shape[1]-1 - np.argmax(~np.isnan(a[:,::-1]), axis=1)
# array([2, 5], dtype=int64)

Now we can just set those indices to NaN in the correspondin indices:

np.put_along_axis(a, m[:,None], np.nan, axis=1)
0
votes

Enumerate the columns and check if values are NaN in loop:

import pandas as pd
import numpy as np

df = pd.DataFrame([
[1, 2, 3, np.NaN, np.NaN, 2]
], columns=["a", "b", "c", "d", "e", "f"]
)

j = 0
for idx, c in enumerate(df.columns):
    if df[c].isna().any():
        while df.iloc[:, idx - j].isna().any():
            j += 1
        df.iloc[:, idx - j] = np.nan

print(df)

Returns:

   a   b   c   d   e  f
0  1 NaN NaN NaN NaN  2

EDIT: a NaN too many is inserted. Will try to fix it, otherwise delete the answer ..