1
votes

Lets say I have dataframe with nans in each group like

df = pd.DataFrame({'data':[0,1,2,0,np.nan,2,np.nan,0,1],'group':[1,1,1,2,2,2,3,3,3]})

and a numpy array like

x = np.array([0,1,2])

Now based on groups how to fill the missing values that are in the numpy array I have i.e

df = pd.DataFrame({'data':[0,1,2,0,1,2,2,0,1],'group':[1,1,1,2,2,2,3,3,3]})
      data   group
0     0      1
1     1      1
2     2      1
3     0      2
4     1      2
5     2      2
6     2      3
7     0      3
8     1      3

Let me explain a bit of how the data should be filled. Consider the group 2. The values of data are 0,np.nan,2 . The np.nan is the missing value from the array [0,1,2]. So the data to be filled inplace of nan is 1.

For multiple nan values, take a group for example that has data [np.nan,0,np.nan] now the values to be filled in place of nan are 1 and 2. resulting in [1,0,2].

1
Fill how? By mean of group or something? Or is like every group has exactly three numbers - 0,1,2? - Divakar
Yes, I will have maximum three rows. So the data should be filled with missing data in the data and present in numpy array. If there are 2 values missing then the they should be filled be randomly. - Bharath
why we need group by here .... - BENY
Could there be less than 3 elems in a group? - Divakar
You come from the stackoverflow.com/questions/46937010/… , right ? I posted an answer there - BENY

1 Answers

4
votes

First find value which miss and then add it to fillna:

def f(y):
    a = list(set(x)-set(y))
    a = 1 if len(a) == 0 else a[0]
    y = y.fillna(a)
    return (y)

df['data'] = df.groupby('group')['data'].apply(f).astype(int)
print (df)
   data  group
0     0      1
1     1      1
2     2      1
3     0      2
4     1      2
5     2      2
6     2      3
7     0      3
8     1      3

EDIT:

df = pd.DataFrame({'data':[0,1,2,0,np.nan,2,np.nan,np.nan,1, np.nan, np.nan, np.nan],
                   'group':[1,1,1,2,2,2,3,3,3,4,4,4]})
x = np.array([0,1,2])
print (df)
    data  group
0    0.0      1
1    1.0      1
2    2.0      1
3    0.0      2
4    NaN      2
5    2.0      2
6    NaN      3
7    NaN      3
8    1.0      3
9    NaN      4
10   NaN      4
11   NaN      4

def f(y):
    a = list(set(x)-set(y))
    if len(a) == 1:
        return y.fillna(a[0])
    elif len(a) == 2:
        return y.fillna(a[0], limit=1).fillna(a[1])
    elif len(a) == 3:
        y = pd.Series(x, index=y.index)
        return y
    else:
        return y

df['data'] = df.groupby('group')['data'].apply(f).astype(int)
print (df)
    data  group
0      0      1
1      1      1
2      2      1
3      0      2
4      1      2
5      2      2
6      0      3
7      2      3
8      1      3
9      0      4
10     1      4
11     2      4