7
votes

I have a data frame like this:

df

col1    col2
 1        A 
 3        B
 6        A
 10       C

I want to create a data frame from above df in such a way that, if col1 values are not consecutive, it will create another row with the next col1 value and col2 value will be the just the above value.

the data frame I am looking for should be

df
col1    col2
 1        A
 2        A
 3        B
 4        B
 5        B
 6        A
 7        A
 8        A
 9        A
 10       C

I could do it using a simple for loop, But is there any pythonic way to do it most efficiently using pandas ?

2

2 Answers

4
votes

Here is one way using set_index() and reindex and ffill:

df.set_index('col1').reindex(range(df.col1.min(),df.col1.max()+1)).ffill().reset_index()

#df.set_index('col1').reindex(range(df.col1.min(),df.col1.max()+1),method='ffill')\
                                                     #.reset_index()

   col1 col2
0     1    A
1     2    A
2     3    B
3     4    B
4     5    B
5     6    A
6     7    A
7     8    A
8     9    A
9    10    C
3
votes

One way is using reindex with ffill:

(df.set_index('col1')
   .reindex(range(df.col1.iloc[0], df.col1.iloc[-1]+1))
   .ffill()
   .reset_index())

    col1 col2
0     1    A
1     2    A
2     3    B
3     4    B
4     5    B
5     6    A
6     7    A
7     8    A
8     9    A
9    10    C

Or another way using Series.repeat:

df.col2.repeat(df.col1.diff().shift(-1).fillna().reset_index(drop=True)