Create pandas duplicate rows based on the number of items in a list type column

Question

I have a data frame like this,

  df
  col1     col2
   A        [1]
   B        [1,2]
   A        [2,3,4]
   C        [1,2]
   B        [4]

Now I want to create new rows based on the number of values in the col2 list where the col1 values will be same so the final data frame would look like,

  df
  col1    col2
   A       [1]
   B       [1]
   B       [2]
   A       [2]
   A       [3]
   A       [4]
   C       [1]
   C       [2]
   B       [4]

I am looking for some pandas short cuts to do it more efficiently

jezrael jezrael · Accepted Answer · 2021-02-18T07:09:40

Use DataFrame.explode and then create one element lists:

df2 = df.explode('col2')

df2['col2'] = df2['col2'].apply(lambda x: [x])

Another idea, I hope faster in large data is use numpy np.repeat with chain.from_iterable for flatten values:

from  itertools import chain

df2 = pd.DataFrame({
        "col1": np.repeat(df.col1.to_numpy(), df.col2.str.len()),
        "col2": [[x] for x in chain.from_iterable(df.col2)]})

print (df2)
  col1 col2
0    A  [1]
1    B  [1]
2    B  [2]
3    A  [2]
4    A  [3]
5    A  [4]
6    C  [1]
7    C  [2]
8    B  [4]

Create pandas duplicate rows based on the number of items in a list type column

1 Answers