4
votes

I've got a pandas dataframe in which one of the columns contains lists with different lengths. The solutions to explode lists in pandas all assume that the lists to be exploded are all the same length.

This is my df:

    Dep     Exp     Fl-No   Shared Codes
0   20:58   20:55   LX 736  [No shared codes]
1   21:23   20:55   LX 818  [Dummy, LH 5809]
2   21:27   21:00   JU 375  [No shared codes]
4   21:28   21:00   LX 770  [Dummy, SN 5102]
7   21:31   21:10   LX 1842 [Dummy, LH 5880, TP 8184, A3 1985]

And this what I am looking for:

    Dep     Exp     Fl-No   Shared Codes
0   20:58   20:55   LX 736  No shared codes
1   21:23   20:55   LX 818  Dummy
1   21:23   20:55   LX 818  LH 5809
2   21:27   21:00   JU 375  No shared codes
4   21:28   21:00   LX 770  Dummy
4   21:28   21:00   LX 770  SN 5102
7   21:31   21:10   LX 1842 Dummy
7   21:31   21:10   LX 1842 LH 5880
7   21:31   21:10   LX 1842 TP 8184
7   21:31   21:10   LX 1842 A3 1985

Has anybody got any suggestions?

4
@Wen I don't think OP can use wide_to_long here. The data isn't in the right format. - cs95
@Wen Actually I'm wrong. I found a way with wide_to_long but it is so bad that it isn't worth putting. - cs95
@cᴏʟᴅsᴘᴇᴇᴅ how about PiR 's answer ? - BENY
@Wen I upvoted it. Why? - cs95
@cᴏʟᴅsᴘᴇᴇᴅ Just saw this type of question multiple times..I really hope pandas can documented one of the solution and name it unlistify - BENY

4 Answers

6
votes

Very similar to @coldspeed. I took a few different steps.

s = df['Shared Codes']
i = np.arange(len(df)).repeat(s.str.len())
df.iloc[i, :-1].assign(**{'Shared Codes': np.concatenate(s.values)})

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
1  21:23  20:55   LX 818          LH 5809
2  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
4  21:28  21:00   LX 770          SN 5102
7  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
7  21:31  21:10  LX 1842          TP 8184
7  21:31  21:10  LX 1842          A3 1985
4
votes

One possibility using np.repeat and np.hstack:

print(df)

     Dep    Exp    Fl-No                        Shared Codes
0  20:58  20:55   LX 736                   [No shared codes]
1  21:23  20:55   LX 818                    [Dummy, LH 5809]
2  21:27  21:00   JU 375                   [No shared codes]
4  21:28  21:00   LX 770                    [Dummy, SN 5102]
7  21:31  21:10  LX 1842  [Dummy, LH 5880, TP 8184, A3 1985]


x = df.iloc[:, :-1].values.repeat(df['Shared Codes'].apply(len), 0)
y = df['Shared Codes'].apply(pd.Series).stack().values.reshape(-1, 1)

out = pd.DataFrame(np.hstack((x, y)), columns=df.columns)
print(out)

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
2  21:23  20:55   LX 818          LH 5809
3  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
5  21:28  21:00   LX 770          SN 5102
6  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
8  21:31  21:10  LX 1842          TP 8184
9  21:31  21:10  LX 1842          A3 1985
2
votes

Ok, I will post it again for more information and other genius solutions please check link1 and link2

df.set_index(['Dep','Exp','Fl-No'])['Shared Codes'].apply(pd.Series).stack().reset_index().drop('level_3‌​',1)

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
2  21:23  20:55   LX 818          LH 5809
3  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
5  21:28  21:00   LX 770          SN 5102
6  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
8  21:31  21:10  LX 1842          TP 8184
9  21:31  21:10  LX 1842          A3 1985

Also, using pd.wide_to_long However, personally do not recommended cause overkill.

df1=df['Shared Codes'].apply(pd.Series)
df1.columns='sur'+df1.columns.astype(str)
df=pd.concat([df,df1],axis=1)
pd.wide_to_long(df,['sur'],['Dep','Exp','Fl-No'],'lol').reset_index().drop(['lol','Shared Codes'],axis=1).dropna()

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
2  21:23  20:55   LX 818          LH 5809
3  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
5  21:28  21:00   LX 770          SN 5102
6  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
8  21:31  21:10  LX 1842          TP 8184
9  21:31  21:10  LX 1842          A3 1985
1
votes

Pandas >=0.25

df:

    Name    Data
0   Bar [Product, Item, X]
1   Foo [Product, Misc]

Using Explode:

df = df.explode('Data')

df:

    Name    Data
0   Bar Product
0   Bar Item
0   Bar X
1   Foo Product
1   Foo Misc