1
votes

I am novice in python. I have below dataframe.I want to pivot\transpose some of the columns and saved the dataframe into individual dataframes based on the column 2.

data={'col1':[1,101,201,301,2,102,202,302,3,103,203,303], 
                    'col2':[1,1,1,1,2,2,2,2,3,3,3,3],
                    'col3':["2015-01-15","2015-01-15","2015-01-15","2015-01-15","2015-01-15",
                                "2015-01-15","2015-01-15","2015-01-15","2015-01-15","2015-01-15","2015-01-15","2015-01-15"],
                    'col4':["2015-01-15","2015-01-16","2015-01-17","2015-01-18","2015-01-15","2015-01-16",
                                "2015-01-17","2015-01-18","2015-01-15","2015-01-16","2015-01-17",
                                "2015-01-18"], 
                    'col5':[0,1,2,3,0,1,2,3,0,1,2,3],
                    'col6':[273.2,275.9,343,235,273.2,275.9,343,235,273.2,275.9,343,235],
                    'col7':[2.8,3.2,7.9,7.2,2.8,3.2,7.9,7.2,2.8,3.2,7.9,7.2]}
df= pd.DataFrame(data)
print(df)

    col1  col2        col3        col4  col5   col6  col7
0      1     1  2015-01-15  2015-01-15     0  273.2   2.8
1    101     1  2015-01-15  2015-01-16     1  275.9   3.2
2    201     1  2015-01-15  2015-01-17     2  343.0   7.9
3    301     1  2015-01-15  2015-01-18     3  235.0   7.2
4      2     2  2015-01-15  2015-01-15     0  273.2   2.8
5    102     2  2015-01-15  2015-01-16     1  275.9   3.2
6    202     2  2015-01-15  2015-01-17     2  343.0   7.9
7    302     2  2015-01-15  2015-01-18     3  235.0   7.2
8      3     3  2015-01-15  2015-01-15     0  273.2   2.8
9    103     3  2015-01-15  2015-01-16     1  275.9   3.2
10   203     3  2015-01-15  2015-01-17     2  343.0   7.9
11   303     3  2015-01-15  2015-01-18     3  235.0   7.2

So,I want to break the above dataframe in the below format. Saved the dataframe name based on column2 name. For example, col2 =1 so dataframe name df[1], col=2: df[2]

df['1']
col2 col3            1_0_col6  101_1_col6  201_1_col6 301_1_col6 1_0_col7  101_1_col7  201_1_col7 301_1_col7
1    2015-01-15        273.2        275.9       343.0     235.0      2.8      3.2          7.9         7.2       



    df['2']
    col2 col3        1_0_col6  101_1_col6  201_1_col6 301_1_col6 1_0_col7  101_1_col7  201_1_col7 301_1_col7
    2    2015-01-15      273.2        275.9       343.0     235.0      2.8      3.2          7.9         7.2    

    df['3']
    col2 col3          1_0_col6  101_1_col6  201_1_col6 301_1_col6 1_0_col7  101_1_col7  201_1_col7 301_1_col7
    2    2015-01-15    273.2        275.9       343.0     235.0      2.8      3.2          7.9         7.2       
1

1 Answers

1
votes

IIUC, I think you want something like this. However, I don't quite understand your column naming in your results.

df_out = df.rename_axis('colnum',1).set_index(['col1','col2','col3','col4','col5'])\
           .stack()\
           .groupby(['col1','col5','colnum','col2','col3'])\
           .apply(lambda x: pd.Series(x.values.tolist())).unstack([0,1,2])\
           .reset_index(level=-1, drop=True)

df_out.columns = df_out.columns.map('{0[0]}_{0[1]}_{0[2]}'.format)

Print df[1] where col2 = 1

df_out.loc[[1]].dropna(1).reset_index()

Output:

   col2        col3  1_0_col6  1_0_col7  101_1_col6  101_1_col7  201_2_col6  201_2_col7  301_3_col6  301_3_col7
0     1  2015-01-15     273.2       2.8       275.9         3.2       343.0         7.9       235.0         7.2

Print df[2] where col2 = 2

df_out.loc[[2]].dropna(1).reset_index()

Output:

   col2        col3  2_0_col6  2_0_col7  102_1_col6  102_1_col7  202_2_col6  202_2_col7  302_3_col6  302_3_col7
0     2  2015-01-15     273.2       2.8       275.9         3.2       343.0         7.9       235.0         7.2

Print df[2] where col2 = 3

df_out.loc[[3]].dropna(1).reset_index()

Output:

   col2        col3  3_0_col6  3_0_col7  103_1_col6  103_1_col7  203_2_col6  203_2_col7  303_3_col6  303_3_col7
0     3  2015-01-15     273.2       2.8       275.9         3.2       343.0         7.9       235.0         7.2