1
votes

I have a dataframe like this,

df
col1    col2    col3
1907    CD       49
1907    FR       33
1907    SA       34
1908    PR        1
1908    SA       37
1909    PR       16
1909    SA       38

Now CD is not present with col1 1908 and 1909 values, FR not present with 1908 and 1909 values and PR not present wth 1907.

Now I want to create rows with col2 values which are not with all col1 values with col3 values as 0.

So final dataframe will look like,

df
col1    col2    col3
1907    CD       49
1907    FR       33
1907    SA       34
1907    PR        0
1908    CD        0
1908    FR        0
1908    PR        1
1908    SA       37
1908    CD        0
1908    FR        0
1909    PR       16
1909    SA       38

I could do this using a for loop with every possible col2 values and comparing with every col1 group. But I am looking for shortcuts to do it most efficiently.

2
are you sure your result is correct? last 4 rows should be 1909 i guessluigigi

2 Answers

2
votes

Use DataFrame.unstack with DataFrame.stack for all combinations filled by 0:

df = df.set_index(['col1','col2']).unstack(fill_value=0).stack().reset_index()
print (df)
    col1 col2  col3
0   1907   CD    49
1   1907   FR    33
2   1907   PR     0
3   1907   SA    34
4   1908   CD     0
5   1908   FR     0
6   1908   PR     1
7   1908   SA    37
8   1909   CD     0
9   1909   FR     0
10  1909   PR    16
11  1909   SA    38

Another idea is use DataFrame.reindex with MultiIndex.from_product:

mux = pd.MultiIndex.from_product([df['col1'].unique(), 
                                  df['col2'].unique()], names=['col1','col2'])
df = df.set_index(['col1','col2']).reindex(mux, fill_value=0).reset_index()
0
votes

We can also do DataFrame.pivot with DataFrame.stack:

df.pivot(*df).stack(dropna = False).fillna(0).rename('col3').reset_index()

or DataFrame.pivot_table

df.pivot_table(*df.iloc[:,::-1],fill_value = 0).unstack().rename('col3').reset_index()

Output

    col1 col2  col3
0   1907   CD  49.0
1   1907   FR  33.0
2   1907   PR   0.0
3   1907   SA  34.0
4   1908   CD   0.0
5   1908   FR   0.0
6   1908   PR   1.0
7   1908   SA  37.0
8   1909   CD   0.0
9   1909   FR   0.0
10  1909   PR  16.0
11  1909   SA  38.0