1
votes

I need to achieve several things:

  • Group columns by Country and Products

  • Perform aggregation to get:

     - percentage of my Products column for each country
     - Calculate the sum of columns Volume and Profit and UnrealizedProfit (2 columns 1st=Volume, 2nd= Profit + UnrealizedProfit)
    
  • Display other columns as well

My DataFrame:

   Country       Sector       Products     Volume    Profit   UnrealizedProfit      
0  Country_1     Sector1      Product_1     50         5            4
1  Country_1     Sector2      Product_2     100        6            3
2  Country_2     Sector1      Product_1     150        3            -1
3  Country_2     Sector2      Product_2     200        -1           5
4  Country_1     Sector1      Product_2     100        7            10
5  Country_2     Sector2      Product_2     200        -3           -1
6  Country_2     Sector1      Product_1     150        2            -1
7  Country_1     Sector2      Product_1     50         5            -3

Note: I have a couple out thousand rows in my actual dataframe.

Desired output would look like this:

   Country       Sector       Products     Product%   Volume    ExpectedProfit        
0  Country_1     Sector1      Product_1     0.138      100        11
1  Country_1     Sector2      Product_2     0.861      200        26
2  Country_2     Sector1      Product_1     0.667      300        3
3  Country_2     Sector2      Product_2     0.333      400        0

I only managed to do one aggregation at a time but not two. So far:

df = (data1.groupby('Country')['Products']
           .value_counts(normalize=True,sort=False)
           .reset_index(name='Product%'))
print (df)

This code gives me:

   Country       Products     Product%
0  Country 1     Product 1     0.138
1  Country 1     Product 2     0.861
2  Country 2     Product 1     0.667
3  Country 2     Product 2     0.333

Each frequency of product is based on the related country --> sum(Country1) =100%, sum(Country2)=100%...

And for the Volume I manage to replicate it:

df = (data1.groupby(['Country','Product'])['Volume']
      .sum()
      .reset_index(name='Volume'))

I added the Product in the groupby() because I want to see the volume for each product and country.

The Goal is to combine the Products% and Volume and adding the ExpectedProfit I don't know how to combine that and to do the aggregation for the profit, as mentioned (Profit+UnrealizedProfit) And as well displaying the Sector (I guess the sector could be included in the Groupby() since each the Sector has several products.

Thank you for your help!

1
Please share a sample of your data frame to work on.Mayank Porwal
I have added a sampleKilli Mandjaro

1 Answers

0
votes

All summed up:

new_df = pd.DataFrame()
grouper = df.groupby(["Country", "Products"])
new_df["ExpectedProfit"] = grouper.Profit.sum() + grouper.UnrealizedProfit.sum()
new_df["Volume"] =  grouper.Volume.sum()
new_df["%"] = df.groupby(['Country']).Products.value_counts(normalize=True,sort=False)
new_df

Outputs:

                     ExpectedProfit  Volume    %
Country   Products                              
Country_1 Product_1              11     100  0.5
          Product_2              26     200  0.5
Country_2 Product_1               3     300  0.5
          Product_2               0     400  0.5

With Sectors

grouper = df.groupby(["Country", "Products", "Sector"])

Without sectors:

grouper = df.groupby(["Country", "Products"])
result = grouper.Profit.sum() + grouper.UnrealizedProfit.sum()
result = result.reset_index(name="ExpectedProfit")

result sectors:

     Country   Products   Sector  ExpectedProfit
0  Country_1  Product_1  Sector1               9
1  Country_1  Product_1  Sector2               2
2  Country_1  Product_2  Sector1              17
3  Country_1  Product_2  Sector2               9
4  Country_2  Product_1  Sector1               3
5  Country_2  Product_2  Sector2               0

result no sectors:

     Country   Products  ExpectedProfit
0  Country_1  Product_1              11
1  Country_1  Product_2              26
2  Country_2  Product_1               3
3  Country_2  Product_2               0