1
votes

I am an experienced SAS programmer and am converting to Python/Pandas. I frequently use PROC SUMMARY in my work in SAS to create summarized data files that I can subsequently use to combine with other files at later steps in SAS programs. The PROC SUMMARY procedure in SAS is very powerful, easy to use, and straight forward to code. I have not yet found a comparable method in Pandas that is as powerful, easy to use, and straight forward to code. As I am new to Python/Pandas, I was wondering if there is a method that does this.

This will create a simple output file with 9 columns for every unique combination of age_category and gender.

proc summary data='input file' nway;
 class age_category gender;
 var weight_kg height_cm;
 output out='output file'
   mean(weight_kg) = weight_avge
   max(weight_kg) = weight_max
   min(weight_kg) = weight_min
   mean(height_cm) = height_avge
   max(height_cm) = height_max
   min(height_cm) = height_min
   n(height_cm) = n_of_cases
  ; 
run; 

I am trying to do the same thing in Pandas with the summarized data being output to a data frame.

1
Please edit your question instead of adding comments. Code is unreadable in comments.Robert
Just add 4 spaces in front of lines of code. You can highlight and use ctrl-K to have the editor do it for you.Tom
I copied your sample code from comment into question. Removed the improper commas in the variable lists.Tom
Thanks for your help in making the SAS code more legible.wbm

1 Answers

1
votes

In Python, first group by age_category gender, aggregate by statistical functions, such as:

dt=df.groupby(['age','gender']).agg(['mean','max','min','count'])