I'm relatively new to Python but made some solid progress however i've come unstuck with converting my process in SAS utilising Proc Freq one way frequency tables into python.
A scaled down version of my dataset is as follows:
import pandas as pd
data = {'Code': [719, 719,719,719,719,719],
'National': [1001, 1001,1001,1001,1001,1001],
'Regional': [3005,3005,3005,3005,3005,3005],
'Local': [2002,2002,2002,2002,2002,2002],
'Collection Variable': [1,1,2,3,3,3]}
df = pd.DataFrame (data, columns = ['Code','National', 'Regional', 'Local','Collection Variable'])
Outputs in SAS would take this and add a frequency and percentage column for the collection variable columns, results as below:
National Regional Local Code Collection Count Percent
Variable
1001 3005 2002 719 1 2 0.333333
1001 3005 2002 719 2 1 0.166667
1001 3005 2002 719 3 3 0.5
Is anything like this possible in python? I've managed to get the percentages, but running in separate columns but not able to get the additional columns.
Furthermore, this would happen for a number of different Collection Variables. In SAS it creates a new table for each variable, is that possible in python with some kind of loop?
Thanks for any help.