I have survey data with 100s of columns and I need to get the frequencies of all the values in each of the columns. The column values are 0, 1 etc, and I want to map that to "No", "Yes". Other columns may be 0 = "No", 1 = "Yes", 3 = "Maybe" etc. These are just examples, there are many columns (e.g. survey questions) with many different maps.
I am wanting to loop through the columns and apply the map to the column, perform frequencies, and output the results to a dataframe. Thanks in advance.
Here is some dummy survey data.
# Main Survey Data
data = {'Name':['Tom', 'Sam', 'Trish', 'Sharon', 'Tony'], 'Age':[20, 21, 19, 18, 20], 'MH' : [0,0,1,2, 1]}
df = pd.DataFrame(data)
I have code that has all the maps for the survey questions. Similar to this.
age_map = {18 : 'Very young', 19 : 'Young', 20 : 'Old', 21 : 'Very old'}
mh_map = {0 : 'No', 1 : 'Yes', 2 : 'Maybe'}
I also have a dataframe used as a look up table that shows the column names and which map the column is linked to. Similar to this data.
# Dataframe used as a look up table showing all the column names and the related map
lkup = {'col' : ['Age', 'MH'], 'map' : ['age_map', 'mh_map']}
df_lkup = pd.DataFrame(lkup)
Obviously it is easy to apply the map on a few columns and perform the counts and output the results, something similar to this.
mh_res = pd.DataFrame(df['MH'].map(mh_map).value_counts().reset_index())
mh_res['Colname'] = 'MH'
mh_res.columns=['Value', 'Count', 'Colname']
print(mh_res)
But I have 100s of columns to loop through. Here is the code I started with, but I don't know how to get the j value in .map(j) from the list where it is a variable name that doesn't have quotes.
# Get the list of columns and maps from the column/map lookup table
collist = df_lkup['col'].tolist()
maplist = df_lkup['map'].tolist()
# Loop through columns and maps
final_results = pd.DataFrame()
for i, j in zip(collist, maplist):
result = pd.DataFrame(df[i].map(j).value_counts().reset_index())
result['Colname'] = i
result.columns=['Value', 'Count', 'Colname']
final_results = final_results.append(result)
print(final_results)