1
votes

I have survey data with 100s of columns and I need to get the frequencies of all the values in each of the columns. The column values are 0, 1 etc, and I want to map that to "No", "Yes". Other columns may be 0 = "No", 1 = "Yes", 3 = "Maybe" etc. These are just examples, there are many columns (e.g. survey questions) with many different maps.

I am wanting to loop through the columns and apply the map to the column, perform frequencies, and output the results to a dataframe. Thanks in advance.

Here is some dummy survey data.

# Main Survey Data
data = {'Name':['Tom', 'Sam', 'Trish', 'Sharon', 'Tony'], 'Age':[20, 21, 19, 18, 20], 'MH' : [0,0,1,2, 1]}
df = pd.DataFrame(data)

I have code that has all the maps for the survey questions. Similar to this.

age_map = {18 : 'Very young', 19 : 'Young',  20 : 'Old',  21 : 'Very old'}
mh_map = {0 : 'No', 1 : 'Yes', 2 : 'Maybe'}

I also have a dataframe used as a look up table that shows the column names and which map the column is linked to. Similar to this data.

# Dataframe used as a look up table showing all the column names and the related map
lkup = {'col' : ['Age', 'MH'], 'map' : ['age_map', 'mh_map']}
df_lkup = pd.DataFrame(lkup)

Obviously it is easy to apply the map on a few columns and perform the counts and output the results, something similar to this.

mh_res = pd.DataFrame(df['MH'].map(mh_map).value_counts().reset_index())
mh_res['Colname'] = 'MH'
mh_res.columns=['Value', 'Count', 'Colname']
print(mh_res)

But I have 100s of columns to loop through. Here is the code I started with, but I don't know how to get the j value in .map(j) from the list where it is a variable name that doesn't have quotes.

# Get the list of columns and maps from the column/map lookup table
collist = df_lkup['col'].tolist()
maplist = df_lkup['map'].tolist()

# Loop through columns and maps

final_results = pd.DataFrame()

for i, j in zip(collist, maplist):
    result = pd.DataFrame(df[i].map(j).value_counts().reset_index())
    result['Colname'] = i
    result.columns=['Value', 'Count', 'Colname']
    final_results = final_results.append(result)

print(final_results)
2

2 Answers

1
votes

Text variables are not reccommened in python, but possible solution is use globals: like

for i, j in zip(collist, maplist):
    result = df[i].map(globals()[j]).value_counts().reset_index()
    result['Colname'] = i
    result.columns=['Value', 'Count', 'Colname']
    final_results = final_results.append(result)

I suggest create dictionary instead df_lkup for avoid it:

age_map = {18 : 'Very young', 19 : 'Young',  20 : 'Old',  21 : 'Very old'}
mh_map = {0 : 'No', 1 : 'Yes', 2 : 'Maybe'}

lkup = {'Age':age_map,'MH': mh_map}
print (lkup)
{'Age': {18: 'Very young', 19: 'Young', 20: 'Old', 21: 'Very old'}, 
 'MH': {0: 'No', 1: 'Yes', 2: 'Maybe'}}

final_results = pd.DataFrame()

for i, j in lkup.items():
    result = df[i].map(j).value_counts().reset_index()
    result['Colname'] = i
    result.columns=['Value', 'Count', 'Colname']
    final_results = final_results.append(result)

print(final_results)
        Value  Count Colname
0         Old      2     Age
1       Young      1     Age
2    Very old      1     Age
3  Very young      1     Age
0         Yes      2      MH
1          No      2      MH
2       Maybe      1      MH
0
votes

You can use np.where(), see: https://numpy.org/doc/1.20/reference/generated/numpy.where.html

df = pd.DataFrame(np.where(df == 0, "No", df))
df = pd.DataFrame(np.where(df == 1, "Yes", df))
df = pd.DataFrame(np.where(df == 3, "Maybe", df))

Edit: If you wanted to do this for a particular column:

df["col_name"] = pd.DataFrame(np.where(df["col_name"] == 0, "No", df["col_name"]))