I would like some ideas on how to approach this interesting problem (to me at least). Let's say that I have a population with 3 different feature variables and some quantitative ratings with the population. An example is like the following:
df
income expense education gender residence
1 153 2989 NoCollege F Own
2 289 872 College F Rent
3 551 98 NoCollege M Rent
4 286 320 College M Rent
5 259 372 NoCollege M Rent
6 631 221 NoCollege M Own
7 729 105 College M Rent
8 582 450 NoCollege M Own
9 570 253 College F Rent
10 1380 635 NoCollege F Rent
11 409 425 NoCollege M Rent
12 569 232 NoCollege F Own
13 317 856 College M Rent
14 199 283 College F Own
15 624 564 NoCollege M Own
16 1064 504 NoCollege M Own
17 821 169 NoCollege F Rent
18 402 175 College M Own
19 602 285 College M Rent
20 433 264 College M Rent
21 670 985 NoCollege F Own
I can do a calculation of spending-to-income ratio (SIR) over the segments defined by the 3 feature variables: education, gender and residence. So at the first level, no segmentation is done and the SIR is:
df %>% summarise(count=n(), spending_ratio=sum(expense)/sum(income)*100)
>> count spending_ratio
1 21 95.8
Then I break up the population into male and female groups, to get:
df %>% group_by(gender) %>% summarise(count=n(), spending_ratio=sum(expense)/sum(income)*100)
>> gender count spending_ratio
1 F 8 138.0
2 M 13 67.3
We continue this process by introducing education:
df %>% group_by(gender, education) %>% summarise(count=n(), spending_ratio=sum(expense)/sum(income)*100)
>> gender education count spending_ratio
1 F College 3 133.1
2 F NoCollege 5 139.4
3 M College 6 72.4
4 M NoCollege 7 63.9
and finally adding residence:
df %>% group_by(gender, education, residence) %>% summarise(count=n(), spending_ratio=sum(expense)/sum(income)*100)
>> gender education residence count spending_ratio
1 F College Own 1 142.2
2 F College Rent 2 131.0
3 F NoCollege Own 3 302.2
4 F NoCollege Rent 2 36.5
5 M College Own 1 43.5
6 M College Rent 5 77.3
7 M NoCollege Own 4 59.9
8 M NoCollege Rent 3 73.4
What I would like to achieve is to generate a treemap-like plot with all the above information included. But as you can see, the treemap plot is ways away from I want. What I want to get is a map that is similar to the image at the top, where the size of each rectangle represents the count and the color represent the SIR and all the levels of the tree are included.
Any help is deeply appreciated.


