I am trying to replicate the very useful tabulate twoway
function in Stata, where on the same output you can see the frequency, the corresponding row percentage and the corresponding column percentage for each possible combination of two categorical variables in the data.
That is, each cell in the resulting table contains three stacked values corresponding to the above numbers.
Example output in Stata: this is the tabulation of a single variable car type, where, in the data, i have 52 observations where car type is domestic
and 22 observations where car type equals foreign
.
Domestic Foreign Total
52 22 74
70.27 29.73 100.00
100.00 100.00 100.00
Is it possible to do that in Pandas? Example:
df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
'B' : ['A', 'B', 'C'] * 8,
'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
'D' : np.random.randn(24),
'E' : np.random.randn(24)})
something that would combine
pd.crosstab(df.A,df.B)
with pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=1)
and pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=0)
on the same output. What do you think? I am thinking about aggfunc
but here I need to get the whole dataframe, not just a Series...
Thanks!