I have a dataframe of 11 variables x 60 rows; the entries are two-valued factor ('Male'/'Female') and contain NAs. (Each row is a single household, with up to 11 members. Hence the column names 'A4M1'...'A4M11').
What is the simplest paradigm to tabulate the total counts of 'Male','Female' so they can be barplot'ted? Ideally my output would be a length-2 named vector of numeric, i.e. directly operate on the factor as a factor, so we keep the labels.
I've been trying reshape,melt,cast,stack,tabulate,table,colwise,sum,aggregate,summarise,by,plyr...
My current working code is:
> tabulate(stack(colwise(as.numeric)(myData), na.rm=TRUE)$values)
[1] 162 151
which is bad since I don't want to coerce to numeric and lose the labels.
See my answer below
table(as.matrix(myData))
is also not ideal since it loses the order of factor levels, and returns us alphabetical order of labels.
tabulate()
seems to be the most suitable fn. To convert the dataframe to a list I use stack(..., na.rm=TRUE)$values
, which is clunky but does the job.
However stack()
is not ideal since it won't work on factors, so I have to convert it with colwise(as.numeric)(myData)
, which throws away the labels.
I could reapply factor(..., labels=c('Male','Female'))
to the output from tabulate()
, but that's just clunky.
So, can you do better?
(Using a standard paradigm, other than writing case-specific code)
> head(myData)
A4M1 A4M2 A4M3 A4M4 A4M5 A4M6 A4M7 A4M8 A4M9 A4M10 A4M11
1 Female Male Male <NA> Male Male Male <NA> <NA> <NA> <NA>
2 Female Male Male <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 Female Male Male <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Female Male Female <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Female Male Male Female Female Male <NA> <NA> <NA> <NA> <NA>
6 Female Male Male <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 ...
> str(myData)
'data.frame': 60 obs. of 11 variables:
$ A4M1 : Factor w/ 2 levels "Male","Female": 2 2 2 2 2 2 1 2 2 2 ...
$ A4M2 : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
$ A4M3 : Factor w/ 2 levels "Male","Female": 1 1 1 2 1 1 2 1 1 2 ...
$ A4M4 : Factor w/ 2 levels "Male","Female": NA NA NA NA 2 NA NA 1 NA 2 ...
$ A4M5 : Factor w/ 2 levels "Male","Female": 1 NA NA NA 2 NA NA 1 NA 2 ...
$ A4M6 : Factor w/ 2 levels "Male","Female": 1 NA NA NA 1 NA NA NA NA NA ...
$ A4M7 : Factor w/ 2 levels "Male","Female": 1 NA NA NA NA NA NA NA NA NA ...
$ A4M8 : Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
$ A4M9 : Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
$ A4M10: Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
$ A4M11: Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
table(as.matrix(test))
do what you require? It will keep the labels when barplotted as inbarplot(table(as.matrix(test)))
- thelatemailtable(as.matrix(test))[levels(test$A4M3)]
- thelatemail