I am using dplyr to summarise some data and I'm grouping this by two factors. The problem is not all levels of the second factor are included within the first factor and my dataframe is not showing instances where there is no result.
I want to include an na.rm=FALSE statement (I think) but this isn't working.
I've also tried the mutate function to include all levels of the factor but it's not working either
Here is my code with the mutate included
Dataframe <- UKData %>%
filter(!is.na(REGION))%>%
group_by(REGION,EMPSIZE) %>%
summarise(NumberofEmployers=length(Employers)) %>%
mutate(EMPSIZE = factor(EMPSIZE, levels = z)) %>%
arrange(REGION,EMPSIZE)
So the issue is that not every region has all employer sizes. The employer size band contains 7 levels. I want a table to show NAs where the Region doesn't have a particular size band. Is this possible?
UPDATE,
So the data will look something like this:
Employers REGION EMPSIZE
Number 1 Scotland 1-4
Number 2 Scotland 5-49
Number 3 Scotland 50-499
Number 4 Scotland 500-999
Number 5 Scotland 1000-4999
Number 6 Scotland 5000+
Number 7 Scotland 50-499
Number 8 North West 5-49
Number 9 North West 1000-4999
Number 10 Yorkshire 5000+
Number 11 Yorkshire 50-499
Number 12 Yorkshire 5-49
Number 13 London 1-4
Number 14 London 5-49
Number 15 London 50-499
Number 16 London 500-999
Number 17 London 1000-4999
Number 18 London 5000+
Number 19 East 50-499
Number 20 East 1000-4999
So, only Scotland and London have all 6 possible size bands, the other regions do not. So the table I want should look like this:
REGION EMPSIZE number
Scotland 1-4 1
Scotland 5-49 1
Scotland 50-499 2
Scotland 500-999 1
Scotland 1000-4999 1
Scotland 5000+ 1
North West 1-4 NA
North West 5-49 1
North West 50-499 NA
North West 500-999 NA
North West 1000-4999 1
North West 5000+ NA
Yorkshire 1-4 NA
Yorkshire 5-49 1
Yorkshire 50-499 1
Yorkshire 500-999 NA
Yorkshire 1000-4999 NA
Yorkshire 5000+ 1
London 1-4 1
London 5-49 1
London 50-499 1
London 500-999 1
London 1000-4999 1
London 5000+ 1
East 1-4 NA
East 5-49 NA
East 50-499 1
East 500-999 NA
East 1000-4999 1
East 5000+ NA
In hindsight, perhaps I don't care if they are NA or in fact 0 - I do want the level shown in the table though
NA
is implicitly allowed in any collection offactor
s (I believe ...rbind(data.frame(b=letters[1:3]), data.frame(b=NA_character_))
works), but since we don't know whatUKData
looks like, it's hard to do much. Can you provide a representative sample of it? Refs: stackoverflow.com/questions/5963269, stackoverflow.com/help/mcve, and stackoverflow.com/tags/r/info. – r2evansREGION
andEMPSIZE
, even if it doesn't appear in the data, right? Take a look at the answers to these questions: stackoverflow.com/questions/32247211/fill-in-missing-rows-in-r, stackoverflow.com/questions/43233682/… – divibisan