3
votes

Suppose I have a dataframe with some missing values:

df <- data.frame(
  group = rep(c("A", "B", "C"), each = 3),
  x = runif(9),
  y = c(runif(6), NA, NA, NA)
)

And I want to plot it using an aesthetic mapping and/or a facet:

ggplot(df, aes(x, y)) +
  geom_point(aes(color = group)) +
  facet_grid(.~group, drop = T) +
  theme_bw()

Which produces the following warning message and graph:

Warning message:
Removed 3 rows containing missing values (geom_point). 

enter image description here

As you can see, there are no observations for y in the C group, which means no data for the group can be plotted. However, ggplot still creates an empty C facet and a C legend entry. Is there a way to get ggplot to realize that it is removing all of the data from group C, and to remove the corresponding facet and legend entry?

One solution is obviously to remove these rows from the underlying data. However, my non-simplified dataframe dozens of columns that could be used as group or axis variables, with blocks of NA values scattered throughout. This means I would need to subset the data differently for every graph I want to create. I'm hoping for a simpler solution.

I've seen related questions dealing with unused factors in single facets and subsets of data, but the solutions presented there don't seem to work with missing data due to NA values.

EDIT to clarify additional complexity: the data at the top are simplified, and suggest a simple solution such as na.omit(). However, my real data look something more like this (still simplified, obviously):

df <- data.frame(
  group = rep(c("A", "B", "C"), each = 3),
  v = c(runif(3), rep(NA, 6)),
  w = c(NA, NA, NA, runif(6)),
  x = runif(9),
  y = c(runif(6), NA, NA, NA),
  z = runif(9)
)

I want to create many different graphs, showing the relationships between different variables. So, if I want to graph x vs. z, I would show all three facets and legend entries, while if I graph w vs. y I would show only B. Running na.omit() on this dataframe will delete every row.

Obviously I could subset the dataframe to only the columns I will use for graphing and than remove NA rows. However, this will require me to create a new dataframe for every graph, which seems tedious and inelegant. For this reason, I'm hoping for a more specific ggplot-based solution. (Of course I will accept an answer such as "there is no ggplot solution, you must create a new dataframe for each graph" if this is indeed the case).

1
Use na.omit(df) instead of df in you ggplot call: ggplot(na.omit(df), aes(x, y)) + ... will give you the desired result.Jaap
@ProcrastinatusMaximus see edit above for why your suggestion will not work (at least in current form). While the question you linked to may contain the basis for one workable solution, I do not believe it is a duplicate, since I am asking about distinctly different behavior which may have an alternative, simpler solution.Joe
Ok, I reopened and posted a solution. HTH.Jaap

1 Answers

2
votes

With the updated example, you can use either na.omit or complete.cases to get the desired result. With:

ggplot(df[complete.cases(df[,c('w','y')]),], aes(w, y)) +
  geom_point(aes(color = group)) +
  facet_grid(.~group, drop = TRUE) +
  theme_bw()

or:

ggplot(na.omit(df[,c('group','w','y')]), aes(w, y)) +
  geom_point(aes(color = group)) +
  facet_grid(.~group, drop = TRUE) +
  theme_bw()

you get:

enter image description here


Old answer: Use na.omit(df) instead of df in you ggplot call:

ggplot(na.omit(df), aes(x, y)) +
  geom_point(aes(color = group)) +
  facet_grid(. ~ group) +
  theme_bw()

will give you the desired result:

enter image description here