Suppose I have a dataframe with some missing values:
df <- data.frame(
group = rep(c("A", "B", "C"), each = 3),
x = runif(9),
y = c(runif(6), NA, NA, NA)
)
And I want to plot it using an aesthetic mapping and/or a facet:
ggplot(df, aes(x, y)) +
geom_point(aes(color = group)) +
facet_grid(.~group, drop = T) +
theme_bw()
Which produces the following warning message and graph:
Warning message:
Removed 3 rows containing missing values (geom_point).
As you can see, there are no observations for y in the C group, which means no data for the group can be plotted. However, ggplot still creates an empty C facet and a C legend entry. Is there a way to get ggplot to realize that it is removing all of the data from group C, and to remove the corresponding facet and legend entry?
One solution is obviously to remove these rows from the underlying data. However, my non-simplified dataframe dozens of columns that could be used as group or axis variables, with blocks of NA values scattered throughout. This means I would need to subset the data differently for every graph I want to create. I'm hoping for a simpler solution.
I've seen related questions dealing with unused factors in single facets and subsets of data, but the solutions presented there don't seem to work with missing data due to NA values.
EDIT to clarify additional complexity: the data at the top are simplified, and suggest a simple solution such as na.omit()
. However, my real data look something more like this (still simplified, obviously):
df <- data.frame(
group = rep(c("A", "B", "C"), each = 3),
v = c(runif(3), rep(NA, 6)),
w = c(NA, NA, NA, runif(6)),
x = runif(9),
y = c(runif(6), NA, NA, NA),
z = runif(9)
)
I want to create many different graphs, showing the relationships between different variables. So, if I want to graph x vs. z, I would show all three facets and legend entries, while if I graph w vs. y I would show only B. Running na.omit()
on this dataframe will delete every row.
Obviously I could subset the dataframe to only the columns I will use for graphing and than remove NA rows. However, this will require me to create a new dataframe for every graph, which seems tedious and inelegant. For this reason, I'm hoping for a more specific ggplot-based solution. (Of course I will accept an answer such as "there is no ggplot solution, you must create a new dataframe for each graph" if this is indeed the case).
na.omit(df)
instead ofdf
in youggplot
call:ggplot(na.omit(df), aes(x, y)) + ...
will give you the desired result. – Jaap