0
votes

I have the following data frame which contains 4 columns of data in addition to the vector of labels c.

Time <-c(1:4)

d<-data.frame(Time,
x1= rpois(n = 4, lambda = 10),
x2= runif(n = 4, min = 1, max = 10),
x3= rpois(n = 4, lambda = 5),
x4= runif(n = 4, min = 1, max = 5),
c=c(1,1,2,3))

I would like to use ggpolt to plot 4 curves"x1,..,x4" above each others where each curve is colored according to the label. So curves x1 and x2 are colored by the same color since they have the same label where as curves x3 and x4 in different colors.

I did the following

d %>% pivot_longer(-c(Time,x1,x2,x3,x4))%>%
   rename(class=value) %>% select(-name) %>%
   pivot_longer(-c(Time,class)) %>%
   mutate(Label=ifelse(Time==max(Time,na.rm = T),name,NA),
          Label=ifelse(duplicated(Label),NA,Label)) %>%

  ggplot(aes(x=Time,y=value,color=factor(class),group=name))+
  geom_line()+
  labs(color='class')+
  scale_color_manual(values=c('red','blue','green'))+
  geom_label_repel(aes(label = Label),
                   nudge_x = 1.5,
                   na.rm = TRUE,show.legend = F,color='black')

but I don't get the needed plot, the resulted curves are not colored according to the label. I want x1 and x2 in red, x3 in blue and x4 in green.


To add: I would like to obtain the same plot above in the following general case, where I can't add the vector c to the data frame as length(c) is not equal to length(x1)=...=length(x4)

Time <-c(1:5)
d<-data.frame(Time,
x1= rpois(n = 5, lambda = 10),
x2= runif(n = 5, min = 1, max = 10),
x3= rpois(n = 5, lambda = 5),
x4= runif(n = 5, min = 1, max = 5))

and c=c(1,1,2,3)

3

3 Answers

1
votes

As you point out in your comments, it is only possible to put the vector of colors as a column in the original data.frame because it happens to be square, but this is a dangerous way to store the information because the colors really belong to the columns rather than the rows. It's better to assign the colors separately and then join into the long format data by variable name prior to plotting.

Below is an example of how I'd do this with your data.

First, prepare the data without the color mapping for each variable, we'll do that next:

# load necessary packages
library(tidyverse)
library(ggrepel)

# set seed to make simulated data reproducible
set.seed(1)

# simulate data
Time <-c(1:4)

d <- data.frame(Time,
              x1 = rpois(n = 4, lambda = 10),
              x2 = runif(n = 4, min = 1, max = 10),
              x3 = rpois(n = 4, lambda = 5),
              x4 = runif(n = 4, min = 1, max = 5))

Next, make a separate data.frame that maps the color grouping to the variable names. At some point you'll want to make this a factor (i.e. discrete rather than continuous) to map it to color so I just do it here but it can be done later in the ggplot call if you prefer. Per your request, this solution easily scales with your dataset without needing to manually set each level, but it requires that your vector of color mappings is in the same order and the same length as the variable names in d unless you have some other way to establish that relationship.

# create separate df with color groupings for variable in d
color_grouping <- data.frame(var = names(d)[-1],
                             color_group = factor(c(1, 1, 2, 3)))

Then you pivot_longer and do a join to merge the color mapping with the data for plotting.

# pivot d to long and merge in color codes
d_long <- d %>%
  pivot_longer(cols = -Time, names_to = "var", values_to = "value") %>%
  left_join(., color_grouping)

# inspect final table prior to plotting to confirm color mappings
head(d_long, 4)

# # A tibble: 4 x 4
#   Time var   value color_group
#   <int> <chr> <dbl> <fct>
# 1     1 x1     8    1
# 2     1 x2     1.56 1
# 3     1 x3     4    2
# 4     1 x4     4.97 3

Finally, generate line plot where color is mapped to the color_group variable. To ensure you get one line per original variable you also need to set group = var. For more info on this check the documentation on grouping.

# plot data adding labels for each line
p <- d_long %>%
  ggplot(aes(x = Time, y = value, group = var, color = color_group)) +
  geom_line() +
  labs(color='class') +
  scale_color_manual(values=c('red','blue','green')) +
  geom_label_repel(aes(label = var),
                   data = d_long %>% slice_max(order_by = Time, n = 1),
                   nudge_x = 1.5,
                   na.rm = TRUE,
                   show.legend = F,
                   color='black')

p

This produces the this plot:

grouped line plot

In your comment you suggested wanting to separate out and stacking the plots. I'm not sure I fully understood, but one way to accomplish this is with faceting.

For example if you wanted to facet out separate panels by color_group, you could add this line to the plot above:

p + facet_grid(rows = "color_group")

Which gives this plot:

faceted plot

Note that the faceting variable must be put in quotes.

1
votes

You were on the right path, but you need a little bit of a different structure to use ggplot:

# delete old color column
d$c <- NULL 

# reshape df
plot.d <- reshape2::melt(d, id.vars = c("Time"))

# create new, correct color column
plot.d$c <- NA
plot.d$c[plot.d$variable == "x1"] <- 1
plot.d$c[plot.d$variable == "x2"] <- 1
plot.d$c[plot.d$variable == "x3"] <- 2
plot.d$c[plot.d$variable == "x4"] <- 3

# plot
ggplot(plot.d, aes(x=Time, y=value, color=as.factor(c), group = variable))+
  geom_line() +
  labs(color='class')+
  scale_color_manual(values=c('red','blue','green'))

Note that I omitted the labels for brevity, but you can add them back in using the same logic. The code above gives the following result:

ggplot2 result

1
votes

Here is a solution for how I understood your question. The DF is brought in the long format, the variable c is replaced with mutate / case_when with the number code you have used.

I have set a seed for better reproducibility.

library(tidyverse)
library(ggrepel)

set.seed(1)
# YOUR DATA
Time <- c(1:4)
d <- data.frame(Time,
  x1 = rpois(n = 4, lambda = 10),
  x2 = runif(n = 4, min = 1, max = 10),
  x3 = rpois(n = 4, lambda = 5),
  x4 = runif(n = 4, min = 1, max = 5),
  c = c(1, 1, 2, 3)
)


d %>%
  pivot_longer(cols = x1:x4) %>%     # make it long
  mutate(c = as.factor(case_when(    # replace consistently
    name == "x1" | name == "x2" ~ 1, # according to YOUR DATA
    name == "x3" ~ 2,
    name == "x4" ~ 3
  ))) %>%
  mutate(
    Label = ifelse(Time == max(Time, na.rm = T), name, NA),
    Label = ifelse(duplicated(Label), NA, Label)
  ) %>%
  ggplot(aes(x = Time, y = value, color = c, group = name)) +
  geom_line() +
  labs(color = "class") +
  scale_color_manual(values = c("red", "blue", "green")) + # YOUR CHOICE
  geom_label_repel(aes(label = Label),
    nudge_x = 1.5,
    na.rm = TRUE, show.legend = F, color = "black"
  )

ADDED

You could leave the c out and color according to name. The color code was neccessary because you wanted 2 names with the same color. If that is not needed, the following code can do it.

d %>%
  pivot_longer(cols = x1:x4) %>%     # make it long
  mutate(
    Label = ifelse(Time == max(Time, na.rm = T), name, NA),
    Label = ifelse(duplicated(Label), NA, Label)
  ) %>%
  ggplot(aes(x = Time, y = value, color = name, group = name)) +
  geom_line() +
  geom_label_repel(aes(label = Label),
    nudge_x = 1.5,
    na.rm = TRUE, show.legend = F, color = "black"
  )