206
votes

With this data frame ("df"):

year pollution
1 1999 346.82000
2 2002 134.30882
3 2005 130.43038
4 2008  88.27546

I try to create a line chart like this:

  plot5 <- ggplot(df, aes(year, pollution)) +
           geom_point() +
           geom_line() +
           labs(x = "Year", y = "Particulate matter emissions (tons)", title = "Motor vehicle emissions in Baltimore")

The error I get is:

geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?

The chart appears as a scatter plot even though I want a line chart. I tried to replace geom_line() with geom_line(aes(group = year)) but that didn't work.

In an answer I was told to convert year to a factor variable. I did and the problem persists. This is the output of str(df) and dput(df):

'data.frame':   4 obs. of  2 variables:
 $ year     : num  1 2 3 4
 $ pollution: num [1:4(1d)] 346.8 134.3 130.4 88.3
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr  "1999" "2002" "2005" "2008"

structure(list(year = c(1, 2, 3, 4), pollution = structure(c(346.82, 
134.308821199349, 130.430379885892, 88.275457392443), .Dim = 4L, .Dimnames = list(
    c("1999", "2002", "2005", "2008")))), .Names = c("year", 
"pollution"), row.names = c(NA, -4L), class = "data.frame")
6
It gives no error when I run it. Its likely that df is not what you think it is. Please state your question in reproducible form, i.e. show the output of dput(df).G. Grothendieck
could be that your variables are factors, then you'd need to convert them to numericerc
@G.Grothendieck I posted what you said. I also converted to numeric and still have the problem.megashigger
You really should state questions in reproducible form. It's hard to help you if we can't recreate the error.Mario Becerra
is it possible to rank the line point in descending order of "pollution"?AndyIan

6 Answers

416
votes

You only have to add group = 1 into the ggplot or geom_line aes().

For line graphs, the data points must be grouped so that it knows which points to connect. In this case, it is simple -- all points should be connected, so group=1. When more variables are used and multiple lines are drawn, the grouping for lines is usually done by variable.

Reference: Cookbook for R, Chapter: Graphs Bar_and_line_graphs_(ggplot2), Line graphs.

Try this:

plot5 <- ggplot(df, aes(year, pollution, group = 1)) +
         geom_point() +
         geom_line() +
         labs(x = "Year", y = "Particulate matter emissions (tons)", 
              title = "Motor vehicle emissions in Baltimore")
38
votes

You get this error because one of your variables is actually a factor variable . Execute

str(df) 

to check this. Then do this double variable change to keep the year numbers instead of transforming into "1,2,3,4" level numbers:

df$year <- as.numeric(as.character(df$year))

EDIT: it appears that your data.frame has a variable of class "array" which might cause the pb. Try then:

df <- data.frame(apply(df, 2, unclass))

and plot again?

4
votes

I had similar problem with the data frame:

group time weight.loss
1 Control  wl1    4.500000
2    Diet  wl1    5.333333
3  DietEx  wl1    6.200000
4 Control  wl2    3.333333
5    Diet  wl2    3.916667
6  DietEx  wl2    6.100000
7 Control  wl3    2.083333
8    Diet  wl3    2.250000
9  DietEx  wl3    2.200000

I think the variable for x axis should be numeric, so that geom_line knows how to connect the points to draw the line.

after I change the 2nd column to numeric:

 group time weight.loss
1 Control    1    4.500000
2    Diet    1    5.333333
3  DietEx    1    6.200000
4 Control    2    3.333333
5    Diet    2    3.916667
6  DietEx    2    6.100000
7 Control    3    2.083333
8    Diet    3    2.250000
9  DietEx    3    2.200000

then it works.

1
votes

Start up R in a fresh session and paste this in:

library(ggplot2)

df <- structure(list(year = c(1, 2, 3, 4), pollution = structure(c(346.82, 
134.308821199349, 130.430379885892, 88.275457392443), .Dim = 4L, .Dimnames = list(
    c("1999", "2002", "2005", "2008")))), .Names = c("year", 
"pollution"), row.names = c(NA, -4L), class = "data.frame")

df[] <- lapply(df, as.numeric) # make all columns numeric

ggplot(df, aes(year, pollution)) +
           geom_point() +
           geom_line() +
           labs(x = "Year", 
                y = "Particulate matter emissions (tons)", 
                title = "Motor vehicle emissions in Baltimore")
1
votes

I got a similar prompt. It was because I had specified the x-axis in terms of some percentage (for example: 10%A, 20%B,....). So an alternate approach could be that you multiply these values and write them in the simplest form.

1
votes

I found this can also occur if the most of the data plotted is outside of the axis limits. In that case, adjust the axis scales accordingly.