0
votes

I'm fairly new to R and I've been having trouble with a plot.

I'm trying to create a line plot with: $YEAR on the X axis $METRIC on the Y axis a different-colored line for each country (meaning, a total of 3 lines on the same plot)

$COUNTRY is a factor with 3 levels

COUNTRY YEAR    METRIC
USA     2000    14.874
USA     2001    15.492
USA     2002    13.091
USA     2003    14.717
CAN     1999    15.031
CAN     2000    14.343
CAN     2001    12.972
CAN     2002    13.216
SWE     1999    14.771
SWE     2000    17.033
SWE     2001    15.932
SWE     2002    14.516
SWE     2003    15.655

When I create the plot with

plot(df$YEAR, df$METRIC, col=df$COUNTRY, type="p")

I get a plot with points for each (x,y) combination and different color for each level of the factor $COUNTRY

However, when I try to get a line for each country, with

plot(df$YEAR, df$METRIC, col=df$COUNTRY, type="l")

I get one non-stopping line, that starts with the 4 observations of "USA" and then goes back to the first year of the next country ("CAN").

screenshot attached

Can anyone explain why is this happening? Is it possible to create this plot using only the pre-built functions?

Thank you in advance for any assistance.

2
base R plot doesn't work like ggplot2: to say col=df$COUNTRY does not mean it is going to group the lines, separating the endpoints, and apply colors intuitively. I think your immediate options (1) make a single base plot, then add each country's data as an individual call to lines; (2) use segments and place NAs between each country (fragile and too much work, typically); or (3) switch to ggplot2 or lattice where grouping/faceting like what you want is a bit more natural.r2evans
Something like library(ggplot2); ggplot(df) + geom_line(aes(YEAR, METRIC, color=COUNTRY)) is more likely what you're looking for, and is relatively easy to read once you start migrating your thought-process from base-R to grammar-of-graphics.r2evans
If you stay with base-R, realize that lines (and therefore plot(..., type=";")) will only use the first color unless type="h" (which is not what you are trying to do here).r2evans

2 Answers

1
votes

Other than my comments above, here is a basic base implementation. If initially your $COUNTRY is a factor (is.factor(df$COUNTRY)), then you can skip the creation of ctryfctr and change the lines call to lines(..., col=x$COUNTRY[1]):

df$ctryfctr <- factor(df$COUNTRY)
plot(NA, xlim=range(df$YEAR), ylim=range(df$METRIC))
for (x in split(df, df$COUNTRY)) lines(x$YEAR, x$METRIC, col=x$ctryfctr[1])

sample per-country plot

0
votes

Since you seem to mix up some concepts, I thought it would be helpful to clarify things a bit.

R's base plot package is great for quick sketching without prior knowledge, but more complicated plots are defined easier with ggplot2 package. You can install it with install.packages("ggplot2"). With ggplot2 you can group the lines as you already tried, and as r2evans already pointed out.

library(ggplot2) ggplot(df) + geom_line(aes(YEAR, METRIC, group=COUNTRY, color=COUNTRY))

So, you tell the ggplot that you are using the df as your data. You define the x and y axis for geom_line inside aes(). With group= you define the grouping variable, and with color= you define that each line is using a different color.

Hope that you have great time with R and ggplot2!