1
votes

I am trying to make a bar plot using base R code along with a linear fit using abline, but it seems like I am not getting the right results when using abline. At least, when looking at the regression line, and comparing it with drawing some lines using the predicted model it is way off:

df <- data.frame(year = c(2018,2019,2020), PWI = c(64.7,71.3,75.2))
barplot(PWI~year, data = df, ylim = c(0,100))
text(x,y+2,labels=as.character(as.matrix(round(df,1))))

abline(lm(PWI~I(year-2018)), lty = "dashed", col = "red")

Image showing the base R approach where abline does not align

How do I get abline to align with the barplot?

For the record, I'm interested in a base R approach with a line behaving like abline. It can be done in ggplot by:

coeff <- coefficients(lm(PWI~year, data = df))
ggplot(df,aes(year,PWI)) + 
  geom_bar(stat = "identity") + 
  geom_abline(intercept = coeff[1], slope = coeff[2])
2

2 Answers

2
votes

Save the output of barplot in a variable. It contains the center of every bar

out <- barplot(PWI~year, data = df, ylim = c(0,100))
abline(lm(PWI~I(out), data = df), lty = "dashed", col = "red")
1
votes

So what's happening here is that barplot is doing some magic for the x-coordinates that isn't the same for abline.

df <- data.frame(year = c(2018,2019,2020), PWI = c(64.7,71.3,75.2))

bp <- barplot(PWI~year, data = df, ylim = c(0,100))
print(bp)
     [,1]
[1,]  0.7
[2,]  1.9
[3,]  3.1

the values here are the actual x-coordinates used by barplot. We can draw the correct abline like so:

df <- data.frame(year = c(2018,2019,2020), PWI = c(64.7,71.3,75.2))
bp <- barplot(PWI~year, data = df, ylim = c(0,100))
text(x,y+2,labels=as.character(as.matrix(round(df,1))))

fit <- lm(PWI~I(year-2018), data = df)

# manually compute the predictions 
ycoords <- predict(fit)

lines(bp, ycoords, col = 3, lty = 3)
points(bp, ycoords, col = 3, lty = 3)

which gives me:

enter image description here

the green line is now where it should be.

edit: Note that barplot is probably turning year into a factor (just change 2020 to 3000 in your example). So the figure might skew the relation in the data if you're plotting something at varying intervals.