13
votes

I'm new to R and statistics and haven't been able to figure out how one would go about plotting predicted values vs. Actual values after running a multiple linear regression. I have come across similar questions (just haven't been able to understand the code). I would greatly appreciate it if you explain the code. This is what I have done so far:

# Attach file containing variables and responses
q <- read.csv("C:/Users/A/Documents/Design.csv")
attach(q)
# Run a linear regression
model <- lm(qo~P+P1+P4+I)
# Summary of linear regression results
summary(model)

The plot of predicted vs. actual is so I can graphically see how well my regression fits on my actual data.

4
Yours is a Linear Regression model so your R-sqr should give the model accuracy. It's not a typical classification problem to have predict vs actual plot - amrrs
Can you include what you have come accross? - Sotos
Just as @Ben Bolker just posted, you can for instance use abline and plot Further reading: stat.ethz.ch/pipermail/r-help//2013-February/347479.html - nilsole
@amrrs yes that's one method but I would like to see a graphical representation of it. - John
@nilsole thanks for the link - John

4 Answers

20
votes

It would be better if you provided a reproducible example, but here's an example I made up:

set.seed(101)
dd <- data.frame(x=rnorm(100),y=rnorm(100),
                 z=rnorm(100))
dd$w <- with(dd,
     rnorm(100,mean=x+2*y+z,sd=0.5))

It's (much) better to use the data argument -- you should almost never use attach() ..

 m <- lm(w~x+y+z,dd)
 plot(predict(m),dd$w,
      xlab="predicted",ylab="actual")
 abline(a=0,b=1)

enter image description here

9
votes

Besides predicted vs actual plot, you can get an additional set of plots which help you to visually assess the goodness of fit.

--- execute previous code by Ben Bolker ---

par(mfrow = c(2, 2))
plot(m)

enter image description here

2
votes

A tidy way of doing this would be to use modelsummary::augment():

library(tidyverse)
library(cowplot)
library(modelsummary)

set.seed(101)
# Using Ben's data above:
dd <- data.frame(x=rnorm(100),y=rnorm(100),
                 z=rnorm(100))
dd$w <- with(dd,rnorm(100,mean=x+2*y+z,sd=0.5))

m <- lm(w~x+y+z,dd)

m %>% augment() %>% 
  ggplot()  + 
  geom_point(aes(.fitted, w)) + 
  geom_smooth(aes(.fitted, w), method = "lm", se = FALSE, color = "lightgrey") + 
labs(x = "Actual", y = "Fitted") + 
  theme_bw()

This will work nicely for deep nested regression lists especially.

To illustrate this, consider some nested list of regressions:

Reglist <- list()

Reglist$Reg1 <- dd %>% do(reg = lm(as.formula("w~x*y*z"), data = .)) %>% mutate( Name = "Type 1")
Reglist$Reg2 <- dd %>% do(reg = lm(as.formula("w~x+y*z"), data = .)) %>% mutate( Name = "Type 2")
Reglist$Reg3 <- dd %>% do(reg = lm(as.formula("w~x"), data = .)) %>% mutate( Name = "Type 3")
Reglist$Reg4 <- dd %>% do(reg = lm(as.formula("w~x+z"), data = .)) %>% mutate( Name = "Type 4")

Now is where the power of the above tidy plotting framework comes to life...:

Graph_Creator <- function(Reglist){

  Reglist %>% pull(reg) %>% .[[1]] %>% augment() %>% 
    ggplot()  + 
    geom_point(aes(.fitted, w)) + 
    geom_smooth(aes(.fitted, w), method = "lm", se = FALSE, color = "lightgrey") + 
    labs(x = "Actual", y = "Fitted", 
         title =  paste0("Regression Type: ", Reglist$Name) ) + 
    theme_bw()
}

Reglist %>% map(~Graph_Creator(.)) %>% 
  cowplot::plot_grid(plotlist = ., ncol = 1)

enter image description here

0
votes

Same as @Ben Bolker's solution but getting a ggplot object instead of using base R

#first generate the dd data set using the code in Ben's solution, then... 

require(ggpubr)
m <- lm(w~x+y+z,dd)

ggscatter(x = "prediction",
          y = "actual",
          data = data.frame(prediction = predict(m),
                            actual = dd$w)) +
  geom_abline(intercept = 0,
              slope = 1)