I'm using the iris dataset in R. I filtered the dataset so iris$Species == setosa or versicolor. I then created a scatter plot where the x axis is Sepal.Length and the y axis is Sepal.Width. The points were highlighted based on Species, and 2 different linear regression lines were added to the scatter plot based on species.
Here are my questions:
- Is it possible to get the slope equations/ slope values for the 2 lines (setosa or versicolor) from the scatter plot? If so, how?
- Is it possible to use a statistical test to see if the slope equations/ slope values for the 2 lines (setosa or versicolor) were significantly different from one another?
Please let me know if/when you can.
Thanks ahead of time.
-P.S.
Here is the figure:
Here is the R code to generate the plot:
# creates data for scatter plot
## dataset of interest
iris
## for iris
colnames (iris)
### creates dataset with just cases where iris$Species == setosa or versicolor
#### unique values for iris$Species
unique(iris$Species)
#### loads tidyverse package
library(tidyverse)
##### filters dataset with just cases where iris$Species == setosa or versicolor
iris__setosa_or_versicolor <- iris %>% filter(iris$Species != "virginica")
##### turns iris__setosa_or_versicolor to dataframe
iris__setosa_or_versicolor <- data.frame(iris__setosa_or_versicolor)
##### unique values for iris__setosa_or_versicolor$Species
unique(iris__setosa_or_versicolor$Species)
## creates scatter plot
### loads ggplot2
library(ggplot2)
### Basic scatter plot
scatter_plot__sepal_length_x_sepal_width__points_is_species <- ggplot(iris__setosa_or_versicolor, aes(x=Sepal.Length, y=Sepal.Width)) + geom_point()
scatter_plot__sepal_length_x_sepal_width__points_is_species
### Basic scatter plot with regression line added
scatter_plot__sepal_length_x_sepal_width__points_is_species <- ggplot(iris__setosa_or_versicolor, aes(x=Sepal.Length, y=Sepal.Width)) + geom_point() + geom_smooth(method=lm, se=FALSE, color="green")
scatter_plot__sepal_length_x_sepal_width__points_is_species
### Basic scatter plot separated by Species
scatter_plot__sepal_length_x_sepal_width__points_is_species <- ggplot(iris__setosa_or_versicolor, aes(x=Sepal.Length, y=Sepal.Width, color=Species, shape=Species)) + geom_point() + geom_smooth(method=lm, se=FALSE, fullrange=TRUE) + labs(title="Scatter plot of Sepal.Length X Sepal.Width with dots as Species where Species is setosa or versicolor", x="Sepal.Length", y = "Sepal.Width") + scale_colour_manual(values = c("#ff0000","#0000ff"))
scatter_plot__sepal_length_x_sepal_width__points_is_species
scatter_plot__sepal_length_x_sepal_width__points_is_species <-
scatter_plot__sepal_length_x_sepal_width__points_is_species + theme(panel.background = element_rect(fill = "white", colour = "white", size = 0.5, linetype = "solid"), panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "lightblue"), panel.grid.minor = element_line(size = 0.25, linetype = 'solid', colour = "lightblue"))
scatter_plot__sepal_length_x_sepal_width__points_is_species
scatter_plot__sepal_length_x_sepal_width__points_is_species <-
scatter_plot__sepal_length_x_sepal_width__points_is_species + geom_point(size=3)
scatter_plot__sepal_length_x_sepal_width__points_is_species
### displays scatter plot
scatter_plot__sepal_length_x_sepal_width__points_is_species
EDIT 1:
Response to comment:
What do you mean in 2.? Do you want to also add the result of the test as an annotation on the figure? Or just compare the slopes independently of the figure? Please, edit your question. I will answer once this is clear. (As a general comment, try to avoid including in the example code snippet details that are irrelevant to your question, such as change in background color and size of points.)
I'm interested in comparing the slopes independent of the figure. I want to see if there are differences between the regression lines and how to interpret those differences.
Response to answer:
Run a regression using lm.
Then use ANCOVA on those regressions to see a slope difference.
Thanks. I think I've tried to do what you said. The anova table comparing the model with v. without the interaction was significant. I think this means that there is a difference between the slopes of the regression based on the grouping variable species. Is this interpretation correct?
The code is below. was the code done properly?
Follow up question for this: How do I find the slopes of the 2 regression lines (iris$Species = setosa v. versicolor) based on the figures?
Here is the code with ANCOVA comparing the 2 regressions:
## comparing the slopes of the regression lines using ANCOVA
# ---- NOTE: DV - Sepal.Width
# ---- NOTE: IV - Sepal.Length
# ---- NOTE: grouping variable: Species
# ---- NOTE: dataset: iris__setosa_or_versicolor
# ---- NOTE: based on this site: https://stats.stackexchange.com/questions/51780/how-to-perform-an-ancova-in-r
### create interaction_regression_model
interaction_regression_model <- aov(Sepal.Width~Sepal.Length*Species,data=iris__setosa_or_versicolor)
#### gives summary of interaction_regression_model
summary(interaction_regression_model)
### create no_interaction_regression_model
no_interaction_regression_model <- aov(Sepal.Width~Sepal.Length+Species,data=iris__setosa_or_versicolor)
#### gives summary of no_interaction_regression_model
summary(no_interaction_regression_model)
### compare 2 regression models, using ancova through anova command
anova(no_interaction_regression_model,interaction_regression_model)
stat_poly_eq()
from package 'ggpmisc'. See the answers to this old and popular question. There are also other answers using other approaches, but I prefer my approach as described in my answer to this earlier question. – Pedro Aphalo