A third week into my R class (please be patient with me even if it seems obvious where went wrong!), and I am struggling with a homework problem with using the R ggplot2 library. Using the built in diamonds data frame, the problem is to make a scatter plot regression line for log (carat) and log (price), but plotting only for the Fair and Ideal cut diamonds.
This is what the plot is supposed to look like
A quick background, the 3 variables in question here are carat (num), cut (Fair, Good, Very Good, Premium, Ideal), and price (int).
I start with the following code:
set.seed(123)
d <- ggplot(diamonds[sample(nrow(diamonds),5000),] #this was provided to us in the homework
d + geom_point(aes(x = log(carat), y = log(price), colour = cut) +
labs(title = 'Regression line for Fair and Ideal Cut Diamonds') +
stat_smooth(aes(x = log(carat), y = log(price), colour = cut), method = "gam")
Now, I know this is incorrect, because "colour = cut" shows ALL the cuts, but I only want "Fair" and "Ideal". The professor hinted that we should try diamonds$cut%in%c(...), and so I tried it in many different ways. One of the latest (wrong) code is:
d + geom_point(aes(x = log(carat), y = log(price), colour = diamonds[diamonds$cut%in%c("Fair","Ideal")]), alpha = 0.5) +
labs(title = 'Regression line for Fair and Ideal Cut Diamonds') +
stat_smooth(aes(x = log(carat), y = log(price), colour = diamonds[diamonds$cut%in%c("Fair","Ideal")]), method = "gam")
I continue to get error messages regardless of where I tried to subset the diamonds$cut (e.g., Length of logical index vector for '[' must equal number of columns, Aesthetics must be either length 1 or the same as the data (5000):colour).
How do I extract just the Fair and Ideal cut to make this graph?
Any help is appreciated!
filter
your data first. See this tutorial suzan.rbind.io/2018/02/dplyr-tutorial-3 – Tung