0
votes

Im working with the MPG data set. I am trying to make a bar graph with cylinders (cyl) on the X axis and Highway miles per gallon (hwy) on the y- axis using the code below.

ggplot(data= mpg) +
geom_bar(mapping = aes(x =cyl, y= hwy), stat = "identity")

The Y- values for Hwy in the data set are between ~20-30 mpg, but on my graph the y-axis values range from 0-2000.

Why are the Y- values different in the graph?

1

1 Answers

1
votes

It might be due to the lack of another variable. You got large values because all quantities are accumulating. If you add a variable like this, you will get what you want:

library(tidyverse)
#Code
ggplot(data= mpg,aes(x =factor(cyl), y= hwy,fill=manufacturer)) +
  geom_bar(stat = "identity",position = position_dodge(0.9))

Output:

enter image description here

Where values for hwy are now displayed properly:

#Code
summary(mpg$hwy)

Output:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  12.00   18.00   24.00   23.44   27.00   44.00 

One option to keep only two variables and analysing their relationship is using geom_point() in this way:

#Code 2
ggplot(data= mpg,aes(x =cyl, y= hwy)) +
  geom_point()

Output:

enter image description here