0
votes

I'm making a boxplot from the below df (I'm sorry if this is the wrong way to post a dataframe. I just copied and pasted the output from the dput function). I've used this code to make the boxplot:

IPC_15 <- tidyr::pivot_longer(Income_percap_15, -c("State", "Counties"), names_to = "Income_Per_Capita", values_to = "num") %>% 
  ggplot(aes(x="", y = Income_percap_15)) + 


geom_boxplot() + coord_cartesian(ylim = c(0, 52))
IPC_15 + labs(x = "State",
                y = "Income per Capita",
                title = "US Income per capita per state")

However I keep getting the error "Aesthetics must be either length 1 or the same as the data (52): y".

Any ideas how to fix this?

structure(list(State = structure(1:52, .Label = c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"), class = "factor"), Counties = c(67L, 29L, 15L, 75L, 58L, 64L, 8L, 3L, 1L, 67L, 159L, 5L, 44L, 102L, 92L, 99L, 105L, 120L, 64L, 16L, 24L, 14L, 83L, 87L, 82L, 115L, 56L, 93L, 17L, 10L, 21L, 33L, 62L, 100L, 53L, 88L, 77L, 36L, 67L, 78L, 5L, 46L, 66L, 95L, 254L, 29L, 14L, 133L, 39L, 55L, 72L, 23L), Income_15 = c(20780.9402985075, 30332.9655172414, 21052.5333333333, 20072.0266666667, 27902.6034482759, 27747.25, 37025.125, 28952, 47675, 23501.8507462687, 20566.0062893082, 31892.6, 21451.1136363636, 25485.7156862745, 23977.0652173913, 26555.8686868687, 24953.0476190476, 20663.6083333333, 22064.609375, 25792.3125, 33073.2083333333, 35554.4285714286, 23662.2048192771, 27610.4252873563, 18805.0487804878, 21504.7826086957, 25020.6785714286, 26336.8494623656, 26317.7058823529, 31810.4, 36084.5238095238, 21789.4545454545, 28189.7580645161, 22514.36, 31900.5094339623, 24467.7727272727, 22811.8701298701, 24311.9166666667, 25952.223880597, 9617.66666666667, 35670.6, 21411.9565217391, 25334.8939393939, 21442.4210526316, 23551.7992125984, 22552.2413793103, 28487.2142857143, 27065.3909774436, 25734.4102564103, 21710.4181818182, 26250.7222222222, 29223.652173913)), row.names = c(NA, -52L), class = "data.frame")

1
"I'm sorry if this is the wrong way to post a dataframe" - OP. Nope - that was perfect and the preferred way to post your dataframe. It tends to look nicer when formatted as code (highlight and then CTRL+K on Windows), but still very functional.chemdork123

1 Answers

1
votes

This solution has multiple parts due to a number of comments that hopefully can help you. I'll try to arrange the points accordingly:

Error Message Text and Meaning

Your error message, "Aesthetics must be either length 1 or the same as the data (52): y" is indicating that one of the aes() attributes is not mapping for all points in your dataset. The description at the end gives you the number it "should" be (52) based on what was found in mapping of one of the aesthetics or the # of observations in your datset. You have 52 rows in your dataframe, so that means one of those aesthetics are not mapped correctly. You can use "" for an aesthetic mapping, which basically means "map the entire dataframe as one". It seems the error is specifically with y=Income_percap_15. After your pivot_longer call, there is no column with that name. I think you want to use y=num there.

Intended Aesthetics and your intended plot

Your code has aesthetics indicated for x="" and y="Income_percap_15", which would indicate you want to show one boxplot for the entire dataset. However, your labs() call indicates you wish to show a boxplot for every state. While you can show the "single boxplot" for the entire dataset" (aes(x="",...)), your data will not be able to show you a boxplot for every state. A boxplot represents the distribution of data, so that means you need multiple points of "y" for every "x" value. In your dataframe, you only have one "y" value (Income per capita) for each "x" (State).

Kinda problematic limits

The limits you set (0 to 52) are applied to the y aesthetic. The y aesthetic appears to be intended to be mapped to Income per capita. In your dataframe after the pivot_longer call, that would be the "num" column, which has a minimum value of 9618 and max of 47675 - clearly out of bounds for the limit you set. That means you'll see an empty plot. If you wanted this to apply to the x aesthetic (52 States), which I believe is your intention, it's not needed here - you only need to specify the correct aesthetic. Since you indicated to apply this limit to the y axis... I'm doing an assumption here that you are looking to have horizontally-arranged boxplots. For that, you are "flipping" the axis, which would be coord_flip().

The Final plot?

Well, I wish I had better news, but as mentioned above, your intended boxplot appears to not be possible with the data you have. To "fix" your code to show you a boxplot (even though it won't be possible), here it is below. Note that the resulting "boxplot" shows lines for every state, because for every state, n=1. The "distribution" is therefore not really a distribution. Note: assume here that df is your dataframe after the pivot_longer call:

ggplot(data=df, aes(x=State, y = num)) +
    geom_boxplot() +
    coord_flip() +
    labs(y='Income per capita', title="US Income per capita per state") +
    theme(
        axis.text.y=element_text(size=7, vjust=0.3),
        plot.title=element_text(size=9)
    )

enter image description here

It actually doesn't look too bad to show "lines" instead of a "box" here, but you can certainly make the same plot and use geom_point or even geom_segment to give you the "line" look, albeit cleaner. Some other notes about the plot:

  • theme() is applied to the y axis, whereas the labs() is applied to the reverse axis as it appears. Coord_flip() means that your labeling aesthetics are applied to the reverse axis, but the theme is generally set outside of the coord_cartesian call.

  • Specific other theme elements to make it look appropriate

  • When you save or view, since your plot is "long", you should use or save in an aspect ratio that support this or your y-axis values are going to appear "squished". I think the width/height aspect ratio was 1:2 here.

  • the vjust in theme() is there to adjust the vertical positioning of the labels on the "State" axis with respect to the text. By default, the tick marks are positioned to be vertically centered with the text... but when you have capitcal and lowercase letters, the vertical center is actually a bit too high based on how we would want it to look. This nudges all the labels up a bit to correct this effect of appearing un-centered even though the labels are actually vertically centered.