4
votes

I use the following dataset (downloadable here) and the code (down below) trying to plot several graphs in one ggplot. I know that there are plenty of explanations out there, but still I do not seem to get the job done because I am confused about where to put the commands for ggplot to understand what I want.

I addition, I know that there a two ways raw data could be present: either in wide or long format. When I keep the data in wide format I have to write a lot in order to get the job done (see code and graph below), but when I convert it to the long format, ggplot complains about missing values (see code and error message down below).

This is my minimal code example:

library(ggplot2) # for professional graphs
library(reshape2) # to convert data to long format

WDI_GDP_annual <- WDI[which(WDI$Series.Name=='GDP growth (annual %)'),] # extract data I need from dataset
WDI_GDP_annual_short <- WDI_GDP_annual[c(-1,-2,-4)] # wide format
test_data_long <- melt(WDI_GDP_annual_short, id = "Time") # long format

# (only successful) graph with wide format data
ggplot(WDI_GDP_annual_short, aes(x = Time)) +
 geom_line(aes(y = Brazil..BRA., colour = "Brazil..BRA.", group=1)) +
 geom_line(aes(y = China..CHN., colour = "China..CHN.", group=1)) +
 theme(legend.title = element_blank())

# several graphs possibilities to plot data in long format and to have to write less (but all complain)
ggplot(data=test_data_long, aes(x = time, y = value, colour = variable)) +
 geom_line() +
 theme(legend.title = element_blank())

ggplot(data=test_data_long, aes(x = time, y = value, color = factor(variable))) +
 geom_line() +
 theme(legend.title = element_blank())

ggplot(test_data_long, aes(x = time, y = value, colour = variable, group = variable)) +       
 geom_line()

This is the (only) successful plot I got so far, but I do not want to need to write so much (since I want to have 6 more graphs in this ggplot):

enter image description here

I know that to use he long format would mean a more elegant way how to plot the multiplot but I what ever command I use (see above) I always get the following complain:

Error: Aesthetics must either be length one, or the same length as the dataProblems:time

Does somebody know the answer to my question?

1

1 Answers

4
votes

To start with: your data have strings ".." in your supposedly numerical columns, which will convert the entire columns to class character (or factor, depending on your stringsAsFactors settings).

If you wish to treat ".." as NA, add na.strings = ".." to your read.xxx call. This will ensure that the columns are treated as numeric. str should be your friend after you have read any data set.

library(reshape2)
library(ggplot2)

df <- read.csv(file = "https://dl.dropboxusercontent.com/u/109495328/WDI_Data.csv",
               na.strings = "..")
str(df)

# compare with
# df <- read.csv(file = "https://dl.dropboxusercontent.com/u/109495328/WDI_Data.csv")
# str(df)


# melt relevant part of the data
df2 <- melt(subset(df,
                   subset = Series.Name == "GDP growth (annual %)",
                   select = -c(Time.Code, Series.Code)),
        id.vars = c("Series.Name", "Time"))

ggplot(df2, aes(x = Time, y = value, colour = variable, group = variable)) +       
  geom_line()

enter image description here