2
votes

I have a dataset which contains the population for all countries in the world and they are broken down into seven regions.

China and India are outliers in the dataset given their populations are both > 1 billion people.

I have tried plotting the populations by region using a log scale for population but when I do so China is not shown as an outlier in my ggplot image. Here is the code I am using:

ggplot(nationsCombImputed, aes(y=population,x=region)) +
  geom_boxplot() +
  scale_y_continuous(trans = "log10")

which produces the following boxplot. As you can see, I wish to plot these populations using a log scale.

enter image description here

As you can see there are no outliers in East Asia and Pacific. An outlier is defined as being greater than 1.5 * interquartile range, where IQR is computed as 29338577.25 which means the following countries are considered outliers but this is not shown in the boxplot:

China Indonesia Japan Korea, Rep. Myanmar Malaysia Philippines Thailand Vietnam

The data I am using can be found here in CSV format, if anyone can explain how to get these outliers to work using the log scale and ggplot two I would be very grateful.

1
Maybe instead of transforming the scale you want to use coord_trans(y="log10")? - Robin Gertenbach

1 Answers

0
votes

As Robin mentioned, coord_trans(y="log10") should perform what you are interested in. More information on the differences between coord_trans() and scale transfomations, can be found on the below link.

what is the difference between scale transformation and coordinate system transformation