First of all, please download my data set from http://alexandervanloon.nl/survey_oss.csv and then execute the following content of a script to get a few scatter plots:
# read data and attach it
survey <- read.table("survey_oss.csv", header=TRUE)
attach(survey)
# plot for inhabitants
png("scatterINHABT.png")
plot(INHABT, OSSADP, xlab="Inhabitants", ylab="Adoption of OSS", las=1)
abline(lm(OSSADP~INHABT)) # regression line (y~x)
dev.off()
# plot for inhabitants divided by 1000
png("scatterINHABT_divided.png")
plot(INHABT/1000, OSSADP, xlab="Inhabitants", ylab="Adoption of OSS", las=1)
abline(lm(OSSADP~INHABT)) # regression line (y~x)
dev.off()
# plot for inhabitants in logarithmic scale
png("scatterINHABT_log.png")
plot(INHABT, OSSADP, xlab="Inhabitants", ylab="Adoption of OSS", las=1, log="x")
abline(lm(OSSADP~INHABT)) # regression line (y~x)
dev.off()
# plot for inhabitants in logarithmic scale and divided by 1000
png("scatterINHABT_log_divided.png")
plot(INHABT/1000, OSSADP, xlab="Inhabitants", ylab="Adoption of OSS", las=1, log="x")
abline(lm(OSSADP~INHABT)) # regression line (y~x)
dev.off()
As you can see, in the first scatterplot the problem is that R
decides to use scientific notation and the data looks odd because of outliers. That's why I'd like to have the inhabitants on x-axis in thousands and have the x-axis use a logarithmic scale as well.
The problem is twofold. First, I can get rid of scientific notation by simply dividing the inhabitants by 1000, but this produces a flat horizontal regression line unlike the first plot. I know there are other ways to fix this such as Do not want scientific notation on plot axis but I couldn't adapt the code there to my situation.
Second, switching the x-axis to a logarithmic scale also makes the regression line flat. Google points to https://stat.ethz.ch/pipermail/r-help/2006-January/086500.html as a first result for a possible solution and I tried using abline(lm(OSSADP~log10(INHABT)))
which is suggested there, but that produces a vertical regression line. And if I divide both by 1000 and use a logarithmic scale, the line is also horizontal.
I'm a social scientist without any background in mathematics and statistics, so I fear I might have missed something obvious, if so my apologies. Thank you all very much for any potential help.