0
votes

I am attempting produce a scatter plot using the ggplot2 library. My data frame (called scatterPlotData) is in this form:

115 2.3
120 1.6
.
.
.
132 4.3

(The ... signifies many other similar values). Essentially, a 2 column data frame. I also have labels to go along with each of those points. Firstly, I'm having trouble with the scatterplot itself. I'm using the following code:

p <- ggplot(scatterPlotData, aes("Distance (bp)", "Intensity"))
p + geom_point()

However, using the above code, I get the following plot:

enter image description here

Obviously, it's not a scatter plot. So, I'd be very helpful if someone could point out what I'm doing wrong.

Secondly, it's about the labels. I will have many datapoints which would have the risk of overlapping datapoints. How should I go about just putting on labels to each point using ggplot? Also, it states that I could use the directlabels package to get a good overlap free labelled scatterplot using different colors, however, I'm not sure how I would go about that with ggplot as I haven't found any documentations regarding the use of directlabels with ggplot.

Any help with either (or both) question(s) are greatly appreciated - thanks.

2

2 Answers

3
votes

Lose the inverted commas, at the moment you're making a plot of the text value... Having looked again, you will have problems with the brackets in your variable name (Distance (bp)). Change that to something without the brackets, then make the ggplot call without the inverted commas:

#Assuming Distance (bp) is the first column
names(scatterPlotData)[1] <- "Distance"
p <- ggplot(scatterPlotData, aes(Distance, Intensity) + geom_point()

As for non-overlapping labels, this is a vexed issue with lots of discussion on SO - I think you'll not get great responses from such a vague question here.

3
votes

First, it would be much more helpful if you provided a reproducible example the precisely described your data.

You should not be passing variable names in aes in quotes. I'm not sure where you got that from, there wouldn't be a single example of anyone doing that that I can think of (unless they were using aes_string which is specifically for that case).

However, it appears that you have an awkward variable name, i.e. Distance (bp). This is non-standard and not recommended. Names should not have spaces in them. The best thing to do would be to rename that column to something sensible and then do something like:

p <- ggplot(scatterPlotData, aes(x = Distance_bp,y = Intensity))
p + geom_point()

If you do not rename the column, something like this might work:

p <- ggplot(scatterPlotData, aes(x = `Distance (bp)`,y = Intensity))
p + geom_point()

Note that those are backticks, not single quotes.

As for the overlapping data, I would recommend reading here and here.