1
votes

How would you handle breaks for a dotPlot when you have serious outliers:

I cannot transform the data to log or anything like that.

 library(mosaic)
n=300
r =c(seq(1,15,1))
binwidth = 1
outliers= c(100,400,800,700)
#outliers= c(15,14,3,5)
dat = c(sample(r ,n= 1,size = n, replace = TRUE),outliers)
quantile(dat)[4]+1.5* IQR(dat)
n=n+4
brks = c(seq(0,sd(dat)*2,binwidth),tail(seq(0,sd(dat)*2,binwidth),1)+binwidth,tail(seq(0, max(dat),binwidth),1)+binwidth)
d = data.frame( x = dat, color = c(rep("red",n/2), rep("green",n/2)))
dotPlot(d$x,  breaks = seq(min(d$x)-binwidth,max(d$x)+binwidth,binwidth), cex = .5)

If you run that code you will see 4 outliers that make the plot unreadable. How would you deal with that?

Right now the breaks go from the min to the max of the d$x by the binwidth but I think that some of those empty bins should be removed. What logic would you use to remove those bins? Bins that are over 2 standard deviations and are empty then remove them? Can you give example code?

Any idea how to create your own dot plot without using dotPlot() or dotplot().

i have the data in the "dat" dataframe below

##### HERE CAN I CREATE MY OWN DOT PLOT?
library(qdapRegex)
binwidth = 1
t = table(cut(dat, seq(0,max(dat)+1,binwidth)  ))
r_names =rownames(t)[t>0]
r_names = as.numeric(rm_between(r_names, ',', ']', extract=TRUE))
dat =data.frame(bin = r_names, data = t[t>0])
dat  #can you turn this into a dot plot where the x-axis ONLY consists of the bin column. i.e. no space between 15 and 100?

Thank you.

1

1 Answers

0
votes

This is a bit more complete an answer than my comment:

png()
dotPlot(log(d$x, 10), xlab=expression(Log[10](X)), scales=list(x=list(labels=10^(0:5)) ))
dev.off()

enter image description here