4
votes

I am drawing a histogram in R, and the first two frequencies are being combined into one bin which I do not want. There are seven possible values in the data and I would like 7 bins, not 6.

Histogram

The info from the histogram drawn by R is,

$breaks
[1]  9 10 11 12 13 14 15

$counts
[1] 27  6  5  4  1 11

$density
[1] 0.50000000 0.11111111 0.09259259 0.07407407 0.01851852 0.20370370

$mids
[1]  9.5 10.5 11.5 12.5 13.5 14.5

$xname
[1] "data$hour"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

The issue is in $counts, they should be:

9-20
10-7
11-6
12-5  
13-4
14-1
15-11

The data and command for histogram and info generation is:

temp <- c(9, 9, 9, 11, 12, 14, 15, 9, 9, 9, 10, 9, 13, 13, 15, 15, 9, 
9, 9, 11, 12, 13, 15, 15, 15, 9, 9, 10, 11, 12, 9, 10, 10, 12, 
15, 9, 9, 9, 9, 10, 11, 15, 9, 10, 10, 11, 11, 12, 13, 15, 15, 
9, 9, 15)

hist(temp)
histinfo = hist(temp)
histinfo

How can this be corrected? My thought is to count the occurrences and draw a barplot, but that seems like overkill since hist is already built in. I have tried changing breaks to no avail.

Thank you.

6

6 Answers

3
votes

When using hist you need breaks to bracket both ends if you want every single item. Therefore, the following will work.

hist(temp, breaks = 8:15)

If you don't like the 8 on the x-axis you'd have to suppress it and then draw the x-axis

hist(temp, breaks = 8:15, xaxt = 'n')
axis(1, 8:14+0.5, 9:15)

Unfortunately, the built in hist function should probably be used primarily for exploration and not publication so if you know that you called the function with arguments like right = TRUE then it should be clear to you what the output of the first graph means and easily interpreted (i.e. clearly there is nothing below 8).

3
votes

Use the table function with barplot

> barplot(table(temp))

enter image description here

2
votes

You have to set breaks when drawing a histogram.

Personally, I would tabulate the data by hand and draw a barplot. Which may or may not be what you're really after.

library(reshape)
temp.melt <- melt(table(temp))

library(ggplot2)
ggplot(temp.melt, aes(x = temp, y = value)) +
  theme_bw() +
  geom_bar(stat = "identity")

enter image description here

1
votes

Though this has been answered, I find this to be the simplest while also producing the best looking default chart:

library(ggplot2)
qplot( factor(temp) )
1
votes

I contend this is a bug. Under the default arguments, the breakpoints are supposed to be right-closed, left open. Based on the documentation, for breaks=c(9, 10, 11, 12,13,14,15), breakpoints should be (9, 10], (10, 11], (11,12], (12,13], (13,14], (14,15]. Which would mean that the 9's wouldn't be plotted at all. It seems that hist is deciding that include.lowest=TRUE (despite the fact that argument is ignored unless you provide breaks a vector), so that the first interval is actually [9, 10].

-1
votes

I had the same issue just recently and had no other option than to use the HIST plot. My data started at zero but it kept combining the first two sets as described above. After playing with this for quite a while trying to set the breaks manually with no affect I was finally able to get this to work correctly by starting the break with a negative value (-). breaks=c(-1:9) ended up working for me. I hope that helps with your issue too.

Gary