0
votes

I posted following question on https://stats.stackexchange.com/questions/117578/density-distribution-of-outcomes-of-2-dice-rolled but did not get any response.

This question is related to: A histogram with a bar for each frequency value

Two dice are rolled and the sum is plotted. The histogram is as expected but density graph shows different densities of 2 and 12 and the plot is assymetric. Why is this so? Amongst the outcomes of 2 dice rolled, the chances of 2 are equal to that of 12. Why is it that the density graph is of unequal values?

num.dices <- 2L
num.rolls <- 100000L
outcomes <- matrix(sample(1:6, num.dices * num.rolls, replace = TRUE),
                   nrow = num.rolls, ncol = num.dices)
sums <- rowSums(outcomes)

Histogram:

ggplot(data.frame(sums), aes(x=factor(sums)))+geom_histogram()

enter image description here

Density plot:

ggplot(data.frame(sums), aes(x=factor(sums), fill=factor(sums)))+geom_density()

enter image description here

I also tried:

ggplot(data.frame(sums), aes(x=factor(sums), fill=factor(sums)))+geom_density(aes(y = ..count..))

enter image description here

1
It looks like a bandwidth problem in your kernel in density. But why do you want to use density() in the first place when the distribution is discrete? - J.R.
Should we not get symmetrical graph with plots for 2 being same as that of 12? How can we confirm and correct bandwidth problem? - rnso
we should, but it seems to me the bandwidth is increasing in the sum. Don't know how to adjust for it in ggplot(), but I don't think it makes much sense either, you are probably looking for something like: plot(density(sums)) in ggplot(). You should be able to ajdust it somehow. - J.R.
I tried adding y = ..count.. which makes it better but still not correct. - rnso
It seems to me, that you are doing 11 different kernel-estimations with 11 diff. bandwitdths, why not just: ggplot(data.frame(sums), aes(x=sums, fill=2))+geom_density(), I'm not quite sure what you are trying to archive here. - J.R.

1 Answers

0
votes

It seems to me, that you are doing 11 different kernel-estimations with 11 diff. bandwitdths, instead do:

ggplot(data.frame(sums), aes(x=sums, fill=2))+geom_density()

or you could add group=1 if you insist to do it with the extra arguments:

ggplot(data.frame(sums), aes(x=sums, fill=factor(sums)))+geom_density(aes(group=1))