0
votes

I posted following question on https://stats.stackexchange.com/questions/117578/density-distribution-of-outcomes-of-2-dice-rolled but did not get any response.

This question is related to: A histogram with a bar for each frequency value

Two dice are rolled and the sum is plotted. The histogram is as expected but density graph shows different densities of 2 and 12 and the plot is assymetric. Why is this so? Amongst the outcomes of 2 dice rolled, the chances of 2 are equal to that of 12. Why is it that the density graph is of unequal values?

num.dices <- 2L
num.rolls <- 100000L
outcomes <- matrix(sample(1:6, num.dices * num.rolls, replace = TRUE),
                   nrow = num.rolls, ncol = num.dices)
sums <- rowSums(outcomes)

Histogram:

ggplot(data.frame(sums), aes(x=factor(sums)))+geom_histogram()

enter image description here

Density plot:

ggplot(data.frame(sums), aes(x=factor(sums), fill=factor(sums)))+geom_density()

enter image description here

I also tried:

ggplot(data.frame(sums), aes(x=factor(sums), fill=factor(sums)))+geom_density(aes(y = ..count..))

enter image description here

1
It looks like a bandwidth problem in your kernel in density. But why do you want to use density() in the first place when the distribution is discrete?J.R.
Should we not get symmetrical graph with plots for 2 being same as that of 12? How can we confirm and correct bandwidth problem?rnso
we should, but it seems to me the bandwidth is increasing in the sum. Don't know how to adjust for it in ggplot(), but I don't think it makes much sense either, you are probably looking for something like: plot(density(sums)) in ggplot(). You should be able to ajdust it somehow.J.R.
I tried adding y = ..count.. which makes it better but still not correct.rnso
It seems to me, that you are doing 11 different kernel-estimations with 11 diff. bandwitdths, why not just: ggplot(data.frame(sums), aes(x=sums, fill=2))+geom_density(), I'm not quite sure what you are trying to archive here.J.R.

1 Answers

0
votes

It seems to me, that you are doing 11 different kernel-estimations with 11 diff. bandwitdths, instead do:

ggplot(data.frame(sums), aes(x=sums, fill=2))+geom_density()

or you could add group=1 if you insist to do it with the extra arguments:

ggplot(data.frame(sums), aes(x=sums, fill=factor(sums)))+geom_density(aes(group=1))