0
votes

I'm working with a dataset where the values of my variable of interest are hidden. I have the range (min max), mean, and sd of this variable and for each observation, I have information on which decile the value for observation lies in. Is there any way I can impute some values for this variable using the random number generator or rnormal() suite of commands in Stata? Something along the lines of:

set seed 1

gen imputed_var=rnormal(mean,sd,decile) if decile==1

Appreciate any help on this, thanks!

2

2 Answers

1
votes

I am not familiar with Stata, but the following may get you in the right direction.

In general, to generate a random number in a certain decile:

  • Generate a random number in [(decile-1)/10, decile/10], where decile is the desired decile, from 1 through 10.
  • Find the quantile of the random number just generated.

Thus, in pseudocode, the following will achieve what you want (I'm not sure about the exact names of the corresponding functions in Stata, though, which is why it's pseudocode):

decile = 4 # 4th decile
# Generate a random number in the decile (here, [0.3, 0.4]).
v = runiform((decile-1)/10, decile/10)
# Convert the number to a normal random number
q = qnormal(v) # Quantile of the standard normal distribution
# Scale and shift the number to the desired mean
# and standard deviation
q = q * sd + mean
1
votes

This is precisely the suggestion just made by @Peter O. I make the same assumption he did: that by a common abuse of terminology, "decile" is your shorthand for decile class, bin or interval. Historically, deciles are values corresponding to cumulative probabilities 0.1(0.1)0.9, not any bins those values delimit.

. clear

. set obs 100
number of observations (_N) was 0, now 100

. set seed 1506

. gen foo = invnormal(runiform(0, 0.1))

. su foo

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         foo |        100   -1.739382    .3795648  -3.073447  -1.285071

and (closer to your variable names)

 gen wanted = invnormal(runiform(0.1 * (decile - 1), 0.1 * decile))