1
votes

I am trying to create indicator variables using different quantile levels. I am creating a variable that contains categories corresponding to quantiles. For one variable, the code I am using is

xtile PH_scale = PH, nq(4)
tab PH_scale, gen(PH_scale_)

Also, I know that if I want to use my own cutpoints instead of the default (e.g., nq(4)) I can define my own cut-points by using

input class
xtile PH_scale = PH, cutpoint(class)

But there are several variables for which I want to define the cut-points differently.

Normally, cut-points work as:

(−∞, x[25] ], (x[25], x[50] ], (x[50], x[75] ], (x[75], +∞)
where x[25], x[50], and x[75] are, respectively, the 25th, 50th (median), and 75th percentiles

and Stata will automatically assign numbers to each of those intervals (e.g., 1 to (−∞, x[25] ], 2 to (x[25], x[50] ], and so on)

However, what I want is

 Assign category 1 to values located in (−∞, x[25] ] AND (x[75], +∞)
 Assign category 2 to values located in (x[25], x[50] ] AND (x[50], x[75] ]

I hope I explain my problem clearly enough. I am not sure whether I can do this using the xtile command. Any other methods that can solve this problem are welcome.

1

1 Answers

1
votes

After

xtile PH_scale = PH, nq(4)

this is a easy replace

replace PH_scale = cond(inlist(PH_scale, 1, 4), 1, 2) 

Alternatively, create percentiles directly

_pctile PH, nq(4) 
gen PH_scale = cond(PH < r(r1) | PH > r(r3), 1, 2) if PH < . 

Note that indicator variables are widely defined as those with values 1 and 0, but the principles are the same either way.