1
votes

I want to count the proportion of a variable, but the warning "factor variables may not contain negative values" always come up. After I check the label list, it contains as below:

label list w38_E1a:

w38_E1a:
-99 Refused
-98 Don't know
1 Yes
2 No

How do I remove this -99 and -98 data?

Thank you.

4
Counting proportions, removing data and what factor variables may be are three distinct issues. What is the syntax you are trying?Nick Cox

4 Answers

1
votes

Assuming that the data is coded as numeric type, then I would simply recode them to be positive because if they are categorical it shouldn't matter their sign,

recode w38_E1a (-99 = 99) (-98 = 98)
0
votes

I think you should drop those outliers,you can use drop if w38_E1a<0

0
votes

It seems that -99 and -98 are intended to code missing values, thus no outlier here. If this is the case, you should recode the values -99 and -98 of variables using the label w38_E1a to missing. To find the variables whose values are labeled with a specific value label you can use -findname- from SSC.

cap which findname
if _rc ssc install findname // install -findname if necessary

findname, vallabelname(w38_E1a)
foreach v of varlist `r(varlist)' {     
   recode `v' (-99 = .a ) (-98 = .b)
}
label def w38_E1a .a "Refused" .b "Don't know" -99 "" -98 "", modify
0
votes

I could not find a way to respond to https://stackoverflow.com/users/15742435/jesse-kaczmarski or https://stackoverflow.com/users/15819003/bing and because I have not "earned" enough reputation I can't comment on their answers directly. However, one should note that their advice can work out in a wrong way:

  1. puput0808 only showed us a the contents of a value label, however, you are trying to recode a variable with the same name or drop cases if a variable with the same name have the values -99 or -98. However, what if the variable name is not identical to the name of the value label? It could be (a) that there is no variable that is connected to this value label (in that case an error message would occur) or (b) that there are several variables connected to this value label and only one has also the name of the value label (in this case the problem would persist).
  2. puput0808 showed us the labels of -99 and -98 indicating that the values are intended to be treated as missing. In that case recoding the values to positive numbers would certainly be a mistake.