1
votes

I want to display (list) the value of a string variable DE15_WHY in Stata only when it is not missing (e.g. some subjects did not provide comments). I thought this would be easy:

list DE15_WHY if DE15_WHY != ""

This displays DE15_WHY for all subjects even if they do not have anything in DE15_WHY...

Is the string formatted wrongly? For example, does Stata think that all subjects have a valid observation for DE15_WHY? How do I fix this? I checked, and it is formatted as a string variable.

Stata also allows me to tabulate DE15_WHY, similar to R. This is a great option but does not display the entire contents of the string variable in the table. How do I get Stata to display the entire string?

2

2 Answers

3
votes

@Metrics' answer has several good details, but I will here add more.

With string variables, Stata has only one definition of missing, namely that a string is empty, and contains precisely no characters.

One or more spaces, despite usually conveying nothing to people, do not qualify as missing so far as Stata is concerned.

The term "blank" is perhaps unclear here and thus better avoided.

If spaces somehow get into your string variables a condition such as

   if trim(mystring) == "" 

selects values that are empty or that have spaces and correspondingly a condition such as

   if trim(mystring) != "" 

selects values with other content. To replace spaces with empty strings, we thus go

   replace mystring = "" if trim(mystring) == "" 

In general, if you have rather long strings, Stata necessarily has a problem of where to display them. One tip is that list will show more than tabulate. If you want a tabulate and list hybrid, check out groups from SSC, using ssc inst groups.

Although the period . is the default or system missing value for numeric variables (or numeric scalars or matrix elements) in Stata, it does not attach any special meaning to the string ".".

2
votes
sysuse auto
list  rep78 in 1/10 if  rep78 !=. # for non-missing 
tab  rep78  # default behaviour is to report only non-missing
tab rep78, missing # if you want also missing

If variable is a string with missing indicated by .

list yourvariable if yourvariable !="."

If variable is a string with missing indicated by blank

list yourvariable if yourvariable !=""

Example:

my  my1
ab  1
cd  2
    3
ef  4

list  my if  my !=""

     +----+
     | my |
     |----|
  1. | ab |
  2. | cd |
  4. | ef |
     +----+

tab will treat both blank and . as missing.

.

 tab my

         my |      Freq.     Percent        Cum.
------------+-----------------------------------
         ab |          1       33.33       33.33
         cd |          1       33.33       66.67
         ef |          1       33.33      100.00
------------+-----------------------------------
      Total |          3      100.00


tab my,missing

         my |      Freq.     Percent        Cum.
------------+-----------------------------------
            |          1       25.00       25.00
         ab |          1       25.00       50.00
         cd |          1       25.00       75.00
         ef |          1       25.00      100.00
------------+-----------------------------------
      Total |          4      100.00