Type mismatch when replacing missing observations with previous values using time-series operators in Stata

Question

Consider the following example. I begin with an str6 'name' variable, and a year for two entities observed every other year.

clear
input str6 nameStr year
"A" 2002
"A" 2004
"A" 2006
"B" 2002
"B" 2004
"B" 2006
end

Then I use tsfill to balance the panel:

egen id = group(nameStr)
xtset id year
tsfill

The dataset is now:

input str6 nameStr year id
"A" 2002 1
""  2003 1 
"A" 2004 1
""  2005 1
"A" 2006 1
"B" 2002 2
""  2003 2 
"B" 2004 2
""  2005 2 
"B" 2006 2
end

Now I could use something like xfill to fill in the missing string identifier. Or, based on the related Stata FAQ and the documentation for Time-series varlists (help tsvarlist) I expect that something like the following to fill in the values of nameStr:

sort id year \\ not required because the data are still sorted from xtset and tsfill
replace nameStr = nameStr[_n-1] if mi(nameStr) &  id[_n-1] == id

and it does.

However, I also expect the following to produce the same behavior, and it does not.

replace nameStr = l.nameStr if mi(nameStr)

Instead Stata returns:

type mismatch
r(109);

While there are several ways to work around this (I've listed two), I'm interested in understanding why this happens. Most similar discussions address cases where two variables of differing types are involved, obviously this isn't the case here, since only one variable is involved.

Because the l.nameStr notation does not work with strings. — user8682794
To see this, replace A and B in nameStr with 1 and 2 and run the code again. — user8682794
Interesting, so then the problem is trying to do numeric things to a string variable. Surprised that I've never tried (and failed) to do this before. — Arthur Morris
@PearlySpencer is this limitation documented (I don't see it in stata.com/manuals/u11.pdf#u11.4.4)? — Arthur Morris
This makes sense in a data manipulation context, which is why I think StataCorp programmers made a value judgement here. — user8682794

Nick Cox Nick Cox · Accepted Answer · 2020-08-06T10:38:54

Stata does not allow time series operators to be applied to string variables. If you think about it you will see that previous (lagging) and following (leading) string values make sense but differences don't, at least not so much. The only simple interpretation of differences would be binary, namely strings at two times are the same or different.

So, Stata is not implying that you can't work with other string values for any panel; it just doesn't support calculations on strings using time series operators.

In addition to the syntax you mention stripolate from SSC supports string interpolation: see this Statalist thread.

Type mismatch when replacing missing observations with previous values using time-series operators in Stata

1 Answers