This continues discussion of Stata solutions. The solution by @Pearly Spencer looks backward and forward from observations with missing values and so is fine for the example with just two observations per group, and possibly fine for some other situations.
An alternative approach makes use, as appropriate, of the community-contributed commands mipolate
and stripolate
from SSC as explained also at https://www.statalist.org/forums/forum/general-stata-discussion/general/1308786-mipolate-now-available-from-ssc-new-program-for-interpolation
Examples first, then commentary:
clear
input ID dum1a dum2a dum3a str3 var1a
1 0 1 . "hi"
1 0 . 0 "hi"
2 1 . . "bye"
2 0 0 1 ""
2 0 1 . ""
end
gen long obsno = _n
foreach v of var dum*a {
quietly count if missing(`v')
if r(N) > 0 capture noisily mipolate `v' obsno, groupwise by(ID) generate(`v'_2)
}
foreach v of var var*a {
quietly count if missing(`v')
if r(N) > 0 capture noisily stripolate `v' obsno, groupwise by(ID) generate(`v'_2)
}
list
+----------------------------------------------------------------+
| ID dum1a dum2a dum3a var1a obsno dum3a_2 var1a_2 |
|----------------------------------------------------------------|
1. | 1 0 1 . hi 1 0 hi |
2. | 1 0 . 0 hi 2 0 hi |
3. | 2 1 . . bye 3 1 bye |
4. | 2 0 0 1 4 1 bye |
5. | 2 0 1 . 5 1 bye |
+----------------------------------------------------------------+
Notes:
The groupwise
option of mipolate
and stripolate
uses the rule: replace missing values within groups with the non-missing value in that group if and only if there is only one distinct non-missing value in that group. Thus if the non-missing values in a group are all 1, or all 42, or whatever it is, then interpolation uses 1 or 42 or whatever it is. If the non-missing values in a group are 0 and 1, then no go.
The variable obsno
created here plays no role in that interpolation and is needed solely to match the general syntax of mipolate
.
There is no assumption here that groups consist of just two observations or have the same number of observations. A common playground for these problems is data on families whenever some variables were recorded only for certain family members but it is desired to spread the values recorded to other family members. Naturally, in real data families often have more than two members and the number of family members will vary.
This question exposed a small bug in mipolate, groupwise
and stripolate, groupwise
: it doesn't exit as appropriate if there is nothing to do, as in dum1a
where there are no missing values. In the code above, this is trapped by asking for interpolation if and only if missing values are counted. At some future date, the bug will be fixed and the code in this answer simplified accordingly, or so I intend as program author.
mipolate, groupwise
and stripolate, groupwise
both exit with an error message if any group is found with two or more distinct non-missing values; no interpolation is then done for any groups, even if some groups are fine. That is the point of the code capture noisily
: the error message for dum2a
is not echoed above. As program author I am thinking of adding an option whereby such groups will be ignored but that interpolation will take place for groups with just one distinct non-missing value.