0
votes

I have two large datasets (more than 1000 variables in each), one of which has all the variables of the second, plus additional variables. I would like to get a list of all these additional variables, and then drop them and append one dataset to another. I have tried the command dta_equal, but got the same problem found here: http://www.stata.com/statalist/archive/2011-08/msg00308.html

I guess append, keep() cannot realize what I want to do directly, i.e., cannot append dataset while drop additional variables since I have to manually type in variables one by one in the keep() option, which is not realistic given my large dataset.

Are there any ways to deal with this?

1
Check ssc describe cfvars.Roberto Ferrer

1 Answers

1
votes

There are several Stata commands that can be useful here.

The unab command is used in the first example to make a list of variable in the dataset with fewer variables. The second and third example use the describe command to obtain the list of variables in a dataset not currently in memory.

The final part the the example shows how to use extended macro list functions to obtain a list of common variables and the set of variables not common to both datasets.

* simulate 2 datasets, one has more variables than the other
sysuse auto, clear
save "data1.dta", replace
gen x = _n
gen y = -_n
save "data2.dta", replace

* example 1: drop after append
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta"
replace source = 2 if mi(source)
keep `vcommon' source

* example 2: drop first then append
clear
describe using "data1.dta", varlist short
local vcommon `r(varlist)'
use `vcommon' using "data2.dta", clear
gen source = 2
append using "data1.dta"
replace source = 1 if mi(source)

* example 3: append and keep on the fly
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta", keep(`vcommon')
replace source = 2 if mi(source)

* use extended macro list functions to manipulate variable list
clear
describe using "data1.dta", varlist short
local vlist1 `r(varlist)'
describe using "data2.dta", varlist short
local vlist2 `r(varlist)'
local vcommon : list vlist1 & vlist2
local vinonly1 : list vlist1 - vlist2
local vinonly2 : list vlist2 - vlist1
dis "common variables = `vcommon'"
dis "variables in data1 not found in data2 = `vinonly1'"
dis "variables in data2 not found in data1 = `vinonly2'"