1
votes

I have a few different datasets, and for each I need to keep only a list of variables. However, when I run a simple keep command, I end up with an error since not all of the variables in the list I supply are present in each dataset. Is there a simple way of solving this, say, an option for keep that I'm missing? If not, is there a way that for each dataset I can define the varlist in keep as being only those from this master list that are present in the current dataset?

Thanks all. I'm just starting to use Stata (previously R), so sometimes I'm still stuck in the R way of looking at things. I'd appreciate any tips.

3
I would also like to know this.Frank
isvar (SSC) filters a namelist into a list of variables and other names.Nick Cox

3 Answers

2
votes

Expanding on ander2ed's answer, you can define a "master list" of variables to keep, and then only keep the matching variables from different data sets -- e.g.:

local keepvars = "make weight mpg length"

sysuse auto, clear // contains the four variables above (and others)
qui ds 
local dsvars `r(varlist)'
local keeplist : list keepvars & dsvars
di "`keeplist'"
keep `keeplist'

sysuse autornd, clear // contains only make, weight, & mpg
qui ds 
local dsvars `r(varlist)'
local keeplist : list keepvars & dsvars
di "`keeplist'"
keep `keeplist'

If desired, this could be fairly easily made into a loop:

// loop approach
local keepvars = "make weight mpg length"
local dslist = "auto autornd"
foreach ds of local dslist {
    qui sysuse `ds', clear
    qui ds
    local dsvars `r(varlist)'
    local keeplist : list keepvars & dsvars
    keep `keeplist'
    di as input ">>> `ds'"
    ds
    // save
}
2
votes

I would suggest using a combination of describe and some extended macro functions. Suppose you have two .dta files you want to compare, set1 and set2.

You could then do something along the lines of:

describe set1, varlist
local set1vars `r(varlist)'

describe set2, varlist
local set2vars `r(varlist)'

local both : list set1vars & set2vars

This will create a local macro, both, which contains a string with the variable names that exist in both data sets. Use this macro inside of a keep command to keep only variables which exist in both sets.

A more thorough example would look something like:

local keeplist "make mpg foreign price"

/* Describe auto dataset */
describe using "C:/Program Files (x86)/Stata13/ado/base/a/auto.dta", varlist
local setlist1 `r(varlist)'

local keep : list keeplist & setlist1

tempfile auto
use `keep' using "C:/Program Files (x86)/Stata13/ado/base/a/auto.dta"
save `auto'

describe using "C:/Program Files (x86)/Stata13/ado/base/a/autornd.dta", varlist
local setlist2

local keep : list keeplist & setlist2

use `keep' using "C:/Program Files (x86)/Stata13/ado/base/a/autornd.dta", clear

/* Do whatever you want with now similar datasets */
* i.e., 
merge 1:1 make using `auto'

Note in the example above that you can issue describe on the data without having it read into memory. Following from this logic, it is then quite easy to incorporate this into a loop as @Brendan Cox illustrates.

Other options involve unab and cfvars (available from ssc).

See a similar question here for more discussion on the topic.

1
votes

Similar to Brendan's answer, you can use a foreach loop to create locals. You can use the command isvar, but I prefer just to use capture des and rely on the return code. For the example below, suppose all the variables we are interested in keeping are "a,b,c,d"

    forval n = 1/2{
      use data_set_`n', clear
      foreach potential_var in a b c d{
        capture qui des `a'
        if _rc == 0{
          local keep_list "`keep_list'" `a'
        }
      }
      keep `keep_list'
      save data_set_`n'_kept, replace
    }