ORIGINAL QUESTION
While importing a large dataset in work, I noticed some unexpected behaviour where Stata appears to "forget" a local macro, when using the append
command.
This seems all the more strange as it appears to be a phenomenon specific to this one command (I tested with save
and the code worked as expected).
*****************
** SET UP
******************
local datasets "auto.dta auto2.dta"
global data "/Users/Seansmac/Desktop/stata_question"
save "$data/test_data.dta", replace emptyok
local datasets "auto.dta auto2.dta"
** Save Stata data in two Excel sheets. This replicates the status of my raw data at work.
foreach dataset in `datasets'{
di "`dataset' loaded"
sysuse `dataset', clear
gen data_name = "`dataset'"
tab data_name
export excel using "$data/auto_excel.xlsx", sheet("`dataset'") first(variables) sheetreplace
di "`dataset' saved in excel"
}
*****************
** Demo of Problem
******************
import excel "$data/auto_excel.xlsx", desc
local worksheets `r(N_worksheet)'
di `worksheets'
forvalues i = 1/`worksheets' {
di " Sheet number `i'"
local shtname`i' `r(worksheet_`i')'
di "loading database: `shtname`i''"
import excel "$data/auto_excel.xlsx", sheet("`shtname`i''") clear firstrow
di "database: `shtname`i'' loaded"
append using "$data/test_data.dta", force
di "database: `shtname`i'' appended"
}
***** Show only the same data was appended twice
use "$data/test_data.dta", clear
tab data_name
** I include this tab to demonstrate that only one of the two data sets is appended.
*****************
** END
*****************
Apologies if the example is a little cluttered but I often find it helpful to use display
when working with locals. To run the code, all that is required is that you change the global data.
To maintain fidelity with my problem in work I include the import excel
section; I don't understand the problem sufficiently to make the example any more minimal.
ORIGINAL QUESTION EDITED
Below are two code chunks. The first demonstrates that the append
command appears to work as I would expect it to (I note that I had forgot to use save
in my original question). This chunk demonstrates that although appending a dataset with an empty dataset may not be entirely intuitive, it still works fine. The advantage of this method is that it eliminates the need for a conditional statement when loading files.
In the second code chunk, I try to use the append
command in the same basic way, but this time in a loop. This code chunk is copy and pasted from Pearly Spencer with three minor changes:
- I save an empty dataset at the beginning
- I comment-out the logical
if
statements - I
append
with the empty dataset instead (and thensave
it, so the second time round it shouldn't be empty).
The local macro which Stata "forgets" is shtname
. If you examine the display
statements, nothing is printed after the first loop. This is the location of my question. To further demonstrate this, the tab
command at the end of the script shows that the variable data_name
has 148 observations of auto.dta
and none of auto2.dta
. This shows the same (ie the first) data set was appended twice. This suggests (to me) that the append
part of the script works fine, but there is a problem with the local marco, shtname
.
* DEMONSTRATE APPEND APPEARS TO WORK *
clear all
cd "[**INSERT CD**]"
*Create empty data set to append later
save "test_data_noloop.dta", replace emptyok
* load first dataset
sysuse auto.dta, clear
* Gen a variable indicating what dataset it is
gen dataset = "auto_1"
* append data with empty dataset
append using "test_data_noloop.dta"
save "test_data_noloop.dta", replace
clear
*load second dataset
sysuse auto2.dta, clear
* gen dataset variable again
gen dataset = "auto_2"
* append with the previously saved dataset
append using "test_data_noloop.dta"
* Demonstrate both datasets have been appended
tab dataset
* FOLLOW PEARLY SPENCER, WITH SMALL ADJUSTMENTS *
clear all
cd "[**INSERT CD**]"
local datasets "auto.dta auto2.dta"
** Added the following line
save "test_data.dta", replace emptyok
foreach dataset in `datasets'{
display "`dataset' loaded"
sysuse `dataset', clear
generate data_name = "`dataset'"
tab data_name
export excel using "auto_excel.xlsx", sheet("`dataset'") first(variables) sheetreplace
display "`dataset' saved in excel"
}
import excel "auto_excel.xlsx", desc
local worksheets `r(N_worksheet)'
display `worksheets'
forvalues i = 1 / `worksheets' {
display " Sheet number `i'"
local shtname`i' `r(worksheet_`i')'
display "loading database: `shtname`i''"
import excel "auto_excel.xlsx", sheet("`shtname`i''") clear firstrow
*if `i' == 1 save "test_data.dta", replace
display "database: `shtname`i'' loaded"
*if `i' > 1 {
append using "test_data.dta", force
save "test_data.dta", replace
*}
display "database: `shtname`i'' appended"
}
use "test_data.dta", clear
tab data_name
To address some remarks in the comments, auto2.dta can be found when typing sysuse dir
into the console. It is thus a dataset available to all. I have tried my best to keep my code replicable, and unless I am mistaken all that needs to be done is to set the cd
for the above code to work.
Secondly, I have tried hard to ensure I haven't made a stupid logical error (as mentioned above, I realise I omitted saving my file in my original question, which would indeed mean I am appending an empty dataset each time). That said, it may be a case that I've looked at this problem for so long I can no longer see the wood from the trees; so please go easy if it's still a one liner type issue!
Finally, I never said Stata is forgetting the local macro, merely that it appears to do so. Hence I ask the question to understand what is going on (or, more likely, where I've made the mistake).
SCREEN SHOT OF MY OUTPUT See red marks where locals are not being displayed.
***EDIT #3
This image appears to show the (as yet) unexplained behaviour stems from append
and not import
.
append
conflicts with the loop. With that said, I think i will call it a day and focus on helping others who find my answers helpful. – user8682794return list
to see what r-class results remain in memory before you copy them to locals. – Nick Cox