0
votes

ORIGINAL QUESTION

While importing a large dataset in work, I noticed some unexpected behaviour where Stata appears to "forget" a local macro, when using the append command.

This seems all the more strange as it appears to be a phenomenon specific to this one command (I tested with save and the code worked as expected).

*****************
** SET UP
******************
local datasets "auto.dta auto2.dta"
global data "/Users/Seansmac/Desktop/stata_question"
save "$data/test_data.dta", replace emptyok
local datasets "auto.dta auto2.dta"


** Save Stata data in two Excel sheets. This replicates the status of my raw data at work.
foreach dataset in `datasets'{


    di "`dataset' loaded"
     sysuse `dataset', clear
     gen data_name = "`dataset'"
     tab data_name
     export excel using "$data/auto_excel.xlsx", sheet("`dataset'") first(variables) sheetreplace
     di "`dataset' saved in excel"
     } 

*****************
** Demo of Problem
******************

import excel "$data/auto_excel.xlsx", desc

local worksheets `r(N_worksheet)'
di `worksheets'

  forvalues i = 1/`worksheets' { 
    di " Sheet number  `i'"


     local shtname`i' `r(worksheet_`i')'
     di "loading database: `shtname`i''"

     import excel "$data/auto_excel.xlsx", sheet("`shtname`i''")  clear firstrow

          di "database: `shtname`i'' loaded"

      append using "$data/test_data.dta", force

      di "database: `shtname`i'' appended"        
}


***** Show only the same data  was appended twice
     use "$data/test_data.dta", clear
     tab data_name

** I include this tab to demonstrate that only one of the two data sets is appended. 

*****************
** END
*****************

Apologies if the example is a little cluttered but I often find it helpful to use display when working with locals. To run the code, all that is required is that you change the global data.

To maintain fidelity with my problem in work I include the import excel section; I don't understand the problem sufficiently to make the example any more minimal.

ORIGINAL QUESTION EDITED

Below are two code chunks. The first demonstrates that the append command appears to work as I would expect it to (I note that I had forgot to use save in my original question). This chunk demonstrates that although appending a dataset with an empty dataset may not be entirely intuitive, it still works fine. The advantage of this method is that it eliminates the need for a conditional statement when loading files.

In the second code chunk, I try to use the append command in the same basic way, but this time in a loop. This code chunk is copy and pasted from Pearly Spencer with three minor changes:

  1. I save an empty dataset at the beginning
  2. I comment-out the logical if statements
  3. I append with the empty dataset instead (and then save it, so the second time round it shouldn't be empty).

The local macro which Stata "forgets" is shtname. If you examine the display statements, nothing is printed after the first loop. This is the location of my question. To further demonstrate this, the tab command at the end of the script shows that the variable data_name has 148 observations of auto.dta and none of auto2.dta. This shows the same (ie the first) data set was appended twice. This suggests (to me) that the append part of the script works fine, but there is a problem with the local marco, shtname.

* DEMONSTRATE APPEND APPEARS TO WORK *

clear all
cd "[**INSERT CD**]"

*Create empty data set to append later
save "test_data_noloop.dta", replace emptyok

* load first dataset
sysuse auto.dta, clear
* Gen a variable indicating what dataset it is
gen dataset = "auto_1"
* append data with empty dataset
append using "test_data_noloop.dta"
save "test_data_noloop.dta", replace

clear
*load second dataset
sysuse auto2.dta, clear
* gen dataset variable again
gen dataset = "auto_2"
* append with the previously saved dataset
append using "test_data_noloop.dta"

* Demonstrate both datasets have been appended
tab dataset

* FOLLOW PEARLY SPENCER, WITH SMALL ADJUSTMENTS *

clear all
cd "[**INSERT CD**]"


local datasets "auto.dta auto2.dta"

** Added the following line
save "test_data.dta", replace emptyok 

foreach dataset in `datasets'{
    display "`dataset' loaded"
    sysuse `dataset', clear
    generate data_name = "`dataset'"
    tab data_name
    export excel using "auto_excel.xlsx", sheet("`dataset'") first(variables) sheetreplace
    display "`dataset' saved in excel"
} 


import excel "auto_excel.xlsx", desc

local worksheets `r(N_worksheet)'
display `worksheets'

forvalues i = 1 / `worksheets' { 
    display " Sheet number  `i'"
    local shtname`i' `r(worksheet_`i')'
    display "loading database: `shtname`i''"
    import excel "auto_excel.xlsx", sheet("`shtname`i''")  clear firstrow
    *if `i' == 1 save "test_data.dta", replace
    display "database: `shtname`i'' loaded"
    *if `i' > 1 {
        append using "test_data.dta", force
        save "test_data.dta", replace
    *}
    display "database: `shtname`i'' appended"        
}


use "test_data.dta", clear
tab data_name

To address some remarks in the comments, auto2.dta can be found when typing sysuse dir into the console. It is thus a dataset available to all. I have tried my best to keep my code replicable, and unless I am mistaken all that needs to be done is to set the cd for the above code to work.

Secondly, I have tried hard to ensure I haven't made a stupid logical error (as mentioned above, I realise I omitted saving my file in my original question, which would indeed mean I am appending an empty dataset each time). That said, it may be a case that I've looked at this problem for so long I can no longer see the wood from the trees; so please go easy if it's still a one liner type issue!

Finally, I never said Stata is forgetting the local macro, merely that it appears to do so. Hence I ask the question to understand what is going on (or, more likely, where I've made the mistake).

SCREEN SHOT OF MY OUTPUT See red marks where locals are not being displayed.enter image description here

***EDIT #3 This image appears to show the (as yet) unexplained behaviour stems from append and not import.

enter image description here

1
What exactly are you trying to do? Append sheet 2 in a Stata dataset that already has sheet 1? It looks that you are using append in the wrong wayuser8682794
@SeánMcK you are not doing anything different from what i proposed. Appending an empty dataset in the first iteration of the loop does not make sense. The fact remains that there is a logical error in your thinking. In addition, your local macro question does not make any sense whatsoever either. Which macro does Stata (appear) to forget? You have yet to tell us.user8682794
All local macros are being displayed fine on my computer. And just because something 'works' does not mean it is logically correct or good programming practice.user8682794
If after all this discussion you still cannot see why your approach is illogical, this has nothing to do with me. The point of my previous comment was that hacks have unintended consequences. Chances are that the way you use append conflicts with the loop. With that said, I think i will call it a day and focus on helping others who find my answers helpful.user8682794
@SeánMcK issue return list to see what r-class results remain in memory before you copy them to locals.Nick Cox

1 Answers

1
votes

My understanding is that you want to append successive Excel sheets into one Stata dataset.

Below is a working version of your toy example.

Set up:

clear all
local datasets "auto.dta auto2.dta"

Save Stata data in two excel sheets:

foreach dataset in `datasets'{
    display "`dataset' loaded"
    sysuse `dataset', clear
    generate data_name = "`dataset'"
    tab data_name
    export excel using "auto_excel.xlsx", sheet("`dataset'") first(variables) sheetreplace
    display "`dataset' saved in excel"
} 

Demonstrate the solution to the problem:

import excel "auto_excel.xlsx", desc

local worksheets `r(N_worksheet)'
display `worksheets'

forvalues i = 1 / `worksheets' { 
    display " Sheet number  `i'"
    local shtname`i' `r(worksheet_`i')'
    display "loading database: `shtname`i''"
    import excel "auto_excel.xlsx", sheet("`shtname`i''")  clear firstrow
    if `i' == 1 save "test_data.dta", replace
    display "database: `shtname`i'' loaded"
    if `i' > 1 {
        append using "test_data.dta", force
        save "test_data.dta", replace
    }
    display "database: `shtname`i'' appended"        
}

Show the results:

use "test_data.dta", clear
tab data_name

(I broke the code in different snippets for better legibility.)


EDIT:

Stata can use only one dataset at a time. The way the append command works is by 'attaching' the data from the specified external dataset to those already loaded into memory (i.e used). The reason the version of your example is not giving you the desired outcome is because you are trying to append an empty dataset every time after you import an Excel sheet. This is an error in logic.


EDIT 2:

The output generated:

. clear all

. local datasets "auto.dta auto2.dta"

. 
. foreach dataset in `datasets'{
  2.     display "`dataset' loaded"
  3.     sysuse `dataset', clear
  4.     generate data_name = "`dataset'"
  5.     tab data_name
  6.     export excel using "auto_excel.xlsx", sheet("`dataset'") first(variables) sheetreplace
  7.     display "`dataset' saved in excel"
  8. } 
auto.dta loaded
(1978 Automobile Data)

  data_name |      Freq.     Percent        Cum.
------------+-----------------------------------
   auto.dta |         74      100.00      100.00
------------+-----------------------------------
      Total |         74      100.00
file auto_excel.xlsx saved
auto.dta saved in excel
auto2.dta loaded
(1978 Automobile Data)

  data_name |      Freq.     Percent        Cum.
------------+-----------------------------------
  auto2.dta |         74      100.00      100.00
------------+-----------------------------------
      Total |         74      100.00
file auto_excel.xlsx saved
auto2.dta saved in excel

. 
. import excel "auto_excel.xlsx", desc

      Sheet | Range
  ----------+----------
   auto.dta | A1:M75
  auto2.dta | A1:M75

. 
. local worksheets `r(N_worksheet)'

. display `worksheets'
2

. 
. forvalues i = 1 / `worksheets' { 
  2.     display " Sheet number  `i'"
  3.     local shtname`i' `r(worksheet_`i')'
  4.     display "loading database: `shtname`i''"
  5.     import excel "auto_excel.xlsx", sheet("`shtname`i''")  clear firstrow
  6.     if `i' == 1 save "test_data.dta", replace
  7.     display "database: `shtname`i'' loaded"
  8.     if `i' > 1 {
  9.         append using "test_data.dta", force
 10.         save "test_data.dta", replace
 11.     }
 12.     display "database: `shtname`i'' appended"        
 13. }
 Sheet number  1
loading database: auto.dta
file test_data.dta saved
database: auto.dta loaded
database: auto.dta appended
 Sheet number  2
loading database: auto2.dta
database: auto2.dta loaded
(note: variable rep78 was byte in the using data, but will be str9 now)
file test_data.dta saved
database: auto2.dta appended

. 
. use "test_data.dta", clear

. tab data_name

  data_name |      Freq.     Percent        Cum.
------------+-----------------------------------
   auto.dta |         74       50.00       50.00
  auto2.dta |         74       50.00      100.00
------------+-----------------------------------
      Total |        148      100.00

. 
end of do-file