0
votes

My question's title may be a little bit ambiguous.

Previously, I wanted to "acquire complete list of subdirs" and then read the files in these subdirs into Stata (see this post and this post).

Thanks to @Roberto Ferrer's great suggestion, I almost manage to do this. But I encountered another problem then. Because I have so many separate files, the length of local macro seems to hit its upper bound. After the command local n: word count Stata sends an error message:

macro substitution results in line that is too long.

The line resulting from substituting macros would be longer than allowed. The maximum allowed length is 645,216 characters, which is calculated on the basis of set maxvar. You can change that in Stata/SE and Stata/MP. What follows is relevant only if you are using Stata/SE or Stata/MP.

The maximum line length is defined as 16 more than the maximum macro length, which is currently 645,200 characters. Each unit increase in set maxvar increases the length maximums by 129.The maximum value of set maxvar is 32,767. Thus, the maximum line length may be set up to 4,227,159 characters if you set maxvar to its largest value.

r(920);

When I reduce the number of subdirs to 5, Stata works fine. Since having roughly 100 subdirs, I suppose to replicate the actions for 20 times. Well, it's manageable, but I still want to know if I can fully automate this process , more specifically, to "exhaust" the max allowable macro length,import the files and add another group of subdirs next time .

Below you can find my code:

//====================================
//=== read and clean projects data ===
//====================================
version 14
set linesize 80
set more off

clear
macro drop _all
set linesize 200
cd G:\Data_backup\Soufang_data


*----------------------------------
* Read all files within dictionary
*----------------------------------


* Import the first worksheets 1:"项目首页" 2:"项目概况" 3:"成交详情"
* worksheet1

filelist, directory("G:\Data_backup\Soufang_data") pattern(*.xlsx)

* Add pattern(*.xlsx) provent importing add file type( .doc or .dta)

gen tag = substr(reverse(dirname),1,6) == "esuoh/"
keep if tag==1
gen path = dirname+"\"+filename
qui valuesof path if tag==1
local filelist = r(values)

split dirname, parse("\" "/")
ren dirname4 citylist
drop dirname1-dirname3 dirname5
qui valuesof citylist if tag==1
local city = r(values)

local count = 1
local n:word count `filelist'

  forval i = 1/`n' {
      local file    : word `i' of `filelist'
      local cityname: word `i' of `city'

      ** don't add xlsx after `file', suffix has been added
      ** write "`file'" rather than `file', I don't know why but it works
      qui import excel using "`file'",clear
      cap qui sxpose,clear
      cap qui drop in 1/1
      gen city = "`cityname'"


      if `count'==1 {
          save house.dta,replace emptyok
         }
      else          {
          qui append using house
          qui save house.dta,replace emptyok
          }

      local ++count
      } 

Thank you.

1

1 Answers

2
votes

You do not need to store the whole list of files in a macro. filelist creates a database of files that you want to work with. Just save it and reload it for each file you want to process. You also use a very inefficient way to append datasets. As the appended dataset grows, the cost of reloading and saving it become very high and can slow down the whole process to a crawl.

Here's a sketch of how to process your Excel files

filelist, directory(".") pattern(*.xlsx)
save "myfiles.dta", replace

local n = _N

forval i = 1/`n' {

    use in `i' using "myfiles.dta", clear

    local f = dirname + "/" + filename

    qui import excel using "`f'",clear

    tempfile res`i'
    save "`res`i''"

} 

clear
forval i = 1/`n' {

    append using "`res`i''"

} 

save "final.dta", replace