I'm currently importing Stata-created .dta files into SAS with the following:
proc import datafile='myfile.dta' out=test dbms=dta replace;
run;
In order to save space and bandwidth when backing up files, I'd like to only keep compressed versions of the dta files around. Can I read a compressed dta file "on-the-fly" with SAS?
I've tried:
filename foo pipe 'gunzip -c myfile.dta.gz';
proc import datafile=foo out=test dbms=dta replace;
run;
but SAS says ERROR: Random access not allowed.
I also tried proc cimport
, but that doesn't seem to support .dta files. I'm sure I could use x
commands to unzip then delete at the bottom of the program, but was hoping for a cleaner solution, since I'll be asking about 50 other SAS/Stata/R programmers to implement this.
We are running SAS 9.2 ts2m3 on 64-bit Linux.
UPDATE
@Joe provided some nice insight on why proc import doesn't work with a pipe for .dta
files, and suggested a "temporary unzipping".
SAS
I plan on putting this in a macro so users can import a dta.gz
with a simple macro call.
* import file ;
x gunzip -c /home/banjer/data/myfile.dta.gz > /home/banjer/data/myfile.dta ;
proc import datafile="/home/banjer/data/myfile.dta" out=mydata dbms=dta replace;
run;
* delete temp uncompressed file ;
x rm /home/banjer/data/myfile.dta ;
* save file ;
proc export data=mydata dbms=dta
file="/home/banjer/data/jtest.dta"
dbms=dta replace;
run;
x gzip /home/banjer/data/jtest.dta ;
Stata
I found two Stata modules here for using and saving gzipped files. The commands are guse
and gsave
. Note that the trailing ".gz" needs to be left off, which is a little annoying. The plus side is that if myfile.dta is not compressed, then guse
will still read it in. This allows our analysts to replace any existing use
and save
commands with guse/gsave.
// import
guse "/home/banjer/data/myfile.dta"
// save
gsave "/home/banjer/data/myfile.dta"
r
tag yet. I may be asking the same question re: R later, so I didn't want to ask too many questions in one. – BanjerCOMPRESS
option in Stata? Would that help sufficiently, and would it be readable by SAS? – Joecompress
command [NB] within Stata may compress a dataset to produce another. Oncesave
d, such a dataset is just as readable as any other dataset. The same file format is used for a .dta file.compress
in Stata has nothing to with any compression using any kind of zip. – Nick Cox