0
votes

I'm currently importing Stata-created .dta files into SAS with the following:

proc import datafile='myfile.dta' out=test dbms=dta replace;
run;

In order to save space and bandwidth when backing up files, I'd like to only keep compressed versions of the dta files around. Can I read a compressed dta file "on-the-fly" with SAS?

I've tried:

filename foo pipe 'gunzip -c myfile.dta.gz';

proc import datafile=foo  out=test dbms=dta replace;
run;

but SAS says ERROR: Random access not allowed.

I also tried proc cimport, but that doesn't seem to support .dta files. I'm sure I could use x commands to unzip then delete at the bottom of the program, but was hoping for a cleaner solution, since I'll be asking about 50 other SAS/Stata/R programmers to implement this.

We are running SAS 9.2 ts2m3 on 64-bit Linux.

UPDATE

@Joe provided some nice insight on why proc import doesn't work with a pipe for .dta files, and suggested a "temporary unzipping".

SAS

I plan on putting this in a macro so users can import a dta.gz with a simple macro call.

* import file ;
x gunzip -c /home/banjer/data/myfile.dta.gz > /home/banjer/data/myfile.dta ;

proc import datafile="/home/banjer/data/myfile.dta" out=mydata dbms=dta replace;
run;

* delete temp uncompressed file ;
x rm /home/banjer/data/myfile.dta ;


* save file ;
proc export data=mydata dbms=dta
  file="/home/banjer/data/jtest.dta"
  dbms=dta replace;
run;

x gzip /home/banjer/data/jtest.dta ;

Stata

I found two Stata modules here for using and saving gzipped files. The commands are guse and gsave. Note that the trailing ".gz" needs to be left off, which is a little annoying. The plus side is that if myfile.dta is not compressed, then guse will still read it in. This allows our analysts to replace any existing use and save commands with guse/gsave.

// import
guse "/home/banjer/data/myfile.dta"  

// save
gsave "/home/banjer/data/myfile.dta"  
1
This doesn't have anything to do with R....Hong Ooi
It's unclear to me if he intends to ask the same question for R, or has a different solution already in mind for R (read the last full paragraph).Joe
Thanks, I should not have included that r tag yet. I may be asking the same question re: R later, so I didn't want to ask too many questions in one.Banjer
Curious. Can you use the COMPRESS option in Stata? Would that help sufficiently, and would it be readable by SAS?Joe
The compress command [NB] within Stata may compress a dataset to produce another. Once saved, such a dataset is just as readable as any other dataset. The same file format is used for a .dta file. compress in Stata has nothing to with any compression using any kind of zip.Nick Cox

1 Answers

4
votes

I don't believe there is a way to do this directly. If you had a text file, you could do what you're trying to do easily by the method you attempted to employ. However, PROC IMPORT other than DBMS=CSV or TAB uses random access (ie, goes back and forward in the file rather than a sequential read), so it won't deal with a byte stream effectively.

You could write your own Stata interpreter, but that sounds like it's beyond the scope of your project. (Stata files are not that hard to read, so you probably could handle it like a byte stream, but this is still likely weeks of work.) If you actually do want to attempt this, I can point you to the documentation required to do it.

The simplest option, IMO, is to gunzip to a temporary location, read it, and then delete the temporary file.