0
votes

SAS is my primary software for this project. I am using SAS to call 7zip. Both SAS and 7zip are 64 bit versions. The objective is to read compressed US NOAA weather station data for 4 years -- about 4,000 stations, so approx. 16,000 files.

Each file contains multiple, variable length, undelimited records with each record containing date, time, and weather information for a given station (e.g., temperature, visibility, precipitation, and so on). This problem is not about reading the records. It's about reading the files.

These files are not stored using an extension of any type, e.g., there is no *.txt, *.dat, *.gz, *.tar. Nothing defining a file type is used in their naming. I have checked this by turning the ‘File Name Extension’ option on and off in Windows File Explorer. File extensions appear for other information but not for the NOAA files. Each file name has 3 fields. The first two fields define a unique NOAA weather station (USAF and WBAN respectively) and the last field is the year. Here are some representative file names:

702120-26646-2011

702120-26646-2012

702120-26646-2013

etc.

Since the files are compressed, I am using 7zip to uncompress them with a macro call from SAS. Here is the SAS macro syntax for these calls:

%do year=2011 %to 2014;

filename in pipe "c:/7-zip/7z.exe x

""C:\data\stuff\weather\data\extracted.zip\extracted\&&usaf-&&wban-&&year\"" -so" lrecl=3000;

run;

%end;

And here is resolved code for 1 file:

MLOGIC(LOOPS): %DO loop beginning; index variable YEAR; start value is 2011; stop value is 2014; by value is 1.

SYMBOLGEN: Macro variable USAF resolves to 702120

SYMBOLGEN: Macro variable WBAN resolves to 26646

SYMBOLGEN: Macro variable YEAR resolves to 2011

MPRINT(LOOPS): filename in pipe "c:/7-zip/7z.exe x ""C:\data\stuff\weather\data\extracted.zip\extracted\702120-26646-2011\"" -so" lrecl=3000;

MPRINT(LOOPS): run;

NOTE: The infile IN is:

Unnamed Pipe Access Device,

PROCESS=c:/7-zip/7z.exe x

"C:\data\stuff\weather\data\extracted.zip\extracted\702120-26646-2011\" -so, RECFM=V,LRECL=3000

Stderr output:

ERROR: The system cannot find the path specified.

C:\data\stuff\weather\data\extracted.zip\extracted\702120-26646-2011

System ERROR:

The system cannot find the path specified.

Clearly there are no errors in syntax but SAS cannot find the file.

The thing that is confusing me is that the file exists or resides in the folder exactly as specified by the resolved path.

Is there a mistake in the 7zup call such that it can’t find the file? For instance should a 7zip option other than “-so” be used?

What else could be going wrong here? Any suggestions are most welcome!

1
Did you try running those commands directly from the prompt to see if it provided any hints as to the issue?Robert Penridge
Also, while SAS may not care whether you use / or \ the operating system will definitely care. Not sure how the pipe command resolves them but you may just want to change all path references to use \ to be sure.Robert Penridge
Assuming Windows 10... Click on the Start button in windows. Type in cmd and select the "Command Prompt" option. From the prompt that appears, try and run the command you are trying to run within SAS. If it works in the prompt, it should work in SAS. If it doesn't work, it should provide you additional feedback. It won't work from the prompt unless you correct the / chars to \ so I'd definitely recommend doing that to start with.Robert Penridge
@RobertPenridge Interesting suggestion. The first attempt returned 'System ERROR: The directory name is invalid. c:\7-zip\7z.exe x ""C:\data\stuff\weather\data\extracted.zip\extracted\"" -so " While the second attempt c:\7-zip\7z.exe x ""C:\data\stuff\weather\data\extracted.zip\extracted\423630-99999-2013\"" -so " returned this: Command Line Error: Empty file path. To me this suggests that 7zip thinks the NOAA file name is another folder?DJohnson
Yeah basically all we're trying to do is remove SAS from the equation and get the command working from the operating system first. Once that works, it's simply a case of substituting it into your SAS code. Try removing one set of double quotes, c:\7-zip\7z.exe x "C:\data\stuff\weather\data\extracted.zip\extracted\" -soRobert Penridge

1 Answers

1
votes

SAS now supports reading ZIP files directly. Try something like this:

filename in zip "C:\data\stuff\weather\data\extracted.zip"
   member="&usaf.-&wban.-&year." 
   lrecl=3000
;