0
votes

I'm trying to find a way to capture into a variable the name of a file what I'm importing in SAS as I want to use it to build the name of an exported file. For exemple, I have the file TEST.xlsx in the directory c:\TEMP, what I want to import in SAS and after some manipulation of data to export the result as TEST-01.xlsx. Can someone, please, help me to do that?

Thank you, Dan

PROC IMPORT DATAFILE= "c:\TEMP\*.xlsx" DBMS=xlsx out=TABLE_START  REPLACE;          
RUN;
3
How are you importing your file? What does that code look like?Reeza
@Reeza: My code for import is: PROC IMPORT DATAFILE= "c:\TEMP\*.xlsx" DBMS=xlsx out=TABLE_START REPLACE; RUN;DanL
Can you first identify the file you're importing before import, instead of using the wildcard there? Like with a DIR command or something?Joe
In the future please edit the question with your code, do not include it in comments.Reeza
Is there one and only one file that matches the wildcard in the subdirectory? How should the code work if there are multiple files that match the wildcard?Len Greski

3 Answers

1
votes

Just find the name first.

data _null_;
  length fname $300 ;
  infile "c:\TEMP\*.xlsx" filename=fname;
  input @;
  call symputx('fname',fname);
  stop;
run;

Then you can use the filename in your IMPORT or other steps.

PROC IMPORT DATAFILE= "&fname" DBMS=xlsx out=TABLE_START  REPLACE;          
RUN;
0
votes

I didn't know you could use wildcards in PROC IMPORT. As far as I can see, there isn't an easy way to capture that information when using this method of importing data. Wild cards would typically be used in an INFILE statement and you're trying to read multiple files at once. If you have multiple XLSX files in the folder, I have no idea of which it would read either, it seems to be the first I think. And unfortunately XLSX doesn't add any more information to the log either.

I think this means you need to change your approach entirely. I would recommend this approach if you need the filename later on:

%let myfile = 'path to your excel file';

proc import out=want datafile=&myFile. dbms=xlsx replace; run;

data want2;
   set want;
   source = &myfile;
run;

If this was not Excel files there would be other options but sadly it is XLSX. If you were importing multiple CSV files for example there's the FILEVAR and FILENAME options.

If it was in the log, you could capture that in a roundabout way, but that doesn't seem to work either. This is a bit strange to me, so I'm going to repost it over on communities.sas.com to see if SAS or other power users have a suggestion.

0
votes

The OP isn't clear as to whether one or more Excel files must be processed.

Another approach to answer this question is to reference the file by using the PIPE option in the FILENAME statement to pipe the results of an operating system command, such as ls in Linux or dir in Windows.

If there are multiple files matching the wildcard as listed in the OP, and each file must be processed, one must use a technique that assigns multiple file names to SAS macro variables.

As example data I'll use Excel versions of the data from the Kaggle Pokémon with Stats database from Alberto Barradas. I saved three Excel files containing Generation 1, Generation 2, and all Pokémon to a subdirectory on a Windows-based computer.

To import the set of files, we begin by generating the list of file names, and reading it into a SAS data set.

%let dirname = /folders/myshortcuts/sf_gitrepos/pokemonData;
filename DIRLIST pipe "dir /B &dirname\*.xlsx";

data dirlist;
   length fname $256;
   infile dirlist length = reclen;
   input fname $varying256. reclen;
run;

Note, if your SAS installation is in LOCKDOWN mode, it is restricted from issuing operating system commands. In this case, you'll have to go out to the operating system and issue a command to generate the directory listing and save it to a file. In Windows it looks like:

dir /b *.xlsx > excelfiles.txt

Note that the /b option means "bare", and prints the directory listing without summary information. Since I'm using the SAS University version for this answer, I had to use the following code to read the file names into SAS.

data dirlist;
   length fname $256;
   infile "/folders/myshortcuts/sf_gitrepos/pokemonData/excelfiles.txt" length=reclen;
   input fname $varying256. reclen;
run;

Output from the SAS Log:

1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         data dirlist;
 74            length fname $256;
 75            infile "/folders/myshortcuts/sf_gitrepos/pokemonData/excelfiles.txt" length=reclen;
 76            input fname $varying256. reclen;
 77         run;

 NOTE: The infile "/folders/myshortcuts/sf_gitrepos/pokemonData/excelfiles.txt" is:
       Filename=/folders/myshortcuts/sf_gitrepos/pokemonData/excelfiles.txt,
       Owner Name=root,Group Name=vboxsf,
       Access Permission=-rwxrwx---,
       Last Modified=20May2018:03:56:41,
       File Size (bytes)=38

 NOTE: 3 records were read from the infile "/folders/myshortcuts/sf_gitrepos/pokemonData/excelfiles.txt".
       The minimum record length was 10.
       The maximum record length was 12.
 NOTE: The data set WORK.DIRLIST has 3 observations and 1 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.01 seconds
       cpu time            0.01 seconds

Next, we'll generate one SAS macro variable for each file name using PROC SQL's INTO feature as explained in William Murphy's Changing Data Set Variables to Macro Variables article from the 2007 SAS Users Conference.

proc sql noprint;
   select count(*) into :NObs from dirlist;
   select fname into :Name1-:Name%left(&NObs) from dirlist;
run;

...and the output from the SAS log:

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         proc sql;
 74            select count(*) into :NObs from dirlist;
 75            select fname into :Name1-:Name%left(&NObs) from dirlist;
 MPRINT(LEFT):  Name3
 76         run;
 NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
 77         
 78         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 91

Finally, we'll write and execute a SAS macro to iteratively run PROC IMPORT to import the Excel files.

%macro genimport;
    %local i;
    %do i = 1 %to &NObs;
      proc import out=want&i datafile="&dirname/&&Name&i" dbms=xlsx replace; 
      run;
   %end;
%mend;
%genimport;

...and the output:

1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         %macro genimport;
 74             %local i;
 75             %do i = 1 %to &NObs;
 76               proc import out=want&i datafile="&dirname/&&Name&i" dbms=xlsx replace;
 77               run;
 78            %end;
 79         %mend;
 80         %genimport;
 MPRINT(GENIMPORT):   proc import out=want1 datafile="/folders/myshortcuts/sf_gitrepos/pokemonData/gen01.xlsx" dbms=xlsx replace;
 MPRINT(GENIMPORT):   RXLX;
 MPRINT(GENIMPORT):   run;

 NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with 
       options MSGLEVEL=I.
 NOTE: The import data set has 165 observations and 13 variables.
 NOTE: WORK.WANT1 data set was successfully created.
 NOTE: PROCEDURE IMPORT used (Total process time):
       real time           0.06 seconds
       cpu time            0.03 seconds


 MPRINT(GENIMPORT):   proc import out=want2 datafile="/folders/myshortcuts/sf_gitrepos/pokemonData/gen02.xlsx" dbms=xlsx replace;
 MPRINT(GENIMPORT):   RXLX;
 MPRINT(GENIMPORT):   run;

 NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with 
       options MSGLEVEL=I.
 NOTE: The import data set has 106 observations and 13 variables.
 NOTE: WORK.WANT2 data set was successfully created.
 NOTE: PROCEDURE IMPORT used (Total process time):
       real time           0.05 seconds
       cpu time            0.04 seconds


 MPRINT(GENIMPORT):   proc import out=want3 datafile="/folders/myshortcuts/sf_gitrepos/pokemonData/Pokemon.xlsx" dbms=xlsx replace;
 MPRINT(GENIMPORT):   RXLX;
 MPRINT(GENIMPORT):   run;

 NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with 
       options MSGLEVEL=I.
 NOTE: The import data set has 800 observations and 13 variables.
 NOTE: WORK.WANT3 data set was successfully created.
 NOTE: PROCEDURE IMPORT used (Total process time):
       real time           0.16 seconds
       cpu time            0.13 seconds

To confirm that we've read the files into SAS, we can view one of the resulting SAS data sets in the SAS Studio Output Data Viewer.

enter image description here

At this point, the macro variables &Name1 etc. can be parsed with %scan() to obtain the name of the file and used to write output files once they have been further manipulated via a DATA step.