0
votes

I have a number of csv files in a folder. They all have the same structure (3 columns). The SAS code below imports all of them into one dataset. It includes the 3 columns plus their file name.

My challenge is that the filename variable includes the directories and drive letter (e.g. 'Z:\DIRECTORYA\DIRECTORYB\file1.csv'). How can I just list the file name and not the path (e.g. file1.csv)? Thank you

data WORK.mydata;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
length FNAME $80.; 
infile 'Z:\DIRECTORYA\DIRECTORYB\*2012*.csv' delimiter = ',' MISSOVER DSD lrecl=32767 filename=fname firstobs=2;
informat A $26. ;
informat B $6. ;
informat C 8. ;
format A $26. ;
format B $6. ;
format C 8. ;
input
    A $
    B $
    C;
if _ERROR_ then call symputx('_EFIERR_',1);
filename=fname;
run;
2

2 Answers

0
votes

I think, your best bet is to use regular expressions. Add to your DATA stel:

reg1=prxparse("/\\(\w+\.csv)/");
if prxmatch(reg1, filename) then
                          filename=prxposn(reg1,1,filename);
0
votes

We can try break this into two data steps. We'll extract the filenames into one data set in the first data step. In the second data step, we'll slap on the filenames (incl. the .txt or .csv) to their respective observations in the combined data set.

We'll use the PIPEing method or PIPE statement, DIR command and /b.

For example, if I have three .txt files: example.txt, example2.txt and example3.txt

%let path = C:\sasdata;
filename my pipe 'dir "C:\sasdata\*.txt"/b ';
data example;
    length filename $256;
    infile my length=reclen;
    input filename $varying256. reclen;
run;

data mydata;
    length filename $100;
    set example;
    location=cat("&path\",filename);
    infile dummy filevar=location length=reclen end=done missover;
    do while (not done);
        input A B C D;
        output;
    end;
run;

Output of first first data step:

                                     filename

                                   example.txt
                                   example2.txt
                                   example3.txt

Output of second data step:

                     filename       A      B       C        D

                   example.txt     171    255    61300    79077
                   example.txt     123    150    10300    13287
                   example2.txt    250    255    24800    31992
                   example2.txt    132    207    48200    62178
                   example2.txt    243    267    25600    33024
                   example3.txt    171    255    61300    79077
                   example3.txt    123    150    10300    13287
                   example3.txt    138    207    47400    61146

In Windows, this would read all the .txt files in the folder. It should work for .csv files as well as long as you add the delimiter=',' in the infile statement in the second data step and change the extension in filename statement to *.csv. Cheers.