0
votes

I have the following problem, and I don't really know where to begin. I have a folder called "ALL" and inside that folder there are sub-folders with titles equal to the date they were created in the format DD-MM-YYYY. There is a folder for every day, ie no missing days. Inside each of those folders there are numerous txt files. I would like to read one of these text files from each of the date folders. That file will have a naming convention of "thedata_" followed by a random series of numbers.

So for example, if there are 3 date folders in the ALL folder, then I would like to read 3 separate "thedata_" text files into 1 final SAS file. And subsequently each day a new folder is added, I want to append the "thedata_" file from that folder to the existing SAS file rather than rerun the script from scratch.

1
First create a list of the files. Search on here or communities.sas.com for programs that will crawl through a directory and list all the files. Then you filter that list for the files you want to read based on your rules using a data step. This will leave you with the list of files to import. The filevar option on the INFILE statement allows you to change input files dynamically and read from all the files at once. There's a rough example in the documentation.Reeza

1 Answers

1
votes

Here's one solution. This uses SAS functions to read and populate a dataset that reads every file in every folder so that you do not need to turn on x commands. You can save each one to a macro variable, then loop through and read each file however you'd like. You can modify this to work with the filevar option.

filename all "Directory/ALL";

data myfiles;
    length folder_name 
           file_name  
           file
           folder_path $5000.
    ;

    /* Folder delimiter */
    if("&sysscp." = "WIN") then SLASH = '\';
        else SLASH = '/';

    /* Open the ALL directory */
    did = dopen("all");

    /* If it was successful, continue */
    if(did) then do;  

        /* Iterate through all subfolders in ALL */
        do i = 1 to dnum(did);

            /* Get the subfolder name and full path */
            folder_name = dread(did, i);
            folder_path = cats(pathname('all'), SLASH, folder_name);

            /* Assign a filename statement to the subfolder */
            rc = filename('sub', folder_path);
            
            /* Give the sub-folder a a directory ID */
            did2 = dopen('sub');

            /* Open the subfolder and read all the .txt files within it */
            if(did2) then do;
                do j = 1 to dnum(did2);

                    file_name = dread(did2, j);
                    file_ext  = scan(file_name, -1, '.');
                    file      = cats(folder_path, SLASH, file_name);
                    
                    /* Save file name only if the expected value is found */
                    if(upcase(file_name) =: "THEDATA_" AND upcase(file_ext) = "TXT") then do;
                        nfiles+1;
                        call symputx(cats('file', nfiles), file); /* Save each file to a macro variable named file1, file2, etc. */
                        output;
                    end;
                end;
            end;

            /* Close the subfolder and move on to the next one */
            rc = dclose(did2);
        end;

    end;

    rc = dclose(did);

    /* Save the total number of files we found to a macro variable */
    call symputx('nFiles', nFiles);

    keep file file_name folder_name folder_path;
run;

/* Read all the files */
%macro readFiles;
    %do i = 1 %to &nFiles.;
        proc import 
            file = "&&file&i."
            out  =  _thedata_&i.
            dbms =  csv
            replace;
            guessingrows=max;
        run;
    %end;

    /* Put all the files together */
    data thedata;
        set _thedata_:;
    run;

    proc datasets lib=work nolist;
        delete _thedata_:;
    quit;
%mend;
%readFiles;