1
votes

I found a macro and have been using it to import datasets that are given to me in csv format. Now I need to edit it because I have datasets that have an id number in them and I want sas datasets with the same name.

THE csvs are named things like IDSTUDY233_first.csv So I want the sas dataset to be IDSTUDY233_first. It should appear in my work folder.

I thought it would just create a sas dataset for each csv named IDSTUDY233_first or something like that. (and so on and so forth for each additional study). However it's naming this way. IDSTUDY_FIRST and over rights itself for every ID. I am newer to macros and have been trying to figure out WHY it does this and how to fix it. Suggestions?

%let subdir=Y:\filepath\; *MACRO VARIABLE FOR FILEPATH;
filename dir  "&subdir.*.csv "; *give the file the name from the path that your at whatever the csv is named;
data new; *create the dataset new it has all those filepath names  csv names;
length filename  fname $ 200;
infile dir  eof=last filename=fname;
input ;
last: filename=fname;
run;

proc sort data=new nodupkey; *sort but don't keep duplicate files;
by filename;
run;

data null;  *create the dataset null;
set new;
call symputx(cats('filename',_n_),filename); *call the file name for this observation n;
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka')); *call the dataset for this file compress then read the file;
call symputx('nobs',_n_); *call for the number of observations;
run;
%put &nobs.; *but each observation in;

%macro import; *start the macro import;
%do i=1 %to &nobs;  *Do for each fie to number of observations; 
proc import datafile="&&filename&i" out=&&dsn&i dbms=csv replace;
                                              getnames=yes;
                                          run;

%end;

%mend import;
%import

*call import macro;

As you can see I added my comments of my understanding. Like I said macros are new to me. I may be incorrect in my understanding. I am guessing the problem is either in

    call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));

or it is in the import statement probably out=&&dsn&i since it rapidly over writes the previous SAS files until it does every one. It's just I need all the sas files not just the last 1.

1

1 Answers

1
votes

My guess is that you are right, it is to do with this line:

call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));

The gotcha is in the arguments passed to compress. Compress can be used to remove or keep certain characters in a string. In the above example, they are using it to just keep alphabetic characters by passing in the 'ka' modifiers. This is effectively causing files with different names (because they have different numbers) to be treated as the same file.

You can modify this behaviour to keep alphabetic characters, digits, and the underscore character by changing the parameters from ka to kn.

This change does mean that you also need to make sure that none of your file names begin with a number (as SAS datasets can't begin with a number).

The documentation for the compress function is here:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

An easy way to debug this would be to take the dataset with all of the call symput statements, and in addition to storing these values in macro variables, write them to variables in the dataset. Also change it from a data _null_ to a data tmp statement. You can then see for each file what the destination table name will be.