0
votes

I am unsure if this is possible (or stupid question), as I just started looking at SAS last week. I've managed to import my .CSV file to a SAS data set using the:

proc import

Specifying the guessingrows= to limit my out=.

My problem is now that my CSV files to import are not of same structure, which I noticed after writing some code using the obsnum= to specify start and x-lines to read.

So my question is wether or not SAS is capable of either look for a specific string/empty variable, and use as end observation?

My Data looks like (but number of Var_x varies for each file):

Structure of data

First I tried looking at the slice= but is only useful if I know the exact Places of interest, as the empty Space between the Groups can vary.

Is it possible to use the set function to specify to start at line 1 and read till encounting a blank field? Or can you redirect me to some function (that I couldn't find myself)?

I would like to look at each "block" separately and process.

Thank you in advance

1
Depending on your data it may be better to write your own import step using an in file statement instead. Perhaps a better sample of what your initial data looks like to start with and what you want as output is a better question to answer.Reeza
PROC IMPORT can't do this. You'd definitely have to write your own import code (not that that is hard).Joe
I see... I just picked up SAS last week, so my knowledge is very limited. So thank you for pointing out. As I am having a CSV file I assumed proc import would be beneficial. Will do some research on the documentation for in file then!user1865820

1 Answers

0
votes

I think you can do this in a relatively straightforward way if you are comfortable doing some processing after all the data has been inputted.

So do proc import on the whole dataset with no restriction.

Then use a data step and a counter to process through the data and output as necessary. Something like:

data output1 output2 output3;
     set imported_data;
     if _n_ = 1 then counter = 1;
     var1lag = lag(var1);
     if var1 = '' and var1lag ne '' then counter=counter+1;
     if counter = 1 then output output1;
     else if counter = 2 then output output2;
     else output output3;
run;

data output1;
     set output1;
     if var1 = '' and var2 = . and var3 = . then delete;
run;
data output2;
     set output2;
     if var1 = '' and var2 = . and var3 = . then delete;
run;
data output3;
     set output3;
     if var1 = '' and var2 = . and var3 = . then delete;
run;

The above code outputs to three datasets based on the value of counter. The lag function lets us look up a row to ensure its the first time we see no data and updates the counter as we see no data.

Then we go back and remove any fully blank data for our datasets.

You could easily use some arrays to make this work more scaleably if you have many outputs instead of the if/else statements to output the data.