0
votes

I'm new to SAS, and need some assistance with a very basic problem.

  • I have hundreds of individual datasets and I want to extract rows that match a certain diagnostic ID.
  • The diagnostic ID's are listed in columns DX1-DX11. However some datasets have only DX1-DX4, and others have diagnostic IDs within DX1-DX09.
  • So I need a way to determine the number of DX columns in a particular dataset, so that I can match my diagnostic ID's of interest to the ID's presented in the dataset across all appropriate columns.

For example, I wanted to do the following, but I cannot do this across all datasets because if a dataset does not have column DX11 for example, the code breaks, since DX11 is not present in the dataset.

DATA DX_CODES (KEEP = ID DX1 DX2 DX3 DX4 DX5 DX6 DX7 DX8 DX9 DX10 DX11);
                                    SET LIB.&SAS_FILE_N;    
                                        WHERE DX1 IN ('123', '1234', '12345', '123456') OR
                                        DX2 IN ('123', '1234', '12345', '123456') OR
                                        DX3 IN ('123', '1234', '12345', '123456') OR 
                                        DX4 IN ('123', '1234', '12345', '123456') OR
                                        DX5 IN ('123', '1234', '12345', '123456') OR
                                        DX6 IN ('123', '1234', '12345', '123456') OR
                                        DX7 IN ('123', '1234', '12345', '123456') OR
                                        DX8 IN ('123', '1234', '12345', '123456') OR
                                        DX9 IN ('123', '1234', '12345', '123456') OR
                                        DX10 IN ('123', '1234', '12345', '123456') OR
                                        DX11 IN ('123', '1234', '12345', '123456');
                                    RUN;
2
I wrote a macro that lets you search an entire library for a particular string, it's available here: github.com/Boemska/macrocore/blob/master/base/mp_searchdata.sasAllan Bowe

2 Answers

1
votes

Use variable short cut lists instead and restructure your code. DX: will refer to all variable starting with DX.

IF whichc('123', of dx:) or whichc('1234', of dx:)...

You need to use IF instead of WHERE because WHERE does not support the use of short cuts.

Long term, it's faster and easier to work with your data in a long format rather than a wide format then it also doesn't matter where the diagnosis is in the list of diagnoses.

1
votes

SAS will happily create new variables for you. For example if you list them in an array. Just use a subsetting IF instead of WHERE. WHERE needs to operate on the data BEFORE it is loaded into the data step. IF operates on the data in the data step and so can reference variables not included in the source dataset.

data dx_codes;
  set lib.&sas_file_n;
  array dx $5 dx1-dx11;
  keep id dx1-dx11 ;
  if  dx1  in ('123', '1234', '12345', '123456') 
   or dx2  in ('123', '1234', '12345', '123456') 
   or dx3  in ('123', '1234', '12345', '123456') 
   or dx4  in ('123', '1234', '12345', '123456') 
   or dx5  in ('123', '1234', '12345', '123456') 
   or dx6  in ('123', '1234', '12345', '123456') 
   or dx7  in ('123', '1234', '12345', '123456') 
   or dx8  in ('123', '1234', '12345', '123456') 
   or dx9  in ('123', '1234', '12345', '123456') 
   or dx10 in ('123', '1234', '12345', '123456') 
   or dx11 in ('123', '1234', '12345', '123456')
  ;
run;