0
votes

I have a SAS data step statement –

Data work.CABGothers2;  
  set work.CABGothers1;  

  IF proc_p in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
    and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;

  IF proc2 in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
    and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;

  IF proc3 in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
    and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;

 ...
run;

This IF-THEN section goes on 21 times, so you can imagine how HUGE and cumbersome this sas code file gets, especially when it comes to any modifications to the ICD10 code list. It would have to be changed individually in all the proc1,proc2... columns.

Also, the ICD10 lists are very huge with over 7000 codes, I was wondering if someone could show me a better SAS code that might take as input a column of data (ICD10 codes) from a file.

I would like a proc sql or Data step procedure. Whichever is more efficient.

Current code-

Data work.CABGothers2;  
  set work.CABGothers1;  

  IF proc_p in (a HUGE LIST OF ICD10 CODES) and PDDCABG = 1
     and TypeofCABG_PDDTemp = . then TypeofCABG_PDDTemp = 4;
run;

UPDATE--

I got this to work if the list is small...however I have a column with 8000 unique ICD10 codes. So I get an error message as shown below.

proc sql;
select quote(icd10) into :cabgvalexcl separated by ',' 
from newlink.cabgvalexcl2019;
quit;

Data work.test1;
    set WORK.cabgpddcol;

    IF proc_p in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1;
    IF oproc1 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1 ;
    IF oproc2 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1;
    IF oproc3 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1 ;
    IF oproc4 in (&cabgvalexcl.) and PDDCABG = 1 then CABGVAL_Excl = 1;

run;

**> ERROR message- ERROR: The length of the value of the macro variable

CABGVALEXCL (65540) exceeds the maximum length (65534). The value has been truncated to 65534 characters.**

UPDATE -- eXAMPLE (JUST FEW ROWS) of ONLY 1 column (I do not have multiple columns. I did that in the macro example because macro variable was running out of max space.) containing ICD10 codes and the data file in which I have to tag rows that have any of the ICD10 codes -

ICD10 Code list - single COLUMN

Input data table

OUTPUT table- LOgic - If any of the ICD10 codes listed in cabgvalexcl2019 (shown here in RED) is found in the table CABGOTHERS1, create a column called - EXCLUDE - and put a value of 1 for that record.

Desired output table

1
rather than building out a single huge string of ICD10 codes, why not use a PROC SQL subquery? i.e. "case when proc_p in (select distinct icd10 from icd10col) and PDDCABG....' etc. - Craig Srsen
Why are you using a wide formatted dataset? This format is rarely ever useful in analytical needs. Use long format instead. - Parfait
@ Craig Srsen - I have no duplicates. These are medical diagnosis codes and there are 7000 unique codes. @ Parfait - I do not know what you mean. Please explain what is wide format and long format. Basically, I have a file with 1000s of codes which change very frequently. I would like to be able to read in this file for multiple columns during the search shown in the if-then statement. thanks - Mray
Note that it is impossible to write code using pictures as data. Please post the sample data as text. Preferable as fully formed data steps. - Tom
Read this page, it's an introduction to macros and will help you understand how to simplify your logic: stats.idre.ucla.edu/sas/seminars/sas-arrays - Reeza

1 Answers

0
votes

Here's a hash-based example. It doesn't use macro variables, so it should work for any number of ICD10 codes:

data cabgvalexcl2019;
input (icd1-icd3) (:$2.);
datalines;
1 2 3
4 5 6
7 8 9
;
run;

/*Generate some dummy data*/
data cabgpddcol;
    array keys[*] $2 proc_p oproc1-oproc20;
    call streaminit(1); /*Set random number seed*/
    do i = 1 to 20;
        do j = 1 to dim(keys);
            keys[j] = put(int(rand('uniform') * 11 + 9), 2.); /*Chosen so we get a few rows with no exclusion codes*/
        end;
        PDDCABG = rand('uniform') < 0.75;
        output;
    end;
    drop i j;
run;

/*  CABGval_Excl = Identify CABG+VALVE exclusions which are "CABG OTHERS".  This is the 2019 CABG+VALVE exclusion list. */

/*  If the RECORD IN following table has CABGVAL_Excl = 1 then it is a CABG+valve WITH EXCLUSION*/
Data work.CABGval_Excl;  /* CABG OTHERS prior to refinement into non-iso CABG WITH Valve and non-iso CABG WITHOUT Valve */

    /*Create hash object to hold list of ICD codes*/
    length icd $ 2;
    if _n_ = 1 then do;
        declare hash h();
        rc = h.definekey('icd');
        rc = h.definedone();
        do until(eof);
            set cabgvalexcl2019 end = eof;
            /*Consider using an array here if you have lots of ICD columns*/
            do icd = icd1, icd2, icd3;
                rc = h.add();
            end;
        end;
    end;

    set cabgpddcol;

    /*Loop through all the keys and stop if we find one in the hash*/    
    array keys[*] proc_p oproc1-oproc20;
    rc = -1;
    do i = 1 to dim(keys) until(rc = 0);
        rc = h.find(key:keys[i]); /*This sets rc = 0 if a match is found*/
    end;

    drop i rc icd:;

    CABGVAL_Excl = rc ne 0 and PDDCABG = 1;
run;

Constructing the hash object is a little bit fiddly if you have multiple columns holding all the distinct ICD10 codes you care about - if they're all in one column then there's a simpler way of doing this:

        declare hash h(dataset:'cabgvalexcl2019');
        rc = h.definekey('icd');
        rc = h.definedone();