4
votes

Questions: How do I define the variable type of variables being imported from a .xlsx file when using PROC IMPORT?


My work

I am using SAS v9.4. So far as I'm aware, it is vanilla SAS. I do not have SAS/ACCESS etc.

My data looks like this:

ID1        ID2  MONTH   YEAR    QTR VAR1    VAR2
ABC_1234   1    1       2010    1   869     3988
ABC_1235   12   2       2010    1   639     3144
ABC_1236   13   3       2010    2   698     3714
ABC_1237   45   4       2010    2   630     3213

The procedure I am running is:

proc import out=rawdata
    datafile = "c:\rawdata.xlsx"
        dbms = xlsx replace;

    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;
run;

When I run this step, I get the following log output:

ERROR: You are trying to use the character format $ with the numeric variable ID2 in data set WORK.RAWDATA.

What this seems to tell me is that SAS automatically assigns the variable type. I want to be able to control it manually. I cannot find documentation which explains how to do this. INFORMAT, LENGTH, and INPUT statements do not seem to work for PROC IMPORT.

I am using PROC IMPORT because it has yielded the greatest success with .xlsx files overall. Two possible solutions I can think of are 1) convert .xlsx to .csv and use INFILE in a DATA step and 2) bring the data in as numeric and convert it to character in a later step. I dislike the first solution because it requires me to manually manipulate the data, a potential source of error (such as leading zeros being removed). I dislike the second because it may unintentionally introduce errors (again, such as with leading zeros) and introduces extraneous work.

3
Are you sure you don't have SAS/ACCESS to PC FILES licensed? I was under the impression that DBMS=XLSX required it as well.Joe
Also, I had no idea you could use format statements (And, it turns out, other similar attribute statements) in proc import directly!Joe
@Joe I think the dbms=xlxs is now part of BASE. The EXCEL and the others still require PC Files. What the difference is, I have no idea.DomPazz
@DomPazz Not according to Chris Hemedinger in 2015, anyway (TS1M2). I don't see any mention of it changing in TS1M3 but perhaps I am missing something.Joe
Too bad dbms=xlsx doesn't work in proc export. =(Robert Penridge

3 Answers

5
votes

You can try to set the columns type as "Text" in Excel to see if SAS will determine it from that. Worth a shot.

If that doesn't work, unless you use PC Files Server, or have Excel of the same bitness installed on the same SAS server for direct access to the file, you will need to use a separate data step to convert the columns.

proc import 
    file = "c:\rawdata.xlsx"
    out=_rawdata(rename=(ID2 = _ID2) )
    dbms = xlsx replace;
run;

data rawdata;
    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;

    set _rawdata;

    ID2 = cats(_ID2);

    drop _:;
run;

If you do have SAS/Access to Excel, you can control these variables directly with the DBDSOPTS data set option. For example:

libname myxlsx Excel 'C:\rawdata.xlsx';

data rawdata;
    set myxlsx.'Sheet1$'n(DBDSOPTS="DBTYPE=(ID2='CHAR(3)')");
run;

The reason why the problem is occurring is because the xlsx engine in proc import is internal to SAS, and is separate from the Excel engine. The Excel engine uses Microsoft Jet or Ace, whereas the xlsx engine uses a proprietary system that does not have as much control as Microsoft's. Why this is the case, I have no idea.

When proc import is run, SAS will try to guess what format it should be (which you do have control over with xls files using the guessingrows option). If it detects all numbers, it will assume a numeric variable. Unfortunately, without SAS/ACCESS to Excel or PC Files Server installed, you cannot control the variable type directly.

0
votes

Define the type in Excel.

If you want to convert it later then use a data step to convert the column.

data want ;
  length id1 $9 id2 $3 ;
  set rawdata(rename=(id2=numeric_id2));
  id2=cats(numeric_id2);
  drop numeric_id2;
run;
0
votes

I solved this by not using PROC IMPORT. It isn't a solution for everyone, but it worked great for my purposes (i.e. not "big data"). If you're reading from an Excel spreadsheet, it should work for you.

ImportDataFile is a macro1 which automates a data step import. A data step import requires a LENGTH statement to define variable names and types, an INPUT statement to read raw data from an external file, and an INFILE statement to specify which file.

data &dataset.;
  &infileStatement.;

  length &lengthStatement. ;

  input (_all_) (:) ;
run;

The macro is composed of three primary steps:

  • Establish a DDE link, if necessary (i.e. connect to Excel)
  • Get data variables by reading in the header
  • Read in the remaining data

Notice how each of those corresponds to the three lines in the data step. Everything in the macro is in support of that data step.

In my experience, it was best to import the data as fixed width characters and then convert to whatever type was needed in a separate step. Yeah, it's redundant, but I never ran into issues with memory or space. The benefits far outweighed any hypothetical concerns. It made the data flow identical for each analysis which aided verification and saved time overall by avoiding the need to correct SAS's guess at the type (and inevitable silent truncation).

Because SAS is a horribly verbose language, this answer runs up against the StackOverflow answer character limit. A fully documented copy is here: https://pastebin.com/raw/RsXz3juJ Put the code in a file named something like ImportDataFile.sas and make sure it runs (probably using %include) before the macro is called. The call form is:

%ImportDataFile(   
       dirData=    
  ,   fileName=    
  ,    dataset=    
  ,  delimiter=    
  , overOption=    
  ,  headerRow=    
  ,      sheet=    
  ,      range=    
  ,     prefix=    
  ,       case=    
  ,  defLength=    
);                       

where

Output(s)     : SAS dataset, macro variable &listHeader                 
Inputs        :    dirData= Directory containing data file.             
                  fileName= Filename including file extension. Must be  
                            .csv, .txt, .tsv, .xls, or .xlsx.           
                   dataset= Name of dataset output to WORK library.     
                 delimiter= (optional) Delimiting string given in       
                            quotes. Default for CSV is a comma, for     
                            TXT/TSV a tab. This parameter may not be    
                            set for Excel files. Doing so generates a   
                            warning.                                    
                overOption= (optional) INFILE option. Default is        
                             MISSOVER.  Other choices are FLOWOVER,     
                             STOPOVER, TRUNCOVER, or SCANOVER.          
                 headerRow= (optional) Row corresponding to header in   
                            an Excel file. Accepts R#C#:R#C#, but       
                            should be given as R#. Default is R1.       
                     sheet= Name of worksheet. Required for XLS or XLSX.
                     range= Range of spreadsheet to be imported.        
                            Required for XLS and XLSX. Use form         
                            R#C#:R#C#.  See example below.              
                    prefix= (optional) String to append to beginning of 
                            each variable name. Default is no prefix.   
                      case= (optional) Toggle mix case variable naming. 
                            Must be lower/upper/mixed. Default is       
                            lower.                                      
                 defLength= (optional) Character field length.  Default 
                            value is 100.                               

For example, the following creates a dataset named xl_import of character type with width 100 from a my_xl_file.xlsx located in C:\Path\To\File. The columns are prefixed with the string "raw_". The overOption corresponds to those defined in the INFILE statement.

%ImportDataFile(              
       dirData= C:\Path\To\File
  ,   fileName= my_xl_file.xlsx      
  ,    dataset= xl_import     
  ,     prefix= raw_          
  ,      sheet= Sheet1     
  ,      range= R2C1:R13C18   
  ,  defLength= 100           
  , overOption= MISSOVER      
);                            

Here is the code for the macro. Enjoy.

********************************************************************
** Utilities / Sub Macros
********************************************************************;
%macro ClearFileRef(fileRef);
  filename &fileRef. clear;
%mend;

%macro CompareVariablesToDDERange();
  %local columnIndex numberOfDDEColumns;

  %let columnIndex        = %eval(%sysfunc(findc(&range., 'C', ib)) + 1);
  %let numberOfDDEColumns = %sysfunc(substr(&range., &columnIndex));
  %if %ListLength(&listHeader) ^= &numberOfDDEColumns %then
    %put WARNING: [MACRO] Data file contains %ListLength(&listHeader) variables. RANGE argument has &numberOfDDEColumns columns.;
%mend;

%macro EstablishSystemLink(fileRef);
  filename &fileRef. dde 'excel|system';
%mend;

%macro EstablishWorkbookLink(fileRef, dirData, fileName, sheetName, range);
  filename &fileRef. dde "excel|&dirData.\[&fileName.]&sheetName.!&range.";
%mend;

%macro IsEmpty(macroVariable);
  %sysevalf(%superq(&macroVariable)=, boolean)
%mend;

%macro IsFileRef(reference);
  %local fileRefExists externalFileExists returnValue;

  %let fileRefExists      = %sysfunc(fexist(&reference.));
  %let externalFileExists = %sysfunc(fileexist(&reference.));
  %if &fileRefExists. = 1 and &externalFileExists. = 0 %then %let returnValue = 1;
  %else %let returnValue = 0;
  &returnValue
%mend;

%macro IsFilePath(reference);
  %local fileRefExists externalFileExists returnValue;

  %let fileRefExists      = %sysfunc(fexist(&reference.));
  %let externalFileExists = %sysfunc(fileexist(&reference.));
  %if &fileRefExists. = 0 and &externalFileExists. = 1 %then %let returnValue = 1;
  %else %let returnValue = 0;
  &returnValue
%mend;

%macro GetObsCount(dataset);
  %local exists returnValue closed;

  %let exists = %sysfunc(open(&dataset));
  %if &exists. %then %do;
    %let returnValue  = %sysfunc(attrn(&exists, nobs));
    %let closed       = %sysfunc(close(&exists));
    %end;
  %else %do;
    %put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
    %abort cancel;
    %end;
  &returnValue
%mend;

%macro GetVarCount(dataset);
  %local exists varCount closed;

  %let exists = %sysfunc(open(&dataset));
  %if &exists. %then %do;
    %let varCount = %sysfunc(attrn(&exists, nvars));
    %let closed   = %sysfunc(close(&exists));
    %end;
  %else %do;
    %put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
    %abort cancel;
    %end;
  &varCount
%mend;

%macro ListLength(list);
  %local count;

  %if %sysevalf(%superq(list)=, boolean) %then %let count = 0;
  %else %let count = %eval(%sysfunc(countc(&list., |)) + 1);
  &count
%mend;

%macro ListElement(list, n);
  %local nthElement;

  %let nthElement = %sysfunc(scan(%superq(&list.), &n., |, m));
  &nthElement
%mend;

%macro RemoveAllFormattingFromSheet(fileRef, sheet);
  data _null_;
    file &fileRef.;
    /* Select sheet of interest */
    put "[WORKBOOK.ACTIVATE(""&sheet."")]";
    /* Select first cell */
    put '[FORMULA.GOTO("R1C1")]';
    /* Apply dummy filter of ">2" to first column */
    put '[FILTER(1, ">2")]';
    /* Disable filters */
    put '[FILTER()]';
    /* Select all */
    put '[SELECT("R[0]C[0]:R[1048575]C[16383]", "R[0]C[0]")]';
    /* Unhide rows */
    put '[ROW.HEIGHT(,,TRUE, 2)]';
    /* Unhide columns */
    put '[COLUMN.WIDTH(,,TRUE, 2)]';
    /* Remove all formatting */
    put '[CLEAR(2)]';
    /* Autofit column width */
    put '[COLUMN.WIDTH(,,TRUE, 3)]';
  run;
%mend;

%macro SetSystemOptions(opt1, opt2, opt3);
  options &opt1. &opt2. &opt3.;
%mend;

%macro ImportDataFile(dirData=, fileName=, dataset=, delimiter=, overOption=MISSOVER, headerRow=R1, sheet=, range=, prefix=, case=lower, defLength=100) / minoperator mindelimiter=',';
%put NOTE: [MACRO] Executing: ImportDataFile(dirData=&dirData, fileName=&fileName, dataset=&dataset, delimiter=&delimiter, overOption=&overOption, headerRow=&headerRow, sheet=&sheet, range=&range, prefix=&prefix, case=&case, defLength=&defLength);

  %local
    macroStart
    case
    extension
    HeaderRef
    lengthStatement
    delimiter
    InfileRef
    infileStatement
    numberOfRecords
    numberOfVars
    duration
   ;

  %global
    listHeader
    originalNOTES
    originalQUOTELENMAX
  ;

  %let macroStart           = %sysfunc(datetime());
  %let originalNOTES        = %sysfunc(getoption(notes));
  %let originalQUOTELENMAX  = %sysfunc(getoption(noquotelenmax));

  %SetSystemOptions(nonotes);

********************************************************************
** Validation
********************************************************************;
  %if %IsEmpty(dirData) %then %do;
    %put ERROR: [&SYSMACRONAME.] DIRDATA argument is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if %IsEmpty(fileName) %then %do;
    %put ERROR: [&SYSMACRONAME.] FILENAME argument is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if %IsEmpty(dataset) %then %do;
    %put ERROR: [&SYSMACRONAME.] DATASET argument is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if not(%IsEmpty(prefix)) and not(%sysfunc(nvalid(&prefix, v7))) %then %do;
    %put ERROR: [&SYSMACRONAME.] Invalid PREFIX="&prefix.";
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %let case = %upcase(&case.);

  %if not(&case. in (LOWER, UPPER, MIXED)) %then %do;
    %put ERROR: [&SYSMACRONAME.] Invalid case option: &case. Must be LOWER, UPPER, MIX.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %let extension  = %upcase(%scan(&fileName., 1, '.', b));

  %if not(&extension. in (TXT, TSV, CSV, XLS, XLSX)) %then %do;
    %put ERROR: [&SYSMACRONAME.] Invalid file type: &extension. Must be TXT, TSV, CSV, XLS, XLSX.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if &extension. in (XLS, XLSX) and %IsEmpty(sheet) %then %do;
    %put ERROR: [&SYSMACRONAME.] SHEET argument undefined.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if &extension. in (XLS, XLSX) and %IsEmpty(range) %then %do;
    %put ERROR: [&SYSMACRONAME.] RANGE argument undefined.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(sheet)) %then %do;
    %put ERROR: [&SYSMACRONAME.] SHEET argument only valid for XLS or XLSX files.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(range)) %then %do;
    %put ERROR: [&SYSMACRONAME.] RANGE argument only valid for XLS or XLSX files.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

**********************************
*** Define delimiter
**********************************;
 %if %IsEmpty(delimiter) %then %do;
    %if       &extension. in (XLS, XLSX)  %then %let delimiter = '09'x;
    %else %if &extension. = CSV           %then %let delimiter = ',';
    %else %if &extension. in (TXT, TSV)   %then %let delimiter = '09'x;
    %else %do;
      %put ERROR: [&SYSMACRONAME.] Delimiter error.;
      %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
      %abort cancel;
      %end;
    %end;

  %if &extension. in (XLS, XLSX) and &delimiter ^= '09'x %then %do;
    %put WARNING: [&SYSMACRONAME.] Delimiter for Excel file must be '09'x.;
    %put WARNING: [&SYSMACRONAME.] Delimiter set to '09'x.;
    %let delimiter = '09'x;
    %end;

********************************************************************
** Prep Excel Worksheet
********************************************************************;
  %if &extension. in (XLS, XLSX) %then %do;
    %let DDECommandRef = DDEcmd;
    %EstablishDDELink(fileRef=&DDECommandRef.);
    %RemoveAllFormattingFromSheet(fileRef=&DDECommandRef., sheet=&sheet.);
    %end;

********************************************************************
** Get header
********************************************************************;

**********************************
*** Define file reference
**********************************;
  %if &extension. in (XLS, XLSX) %then %do;
    %let HeaderRef = DDEHead;
    %EstablishDDELink(
      fileRef= &HeaderRef.
      ,   dirData= &dirData.
      ,  fileName= &fileName.
      , sheetName= &sheet.
      ,     range= &headerRow.
    );
    %end;
  %else %if &extension. in (CSV, TXT, TSV) %then
    %let HeaderRef = %sysfunc(dequote(&dirData.))\&fileName.;

  %ReadHeaderIntoList(reference=&HeaderRef., delimiter=&delimiter., prefix=&prefix., case=&case.);

********************************************************************
** Create length statement
********************************************************************;
  %let lengthStatement = %CreateLengthStatement(&listHeader., &defLength.);

********************************************************************
** Import data
********************************************************************;

**********************************
*** Define infile statement
**********************************;
  %if &extension. in (XLS, XLSX) %then %do;
    %let InfileRef = DDESheet;
    %EstablishDDELink(
      fileRef= &InfileRef.
      ,   dirData= &dirData.
      ,  fileName= &fileName.
      , sheetName= &sheet.
      ,     range= &range.
    );
    %let infileStatement = infile &InfileRef. dlmstr=&delimiter. dsd notab &overOption.;
    %CompareVariablesToDDERange();
    %end;
  %else %if &extension. in (CSV, TXT, TSV) %then %do;
    %let InfileRef       = %sysfunc(dequote(&dirData.))\&fileName.;
    %let infileStatement = infile "&InfileRef." dlmstr=&delimiter. dsd &overOption. firstobs = 2 end=last_record;
    %end;

**********************************
*** Perform import
**********************************;
  data &dataset.;
    &infileStatement.;

    length &lengthStatement. ;

    input (_all_) (:) ;

  run;

********************************************************************
** Housekeeping
********************************************************************;
  %let numberOfRecords = %GetObsCount(&dataset.);
  %let numberOfVars    = %GetVarCount(&dataset.);

  %SetSystemOptions(notes);

  %put;
  %put NOTE: [MACRO] The dataset WORK.%upcase(&dataset.) has &numberOfRecords. observations and &numberOfVars. variables.;
  %put NOTE: [MACRO] IMPORTDATAFILE macro used (Total process time):;

  %let duration = %sysfunc(putn(%sysevalf(%sysfunc(datetime()) - &macroStart.), time12.3));
  %if %sysfunc(minute("&duration."t)) > 0 %then %do;
    %put NO%str(TE-)         real time            %substr(&duration., 3, 8);
    %end;
  %else %do;
    %put NO%str(TE-)         real time            %substr(&duration., 6, 5) seconds;
    %end;

  %put;

  %SetSystemOptions(&originalNotes., &originalQUOTELENMAX.);

%mend;

%macro  EstablishDDELink(fileRef, dirData, fileName, sheetName, range);
%put NOTE: [&SYSMACRONAME] Executing: EstablishDDELink(fileRef=&fileRef, dirData=&dirData, fileName=&fileName, sheetName=&sheetName, range=&range);

  %local dirData linkConnection stopTime closeReturnCode;

********************************************************************
** Validate arguments
********************************************************************;
  %if %IsEmpty(fileRef) %then %do;
    %put ERROR: [&SYSMACRONAME] fileRef is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if %length(&fileRef.) > 8 %then %do;
    %put ERROR: [&SYSMACRONAME] Fileref &fileRef exceeds 8 character limit.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

  %if not %IsEmpty(dirData) %then %let dirData = %sysfunc(dequote(&dirData.));

********************************************************************
** Assign fileref according to link type
********************************************************************;
  %if     %IsEmpty(dirData)
      and %IsEmpty(fileName)
      and %IsEmpty(sheetName)
      and %IsEmpty(range) %then %EstablishSystemLink(&fileRef.);
  %else %EstablishWorkbookLink(&fileRef., &dirData., &fileName., &sheetName., &range.);

********************************************************************
** Check that link has been established
********************************************************************;
  %let linkConnection = %sysfunc(fopen(&fileRef, S));

  %if not (&linkConnection. > 0) %then %do;

    /*Run until either Excel opens (linkConnection > 0)
      or until 10 seconds have passed.*/
    %let stopTime = %sysevalf(%sysfunc(datetime()) + 10);

    %do %until (&linkConnection. > 0);
      %if (%sysfunc(datetime()) >= &stopTime.) %then %do;
    %put ERROR: [&SYSMACRONAME] DDE system link was not established. Operation timed out.;
    %ClearFileRef(fileRef.);
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

      %let linkConnection = %sysfunc(fopen(&fileRef, S));
      %end;
    %end;

********************************************************************
** Housekeeping
********************************************************************;
  %let closeReturnCode = %sysfunc(fclose(&linkConnection));

%mend;

%macro  ReadHeaderIntoList(reference, delimiter, prefix, case) / minoperator mindelimiter=',';
%put NOTE: [MACRO] Executing: ReadHeaderIntoList(reference=&reference, delimiter=&delimiter, prefix=&prefix, case=&case);

  %local  fileSpecification notab delimiter;
  %global listHeader;

  %SetSystemOptions(nonotes);

  %if %IsEmpty(reference) %then %do;
    %put ERROR: [&SYSMACRONAME.] REFERENCE argument is blank.;
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

********************************************************************
** Determine infile statement options
********************************************************************;
  /*SAS filerefs exist only for Excel files*/
  %if       %IsFileRef(&reference.)  %then %do;
    %let fileSpecification  = &reference.;
    %let notab              = notab;
    %end;
  /*Absolute references only for CSV,TXT,TSV files*/
  %else %if %IsFilePath(&reference.) %then %do;
    %let fileSpecification  = "&reference.";
    %let notab              = ;
    %let extension          = %upcase(%scan(&reference., 1, '.', b));
    %end;
  %else %do;
    %put ERROR: [&SYSMACRONAME.] Invalid input REFERENCE: [&reference.];
    %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
    %abort cancel;
    %end;

********************************************************************
** Read in header
********************************************************************;
  data _null_;
    infile &fileSpecification. dlmstr = '```#@' &notab. obs = 1 lrecl = 32767 ;
    length
      raw_header_line   $ 32767
      raw_with_pipes    $ 32767
    ;
    input raw_header_line;

    raw_with_pipes  = tranwrd(raw_header_line, &delimiter., '|');
    call symput('rawListHeader', strip(raw_with_pipes));
  run;

********************************************************************
** Transform headers into valid variable names
********************************************************************;
  %SetSystemOptions(noquotelenmax);
  data _null_;
    length
      i           8
      listLength  8
      header_i    $ 32767
      temp_i      $ 32767
      listValid   $ 32767
    ;
    listLength = %ListLength(%superq(rawListHeader));

    do i = 1 to listLength;
      header_i = scan("%superq(rawListHeader)", i, '|', 'm');

**********************************
*** Apply prefix
**********************************;
      if not missing(header_i) then prefixed_i = cats("&prefix.", header_i);
      else                          prefixed_i = header_i;

**********************************
*** Apply case
**********************************;
      if      "&case." = "LOWER" then cased_i = lowcase(prefixed_i);
      else if "&case." = "UPPER" then cased_i = upcase(prefixed_i);
      else                            cased_i = prefixed_i;

**********************************
*** Keep valid otherwise correct
**********************************;
      if nvalid(cased_i, 'v7') then do;
    if i = 1 then listValid = cased_i;
    else          listValid = catx('|', listValid, cased_i);
    end;
      else do;

**********************************
*** Fill in blank headers
**********************************;
      if missing(cased_i) and "&case." = "UPPER" then temp_i = "%upcase(&prefix.)NO_HEADER";
      else if missing(cased_i)                   then temp_i = "&prefix.no_header";

**********************************
*** Replace blanks with _ and
*** Remove invalid characters
**********************************;
      else do;
    replaced_space_with_underscore = tranwrd(strip(cased_i), ' ', '_');
    temp_i = compress(replaced_space_with_underscore, '_', 'kin');
    end;

**********************************
*** Make first char _ if digit
**********************************;
    if anydigit(temp_i) = 1 then temp_i = cats('_', temp_i);

**********************************
*** Trim length to 32
**********************************;
    if length(temp_i) > 32 then temp_i = substr(temp_i, 1, 32);

**********************************
*** Verify valid V7 name
**********************************;
    if not nvalid(temp_i, 'v7') then do;
      put 'ERROR: [&SYSMACRONAME.] Error cleaning header ' i +(-1) '. Invalid SAS name.';
      call execute('
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        data _null_;
          abort cancel nolist;
        run;');
      stop;
      end;

    if i = 1 then listValid = temp_i;
    else          listValid = catx('|', listValid, temp_i);
    end;

      output;
    end;
    call symput('listValid', strip(listValid));
  run;

********************************************************************
** Append repeated headers with incremented value
********************************************************************;
  /*Use hash table with key being each header and value
    corresponding to the number of occurences.  Create new
    header list as follows: If first occurence of a header,
    add to list.  If not first occurence, ruthlessly append
    occurence number (ensuring validity) and add to list.
    Beware: SAS documentation for hashes contains syntax
    errors.*/
  data _null_;
    length
      element_i   $ 32
      item        $ 32
      occurrences 8
      new_list    $ 32767
    ;

    declare hash h();
    h.defineKey('item');
    h.defineData('item', 'occurrences');
    h.defineDone();
    call missing(item, occurrences);

    listLength = input("%ListLength(&listValid.)", 8.);
    do i = 1 to listLength;
      element_i = scan("&listValid.", i, '|');

      if not (h.find(key: element_i) = 0) then do;
    h.add(key: element_i, data: element_i, data: 1);
    new_list = catx('|', new_list, element_i);
    end;
      else do;
    occurrences + 1;
    h.replace(key: element_i, data: element_i, data: occurrences);

    len     = length(element_i);
    digits  = ceil(log10(occurrences + 1));

    if (len + digits) > 32 then
      new_element = cats(substr(element_i, 1, len - digits), occurrences);
    else new_element = cats(element_i, occurrences);

    new_list = catx('|', new_list, new_element);
    end;
    end;

    call symput('listHeader', strip(new_list));
  run;
%mend;

%macro  CreateLengthStatement(listHeader, defLength);
  %local lengthStatement header_h;

  %let lengthStatement=;
  %do h = 1 %to %ListLength(&listHeader.);
  %let header_h = %ListElement(listHeader, &h);
    %if &h. = 1 %then %let lengthStatement = &header_h. $ &defLength. ;
    %else %let lengthStatement = &lengthStatement. &header_h. $ &defLength. ;
  %end;
  %let lengthStatement = &lengthStatement;
  &lengthStatement
%mend;

1 The solution uses macros extensively. In my experience, people advised me to avoid macros. I found it best, with all respect, to ignore that advice. SAS doesn't have functions, which makes developing abstractions hard. Macros allow you to imitate functions. A common fear of macros is debugging. Stick to the Single Responsibility Principle and you'll find they're not hard to debug at all. Document them with a %put statement and you'll know who is being called and when. If you're not familiar with macros, they're really just text replacement. The code goes through a pre-processor and replaces macro code with text. That text, and the rest of your code, is then executed. The best resource for learning about macros is the manual.