0
votes

My client uses SAS 9.3 running on an AIX (IBM Unix) server. The client interface is SAS Enterprise Guide 5.1.

I ran into this really puzzling problem: when using PROC IMPORT in combination with dbms=xlsx, it seems impossible to filter rows based on the value of a character variable (at least, when we look for an exact match).

With an .xls file, the following import works perfectly well; the expected subset of rows is written to myTable:

proc import out = myTable(where=(myString EQ "ABC"))
  datafile ="myfile.xls"
  dbms = xls replace;
run;

However, using the same data but this time in an .xlsx file, an empty dataset is created (having the right number of variables and adequate column types).

proc import out = myTable(where=(myString EQ "ABC"))
  datafile ="myfile.xlsx"
  dbms = xlsx replace;
run;

Moreover, if we exclude the where from the PROC IMPORT, the data is seemingly imported correctly. However, filtering is still not possible. For instance, this will create an empty dataset:

data myFilteredTable;
  set myTable;
  where myString EQ "ABC";
run;

The following will work, but is obviously not satisfactory:

data myFilteredTable;
  set myTable;
  where myString LIKE "ABC%";
run;

Also note that:

  • Using compress or other string functions does not help
  • Filtering using numerical columns works fine for both xls and xlsx files.
  • My preferred method to read spreadsheets is to use excel libnames, but this is technically not possible at this time.

I wonder if this is a known issue, I couldn't find anything about it so far. Any help appreciated.

1
Sounds like you have an extra character at the end of myString. Have you checked to verify that isn't the case? (put myString $HEX.; would be a good start)Joe
I checked and no extra characters are present.Dominic Comtois
I would check again, then, and inspect carefully that hex dump. I could see an XLSX import engine bug causing the WHERE to not work properly in the dataset options, but I couldn't see it working that way in the SAS data step itself. Something's fishy there. Maybe it's Unicode-related or something like that (more generally, codepage-related), but something must be causing a character to be showing up after ABC, if where mystring eq "ABC" fails all rows and where mystring like "ABC%" doesn't.Joe
ok I see what's happening. With dbms=xls, blanks (difference between number of characters and the variable length) are filled with hex 20s (spaces), as they should. With dbms=xlsx, it's rather 00s (NULs). Not sure yet how we'll handle this, but at least we're a step closer... Thanks for your help Joe.Dominic Comtois
That makes sense - and sounds like a bug in the XLSX import routine. It may be fixed in a hotfix or in 9.4; I would file a ticket with SAS tech support.Joe

1 Answers

0
votes

It sounds like your strings have extra values on the end not being picked up by compress. Try using the countc function on MyString to see if any extra characters exist on the end. You can then figure out what characters to remove with compress once they're determined.