The quick answer
Change a Windows registry entry using RegEdit. On my system, Windows 10, x64, Office 2016, the entry
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRow
would change to
0
The explanation
The Proc IMPORT DBMS=EXCEL
is inferring column var2
is numeric content because it has not scanned enough Excel rows to discover there is some non-numeric content. So on that inference when non-numeric content is reached the procedure replaces it with a missing value.
You will have to
- Force IMPORT to scan enough rows to find non-numerics in a column (if present).
- Causes IMPORT to define those columns (variables) as character type
- Post-process the imported data set in a DATA step
- Performs transformations that convert those expected numeric variables into actual numeric variable.
Scanning more rows - Excel
There is no Proc IMPORT
option for changing the number of Excel rows that are scanned.
Proc IMPORT DBMS=Excel
in Windows uses Microsoft technology to read Excel files. In SAS version <9.2 Phase 2 the technology is Jet and new versions use ACE. Each of these technologies uses the Windows registry to obtain the parameter TypeGuessingRows
which is how many rows should be scanned before inferring a column is character, numeric, or date. When the parameter value is 0, all rows are scanned before inferring.
SAS Documentation "SAS/ACCESS® 9.4 Interface to PC Files: Reference, Fourth Edition" chapter "Microsoft Excel Workbook Files" details the Windows registry entry that will need to be changed based on system and Office installation. As mentioned in the the quick my system has
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRow
The default of 8
rows was changed to 0
.
Bonus: Scanning more rows - Delimited files
These would be text files with fields delimited by commas, tabs, or some other character.
Proc IMPORT DBMS=CSV
can be forced to scan all rows by using GUESSINGROWS statement
Proc IMPORT --all my options-- out=import_raw;
GUESSINGROWS=MAX; * statement;
run;
Step 2 - Post processing
Here is some sample code showing how a known 'tricky' column is converted to a same named numeric column. The rule is that a value <####
will be converted to ####
.
data import;
set import_raw (rename=var2=var2_raw);
if var2_raw =: '<' then
var2 = input(substr(var2_raw,2), best12.);
else
var2 = input(var2_raw,best12.);
drop var2_raw;
run;
Your actual rules for transformation may vary -- for instance you might need to add a $1
column var2_relation
that takes on values =
, <
, or >
. Or you might compress the value, removing all non-number characters, before doing the input() conversion.
It would be another question if you have problems coding a solution that does the same transform across all variables.
XLSX
engine instead of theEXCEL
engine? – Tom