1
votes

I have a dataset in excel, variables contain values with both character and numeric formats.

var1 -------- var2

352,45 ----- <34,5

when I import them into SAS, var2 becomes missing, how can I keep or impute "<34,5" for var2?

I use the following code to import:

PROC IMPORT OUT= data
DATAFILE= "data1.xlsx" 
DBMS=EXCEL REPLACE;
RANGE="Sheet1$"; 
GETNAMES=YES;
MIXED=YES;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
3
How are you importing the data ? What is the log showing ?Richard
I added the code to my postPari T
Did you try using the XLSX engine instead of the EXCEL engine?Tom
Also see this question.Joe

3 Answers

2
votes

The quick answer

Change a Windows registry entry using RegEdit. On my system, Windows 10, x64, Office 2016, the entry

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRow

would change to

0

The explanation

The Proc IMPORT DBMS=EXCEL is inferring column var2 is numeric content because it has not scanned enough Excel rows to discover there is some non-numeric content. So on that inference when non-numeric content is reached the procedure replaces it with a missing value.

You will have to

  1. Force IMPORT to scan enough rows to find non-numerics in a column (if present).
    • Causes IMPORT to define those columns (variables) as character type
  2. Post-process the imported data set in a DATA step
    • Performs transformations that convert those expected numeric variables into actual numeric variable.

Scanning more rows - Excel

There is no Proc IMPORT option for changing the number of Excel rows that are scanned.

Proc IMPORT DBMS=Excel in Windows uses Microsoft technology to read Excel files. In SAS version <9.2 Phase 2 the technology is Jet and new versions use ACE. Each of these technologies uses the Windows registry to obtain the parameter TypeGuessingRows which is how many rows should be scanned before inferring a column is character, numeric, or date. When the parameter value is 0, all rows are scanned before inferring.

SAS Documentation "SAS/ACCESS® 9.4 Interface to PC Files: Reference, Fourth Edition" chapter "Microsoft Excel Workbook Files" details the Windows registry entry that will need to be changed based on system and Office installation. As mentioned in the the quick my system has

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRow

The default of 8 rows was changed to 0.

Bonus: Scanning more rows - Delimited files

These would be text files with fields delimited by commas, tabs, or some other character.

Proc IMPORT DBMS=CSV can be forced to scan all rows by using GUESSINGROWS statement

Proc IMPORT --all my options-- out=import_raw;
  GUESSINGROWS=MAX; * statement;
run;

Step 2 - Post processing

Here is some sample code showing how a known 'tricky' column is converted to a same named numeric column. The rule is that a value <#### will be converted to ####.

data import;
  set import_raw (rename=var2=var2_raw);

  if var2_raw =: '<' then 
    var2 = input(substr(var2_raw,2), best12.);
  else
    var2 = input(var2_raw,best12.);

  drop var2_raw;
run;

Your actual rules for transformation may vary -- for instance you might need to add a $1 column var2_relation that takes on values =, <, or >. Or you might compress the value, removing all non-number characters, before doing the input() conversion.

It would be another question if you have problems coding a solution that does the same transform across all variables.

0
votes

As others have noted, likely the Excel engine is not scanning far enough down to find the character values.

The XLSX engine, new I believe in 9.3 and generally fully functional for both import and export in 9.4, will do a better job scanning for variable types. It's fairly easy to use, just swap DBMS=EXCEL to DBMS=XLSX. It won't be perfectly identical if you have (for example) GETNAMES=NO, but for most uses it's identical.

PROC IMPORT OUT= data
DATAFILE= "data1.xlsx" 
DBMS=XLSX REPLACE;
SHEET="Sheet1"; 
GETNAMES=YES;
RUN;
-1
votes

Have you tought about using infile instead of proc import? Also if the number and characters are all in the same colomn you will need to force it to characters.

http://www2.sas.com/proceedings/forum2008/166-2008.pdf

The code below is a sample I have from a loop but you should be able to figure it out. Note that when a colomn is followed with $ in the infile statement it forces the colomn to be characters!

 filename file&i "&fdir";   /*THIS ASSIGN FILE NAME RELATED WITH THE DIRECTORY IN PREVIOUS FILE*/
      data &name; /*USE THE FULL PATH OF THE FILE NEEDED AS PER 1ST TABLE*/
LENGTH  BAN $10.;
    LENGTH  SUBSCRIBER_NO   $10.;
    LENGTH  TRANSACTION_DATE    $18.;
    LENGTH  OPT1    $18.;
        INFILE file&i delimiter = ',' MISSOVER DSD LRECL=32767 FIRSTOBS=2 flowover;
        input SUBSCRIBER_NO $   BAN $   OPT1 $  TRANSACTION_DATE $  TRANSACTION_TYPE $  ITEM_ID $   MSID $  NIN1 $  ACTIVATION_TYPE $   STORE_CODE $    OPT8 $  OPT10 $     OPT9 $  WES $   BILL_CYCLE $    LANGUAGE_CODE $     REGION $    COMPANY_CODE $  PRICE_PLAN $    COMMIT_START_DATE $     SYS_CREATION_DATE $     RENEWAL_DATE $  ESN_TYPE $  ACCOUNT_TYPE $  EFFECTIVE_DATE $    INIT_ACTIVATION_DATE $  TENURE $    DATA $  PRICE_PLAN_DATA $   OPT3 $  PRICE_PLAN_DESC $   MSF $   PRICE_PLAN_SERIES $     ACTIVATION_DATE $   OPT5 $  TERM_STATUS $   OPT4 $  FIRST_NAME $    LAST_BUSINESS_NAME $    ADDRESS_ATTENTION $     USER_NAME $     ADDRESS_NAME_1 $    ADDRESS_NAME_2 $    ADDRESS_NAME_3 $    CITY $  province $  POSTAL_CODE $   home_no $   work_no $   MKT_ACCOUNT_TYPE $  ESN_EFFECTIVE_DATE $    CABLE_FOOTPRINT $   COMMON_IND $    CS_VIP_CLASS $  OPT2 $  OPT6 $  OPT7 $  KEYCODE $   CAMPAIGN_CODE $     CAMPAIGN_CYCLE $    CAMPAIGN_DATE $     CAMPAIGN_DESCRIPTION $  CAMPAIGN_TYPE $     EMAIL $     MOP $   SERIAL_NUMBER $     ACTIVATION_SUB_TYPE $   SALES_REP $ ;
      run;

      data import;
        set %if ne 1 %then import;
            &name;
      run;
      %let i = %eval(&i+1);
      filename  file&i clear;