1
votes

I am trying to import several .txt files in to SAS. The values are separated by semi colons and each file has several million observations. The problem I have is when importing the file one of the variables gets the wrong formatting. The original value per observation is a random number (though all of approximately 20-25 digits) and for one obs might be something like 301185964728506014850593. When imported SAS then, for some reason, reads it as (numeric, correctly but not importantly) 3.0118596E23. How can i make SAS read exactly what is in the .txt file?

Searching for the documentation has not been fruitful at all, with very few additional options found. The code below is the standard code I use to import the files (using mixed=yes does not seem to help and expanding the guessingrows changes nothing exept when i have it set to max which just makes the whole system crash).

proc import DATAFILE="W:FILE1.txt" OUT=FILE1 DBMS=dlm replace; delimiter=';'; guessingrows=100; run;

When this is read i get the error message "ERROR: Import unsuccessful. See SAS Log for details." However the file is imported and i can see that all rows and variables are included. I simply want to change the format or possibly the variable length (call the variable ID) in the proc import stage. Is this possible?

2
You can't read 301185964728506014850593 as numeric into SAS without loss of InformationLee
What is your SAS client ? Display manager, Enterprise guide or SAS Studio ?Richard
I am using Enterprise guide.Jonas Lundin

2 Answers

1
votes

To have a greater level of control when reading in a file of raw data, you should consider using a data step, e.g.:

data want;
   infile datalines4 dlm = ';' dsd;

   input
      id : $char25.
      v1 : $char10.
      v2 :       8.
   ;

   datalines4;
301185964728506014850593;abc;123
30118596472850601485059;abcd;1234
3011859647285060148505;abcde;12345
;;;;

This way you can specify the id to be character and so data will be held as presented in the file.

0
votes

There is a SAS limit to the how large a numeric can be and still retain exact integer representation. (Based on binary limits of mantissa and exponent features in double precision floating point values)

From the "SAS Companion for Windows"

Significant Digits and Largest Integer by Length for SAS Variables under Windows 

Length   Largest                     
in       Integer                                 Significant
Bytes    Represented              Exponential   Digits 
         Exactly                  Notation      Retained
------   -----------------------  -----------   -----------
 3                        8,192    213            3
 4                    2,097,152    221            6
 5                  536,870,912    229            8
 6              137,438,953,472    237           11
 7           35,184,372,088,832    245           13
 8        9,007,199,254,740,992    253           15

There is no option for directly specifying the column formats with the IMPORT Procedure.

You can recall the DATA Step source code that the procedure creates and modify that.

From the documentation File "Format-Specific Reference for the IMPORT and EXPORT Procedures" Delimited Files

Processing Delimited Files in SAS

When you use PROC IMPORT to read a comma-separated file, a tab-separated file, or other delimited file, the procedure does the following actions by default:

  • creates a DATA step with an INPUT statement
  • submits all of the code to the DATA step compiler, which, in turn, executes the code.

If you need to revise your code after the procedure runs, issue the RECALL command (or press F4) to the generated DATA step. At this point, you can add or remove options from the INFILE statement and customize the INFORMAT, FORMAT, and INPUT statements to your data.

So the steps will be

  1. Submit a Proc IMPORT that outputs zero rows.
    The only thing wanted from this step is the source code generated by the procedure
  2. Open a new edit window (i.e. menu View/Enhanced Editor)
  3. Issue the RECALL command in the command bar (or menu Run/Recall Last Submit)
  4. Edit the recalled source code
    - Remove the limiter where=(1=0))
    - Change the INFORMAT from best32. to something like $32.
    - Remove the corresponding FORMAT statement
  5. Submit the edited code

Example:

Create sample data set with too big integers and IMPORT it

filename myfile temp;

data _null_;
  file myfile;
  put "one;two;three";
  put "1;2;3";
  put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
  put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
  put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
  put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
  put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
  put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
run;


proc import 
  file=myfile 
  dbms=dlm
  replace
  out=myimport(where=(1=0)    /* output limiter */
;
  delimiter=';';
run;

Recall the SAS source code, edit it and resubmit

 /**********************************************************************
 *   PRODUCT:   SAS
 *   VERSION:   9.4
 *   CREATOR:   External File Interface
 *   DATE:      07NOV19
 *   DESC:      Generated SAS Datastep Code
 *   TEMPLATE SOURCE:  (None Specified.)
 ***********************************************************************/
    data WORK.MYIMPORT 
/*(where=(1=0))   */        /* <------ remove limiter */
;
    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
    infile MYFILE delimiter = ';' MISSOVER DSD lrecl=32767 firstobs=2 ;
       informat one $32. ;        /* <-------- change informats */
       informat two $32. ;
       informat three $32. ;
/*       format one best12. ;*/   /* <--------- remove format statements */
/*       format two best12. ;*/
/*       format three best12. ;*/
    input
                one
                two
                three
    ;
    if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
    run;

Yields the data set

Obs    one                         two                         three

 1     1                           2                           3
 2     301185964728506014850593    301185964728506014850594    301185964728506014850595
 3     301185964728506014850593    301185964728506014850594    301185964728506014850595
 4     301185964728506014850593    301185964728506014850594    301185964728506014850595
 5     301185964728506014850593    301185964728506014850594    301185964728506014850595
 6     301185964728506014850593    301185964728506014850594    301185964728506014850595
 7     301185964728506014850593    301185964728506014850594    301185964728506014850595