6
votes

I`ve got (and will receive in the future) many CSV files that use the semicolon as delimiter and the comma as decimal separator. So far I could not find out how to import these files into SAS using proc import -- or in any other automated fashion without the need for messing around with the variable names manually.

Create some sample data:

%let filename = %sysfunc(pathname(work))\sap.csv;

data _null_;
  file "&filename";
  put 'a;b';
  put '12345,11;67890,66';
run;

The import code:

proc import out = sap01 
datafile= "&filename"
dbms = dlm; 
delimiter = ";";
GETNAMES = YES; 
run;

After the import a value for the variable "AMOUNT" such as 350,58 (which corresponds to 350.58 in the US format) would look like 35,058 (meaning thirtyfivethousand...) in SAS (and after re-export to the German EXCEL it would look like 35.058,00). A simple but dirty workaround would be the following:

data sap02; set sap01;
AMOUNT = AMOUNT/100;
format AMOUNT best15.2;
run;

I wonder if there is a simple way to define the decimal separator for the CVS-import (similar to the specification of the delimiter). ..or any other "cleaner" solution compared to my workaround. Many thanks in advance!

2
35.358,00, this looks like a string variable. Is it 35,058,00?DaJun Tian
thanks, I've edited my post!Joz

2 Answers

6
votes

You technically should use dbms=dlm not dbms=csv, though it does figure things out. CSV means "Comma separated values", while DLM means "delimited", which is correct here.

I don't think there's a direct way to make SAS read in with the comma via PROC IMPORT. You need to tell SAS to use the NUMXw.d informat when reading in the data, and I don't see a way to force that setting in SAS. (There's an option for output with a comma, NLDECSEPARATOR, but I don't think that works here.)

Your best bet is either to write data step code yourself, or to run the PROC IMPORT, go to the log, and copy/paste the read in code into your program; then for each of the read-in records add :NUMX10. or whatever the appropriate maximum width of the field is. It will end up looking something like this:

data want;
  infile "whatever.txt" dlm=';' lrecl=32767 missover;
  input
    firstnumvar :NUMX10.
    secondnumvar :NUMX10.
    thirdnumvar :NUMX10.
    fourthnumvar :NUMX10.
    charvar :$15.
    charvar2 :$15.
  ;
run;

It will also generate lots of informat and format code; you can alternately convert the informats to NUMX10. instead of BEST. instead of adding the informat to the read-in. You can also just remove the informats, unless you have date fields.

data want;
  infile "whatever.txt" dlm=';' lrecl=32767 missover;
  informat firstnumvar secondnumvar thirdnumvar fourthnumvar NUMX10.;
  informat charvar $15.;
  format  firstnumvar secondnumvar thirdnumvar fourthnumvar BEST12.;
  format charvar $15.;
  input
    firstnumvar
    secondnumvar
    thirdnumvar
    fourthnumvar
    charvar $
  ;
run;
0
votes

Your best bet is either to write data step code yourself, or to run the PROC IMPORT, go to the log, and copy/paste the read in code into your program

This has a drawback. If there is a change in the stucture of the csv file, for example a changed column order, then one has to change the code in the SAS programm.
So it is safer to change the input, substituting in the numeric fields the comma with dot and passing SAS the modified input.

The first idea was to use a perl program for this, and then use in SAS a filename with a pipe to read the modified input.
Unfortunately there is a SAS restriction in the proc import: The IMPORT procedure does not support device types or access methods for the FILENAME statement except for DISK.
So one has to create a workfile on disk with the adjusted input.

I used the CVS_PP package to read the csv file.
testdata.csv contains the csv data to read.
substitute_commasep.perl is the name of the perl program

perl code:

# use lib "/........";    # specifiy, if Text::CSV_PP is locally installed. Otherwise error message: Can't locate Text/CSV_PP.pm in ....;
use Text::CSV_PP;
use strict;
   my $csv = Text::CSV_PP->new({ binary => 1
                                ,sep_char   => ';'
                             }) or die "Error creating CSV object: ".Text::CSV_PP->error_diag ();
   open my $fhi, "<", "$ARGV[0]" or die "Error reading CSV file: $!";
   while ( my $colref = $csv->getline( $fhi) ) {
      foreach (@$colref) {              # analyze each column value
         s/,/\./ if /^\s*[\d,]*\s*$/;   # substitute,  if the field contains only numbers and ,
      }
      $csv->print(\*STDOUT, $colref);
      print "\n";
   }
   $csv->eof or $csv->error_diag();
   close $fhi;

SAS code:

filename readcsv pipe "perl substitute_commasep.perl testdata.csv";
filename dummy "dummy.csv";
data _null_;
     infile readcsv;
     file dummy;
     input;
     put _infile_;
run;
proc import datafile=dummy
     out=data1
     dbms=dlm
     replace;
     delimiter=';';
     getnames=yes;
     guessingrows=32767;
run;