0
votes

I have a big .csv file (~2GB) to be read in SAS. Unfortunately, somehow there is a non-ascii character in the file and SAS stops importing upon reach the field including this character.

In order to specify the format of the fields, I use data step INFILE to do the import job. I wonder if there is any way to read the complete data while ignoring non-ascii characters.

Note: The only fix for me right now is to import in SAS first, get the error message and know the exact place where the error is, open the file (wait for like 10 minutes), manually locate the place I found and delete the character. Obviously it's cumbersome and difficult to be repeated.

2
I would write a Perl or shell script to pre-process the file to remove those characters. I'm not an expert in either, so I won't attempt an answer.DomPazz
@DomPazz Thanks anyway. I'm thinking of writing a Python script to do the similar thing. Need some extra effort :)Nip
Can you read it in with a different character set?Joe
@Joe Would you please share more details?Nip
The encoding option on infileJoe

2 Answers

0
votes

Just a quick thought. If you compress the non-writable characters from _infile_, then it might work?

data _null_;
  infile file;
  input;
  _infile_=compress(_infile_,"","kw");
run;

You would need to create the variables from the _infile_ variable afterwards.

0
votes

You would need to go to UTF-8 encoding SAS session and then do something like:

  data txt;
    infile intxt truncover encoding="UTF-8" lrecl=10000;
    input line $10000.;
  run;

Obviously, you can do more clever things to load the csv properly, but I don't know a way out from using utf-8. SAS stops reading file on those special characters before looking at any of datastep statements.