0
votes
"6/23/2015 8:55:36 AM,6/23/2015 7:57:55 AM,test,A B,""C, D "",E-MA,F,Personal G,G one,test - TWO THREE,""I LIKE APPLE"",""ONE, TWO"",FCB,6/27/2015 - 6/27/2016,6/23/2015,BIZ,Personal,MA,NY,Personal,Group,""NYC Ins. Companies"",,,""$NYC NY-MA, (Group)"",""$NYC NY-MA, (Group)"",,,,"

The data looks like this but with 10k rows. I was first trying to use PROC IMPORT but it didn't recognize the comma(,) as a delimiter in the file. Then I tried DATA STEP INFILE but still didn't work out the problem.

Does anyone have any experience importing files with text qualifiers?

Thank you.


update

  • Not sure how to insert a table here but if i use excel to import the file, specifying the comma as the delimiter and double quote as the text qualifier, it would have 30 fields. Looks like the following and sorry for the crappy table...

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 VAR16 VAR17 VAR18 VAR19 VAR20 VAR21 VAR22 VAR23 VAR24 VAR25 VAR26 VAR27 VAR28 VAR29 VAR30

6/23/2015 8:55|6/23/2015 7:57| test| A |B C, D | E-MA |F |Personal G |G one| test - TWO THREE| I LIKE APPLE |ONE, TWO |FCB 6/27/2015 - 6/27/2016 |6/23/2015| BIZ| Personal |MA |NY| Personal |Group NYC Ins. Companies | | |$NYC NY-MA, (Group)| $NYC NY-MA, (Group) ||||

  • It not a copy paste issue. I have the raw csv file and txt file.
1
Can you edit your question to show what the correct fields are after input?Joe
It looks like the whole line is wrapped in double-quotes. Is the file actually like that or is that some kind of copy/paste issue when you created your question? Looking at it a little more, I'm guessing the file is actually like that... I think the person that created the file made a mistake and nested their quotes 1 level too deep. I'd probably just ask them to fix it and resend.Robert Penridge
I think you'd be better of just listing what of the first ten or fifteen variables in your example should be separate variables; the way you listed it here is unfortunately not very useful. In particular - what is supposed to happen with ""C, D"" - is that supposed to be "C,D" in the final variable (one quote set around it, and one field)?Joe
C, D are just some strings....Mike

1 Answers

0
votes

I think you might have better luck with proc import if you make an intial pass through the file to strip out the extraneous double quotes:

data _null_;
  infile "original_file.csv" lrecl = 32000;
  file "new_file.csv";
  input;
  _infile_ = tranwrd(substr(_infile_,2,length(_infile_) - 1),'""','"');
  put _infile_;
run;

The idea is pretty simple - read in the whole line, remove the first and last characters (assuming that these are always double quotes), and then replace double double quotes with double quotes.

This may cause some further problems if any of your text fields are actually supposed to contain double quotes, but otherwise it should generate a file that should be slightly easier to import directly, either via proc import or by using the dsd option on an appropriate infile statement.