3
votes

I am reading a .txt file into SAS, that uses "|" as the delimiter. The issue is there is one column that is using "|" as a word separator as well instead of acting like delimiter, this needs to be in one column.

For example the txt file looks like:

apple|fruit|Healthy|choices|of|food|12|2012|chart

needs to look like this in the SAS dataset:

apple | fruit | Healthy choices of Food | 12 | 2012 | chart

How do I eliminate "|" between "Healthy choices of Food"?

4
How do you know the delimiter between fruit and Healthy is correct but the one between Healthy and choices is not? objectively, how do you know? - Jay Corbett
I think OP means that the first 2, and last 3, variables can't contain the extra delimiter. My solution assumes that. - itzy

4 Answers

2
votes

I think this will do what you want:

data tmp1;
  length tmp $100;
  input tmp $;
  cards;
apple|fruit|Healthy|choices|of|food|12|2012|chart
apple|fruit|Healthy|choices|of|food|and|lots|of|other|stuff|12|2012|chart
;
run;

data tmp2;
  set tmp1;
  num_delims=length(tmp)-length(compress(tmp,"|"));
  expected_delims=5;
  extra_delims=num_delims-expected_delims;
  length new_var $100;
  i=1;
  do while(scan(tmp,i,"|") ne "");
    if i<=2 or (extra_delims+2)<i<=num_delims then new_var=trim(new_var)||scan(tmp,i,"|")||"|";
    else new_var=trim(new_var)||scan(tmp,i,"|")||"#";
    i+1;
  end;
  new_var=left(tranwrd(new_var,"#"," "));
run;
0
votes

This isn't particularly elegant, but it will work:

data tmp;
 input tmp $50.;
 cards;
apple|fruit|Healthy|choices|of|food|12|2012|chart
;
run;

data tmp;
 set tmp;
 var1 = scan(tmp,1,'|');
 var2 = scan(tmp,2,'|');
 var4 = scan(tmp,-3,'|');
 var5 = scan(tmp,-2,'|');
 var6 = scan(tmp,-1,'|');

 var3 = tranwrd(tmp,trim(var1)||"|"||trim(var2),"");
 var3 = tranwrd(var3,trim(var4)||"|"||trim(var5)||"|"||trim(var6),"");
 var3 = tranwrd(var3,"|"," ");
 run;
0
votes

Expanding a little on Itzy's answer, here is another possible solution:

data want;
   /* Define variables */
   attrib item    length=$10 label='Item';
   attrib class   length=$10 label='Family';
   attrib desc    length=$80 label='Item Description';
   attrib count   length=8   label='Some number';
   attrib year    length=$4  label='Year';
   attrib somevar length=$10 label='Some variable';

   length countc $8; /* A temp variable */

   infile 'c:\temp\delimited_temp.txt' lrecl=1000 truncover;
   input;
   item    = scan(_infile_,1,'|','mo');
   class   = scan(_infile_,2,'|','mo');
   countc  = scan(_infile_,-3,'|','mo');  /* Temp var for numeric field */
   count   = inputn(countc,'8.');         /* Re-read the numeric field */
   year    = scan(_infile_,-2,'|','mo');
   somevar = scan(_infile_,-1,'|','mo');

   desc = tranwrd(
             substr(_infile_
                 ,length(item)+length(class)+3
                 ,length(_infile_) 
                    - ( length(item)+length(class)+length(countc)
                       +length(year)+length(somevar)+5))
            ,'|',' ');
   drop countc;
run;

The key in this case it to read your file directly and handle the delimiters yourself. This can be tricky and requires that your data file is exactly as described. A much better solution would be to go back to whoever gave this this data and ask them to deliver it to you in a more appropriate form. Good luck!

0
votes

Another possible workaround.

data tmp;
infile '/path/to/textfile'; 
input tmp :$100.;
array varlst (*) $30 v1-v6;
a=countw(tmp,'|');
do i=1 to dim(varlst);
 if i<=2 then
   varlst(i) = scan(tmp,i,'|');
 else if i>=4 then
   varlst(i) = scan(tmp,a-(dim(varlst)-i),'|');
 else do j=3 to a-(dim(varlst)-i)-1;
   varlst(i)=catx(' ', varlst(i),scan(tmp,j,'|'));
   end;
 end;
drop tmp a i j; 
run;