I have an xml file containing only one text line of length more than 32767. For now SAS truncates it at 32767-th character and stops reading this line further. The task is to split the input line into separate variables or separate observations. The code I use to read file is:
data out (drop=v_length);
length xml_text $32767;
retain xml_text v_length gr_split;
infile tempxml encoding='utf-8' end=last;
input;
if _n_ = 1 then do;
v_length = length(left(_infile_));
gr_split = 1;
end; else
v_length=v_length+length(left(_infile_));
if v_length gt 32767 then do;
gr_split + 1;
v_length=length(left(_infile_));
end;
if _n_ = 1 or v_length=length(left(_infile_)) then do;
xml_text = compress(left(_infile_),,'c');
end; else
xml_text = trim(xml_text)||compress(left(_infile_),,'c');
if last then do;
call symput('NumOfTextGroups',gr_split);
call symput('LastRow',_n_);
end;
run;
When the whole xml length is no longer than 32767 the code produces a single cell. Otherwise it outputs n rows. In first case I can parse it in Oracle directly (once data is delivered there). In second, I first bring data to Oracle and there I assemble the cell to parse. However it works only when each line of xml file is less 32767 characters.
LIBNAME
with an XML map - that works very well in most cases. The other I used delimited by">" if I remember correctly to read it in. You can use
@"<tag>"` as well. – Joe