We tried reading 3-4 GB of XML file using SAS XML mapper .but when we PROC COPY the data from the XML engine to SAS Dataset its taking almost 5 to 6 mins which is too much time for us since we have to process 3000 files a day .We are running almost 10 files in parallel.One table almost have 230 columns.
Is there any other faster way to process the XML ? can we use PROC GROOVY ? will it be efficient? if yes can any one provide me a sample code? i tried searching online but not able to get one.
The XML has PII data and its huge of 3 GB .
The Code being run is very simple and straight forward:
filename NHL "/path/ODM.xml";
filename map "/path/odm_map.map";
libname NHL xmlv2 xmlmap=map;
proc copy in=nhl out=work;
run;
Total Table created : 54 in which more than 14 tables have ~18000 records and remaining tables have ~1000 records
The Log window shows
NOTE: PROCEDURE COPY used (Total process time):
real time 4:03.72
user cpu time 4:00.68
system cpu time 1.17 seconds
memory 32842.37k
OS Memory 52888.00k
Timestamp 19/05/2020 03:14:43 PM
Step Count 4 Switch Count 802
Page Faults 3
Page Reclaims 17172
Page Swaps 0
Voluntary Context Switches 3662
Involuntary Context Switches 27536
Block Input Operations 504
Block Output Operations 56512
SAS Version : 9.4_M2
total memsize is MEMSIZE=3221225472
in our server
3000 files total out of which 1000 will be 3 to 4 GB and some of which will be 1 GB and 1000 files will be in KB .The smaller files are getting processed quickly the problem is only with big files .it uses almost the entire CPU.
The copy time from XML engine varies when we reduce the number of file,but for that to happen we have to change the map file or the input xml.
Already raised SAS tracks and have questioned the same in SAS communities still no luck.looks like its parser limitation itself.
Any idea about the shredder in Teradata ? will it be efficient?
MEMMAXSZ
option in aconfig.sas
. Also, theXML
engine might be bottlenecked in the resources available to the SAS private runtime JRE. Finally, you can use communities.sas.com for topics requiring a more threaded conversation. – Richard