3
votes

I have to produce an XML file out of a SAS dataset. The format of the XML file is defined very strictly and I need to match it exactly. I'm using SAS 9.4 (note: and stuck with it!) and utilizing XMLMAPs and libname xmlv2. I feel like I'm very close to a solution, but there's a final hurdle that I cannot seem to pass!

The XML file has a 3-level structure, with a single lvl 2 element holding all the lvl 3 elements. Whatever I try, all my lvl 3 elements seem to spawn their own lvl 2 elements. It seems that the SAS xmlv2 libname engine even works differently when importing or exporting the exact same data! Examples & steps to reproduce below - help me out if you can!

Example data

The data is a list of files and some attributes related to those files. The attributes are common to all files, only filenames in the list differ. This creates a test dataset into SAS work:

proc sql;

create table input_data
  (col1 char(1),
  col2 char(1),
  file char(20));

insert into input_data
  values ('1', '2', 'file1.txt')
  values ('1', '2', 'file2.txt');

quit;

Desired output XML

Note that all the filenames are listed together in their own FILE elements, nested inside a single FILES element. The common attributes are elements inside the main FILE_INFO element. This is the structure I need to be able to output.

<?xml version="1.0" encoding="windows-1252" ?>

<FILE_INFO>
  <COL1>1</COL1>
  <COL2>2</COL2>
  <FILES>
    <FILE>file1.txt</FILE>
    <FILE>file2.txt</FILE>
  </FILES>
</FILE_INFO>

SAS XMLMAP I've created

<?xml version="1.0" encoding="windows-1252"?>
<!-- ############################################################ -->
<!--  this is a map file for SAS-XML conversion                   -->
<!-- ############################################################ -->
<SXLEMAP name="file_test" version="2.1">

  <!-- ############################################################ -->
  <OUTPUT>
    <TABLEREF name="FILE_INFO"/>
  </OUTPUT>

  <NAMESPACES count="0"/>

  <!-- ############################################################ -->
  <TABLE name="FILE_INFO">
    <TABLE-PATH syntax="XPath">/FILE_INFO/FILES/FILE</TABLE-PATH>

    <COLUMN name="col1" retain="YES">
      <PATH syntax="XPath">/FILE_INFO/COL1</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>1</LENGTH>
    </COLUMN>

    <COLUMN name="col2" retain="YES">
      <PATH syntax="XPath">/FILE_INFO/COL2</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>1</LENGTH>
    </COLUMN>

    <COLUMN name="file">
      <PATH syntax="XPath">/FILE_INFO/FILES/FILE</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>20</LENGTH>
    </COLUMN>

  </TABLE>
</SXLEMAP>

SAS code to output the XML using the XMLMAP

filename out "C:\myfolder\test_out.xml";
libname out xmlv2 xmltype=xmlmap xmlmap="C:\myfolder\file_test.map";

data out.FILE_INFO;
   set work.input_data;
run;

Actual result XML

<?xml version="1.0" encoding="windows-1252" ?>

<FILE_INFO>
  <FILES>
    <FILE>file1.txt</FILE>
  </FILES>
  <COL1>1</COL1>
  <COL2>2</COL2>
  <FILES>
    <FILE>file2.txt</FILE>
  </FILES>
</FILE_INFO>

Steps to reproduce

Generate test data set using the code above. Save XMLMAP into file_test.map. Run SAS Code, compare resulting XML to the desired result.

The Problem

See what happens there? All the FILE-elements go inside their own FILES-elements. This happens no matter how many rows with separate filenames I have in my data: each and every one of them has its own FILES-element.

The fun part is, if I take the desired output XML file above, and feed it back to SAS using the very same XMLMAP, the resulting SAS dataset is exactly the same as my original input dataset!

I've tried fiddling with RETAIN-options in the XMLMAP, I've tried defining FILES as its own column in the input dataset and defining it in the XMLMAP, I've tried all kinds of random stuff but to no avail. Any ideas?

1
Sounds like you want it to do BY group processing. Did you try adding BY statement to your data step?Tom
I think that the issue may be with the fact that the FILES/FILE is not a parent of COL1 / COL2; I think you have to do the XPATH a bit differently. But I'm not sure exactly how. The map needs to indicate that the COL1 and COL2 are "siblings" of "file", not of "files", basically.Joe
Looking at this some more, I don't think SAS will produce the file you want through the XML map pathway. There's no good way for it to do so in a sequential manner, which is how SAS operates in this sort of thing. You can get it to work more or less as you expect if you make COL1 and COL2 into attributes of FILES, but not otherwise.Joe
@Tom, I don't think BY groups in the datastep help? The XMLMAP directs the whole process...Juha K

1 Answers

5
votes

Because your desired XML involves a somewhat complex grouping, consider XSLT, the special-purpose language designed to transform XML files. SAS 9.4 maintains an XSLT processor using Saxon-EE version 9.3 engine with proc xsl which allows both XSLT 1.0 or 2.0 scripts.

Specifically, export your data into a raw xml file (no mapping) and use XSLT 1.0's Muenchian Grouping or the simpler XSLT 2.0's xsl:for-each-group. I include both since for portability, XSLT 1.0 is used more extensively as the default specification among other language libraries (Java, Python, PHP, R) in case you need to run outside of SAS or future readers use earlier versions.

Do note you will see cols are hard-coded in the XSLT inside concat() and as <COL> nodes in the specified templates. For additional cols, add to these sections accordingly. The normalize-space() is used since SAS pads spaces before/after the text values.

XSLT 1.0 (save as .xsl file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="colkeys" match="INPUT_DATA" use="concat(col1, col2)" />

  <xsl:template match="/TABLE">
    <FILE_INFO>
      <xsl:apply-templates select="INPUT_DATA[generate-id() =
                                       generate-id(key('colkeys', concat(col1, col2)))]"/>
    </FILE_INFO>
  </xsl:template>

  <xsl:template match="INPUT_DATA">

    <COL1><xsl:value-of select="normalize-space(col1)"/></COL1>
    <COL2><xsl:value-of select="normalize-space(col2)"/></COL2>

    <FILES>
        <xsl:for-each select="key('colkeys', concat(col1, col2))">
            <FILE><xsl:value-of select="normalize-space(file)"/></FILE>
        </xsl:for-each>
    </FILES>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.0 (save as .xsl file)

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="colkeys" match="INPUT_DATA" use="concat(col1, col2)" />

  <xsl:template match="/TABLE">
    <FILE_INFO>
      <xsl:for-each-group select="INPUT_DATA" group-by="concat(col1, col2)">          
        <COL1><xsl:value-of select="normalize-space(col1)"/></COL1>
        <COL2><xsl:value-of select="normalize-space(col2)"/></COL2>

        <FILES>
            <xsl:for-each select="current-group()">
                <FILE><xsl:value-of select="normalize-space(file)"/></FILE>
            </xsl:for-each>
        </FILES>
      </xsl:for-each-group>
    </FILE_INFO>

  </xsl:template>

</xsl:stylesheet>

SAS

** EXPORT DATASET TO XML FILE;
filename out "C:\Path\Raw_Output.xml";

libname out xml;

data out.input_data;
 set Work.input_data;
run; 

libname out clear;

proc xsl 
    in="C:\Path\Raw_Output.xml"
    out="C:\Path\Final_Output.xml"
    xsl="C:\Path\XSLT_Script.xsl";
run;

Output

<?xml version="1.0" encoding="UTF-8"?>
<FILE_INFO>
   <COL1>1</COL1>
   <COL2>2</COL2>
   <FILES>
      <FILE>file1.txt</FILE>
      <FILE>file2.txt</FILE>
   </FILES>
</FILE_INFO>