1
votes

I'm trying to use Excel's get&transform functionality (previously known as powerquery) to import an XML data source. The data source has a list of b tags, each with a variable number of d tags in a c2 child, such as the following:

<a>
  <b>
    <c1>foo</c1>
    <c2>
      <d>bar</d>
    </c2>
  </b>
  <b>
    <c1>fuz</c1>
    <c2>
      <d>baz</d>
      <d>quz</d>
    </c2>
  </b>
</a>

When I import this data with the following query the data type for column c2.d is different for the two different rows representing the b items, for the first row it is a general spreadsheet cell type, for the second row it is a Table type.

let
    Source = Xml.Tables(File.Contents("C:\Localdata\excel-powerquery-test2.xml")),
    Table0 = Source{0}[Table],
    #"Changed Type" = Table.TransformColumnTypes(Table0,{{"c1", type text}}),
    #"Expanded c2" = Table.ExpandTableColumn(#"Changed Type", "c2", {"d"}, {"c2.d"})
in
    #"Expanded c2"

It seems that for the first row it automatically converts the d tag into a simple spreadsheet cell as there is only one and it only contains text. However for the second row it sees there are two d tags and hence keeps it as a table. The problem now is that I can neither load the data as is as the Table in the second row is loaded into the spreadsheet as the literal string "Table" leaving me without the actual data, nor can I further expand the Table using Table.ExpandTableColumn as it (rightly) complains that bar in the first row is not a table.

I presume the automatic conversion of a single tag containing text to a simple cell rather than a table happens either in the Xml.Tables or ExpandTableColumn functions. The tooltip for Xml.Tables shows that it has an options parameter, unfortunately the documentation for Xml.Tables does not give any details on this options parameter.

How can I get this second row expanded out to two rows, one each for the two d tags contained in the second b tag having the same "fuz" string in the first column? Such an expansion works fine if the contents of the d tags are further XML tags, but apparently not if the d tags only contain text.

1

1 Answers

1
votes

Let's add a step to make sure everything is at the same level:

let
    Source = Xml.Tables(File.Contents("C:\Localdata\excel-powerquery-test2.xml")),
    Table0 = Source{0}[Table],
    Expandc2 = Table.ExpandTableColumn(Table0, "c2", {"d"}, {"d"}),
    ToLists = Table.TransformColumns(Expandc2,
                  {"d", each if _ is table then Table.ToList(_) else {_}}),
    ExpandLists = Table.ExpandListColumn(ToLists, "d")
in
    ExpandLists

The ToLists step turns this:

Screenshot

Into a more consistent list format:

c1       d
-----------------------
foo     {"bar"}
fuz     {"baz", "quz"}

Then you can expand to rows without mixed data types.