5
votes

After creating an xlsx file with the following custom query (loading data from the jsonplaceholder service from typicode.com)

let
    Source = Json.Document(Web.Contents("https://jsonplaceholder.typicode.com/posts")),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"userId", "id", "title", "body"}, {"userId", "id", "title", "body"})
in
    #"Expanded Column1"

and unzipping it I would expect the above string (the query definition) to be found somewhere in the resulting folder structure in some form. The goal is to programatically replace the URL, but somehow the only query definition I can find is

<connection id="1" keepAlive="1" name="Query - posts" description="Connection to the 'posts' query in the workbook." type="5" refreshedVersion="6" background="1" saveData="1">
    <dbPr connection="Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=posts;Extended Properties=&quot;&quot;" command="SELECT * FROM [posts]"/>
</connection>

Although I wouldn't recommend anybody to download & run office files from strangers, I did upload the xlsx to nofile.io.

I would expect some formula property on the queryTable, but the query table definition just looks like

<queryTable xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" name="ExternalData_1" connectionId="1" autoFormatId="16" applyNumberFormats="0" applyBorderFormats="0" applyFontFormats="0" applyPatternFormats="0" applyAlignmentFormats="0" applyWidthHeightFormats="0">
   <queryTableRefresh nextId="5">
      <queryTableFields count="4">
         <queryTableField id="1" name="userId" tableColumnId="5" />
         <queryTableField id="2" name="id" tableColumnId="2" />
         <queryTableField id="3" name="title" tableColumnId="3" />
         <queryTableField id="4" name="body" tableColumnId="4" />
      </queryTableFields>
   </queryTableRefresh>
</queryTable>

Ideally I would like just the information about how the spec stores this information (as I want to manually edit this on the frontend in the browser), although any solution which allows me to generate this would be perfect.

1
In *.xlsx/customXml/ you will find a item1.xml which contains a DataMashup element which contains a base64Binary which is the binary query definition file. I have no clue how to work with that. That's why only a comment and not a answer.Axel Richter
Based on this website it looks like that used to be a normal xml file... fourmoo.com/2017/05/02/… (powerBI and Excel files are somewhat similar :) working with both of them now)David Mulder
@VipinKumar: Tagged it [javascript] because that's the language I am doing the template generation in. Tagging it with Javascript both prevents someone coming with some official Office dll component for editing this information and/or directs users to javascript libraries. It's similar to asking an [html]/[css] question and tagging it javascript because you're fine with a [javascript] solution.David Mulder

1 Answers

1
votes

Although this is an old post, I'm adding references to the Open Specifications that detail exactly how the query information can be found in Excel (.xlsx) Office Open XML file format. As was already noted, the customXml\item[x].xml parts will contain DataMashup format information. The specifications for those packaging and part data can be found in the following two links:

[MS-QDEFF]: Query Definition File Format

[MS-QDEIF]: Query Definition Interoperability Format

One thing to note is that the top level binary stream (after base64 decoding the value from the DataMashup element) has a 4 byte version field at the beginning (which is 0) and a 4 byte size field before each of the Package, Permissions and Metadata pieces. Once each piece is extracted, it can be used as an independent PK Zip package (actually an OPC - Open Packaging Convention).