3
votes

I recently switched to PowerQuery to fetch data from various sources. I have loaded my existing data to a table called "masterEntries".

The query I have calls a function to check the last record for each source in "masterEntries" and fetches only newer records.

let
    Source = Excel.CurrentWorkbook(){[Name="formsMaster"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"FormName", type text}, {"Form", type text}, {"LastEntry", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each formEntries([FormName],[LastEntry])),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"EntryId", "Field1", "Field2", "Field3", "Field5", "DateCreated"}, {"EntryId", "Field1", "Field2", "Field3", "Field5", "DateCreated"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"Form", "LastEntry"}),
in
    #"Removed Columns"

This query loads the data to a new table. Instead I want to append the data to "masterEntries".

I am trying to do this with PowerQuery and not VBA. PowerQuery has Append Query feature where two or more queries/results can be combined to a new table.

Even a new query to append the resulting table from above query ("latestEntries") to existing table ("masterEntries") will do.

Any ideas on how it can be done with PowerQuery?

EDIT

  1. My original data ("masterEntries") was loaded manually. It is a big table with 400K+ records. I can load it using a query if that is going to help.
  2. Every run of "latestEntries" checks what records are already in "masterEntries" and fetches only newer entries from different sources.
  3. The Append Query method in Power Query is just a connection. It does not append records permanently. That is, when "latestEntries" brings a new set of records, the "masterEntries" loses the records that were in the earlier run of "latestEntries".
3
You must create connection to 1st data source and load it. After that you should create 2nd connection and when you finish your edits just go to Append Queries from Home ribbon and select 1st connection. Then you will most likely have to sort the data. Very easy process, they've done a great job with it.zipa
The second connection loads data to a new table. It does not append data to my existing table. I don't want a new table with consolidated data. Correct me if I got you wrong.eshwar
It does, but then you apply Append Queries and it does exactly as it name tells and gives you appended solution.zipa
By "appended solution" do you mean that I get the main table is appended with additional data? What I get is a new table which I don't want.eshwar
You first get new table, and then you append that new table to the last one. Or you can choose Append Queries as New and it will create a new table that is a combination of two.zipa

3 Answers

1
votes

This sounds a bit like a request for "incremental load". This is currently not supported by Power Query in Excel. The workaround is to go via a "linkback"-table like described here: http://ms-olap.blogspot.de/2015/05/incremental-data-loads-in-microsoft.html

If your linkback-table exceeds 1,1 Mio rows, you can use JSON-compression like described here: http://www.thebiccountant.com/2016/12/06/how-to-store-tables-longer-than-11-mio-rows-in-excel/ But be aware, that this costs performance.

Both methods "cost" performance, so this technique only makes sense, if you "save" repetitive execution of really heavy transformations (or long loads from the web).

0
votes

You should add something like this, just change name of Your_Table into table you want to use:

    #"Append Query" = Table.Combine({#"Removed Columns", Your_Table})
in
    #"Append Query"
0
votes

Assuming you have some kind of ID, and it is integer, here is the query for the masterEntries table (this is important!):

let
    Source = Excel.CurrentWorkbook(){[Name="masterEntries"]}[Content],
    Types = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Value", type number}}),

    //Assuming you have integer-type IDs. 
    //Otherwise you have to order and index records in a view, and query that view.
    MaxID = List.Max(Types[ID]), 
    //if you have ordered index, List.Max() can be substituted with Table.LastN(Types, 1)[ID]{0}
    //it may perform better.

    TableFromDB = Excel.CurrentWorkbook(){[Name="source"]}[Content], //Replace with database table
    GetNewRows = Table.SelectRows(TableFromDB, each [ID] > MaxID),
    MergeTables = Table.Combine({Types, GetNewRows})
in
    MergeTables