2
votes

I am trying to create a summary table of data located in a couple hundred CSV files. I have created some power queries to convert the data into one table with five columns (Name_Age, Events, MS, Name, Age) and import it into a table in Excel. This table is called RawData. RawData Example

In a second tab, I have a VBA script copying RawData into a second table called Prep. The second table is necessary because I have a sixth column that calculates Duration. This second table is necessary because I couldn't figure out how to do the calculations in Power Query. Prep Example

So, MS stands for milliseconds. This field represents the moment in time that Event happens. I need to calculate the duration of the event, i.e. MS of next event - MS of the current event. Using the Event AR as an example, my formula takes the MS of F (32635) and subtracts the MS of AR (32301) to give a duration of 334. ER stands for End of Record or the end of the timeline in question for the subject. How can I do that in Power Query?

I can do it in Excel but I am using a macro to copy RawData into Prep. I would prefer to just have it all done in Power Query. The other reason is I then have a PivotTable summarizing the whole thing. If I can get the Duration calculated in Power Query, I can skip the macro.

3

3 Answers

4
votes

In order to substract the field from the following row from your current row, you need to use relative cell-references like described in this article: http://excel-inside.pro/blog/2015/11/05/absolute-and-relative-references-in-power-query/

0
votes

Since you want to evaluate the data with a pivot table, I would recommend to load the data into the data model and use Power Pivot to perform the calculation. Add a calculated column with the formula

=CALCULATE(Min(Table1[MS]),FILTER(Table1,EARLIER(Table1[MS]) < Table1[MS]))

This will give you the minimum value of MS which is greater than the one in the current row. Afterwards you can add another column with the difference between the new column and the MS column to get the Duration. You need to include an If clause to set the Duration of the highest MS value to 0 (or however you want to handle that):

=If([CalculatedColumn1]=BLANK(),0,[CalculatedColumn1]-[MS])

You can also add additional filters in the first formula to compare only those MS values related to the same "Name_Age":

=CALCULATE(Min(Table1[MS]),FILTER(Table1,EARLIER(Table1[MS]) < Table1[MS] && EARLIER(Table1[Name_Age]) = Table1[Name_Age]))

For more information about EARLIER() see https://msdn.microsoft.com/en-us/library/ee634551(v=sql.110).aspx

0
votes

Try this (Power Query code):

let
Input = Excel.CurrentWorkbook(){[Name="RawData"]}[Content],

// Generates two lists (Time and Type) from source table columns.

Time = Input[MS],
Type = Input[Event],

// Creates a list of integers to loop, from 0 to the size of Time list less 1

Loop = {0..(List.Count(Time)-1)},

// Generates a new list (Delta) calculating the durations substracting the values in the Time List: value(position + 1) - value (position). If Type(position) = "ER" returns "NA".

Delta = List.Transform(Loop, each if Type{_}="ER" then "NA" else Time{(_+1)}-Time{_}),

// Add the Delta list to the original table.

Indexed = Table.AddIndexColumn(Input,"Duration"),
Output = Table.TransformColumns(Indexed, {"Duration", each Delta{_}})
in
Output