0
votes

I am using Power Query in Excel and I need to calculate the duration on each "Door_side" using the Time column on a daily level for each individual user.

The data comes from a card based access system and is formatted as follows:

Date  Time   User_No   Door_side
03/12  08:59   User_05   Outside
03/12  09:00   User_33   Inside
03/12  09:01   User_10   Outside
03/12  09:01   User_04   Outside
03/12  09:02   User_26   Outside
03/12  09:03   User_19   Outside
03/12  09:03   User_15   Inside
03/12  09:04   User_31   Inside
03/12  09:05   User_31   Outside
03/12  09:06   User_15   Outside
03/12  09:06   User_06   Inside
03/12  09:06   User_06   Inside
03/12  09:06   User_06   Inside
03/12  09:08   User_32   Outside
03/12  09:09   User_10   Inside
03/12  09:09   User_13   Inside
03/12  09:10   User_10   Outside

I tried the following:

  1. Sorted the Rows by Date, User and Time;
  2. Added Index column;
  3. Created Custom column named PreviousTime;
  4. Calculated Duration (Time - PreviousTime).

The full code for the above mentioned steps is:

    let
    Source = Table,
     #"Sorted Rows" = Table.Sort(Source,{{"Date", Order.Ascending}, {"User_No", Order.Ascending}, {"Time", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
    #"Added Custom" = Table.AddColumn(#"Added Index", "PreviousTime", each try 
if List.AllTrue(
{[User_No]=#"Added Index"[User_No]{[Index]-1},[Date]=#"Added Index"[Date]{[Index]-1}
}
)
then try #"Added Index"[Time]{[Index]-1} otherwise [Time]
else [Time]
otherwise [Time]),
    Duration = Table.AddColumn(#"Added Custom", "Duration", each [Time] - [PreviousTime], type duration)
in
    Duration

This works on small data sets but causes functionality issues and completely fails on a larger amount of data. I am fairly new to Power Query and M so I just can't figure out what exactly from the custom column formula causes issues or how to approach this in another way.

I tried to keep the above code as part of my query and also to use it as a function but there is not much difference functionality wise between these two approaches. The processed table will be sent to the Data Model but I was hoping to obtain the duration in Power Query rather than in Power Pivot. A big thank you in advance!


To detail on the task a bit more I uploaded a reduced version of the data, for 3 users for the month of December. You can find it here: https://1drv.ms/x/s!AocQlL_KAzymgwhqiKxSL5JMZheL.

What I want to achieve is to calculate the duration between the timestamps based on user and date. As a plus I do not have users working past midnight so all timestamps for a specific shift will be within the same date.

An example of the desired outcome can be found within the workbook as well and looks like this (calculated in Excel):

Date    Time    User    Door_side    Duration
03/12   06:54   User_1  Outside 
03/12   07:26   User_1  Inside    00:32:00
03/12   07:27   User_1  Outside   00:01:00
03/12   07:44   User_1  Inside    00:17:00
03/12   07:52   User_1  Outside   00:08:00
03/12   08:35   User_1  Inside    00:43:00
03/12   08:36   User_1  Outside   00:01:00
03/12   11:50   User_1  Inside    03:14:00
03/12   12:01   User_1  Outside   00:11:00
03/12   13:27   User_1  Inside    01:26:00
03/12   13:43   User_1  Outside   00:16:00
03/12   14:57   User_1  Inside    01:14:00
03/12   15:20   User_1  Inside    00:23:00
03/12   15:26   User_1  Outside   00:06:00
03/12   15:34   User_1  Inside    00:08:00

Because the data contains all users and multiple days I am attempting to do the calculations within tables grouped by Date and User.


I spent some time testing all 3 approaches presented below (List.Min, Table.FirstN & nested tables) and on a limited data set all of them do a great job.

However, when applied to a larger dataset (I have around 20000 rows for 1 month) the nested tables approach seems to be the fastest.

Thank you Eugene and Marc for helping and, more important, for teaching me something new.

2
This other question may be useful to you.Alexis Olson

2 Answers

0
votes

Here's a different approach. It relies on working in nested tables.

I started with your data from your spreadsheet, in a table named Table1: enter image description here

In Power Query, using Table1 as the source, I split the Booking Time column, renamed the resulting date and time columns, filtered, out the - Doorside entries, and sorted per your guidance: enter image description here

Then I grouped by Booking Date and User: enter image description here enter image description here

Then I added an index column within each of the nested tables, in a new custom column: enter image description here enter image description here

Then I added a new column with the previous time within each of the nested tables, in a new custom column: enter image description here enter image description here (The error you see here is because there is no previous time.)

Then I added a new column with the corrections for the errors from when I added the previous date, in each of the nested tables, in a new custom column. I figured I would "correct" caused by no previous times, by replacing the error with the "current" Booking Time, which would result in a duration of zero: enter image description here enter image description here

Then I added a new column with the durations calculated in each of the nested tables, in a new custom column: enter image description here enter image description here

Then I removed all columns except the last one I had added, which I had called AddDuration: enter image description here

Then I expanded the AddDuration column: enter image description here

Here's my M code:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Split Column by Delimiter" = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"Booking time", type text}}, "en-US"), "Booking time", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Booking time.1", "Booking time.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"Booking time.1", "Booking Date"}, {"Booking time.2", "Booking Time"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Booking Date", type date}, {"Booking Time", type time}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Doorside] <> "-")),
#"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Booking Date", Order.Ascending}, {"User", Order.Ascending}, {"Booking Time", Order.Ascending}}),
#"Grouped Rows" = Table.Group(#"Sorted Rows", {"Booking Date", "User"}, {{"AllData", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "AddIndex", each Table.AddIndexColumn([AllData],"Index",0,1)),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "AddPreviousTime", each let tblName = [AddIndex] in Table.AddColumn([AddIndex],"Previous Time",each tblName{[Index]-1}[Booking Time], type time)),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "CorrectErrors", each Table.ReplaceErrorValues([AddPreviousTime], {{"Previous Time", [AddPreviousTime][Booking Time]{0}}})),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "AddDuration", each Table.AddColumn([CorrectErrors],"Duration", each [Booking Time] - [Previous Time], type duration)),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"AddDuration"}),
#"Expanded AddDuration" = Table.ExpandTableColumn(#"Removed Other Columns", "AddDuration", {"Booking Date", "Booking Time", "User", "Doorside", "Index", "Previous Time", "Duration"}, {"Booking Date", "Booking Time", "User", "Doorside", "Index", "Previous Time", "Duration"})
in
#"Expanded AddDuration"
0
votes

If I got your task correctly, you need time when next event occured, presuming this is the time door was closed. In this case I strongly recommend you avoid using index. Instead, I suggest you to think out how to apply row selection procedure to gt what you need for each row.

Here is what I think should work if my understanding of your task was correct:

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    SplitDateTime = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"Booking time", type text}}, "en-GB"), "Booking time", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Date", "Time"}),
    FilteredDoorside = Table.SelectRows(SplitDateTime, each ([Doorside] <> "-")),
    ChangedType = Table.Buffer(Table.TransformColumnTypes(FilteredDoorside,{{"Date", type date}, {"Time", type time}, {"User", type text}, {"Doorside", type text}})),
    GetCloseTime = Table.AddColumn(ChangedType, "Duration", (row)=>List.Min(Table.SelectRows(ChangedType, each [Date]=row[Date] and [Time]>row[Time])[Time]) - row[Time]),
    SetType = Table.TransformColumnTypes(GetCloseTime,{{"Duration", type duration}})
in
    SetType

In GetCloseTime step I add function column, which selects rows from the table self, with the same date and later in time, and then selects minimal time. This will be next event time. You can add additional criteria if you need.

Another way is instead using List.Min make a sorted derived table and take its 1st row and value in column Time: {0}[Time]

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    SplitDateTime = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"Booking time", type text}}, "en-GB"), "Booking time", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Date", "Time"}),
    FilteredDoorside = Table.SelectRows(SplitDateTime, each ([Doorside] <> "-")),
    ChangedType = Table.Buffer(Table.TransformColumnTypes(FilteredDoorside,{{"Date", type date}, {"Time", type time}, {"User", type text}, {"Doorside", type text}})),
    GetCloseTime = Table.AddColumn(ChangedType, "Duration", (row)=>Table.FirstN(Table.Sort(Table.SelectRows(ChangedType, each [Date]=row[Date] and [Time]>row[Time]),{{"Time", Order.Ascending}}),1){0}[Time] - row[Time]),
    SetType = Table.TransformColumnTypes(GetCloseTime,{{"Duration", type duration}})
in
    SetType