I am using Power Query in Excel and I need to calculate the duration on each "Door_side" using the Time column on a daily level for each individual user.
The data comes from a card based access system and is formatted as follows:
Date Time User_No Door_side
03/12 08:59 User_05 Outside
03/12 09:00 User_33 Inside
03/12 09:01 User_10 Outside
03/12 09:01 User_04 Outside
03/12 09:02 User_26 Outside
03/12 09:03 User_19 Outside
03/12 09:03 User_15 Inside
03/12 09:04 User_31 Inside
03/12 09:05 User_31 Outside
03/12 09:06 User_15 Outside
03/12 09:06 User_06 Inside
03/12 09:06 User_06 Inside
03/12 09:06 User_06 Inside
03/12 09:08 User_32 Outside
03/12 09:09 User_10 Inside
03/12 09:09 User_13 Inside
03/12 09:10 User_10 Outside
I tried the following:
- Sorted the Rows by Date, User and Time;
- Added Index column;
- Created Custom column named PreviousTime;
- Calculated Duration (Time - PreviousTime).
The full code for the above mentioned steps is:
let
Source = Table,
#"Sorted Rows" = Table.Sort(Source,{{"Date", Order.Ascending}, {"User_No", Order.Ascending}, {"Time", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "PreviousTime", each try
if List.AllTrue(
{[User_No]=#"Added Index"[User_No]{[Index]-1},[Date]=#"Added Index"[Date]{[Index]-1}
}
)
then try #"Added Index"[Time]{[Index]-1} otherwise [Time]
else [Time]
otherwise [Time]),
Duration = Table.AddColumn(#"Added Custom", "Duration", each [Time] - [PreviousTime], type duration)
in
Duration
This works on small data sets but causes functionality issues and completely fails on a larger amount of data. I am fairly new to Power Query and M so I just can't figure out what exactly from the custom column formula causes issues or how to approach this in another way.
I tried to keep the above code as part of my query and also to use it as a function but there is not much difference functionality wise between these two approaches. The processed table will be sent to the Data Model but I was hoping to obtain the duration in Power Query rather than in Power Pivot. A big thank you in advance!
To detail on the task a bit more I uploaded a reduced version of the data, for 3 users for the month of December. You can find it here: https://1drv.ms/x/s!AocQlL_KAzymgwhqiKxSL5JMZheL.
What I want to achieve is to calculate the duration between the timestamps based on user and date. As a plus I do not have users working past midnight so all timestamps for a specific shift will be within the same date.
An example of the desired outcome can be found within the workbook as well and looks like this (calculated in Excel):
Date Time User Door_side Duration
03/12 06:54 User_1 Outside
03/12 07:26 User_1 Inside 00:32:00
03/12 07:27 User_1 Outside 00:01:00
03/12 07:44 User_1 Inside 00:17:00
03/12 07:52 User_1 Outside 00:08:00
03/12 08:35 User_1 Inside 00:43:00
03/12 08:36 User_1 Outside 00:01:00
03/12 11:50 User_1 Inside 03:14:00
03/12 12:01 User_1 Outside 00:11:00
03/12 13:27 User_1 Inside 01:26:00
03/12 13:43 User_1 Outside 00:16:00
03/12 14:57 User_1 Inside 01:14:00
03/12 15:20 User_1 Inside 00:23:00
03/12 15:26 User_1 Outside 00:06:00
03/12 15:34 User_1 Inside 00:08:00
Because the data contains all users and multiple days I am attempting to do the calculations within tables grouped by Date and User.
I spent some time testing all 3 approaches presented below (List.Min, Table.FirstN & nested tables) and on a limited data set all of them do a great job.
However, when applied to a larger dataset (I have around 20000 rows for 1 month) the nested tables approach seems to be the fastest.
Thank you Eugene and Marc for helping and, more important, for teaching me something new.