In powerBI I can import a table of form records. The form is from a web application and each time the form is updated, a new row is added to the table. Each form has a unique identifier. So the table has multiple rows for each unique identifier and a datestamp for each update.
Using group by in powerBI I can filter the table on the first or the last record for each unique identifier using min() or max() on the datestamp. However I want to filter on the second datestamp of each unique identifier.
I can't work out how to do it in powerBI so I thought import with a python script would be the solution and got halfway before getting stuck.
The process I have so far is : use pandas, create a dataframe from the table, group the table by unique identifier with a lambda function on the date column that selects the 2nd record.
import pandas as pd
df=pd.read_excel (r'C:\..\form_records.xlsx')
df=df.groupby('ID')['datestamp'].apply(lambda x: x.nsmallest(2).max())
print(df)
In jupyter, this outputs the ID column and the correct datestamp next to each ID in a column that doesn't have a header. Like so :
ID
REC0162259 2020-06-24 07:43:05
REC0162260 2020-06-24 07:44:00
REC0162261 2020-06-24 09:41:05
REC0162262 2020-06-24 08:13:29
REC0162263 2020-06-24 09:40:24
...
REC0162409 2020-06-24 16:23:30
Name: datestamp, Length: 149, dtype: datetime64[ns]
This looks promising but when I use the script as a data source in powerBI, no data tables are present :
DAX :
let
Source = Python.Execute("import pandas as pd#(lf)df=pd.read_excel (r'r'C:\..\form_records.xlsx'')#(lf)df=df.groupby('ID')['datestamp'].apply(lambda x: x.nsmallest(2).max())#(lf)print(df)"),
df1 = Source{[Name="df"]}[Value]
in
df1
I don't know how to transform the new grouped data into a table in python so that it is available for import. If I don't do the groupby step, the original data will import perfectly as if I had done a powerBI excel import.
Note that the original data contains other columns apart from ID and datestamp but I don't necesarily need those for the analyisis.