0
votes

In powerBI I can import a table of form records. The form is from a web application and each time the form is updated, a new row is added to the table. Each form has a unique identifier. So the table has multiple rows for each unique identifier and a datestamp for each update.

Using group by in powerBI I can filter the table on the first or the last record for each unique identifier using min() or max() on the datestamp. However I want to filter on the second datestamp of each unique identifier.

I can't work out how to do it in powerBI so I thought import with a python script would be the solution and got halfway before getting stuck.

The process I have so far is : use pandas, create a dataframe from the table, group the table by unique identifier with a lambda function on the date column that selects the 2nd record.

import pandas as pd
df=pd.read_excel (r'C:\..\form_records.xlsx')
df=df.groupby('ID')['datestamp'].apply(lambda x: x.nsmallest(2).max())
print(df)

In jupyter, this outputs the ID column and the correct datestamp next to each ID in a column that doesn't have a header. Like so :

output_1

ID
REC0162259   2020-06-24 07:43:05
REC0162260   2020-06-24 07:44:00
REC0162261   2020-06-24 09:41:05
REC0162262   2020-06-24 08:13:29
REC0162263   2020-06-24 09:40:24
                     ...        
REC0162409   2020-06-24 16:23:30
Name: datestamp, Length: 149, dtype: datetime64[ns]

This looks promising but when I use the script as a data source in powerBI, no data tables are present :

Source Step

Navigation Step

DAX :

let
    Source = Python.Execute("import pandas as pd#(lf)df=pd.read_excel (r'r'C:\..\form_records.xlsx'')#(lf)df=df.groupby('ID')['datestamp'].apply(lambda x: x.nsmallest(2).max())#(lf)print(df)"),
    df1 = Source{[Name="df"]}[Value]
in
    df1

I don't know how to transform the new grouped data into a table in python so that it is available for import. If I don't do the groupby step, the original data will import perfectly as if I had done a powerBI excel import.

Note that the original data contains other columns apart from ID and datestamp but I don't necesarily need those for the analyisis.

1

1 Answers

0
votes

solved with ...to_frame() with help from the below posts :

https://technology.amis.nl/2019/10/11/convert-groupby-result-on-pandas-data-frame-into-a-data-frame-using-to_frame/

Converting a Pandas GroupBy output from Series to DataFrame

import pandas as pd
df=pd.read_excel (r'C:\..\form_records.xlsx')
grouped_df=df.groupby('ID')['datestamp'].apply(lambda x: x.nsmallest(2).max()).to_frame(name="second_update").reset_index()
print(grouped_df)