1
votes

I have a table called 'Table1' which has two columns being 'Name' and 'Date'.

In Power Query Editor, I want to create a custom column called 'Last Date' to find the most recent date when the given Name appeared last time.

For instance, B firstly appeared on 5/7/2019, then on 11/8/2019, and last on 17/9/2019. So the Last Date for B when the Date is 11/8/2019 is 5/7/2019, and the Last Date for B when the Date is 17/9/2019 is 11/8/2019. Please see below example.

enter image description here

I did some research online but got confused with custom function, VAR, Measure, MAXX, CALCULATE, FILTER, etc...

I am not familiar with DAX nor Advanced DAX Editor so if possible please provide detailed answer or clear steps of how to apply your solution.

Let me know if I did not make myself clear. Otherwise your prompt help is greatly appreciated!!

3

3 Answers

1
votes

A custom function might look something like:

let
    AddLastDateColumn = (someTable as table) as table =>
        let
            initialHeaders = Table.ColumnNames(someTable),
            sorted = Table.Sort(someTable, {{"Date", Order.Ascending}, {"Name", Order.Ascending}}),
            merged = Table.NestedJoin(sorted, {"Name"}, sorted, {"Name"}, "$joined", JoinKind.LeftOuter),
            lastDateColumn = Table.AddColumn(merged, "Last Date", each
                let
                    maxDate = [Date],
                    filtered = Table.SelectRows([#"$joined"], each [Date] < maxDate),
                    lastRow = if not Table.IsEmpty(filtered) then Table.Last(filtered)[Date] else null // Could use Table.Max, but data is already sorted.
                in lastRow,
                type nullable date),
            dropColumns = Table.SelectColumns(lastDateColumn, initialHeaders & {"Last Date"})
        in dropColumns
in
    AddLastDateColumn

If you save the above as its own query, you can then access it in other queries. For example, if you save the above as a query named AddLastDateColumn, you can then access it in other queries (as below):

let
    sourceTable =
        let
            nameColumn = {"A", "B", "C", "D", "E", "F", "G", "A", "B", "C", "D", "E", "F", "G", "A", "B", "C", "D", "E", "F", "G"},
            dateColumn = {#date(2019,7,1), #date(2019,7,5), #date(2019,7,14), #date(2019,7,23), #date(2019,7,24), #date(2019,8,1), #date(2019,8,5), #date(2019,8,10), #date(2019,8,11), #date(2019,8,17), #date(2019,8,23), #date(2019,8,25), #date(2019,9,3), #date(2019,9,4), #date(2019,9,13), #date(2019,9,17), #date(2019,9,23), #date(2019,9,27), #date(2019,9,28), #date(2019,10,6), #date(2019,10,9)},
            toTable = Table.FromColumns({nameColumn, dateColumn}, type table [Name = text, Date = date])
        in toTable,
    invokeFunction = AddLastDateColumn(sourceTable)
in
    invokeFunction

I was looking at the documentation for Table.NestedJoin (https://docs.microsoft.com/en-us/powerquery-m/table-nestedjoin). It seems to have a parameter named keyEqualityComparers:

An optional set of keyEqualityComparers may be included to specify how to compare the key columns.

I don't have time to look into its capabilities and the syntax it requires, but perhaps it could be used to more elegantly specify the JOIN criteria: "Name must match exactly. Right Date must be the greatest date that is less than Left Date".

In any case, I think the above function should do what you were after.

1
votes

Power Query and Power BI are two different tools.

Power Query is designed to process data. It uses language called "m". For example, if you need to import and merge files, fix bad data, etc - that's what you use.

Power BI is designed to analyze data. It uses language called "DAX". Most of the time, Power BI/DAX are useful to design interactive analytics - reports that respond to slicers, filters, etc.

Some people use DAX to enhance their data instead of Power Query - like adding a calculated column. Personally, I think it's a bad idea, but perhaps Power Query/m is too non-intuitive for them and DAX is easier. I will explain how to add a calculated column using DAX. If for some reason you prefer Power Query, please mention that in your question.

First, you must be in a main Power BI window, not in a Power Query window. Go to your data model, and select your table. On tab "model", click "add column". Enter DAX formula:

Last Date = 
  VAR Current_Date = Table1[Date]
  VAR Current_Name = Table1[Name]
RETURN
  CALCULATE( MAX(Table1[Date]), 
             Table1[Date] < Current_Date,
             Table1[Name] = Current_Name )

The formula will generate a new column with the previous date for each Name.

How it works:

  • When calculating the column, Power BI iterates the table record by record. For each record, we store its date in a variable "Current_Date", and its name in "Current_Name";
  • Then, we need to find a date that is: a) smaller than the current_date, and b) only for the current names. This is accomplished by calculating MAX date with 2 filters: date must be < then Current_Date, and name must be = Current_Name.

For example, for name "B", first highlight: it will first filter the table leaving only records where name = "B" (3 records). Then, it will further filter these 3 records to find dates < 11/08/2019, which is one record: 5/07/2019.

As a side note, I recommend to read at least one good book on Power BI/DAX, or take an online training class. This tool is not simple, and you will waste a lot of time if you try to learn it by trial and error.

0
votes

After being directed to the right direction by RADO, I searched a couple more similar Power Query cases using M language and built-in GUI, I have found a way to solve my question.

1) firstly sort my table ascending in the following orders: Name, Date;

2) add two index columns, one starts with 0 and the other starts with 1, and then merge the table itself matching index column 0 with index column 1;

3) expand the merged columns [Name.1] and [Date.1], and then add a custom column called "Last Date" with the following if function

if [Name]=[Name.1] then [Date.1] else null

4) Remove other columns just leave [Name], [Date] and [Last Date] column.

The above method is a bit tedious if I have a few tables that need the same operation. If anyone can provide a 'Custom Function' solution using a single query which can be applied to other tables that will be greatly appreciated!