3
votes

I have two datasets, each in 3 separate columns, within the same sheet.

The 1st data set contains data on invoices, and the 2nd contains data on the individual items that are included on the invoices.

  • 1st data set (Invoices): CustomerNumber (long), InvoiceDate (date), InvoiceAmount (double)
  • 2nd data set (Items): CustomerNumber (long), ItemPurchaseDate (date), ItemCost (double)

There is no unique identifier to relate the two data sets.

General rule is that:

  • Items where CustomerNumber=X, and ItemPurchaseDate < InvoiceDate should be related to Invoice where CustomerNumber=X and InvoiceDate > ItemPurchaseDate, but InvoiceDate < nextInvoiceDate for the same CustomerNumber (if there are multiple invoices)

(Try to imagine a situation where you're trying to match the items to nearest invoices.)

For example like this

enter image description here

So, not all items can be related to an invoice, and sometimes we will not be able to find any items for an invoice.

I wrote code, but it is inefficient as it will go through all the items for all the invoices. In my case, there are 9000 invoices and 260k items.

But to demonstrate (I've left out the bits that are not important):

For Each InvCustNo In InvList 'look into every Invoice
    For Each ItemCustNo In ItemList 'look into every Item
        'check if we're looking at the new client,
        ' if yes - reset the date holder
        If prevInvCustNo <> InvCustNo Then
            prevInvDate = 0
        End If
        InvDate = InvCustNo.Offset(0, 1).Value 'define the InvoiceDate
        ItemDate = ItemCustNo.Offset(0, 1).Value 'define ItemDate
        If InvCustNo = ItemCustNo And InvDate > ItemDate And _
            InvDate >= prevInvDate And _
            prevInvDate <= ItemDate Then 'perform comparison
            TempSum = TempSum + ItemCustNo.Offset(0, 2)
        End If
    Next ItemCustNo
    prevInvDate = InvDate 'update the date-1
    prevInvCustNo = InvCustNo 'update the client-1
    InvCustNo.Offset(0, 7).Value = TempSum 'print values
    TempSum = 0 'reset sum
Next InvCustNo

The above code should do the work, but not on the amount of data I have, for two main reasons:

  • it will go through 9.000 * 260.000 rows = 2.34 billion operations
  • it will write 9000 results, one at a time

Two questions came to mind

  • can I write all the results in one go? (Arrays?)
  • can I limit my matching somehow, so that I don't do 2.34 billion operations (I guess with sorting I can)

Any thoughts & examples (which I could use as starting points) would be very helpful.

1
no unique identifier that could relate two data sets What about customer number? - findwindow
have you thought about MS Access and queries to solve this issue? as @findwindow points out you do have a unique identifier - Scott Holtzman
@ScottHoltzman Sorry, maybe my question is not clear enough. I don't have an unique identifier, as CustomerNumber is repeated on many invoices and many items. Solely by using the CustomerNumber I can't relate the Items with Invoices. - Matej Jurkić
I see your point, but have you thought about Access and the power of SQL? Since you do have unique customer identifiers and you can relate the dates together given your logic? - Scott Holtzman
@ScottHolzman I could use Access and SQL but the idea is to do it in Excel as some additional calculations & formatting will have to be made for which SQL would not be suitable. - Matej Jurkić

1 Answers

2
votes

I have a non-vba solution that I think may work for you.

Given the data in the screenshot (which mirrors your example)

enter image description here

Ensure that the data in columns A:C only is sorted

  • A to Z by CustomerNumber
  • Then Newest to Oldest by InvoiceDate

Then place the following formula in G1 (and drag down):

=IF(E2>MAX(IF(A:A=D2,B:B)),"No Invoice Yet",INDEX(B:B,(MATCH(E2,IF(A:A=D2,B:B),-1))))

as an Array (Press Ctrl + Shft + Enter) instead of just Enter.

The formula works in the following way

  1. Compares the ItemDate to the Max InvoiceDate for the given customer number. If the ItemDate is newer than the latest invoice date, it returns "No Invoice Yet"
  2. Otherwise It looks for the InvoiceDate that is closest to the ItemDate (think next possible InvoiceDate after ItemDate), for the respective customer number, and returns this date.