I have 2 datasets, one large and one small. While functionality is most important here, efficiency matters as well, because the large dataset could be potentially tens of millions of records. Let's call my large dataset "Transactions" and my small one "Prices." Here's what I'm trying to do. In the transactions file, there are a bunch of values for 'Store'. For each 'Store', I would like to create a hash table of relevant 'Product', and for each unique 'Saledate', pull all product 'Price' into the output table, not just for the associated 'Product' but for ALL 'Products' found in the Transactions dataset for that 'Store'.
Here's a sample of the "Transactions" dataset:
Store Product SaleDate Price
A apple 1/1/2011 1.05
A apple 1/3/2011 1.02
A apple 1/4/2011 1.07
A pepper 1/2/2011 0.73
A pepper 1/3/2011 0.75
A pepper 1/6/2011 0.79
And here's a sample of the "Prices" dataset:
Product Saledate Price
apple 1/1/2011 1.05
apple 1/2/2011 1.06
apple 1/3/2011 1.02
apple 1/4/2011 1.07
...
pepper 1/1/2011 0.74
pepper 1/2/2011 0.73
pepper 1/3/2011 0.75
pepper 1/4/2011 0.75
pepper 1/5/2011 0.75
pepper 1/6/2011 0.79
...
mango 1/1/2011 2.40
mango 1/2/2011 2.42
...
And here's the desired output for this scenario(the asterisk denotes that the price was inserted from "Prices" into "Transactions" via lookup).
Store Product Saledate Price
A apple 1/1/2011 1.05
A pepper 1/1/2011 0.74 *
A apple 1/2/2011 1.06 *
A pepper 1/2/2011 0.73
A apple 1/3/2011 1.02
A pepper 1/3/2011 0.75
A apple 1/4/2011 1.07
A pepper 1/4/2011 0.75 *
A apple 1/6/2011 1.10 *
A pepper 1/6/2011 0.79
Basically, in pseudo-code, the idea is do loop through each store, look-up its list of distinct products and saledates, then find corresponding prices for those products and saledates in the prices data and insert them if they are not already there. In the example, since apple has a saledate of 1/1 in "Transactions", I need the saledate for pepper as well (from "Prices"). The same applies for 1/2, except vice-versa since the price is already there for pepper, but not for apple. 1/5 has a record in "Prices", but it's not needed, since it doesn't occur for either apple/pepper in "Transactions." For another store, different products exist, so pepper is not relevant at all, but mango is.
I've tried several approaches and can't get around the "double" look-up hangup that I think is necessary.
Here is a link to a previous question/answer related to this new question. The answer provides sample code to create the dummy tables. https://stackoverflow.com/a/36961795/214994