1
votes

EDIT

Workable File: https://docs.google.com/spreadsheets/d/1IhxKqnj62vssttZA73hF-S_TvynQNPdtSyb7wWq3Kpo/edit?usp=sharing

Screenshot: http://i.imgur.com/d25R8lc.png

OBJECTIVE

I am currently creating a pricing model for CUSTOMERS, who order specific PRODUCTS, and CONFIGURATIONS of said product. The end result is to SUM SALES and LOOKUP a table to place the customer into a category.

APPROACH

  • 500,000+ Rows of data
  • SUMIF: Sum Range = SALES; Criteria = (CUSTOMER ID, PRODUCT ID, CONFIGURATION)
  • VLOOKUP: Take SUMIF value and partially match to PRICING TIER table, return Pricing Tier (e.g "Silver Tier", "Gold Tier", "Platinum Tier")
  • Once pricing tiers have been defined, create a pivot table for easy filtering

PROBLEM(S): Processing time for rows is 30-60 minutes. I've tried breaking out the SUMIFS statements from the VLOOKUP to calculate them independently, but had no measurable success. I've also tried making SQL queries to the server (using DSUM) to reduce the local workload, but this also had no positive effect in processing time.

QUESTIONS

  1. How can I efficiently process large amounts of information and properly categorize CUSTOMERS based off of their SALES, PRODUCT ID, and CONFIGURATION criteria?
  2. I've been considering doing this in VBA, but am not sure whether or not VBA is actually more efficient than a cell formula.
  3. Would add-in utilities offer any solution (e.g power pivot or power query)?
1
With 500,000 rows of data, array formulas are not the way to go. VBA would probably be best. But it is very difficult to give you a more specific answer without seeing a sample of the data, the expected outcome and the actual formula that is currently working. - Scott Craner
Noted. Would a file upload be preferred, or would a screenshot suffice? - jonplaca
Preferably a mock up of the data as text, barring that a screenshot. - Scott Craner
@ScottCraner: please see attached: Workable File: docs.google.com/spreadsheets/d/… Screenshot: i.imgur.com/d25R8lc.png - jonplaca
My guess is you have other formulas in the table that are causing the time. Even with 500,000 data points, this should not take 30 minutes to calculate. - Scott Craner

1 Answers

2
votes

With 500 thousand rows I would consider using Access or another database application. It's just a lot of data for VLOOKUPs and SUMIFs and even Excel VBA is going to be slow if you do tons of comparisons.

SUMIFs and VLOOKUPs are very easy to replicate in queries, and will be much much faster.

If using ACCESS, try using this SQL Query:

SELECT [Customer ID],[Product ID],[Configuration],[Pricing Tier] 
FROM 

    (SELECT [Data Table].[Customer ID], [Data Table].[Product ID], 
    [Data Table].[Configuration], Sum([Data Table].Sales) AS [Sales Sum]
    FROM [Data Table]
    GROUP BY [Data Table].[Customer ID], [Data Table].[Product ID], 
    [Data Table].[Configuration])

AS T,
[Pricing Table] P
WHERE [Annual Sales] = (SELECT Max([Annual Sales]) 
                       FROM [Pricing Table] 
                       WHERE [Annual Sales] <= [Sales Sum]);

This should (hopefully) do the trick. Let me know!