Note: this isn't a regular "how does data validation work" post, so bear with me.
I'm trying to get dynamic cascading data validation working for columns in a table -- i.e. let each cell in e.g. column D have a range of (data validation list dropdown) options, the user selects one of them, then the cell next to it in column E shows all possible dependant options (looked up from some lookup table using the selection in column D as lookup key), the user selects one of those as well, etc..
I know how to set up such cascading data validation with named ranges and such, using formulas to create the cascading option lists. However, these rely on pre-configured known named ranges or tables per input cell which are used to deliver the available options, and thus only work for single (or a handfull of) input cells and not for all input cells in a whole table column which needs to be able to expand (add rows etc).
I already experimented with preceding each dropdown with a column showing the applicable options on that row (using formulas -- works fine, except for sheer size when more than x options need to be shown), but the only real user-friendly solution is to just filter the options in the actual dropdown list itself.
I could reserve x helper columns per dropdown column to store all available options for these lists (x being the max. nr. of options I'd ever expect to be available for all lists combined), and populate each of these with some RANK formula to extract all possible options in a contiguous range, and then use a formula to determine how many of these columns to use in the dropdown's list formula. Some of these lists are however quite long, and I have multiple of these constructs to add to my table, so this will fast become a big performance hog and a mess. I'm already using up to column 400 in this table (with lots of complex formulas in them), and Excel already starts moaning about it; adding an extra few hundred for these monster helper lookup columns just isn't an option.
So I already gave up using pure formulas, and went to trusty old VBA. I already have a nicely working system in place where:
- I intercept the sheet's OnChange and filter out changes to table cells,
- look if the changed cell forms the basis for one of the dropdowns, and if so
- just collect all options for each dependant dropdown in VBA itself, and
- replace the existing data validation lists of all follow-up dropdowns with the hardcoded list I just collected.
And when all was fine and done and I was quite pleased with the result, I replaced the test data in the config with all possible options, and the document just got corrupted on saving. After recovery on open, some of the dropdowns now had "N/A" as their dropdown list config. After some digging in the sheet's XML (you gotta love this .xlsm zip file structure instead of .xls), I found out that these hardcoded dropdown lists are limited to 255 chars max... making this option instantly useless for me.
I could use hidden helper option columns 2.0; I now do not need the complex lookup formulas in them but can let VBA list all available options in there, and then let it base the dropdown formula on the used column range. But like I said I'm already stressing Excel enough with my table's size, so adding a few hundred extra columns really isn't an option, however simple they are. I could also have a separate 'scratch pad' sheet per dropdown column, and let VBA write the options in there (sheet X for dropdown column X, write options for cell X2 on row 2 of sheet X in the columns), but that's just too ugly of a hack for me to accept right now.
Another option is to ditch Excel's dropdown validation lists and dynamically overlay them with Forms or ActiveX combobox controls (cell select event = VBA overlays a combobox control over it, adjusts it's available options, sets it's linked cell to the source table cell, and sets the focus to that). These are far superior in user friendlyness anyway and seem perfect, but I already used such a system in another extensive Excel application I made in years past, and it only caused strange intermittant Excel hangups, crashes and file corruptions, after which I had to scrap the system and revert to regular data validation again.
A last option is to add a VBA data entry userform which pops up if the user e.g. tries to alter the values, but then why have an Excel table in the first place... I could then better just make a regular Windows app for it in .Net or such.
Any insight into other out-of-the-box options is highly appreciated! (Siddharth, are you there? :) )
Update
There just isn't any satisfying option it seems, so I've created a convoluted VBA solution instead. In case anyone else needs to add such a solution as well, I've added the general design below (no code since that would just be too much to list here);
- There's one main table holding the (for now) 9 dropdown columns, divided over 3 tiered cascading branches.
- There's 3 lookup tables which mirror these 3 paths, and which hold the configuration of which option combinations are available (for now unnormalized lookup tables, but this is my next challenge).
- I've added one config table which describes these three paths;
- column 1 lists the main table's dropdown columns,
- column 2 lists the lookup table to use for that path, and
- column 3 lists that lookup table's corresponding columns.
- I've added a "Dropdown cache" scratchpad sheet to dynamically store available dropdown options.
- I've handled OnChange for the main table's cells, looking if dropdown cells change that are listed in one of the cascading paths. If so, it:
- Evaluates which values are entered into the dropdown columns preceding (and including) this changed dropdown column,
- Goes through all the rows in the relevant lookup tables, finding unique matches given these preselected values using a matching algorithm (still a W.I.P.),
- Alters all follow-up dropdowns to use these available options.
- All available options for a single dropdown cell are listed on a single row assigned to that dropdown on the dropdown cache sheet. They are handed out on a first-come-first-serve basis.
- When a dropdown's option list needs to change:
- If the dropdown's data validation is already set I find it's cache row by looking at the data validation formula; if not, I take the next available cache row (basically UsedRange.Row + 1). By looking at the data validation formula I'm free to delete rows from the cache sheet, since Excel will patch up the dropdown data validation formulas to now point to the correct rows.
- The first cell on that row is an "in use" indicator formula linking back to the dropdown cell itself. If the original dropdown cell (table row) is deleted, the formula will automatically transform into a #REF!, indicating that cache line is now stale.
- All options are listed from column 2 onwards, discarding any excess old options if there's less options now than previously.
- The dropdown's data validation formula is set to use the newly set option range.
- If there's only one option in the list, this one is pre-filled into the dropdown as well (since we're using VBA anyway, let's add some user friendlyness as well).
- I've handled OnSave so that I can tidy up the dropdown cache sheet;
- I pre-scan every cell in the first column (link back column) of the cache sheet, and look if it has become an #REF! error, which happens if that dropdown's table row has been deleted.
- All to-be-deleted cache rows are batched in as large as possible row ranges, which I delete in batches.
All in all a nicely working solution, but a pity it had to come to this, and I wonder how it'll perform once the file will be used for real with lots of rows added to the table...
And then I found that when sorting the table's rows, the data is sorted nicely, but the now unique data validation formulas don't follow the data... and no Change event is fired either to be able to rectify anything... and the cell references from the cache sheet back into the table also don't follow the cells they represent... thank you Microsoft!!!
So the design is now updated to:
- The range to use for each dropdown is now listed in the cell preceding the dropdown instead of in the dropdown's own validation formula. This does sort along with the data.
- The dropdown data validation formula is now set to
=INDIRECT(<cell before this column>). - The linkback cell used on the cache sheet... I don't know how to fix that...
And using this volatile formula this has as a consequence that Excel asks to save changes whenever someone even only hints at touching these dropdowns...
Gotta love that Excel.