1
votes

I've been working on a large data set and ran into what looks to be a very odd Excel bug.

I have isolated the problem down to a CountIf formula with a partially fixed range that is used to check whether an item is a duplicate in the rage above it e.g. =COUNTIF(C$7:C17,C17) / =COUNTIF(C$7:C18,C18) copied down the full data set.

When copied down to a large number of rows (say 10,000) the formulas paste and and calculate fine, it is when you try and do anything else with them that the bug occurs. e.g. if you try and Delete / Clear all / copy and paste values on all the cells, the cell progress bar comes up and moves incredibly slowly for what should be a simple process.

If you use a CountIf for the whole range, clearing the value of the cells is pretty much instant. A SumIf for the same partial range also deletes immediately.

You should be able to replicate this yourself by using =RANDBETWEEN(0,10000) for 10k rows, copying and pasting values for the random numbers to isolate them as being the issue and applying the partial range countif vs full range countif, then selecting the partial range cells and hitting delete. Note that the issue still arises if calculation is set to manual.

For reference, I am using Excel 2013 64bit on Windows 7 64 bit.

If anyone is up for confirming the bug or knows about an existing bug and potential solutions, it would be much appreciated.

Edit: Steps to reproduce the issue

  1. Create a new file (save and exit other Excel files!)
  2. To create the data to count, in A2 enter =RANDBETWEEN(0,10000) if you have a slow CPU or RANDBETWEEN(0,20000) if you have a faster CPU
  3. Copy A2 down to row A10k or A20k.
  4. Copy and paste values for column A
  5. In B2, enter =COUNTIF($A$1:A2,A2) and copy down
  6. Once sheet is fully calculated, select all the CountIf cells in column B and use the Clear All function. This is the stage where Excel seems to freeze for 2 minutes for 10k rows for me (have an i7 processor). If it is instant, then let me know what version of Excel and windows you are using.
3
are you actually asking something here? in any case, please report a bug to MicrosoftAprillion
Can you be extremely specific in how to reproduce this? I tried to follow your steps and I cannot reproduce anything that is "bug" note worthy. Took my cp no time at all to copy all 10k rowschancea
Aprillion - I'm trying to identify if this is reproducible by others and if it is, whether there other formulas that can achieve the same thing without creating the issue.jimwinz
chancea - I have edited in the specific steps to recreate the issue. Let me know if you need anything else.jimwinz
Alright I got my Excel to freeze for 30s~1m following those steps...I used 20k everythingchancea

3 Answers

4
votes

As Byron suspected the time is taken by Excel updating the dependency trees.

If you switch off the dependency tree maintenance by setting Workbook.ForceFullCalculation to true then clearing the 20000 COUNTIF cells is virtually instantaneous.

Note that using this option makes every calculation a full calculation: every formula becomes volatile and the status bar always shows calculate.

0
votes

Edit:

Following Charles' excellent answer, which forces all formulas in a sheet to be volatile for the session, I thought I would try rebuilding the CountIf with a volatile offset function to define the range, and sure enough it worked. So the formula I used was: =COUNTIF(OFFSET(A2,-ROW(A2)+2,0):A2,A2)

Whether it is better to have a non volatile function that you need to turn on ForceFullCalculation to alter (delete, past values etc.) vs having a large number of volatile formulas will be up to you.

Original post:

OK, so I have been trying a few other formulas to see if I could overcome the issue.

Strangely, using =SUMIF($A$1:A2,A2,$A$1:A2)/A2 would occasionally work, clearing all the data in column B instantly, but was not consistent and would often freeze like the =COUNTIF($A$1:A2,A2) formula used in the original post.

I also tried some other formulas, such as =SUMPRODUCT($A$1:A2*($A$1:A2=A2))/A2 as well as the same configuration using Sum in an array formula but never got an instant clear.

I also tried copying blank cells and posting them over the formula cells, but that still froze for the same 30s - 60s that clearing them did. You could cut and paste and that would be fine, but trying to clear would result in the freeze. Deleting the entire column also did not speed up the process. I also tested Excel in safe mode, to no avail.

Given the lack of success, I am guessing that Byron's comment on Excel updating the internal model after each cell is cleared due to the different ranges for each formula is causing the freeze - though I still don't really get why pasting the formulas in the first instance would take a fraction of the time as clearing / deleting them!

As such, I can only conclude that irrespective of the reason, there does not appear to be a formulaic solution that does not cause the freeze, so I will have to use pivot tables / remove duplicates.

I appreciate everyone who tried to replicate the issue - for whatever it's worth, it looked like 3 of use had the issue and 1 did not.

-2
votes

This is caused by your use of the RANDBETWEEN function. I was unable to replicate your results after I used static data in place of this formula for 1 million rows.

The problem is that when you clear the data from the cells it does not happen for all of the cells at the same time. Excel has to step through each one and update a few things(Value of the cell, formatting, and formula calculation chain) and if anything in the formula chain has changed then all of the formulas recalculate and upon every recalculation all of the RANDBETWEEN values also get updated which causes another calculation of all of the COUNIF formulas that are affected.

So, for 20k RANDBETWEEN values and 20k COUNTIF formulas you get about 6 million calculations and up to 18 million writes to disk(depending on caching) when you try to delete all of those formulas.

So, no this is not a bug it is simply an unintended consequence of using the RANDBETWEEN function.