6
votes

This question is about how Excel's COUNTIF function treats different data types when used as an array formula.

There are lots of good posts out there detailing how to use COUNTIF for tasks such as extracting unique values from a list, for example this post. I've managed to use examples from this and other posts to solve specific problems, but I'm trying to get a deeper understanding of array formulas in order to adapt my formulas to new needs.

I came across a peculiar behavior of COUNTIF. In general, Excel seems to treat strings as "larger than" numbers, so that the following examples are valid:

Cell Formula      Returns
=1<2              TRUE
="a"<"b"          TRUE
="a">"b"          FALSE
=1<"b"            TRUE

Now, suppose range A1:A6 contains the following data set:

1
2
3
A
B
C   

For each cell in this set, I want to check how many of all the cells in the set that are smaller than or equal to that cell (a useful technique in more complex formulas). I enter the following array formula in range B1:B6:

{=COUNTIF($A$1:$A$6,"<="&$A$1:$A$6)} (CTRL + SHIFT + ENTER)

Based on the examples above comparing numbers and strings (also illustrated in Column D below), I would expect the output shown below to look like Column C. However, the array formula returns the result shown in Column B, which suggests that strings and number elements are counted separately by arraywise COUNTIF.

Column A     Column B     Column C     Column D
1            1            1            A1<"C" = TRUE
2            2            2            A2<"C" = TRUE
3            3            3            A3<"C" = TRUE
A            1            4            A4<"C" = TRUE
B            2            5            A5<"C" = TRUE
C            3            6            A6<"C" = FALSE

So the question is how to produce the output in Column C? (EDIT: Just to clarify, I'm specifically looking for solutions that make use of COUNTIF's array properties.)

Any insight into why arraywise COUNTIF apparently behaves differently than the single-cell examples would also be much appreciated.

NOTE: I've translated the examples from a non-English version of Excel, so I apologize in advance for any typos.

PS. For a background, I ran into this problem when I tried to build a formula that would both extract unique values from a list with possible duplicates, and sort the unique values in numerical/alphabetical order. My current solution is to do this in two steps. One solution for how to do it in one step is proposed here.

4
I believe you may be using COUNTIF incorrectly.Tim Biegeleisen
The syntax {=COUNTIF($A$1:$A$6,"<="&$A$1:$A$6)} is actually extracted from a much more complex formula that behaves precisely as expected, and which as been used by others to solve similar problems. This post illustrates an intermediate step in that formula.Egalth

4 Answers

2
votes

The different behavior can easily shown if you compare

=COUNTIF($A$1:$A$6,"<=A")

with

{=COUNT(IF($A$1:$A$6<="A",1))}

The first will only get text values from $A$1:$A$6 because it is clearly text to compare and it is faster ignoring other values then. =COUNTIF($A$1:$A$6,"<=3") will only get numeric values from $A$1:$A$6 because of the same reasons. Even if the criterion would be a concatination with a cell reference, then the concatination would be the first process and would lead either to "<=A" or "<=3". So it is ever clear what to compare, text or numbers.

The second first needs an array of the comparisons, then performs the IF, gets so an array of 1 or FALSE and counts then. But the "A" could also be a cell reference. So it is not clear what to compare at the beginning and the first array has to compare all values in $A$1:$A$6.

So COUNTIF(S) and SUMIF(S) cant be used comparing mixed text and numeric data.

The solution is shown already by XOR LX.

Btw.: with your PS. For a background you should consider the following solution from an German Excel site: http://www.excelformeln.de/formeln.html?welcher=236.

In your linked example:

Formula in B2 downwards

{=INDEX($A$2:$A$99,MATCH(LARGE(COUNTIF(A$2:A$99,">="&A$2:A$99)+99*ISNUMBER(A$2:A$99),ROWS($1:1)),COUNTIF(A$2:A$99,">="&A$2:A$99)+99*ISNUMBER(A$2:A$99),0))&""}

In this solution the COUNTIF compares with >= so the biggest text or number will count lowest and so get the lowest position. All number positions are added with 99. So they are ever greater than all possible text positions. So we have a descended sorted array. Then, using LARGE, the list is created from the highest to the lowest position.

3
votes

First of all, excellently laid-out question, and on an interesting topic to boot.

I also raised an eyebrow when I first came across this behaviour of the COUNTIF(S)/SUMIF(S) functions. In their defence, I suppose we could construct situations in which we actually want strings and numerics to be considered separately.

In order to construct your required in-formula array, you will need something like:

MMULT(0+(TRANSPOSE($A$1:$A$6)<=$A$1:$A$6),ROW($A$1:$A$6)^0)

though note that the necessary transposition will mean that any set-up which includes this construction will require committing with CSE.

Regards

0
votes

I doubt that countif is the right function for what you want to achieve here.

try this (ctrl+shift+enter):

={SUM(IF(A1>=$A$1:$A$6,1,0))}

You will get

1
2
3
4
5
6

PS: CountIf is an basically an array function internally. Using it in another array function results into multiple array functions and their behaviour becomes complex. Array functions are best used with clear logical path.

As tested in Excel 2013, you will only get 1 in all results instead of what was proposed in Column B.

Currently, in the function provided by you, countif cannot figure out which cell to compare to which cell. Array functions expand ranges and then perform the provided action. Therefore, it is comparing each cell to same cell and resulting into 1.

0
votes

Try this FormulaArray in B1 then copy till B6:

=SUM(($A$1:$A$6<=$A1)*1)