25
votes

I am trying to count the number of times a sub-string appears within a column of string data in Excel. Please see the below example.

The column of string data (tweets) looks like this:

   A
1  An example string with @username in it
2  RT @AwesomeUser says @username is awesome

The column with "substrings" (Twitter screen names) looks like this:

   B
1  username
2  AwesomeUser

I want to use a formula to count the number of times that a substring from B1, B2, etc. appears in the strings in column A. For example: a formula searching for B1 would return "2" and a search for B2 would return "1".

I can't do it this way:

=COUNTIF(A:A, "username")

because COUNTIF only looks for strings, not substrings. This formula would always return "0".

Here's a formula I thought might do it:

=SUMPRODUCT((LEN(A:A)-(LEN(SUBSTITUTE(A:A,"username",""))))/LEN("username"))

Unfortunately, I have 16,000 entries in column B and tens of thousands in A, so counting characters won't work even on a high power PC (also, the result returned by the function is suspect).

I thought about using:

=COUNTIF(A:A, "*username*")

but COUNTIF requires a string with the star operators; I need to use cell references due to the volume of data.

My question: does anyone know how I can use a formula for this? If using COUNTIF, how do I get a cell reference in the conditional part of the statement (or use a function to substitute the string in the cell referenced within the conditional part of a COUNTIF statement)?

I know that I could parse the data, but I would like to know how to do it in Excel.

3
Note that the SUMPRODUCT formula you quote does a slightly different thing to Chris' COUNTIF function. COUNTIF will only count each cell once at most, so each cell either "scores" 1 or zero. The SUMPRODUCT formula will count ALL instances or "username" even if it appears more than once in any individual cell - if you want to use that formula you should restrict the range rather than using the whole columnbarry houdini

3 Answers

43
votes

You are nearly there, use

=COUNTIF(A:A, "*"&B1&"*")

Caveat:
This counts the number of cells in A:A that contain the string one or more times. It does not count the total number of instances of the string.

3
votes

A very Simple Excel Solution for occurrence of x-string ("ABC") into y-string ("ABC 123 ABC 23dfABC"):

  1. Identify length of y-string. Ly = len(y)
  2. Substitute x-string by x1-string (append only 1 char of any supported character to x-string. e.g. "ABC"=> "ABC_" ) and calculate length of new string y1-string. Ly1 = len(substitute(y,x,x1)).
  3. No of occurrence of x-string) in y-string = Ly1-Ly

Number of Occurrence: len(substitute(y,x,x1)) - len(y)

Nomenclature for Understanding: x-string: "ABC" x1-string: "ABC_" y-string: "ABC 123 ABC 23dfABC" y1-string: ""ABC_ 123 ABC_ 23dfABC_"

0
votes

Finding actual occurrencies via TEXTJOIN function

Using TEXTJOIN (available since version 2019 or MS 365) allows to get the actual number of any occurrencies in one or more columns (column A:A or e.g. even columns A:B).

Thus it avoids the Caveat of the COUNTIF function

"to count only the number of cells (in A:A) that contain the string one or more times. It does not count the total number of instances of the string."

Three Methodical steps

The formula shown (assuming a named search term MySearch) below executes only three logical steps to get the actual number of findings:

  • 1 join all filled cells in given column(s) via TEXTJOIN,
  • 2 replace the search terms in string 1 with a term length reduced by minus 1 and
  • 3 subtract the length of string 2 from length of string 1.
=LEN(TEXTJOIN("|",TRUE,A:B))-LEN(SUBSTITUTE(TEXTJOIN("|",TRUE,A:B),MySearch,LEFT(MySearch,LEN(MySearch)-1)))

MS Help TEXTJOIN function