0
votes

I am trying to extract all cell/range addresses appear with in a formula in a Google Sheets cell.

Formulas by nature might be very complicated. I tried many patterns, that works in web testers, but not on google sheets re2.

The following example shows two issues. Maybe I misreading the matching results, but as I understand the are 4 Matches.

Formula (ignore the logic):

=A$13:B4+$BC$12+$DE2+F2:G2

Regex:

((\$?[A-Z]+\$?\d+)(:(\$?[A-Z]+\$?\d+))?)

Expected result:

[A$13:B4,$BC$12,$DE2,F2:G2]

Here (if I am not misreading the results) it looks OK. I am not sure if the group matching displayed are also considered as matches as it is stated "4 matches, 287 steps"

However in google sheets returns all Match 1 results

[A$13:B4,A$13,:B4,B4]    

The other matches are ignored So I guess the question is how to convert the regex to re2 syntax?

Update: Following player0 comments, maybe I was not clear. This is only a simple example, to isolate other issues I have. This one is just a string containing addresses in few relative and absolute formats. However, I am looking for a wider general solution that will fit any possible formulas that might contain formulas and references to other sheets. For example:

=(STDEVA(Sheet1!B2:B5)+sum($A$1:$A$2))*B2

Expected results here is Sheet1!B2:B5,$A$1:$A$2,B2

This formula contains two formulas and reference to another sheet. Still ignoring here from Named Ranges and other formula possible references that I am currently can not think of. Also, the square brackets [] are irrelevant, it was just way to display the results, and actually is copied from Logs as it is all done within a script.

3
share a copy of your sheet with example of desired outputplayer0
What if you don't use any capturing groups? \$?[A-Z]+\$?\d+(?::(?:\$?[A-Z]+\$?\d+))? regex101.com/r/A5yKb5/1The fourth bird
@player0 Guess I was not clear enough. Please see my edit.OJNSim
@Thefourthbird As I am not a regex master, I don't fully understand the grouping usage, however I do understand it will prevent the group matching. Anyway, in google sheets still returns only the first match A$13:B4OJNSim

3 Answers

2
votes

try:

=INDEX(SUBSTITUTE(TEXTJOIN(",", 1, 
 IFNA(REGEXEXTRACT(SPLIT(SUBSTITUTE(FORMULATEXT(A3), "'", "♥"), 
 "+-*/^()=<>&"), 
 "(?:.+!)?[A-Z$]+\d+(?::[A-Z$](?:\d+)?)?|(?:.+!)?[A-Z$]:[A-Z$]+"))), "♥", "'"))

enter image description here


or longer:

=INDEX(SUBSTITUTE(TEXTJOIN(",", 1, 
 IFNA(IFNA(REGEXEXTRACT(SPLIT(SUBSTITUTE(FORMULATEXT(A3), "'", "♥"), 
 "+-*/^()=<>"), "(?:.+!)?[A-Z$]+\d+(?::[A-Z$](?:\d+)?)?"), 
 REGEXEXTRACT(SPLIT(SUBSTITUTE(FORMULATEXT(A3), "'", "♥"), 
 "+-*/^()=<>"), "(?:.+!)?[A-Z$]:[A-Z$]+")))), "♥", "'"))

enter image description here

0
votes

Seems, you can use

[A-Z$]+\d+(?::[A-Z$]\d+)?

See a demo on regex101.com.

0
votes

I figured out a better way without splitting, by using the /g flag. However, this works in a script and not by using Sheets internal regex function (ie REGEXEXTRACT), as I couldn't figure out how to format the regex string within a cell that will contain the /g flag and REGEXEXTRACT will accept as a valid regex.

Here's the code:

/* Find all predessesor cells of input range 
*/
function findPredecessor(rng){
 
  var formualaText = rng.getFormula();
  
  /* addMatchesRegex
  * supports all A1Notation addresses 
  * the 2nd regex after the | operator will match all column addresses (A:A, Sheet2!b:B, etc)
  * some NamedRanges with names like NameRange1 
  * Does not support - NamedRange with names including dot, not ending with digits 
  */
  var addMatchesRegex = /(([\w .'!]+)?(\$?[A-Z]+\$?\d+\b)(:(\$?[A-Z]+\$?\d+))?)|([\w .'!]+)?[A-Z]+:[A-Z]+/gi; 
     
  var addMatches = formualaText.match(addMatchesRegex);
  
  Logger.log("%s add matched: %s",addMatches.length,addMatches);
  
  /* fullMatchRegex
  *  modify addMatches to return also strings like
  * 1. SUM, IFERROR, etc - internal sheets functions.
  * 2. NamedRanges
  * 
  */
  var fullMatchRegex = /(([\w .'!]+)?([\$A-Z.\d]*)(:(\$?[A-Z]+\$?\d*))?)/gi; 
  
  // match regex with formula
  var fullMatches =  formualaText.match(fullMatchRegex);
    
  Logger.log("Full matches list: %s",fullMatches);
  
  var namedRangesAdd = analyzeMatch(addMatches,fullMatches);
    
  Logger.log("%s total predecessors: %s",namedRangesAdd.length,namedRangesAdd);
}



/* This function accepts the two regex matches list
*  and returns one unique list of all predecessor addresses
*  @param {Array} addMatches - All A1 notation addresses 
*                              plus some of NamedRanges 
*  @param {Array} fullMatches - All A1 notation addresses,All NamedRanges,
*                               Other irrelevent matches
*/
function analyzeMatch(addMatches,fullMatches){

  /*Expected 
    First parameter - holds all A1Notation addresses as well as NamedRanges that
    their name in the form of /[A-Z]+/d+
    NamedRange with name including dot(.) or does not contain digits will not
    be on the list
    Second Parameter - contains all first list matches, as well as all NamedRanges
    names and also irrelevant matches to be filtered like function names and empty string 
  */
  
  //Full Matched Addresses to be returned
  var mAddresses = [];
  
  //Remove duplicate addresses
  var uniqueMatches = 
      addMatches.filter((item,index)=>addMatches.indexOf(item)===index); 
  
  //Get all named Ranges in spread sheet
  var nr = SpreadsheetApp.getActive().getNamedRanges();
  
  // Loop Named Ranges arr 
  nr.forEach(function(item){
  
    /* Check if the name of the current Named Range
    * is included in matches
    * 1. first in addMatches list
    * 2. only if not found in the wider list */
    
    var name = item.getName();
    
    //Check if in addmatches array
    var i = uniqueMatches.indexOf(name);
    
    //Build A1Notation address of current NamedRange 
    var rng = item.getRange();
    var add = "'" + rng.getSheet().getName() + "'!" + rng.getA1Notation();    
    
    if (i > -1){
      
      //Add the address of curr NamedRange to final list 
      mAddresses.push(add);
      //Remove curr NamedRange from list
      uniqueMatches.splice(i,1);
      
    }else if (fullMatches.includes(name)){
      // Name found - add the address of the 
      //              Named Range to matched Addresses list
      
      mAddresses.push(add);    
    }
    
  });
  
  //Add all left matched addresses to final list  
  mAddresses.push(...uniqueMatches);
  
  return mAddresses;
   
}

What makes it a bit complicated are NamedRanges. This code will match and analyze and returns one list with all predecessors addresses including the addresses of the NamedRanges.