0
votes

Question: looking for Google Sheets regex that captures all instances of a string between [t- ] and outputs to the neighboring column cell as an array, or some other delimiter between matches.

For the following string, I'm trying to extract all instances of text between [t- ].

A1:

Lorem Ipsum [t- is simply] dummy text of the [t- printing] and typesetting [c- industry], so [d- it would make] sense that

Expected Output is an array of all occurrences in a single column:

B1:

[is simply, printing]

Or output could be any delimiter of match occurrences

is simply | printing

Trying the following with a single text within [t- ] works fine, but for multiple instances it extracts everything between open [t- of first occurrence and ] of last occurrence:

=REGEXEXTRACT(A1,"\[t- (.*)\]")

Leading to:

is simply]! dummy text of the [t- printing

I've also tried multiple capture groups, but this only works if I'm sure there's only two instances of text between [t- ]— there could be n instances per row. Also it doesn't output results to an arrary in one column, but spreads across multiple columns:

=regexextract(A1, "(\[t- (.*)\]).*(\[t- (.*)\])" )


EDIT: I've received a couple answers with Regex that works for other tools/languages (e.g., PHP or Javascript) but not Google Sheets. Here's the Google Sheets Regex Syntax.

EDIT 2: The above sample string has other text inside brackets marked with other letters, e.g., [c- industry] and [d- it would make]. These should not be included. Only texted in [t- ] (with a "t-") should be returned.

2
Try \[t-([^]]*)\]. You can check and get an explanation here regex101.com/r/iemQDb/1. - Andrei Odegov
@AndreiOdegov this only gets the first occurrence - Growler
what about REGEXREPLACE and the (?:^|\])[^\[]*(?:\[t-|$) regex? check here regex101.com/r/WwFjcy/1. - Andrei Odegov
@AndreiOdegov this works better, but this also gets text inside brackets with other characters, e.g., [c- ], or [d- ]. It needs to only get texts within [t- ] and ignore replacing text when there's other brackets (like [c- ] or [d- ]) - Growler

2 Answers

4
votes

Similar to the technique used here,

  • REGEXREPLACE all [t-.*] to (.*)
  • Provide the resulting expression from above as the regex to REGEXEXTRACT
  • \Q..\E is used to escape other characters

    =REGEXEXTRACT(A1, "\Q"&REGEXREPLACE(A1,"\[t-[^]]+\]","\\E(.*)\\Q")&"\E")
    

Alternatively, With just REGEXREPLACE to delimit,

=REGEXREPLACE(A1,"(^|\])(.*?)(\[t-|$)","$1|$3")
=SPLIT(REGEXREPLACE(A1,"(^|\])(.*?)(\[t-|$)","$1|$3"),"|")

Replace all characters .* that

  • Start with ] or start of string ^
  • End with [t- or end of string $

with ]|[t-

2
votes

bulletproof solution:

="["&JOIN(", ", ARRAYFORMULA(REGEXREPLACE(
 QUERY(SPLIT(TRANSPOSE(SPLIT(A1, "[")), "]"), 
 "select Col1 where Col1 contains 't- '", 0), 
 "t- ", "")))&"]"

__________________________________________________________

=JOIN(" | ", ARRAYFORMULA(REGEXREPLACE(
 QUERY(SPLIT(TRANSPOSE(SPLIT(A1, "[")), "]"), 
 "select Col1 where Col1 contains 't- '", 0), 
 "t- ", "")))