1
votes

I am using Google Datastudio to make a CASE statement to take a multi-words string and split it out into categories. I was asked to use REGEXP_MATCH (nothing else, I know contains function would be easier).

I need a solution to match the following words:

HouseBrochure 
home brochure 
HomeBrochure 
house brochure 
Bathroom brochure 
Bathroombrochure 
FloorBrochure 
floor brochure

To complicate matters, these words come in via a website request system, meaning people can request a house, bathroom and floor brochure in one request. When such requests reach my server, it compiles into a list(string) which looks like this:

# (with the pipes included)
HouseBrochure|Bathroom brochure|floor brochure 

This is just an example of 1 request, there are many variations and multiple requests that come through (I've also only included a few of these brochures, there are many more)

I need to separate out all the house brochures, all the bathroom brochures and all the floor brochures etc, so I can count how many requests have been made for each brochure.

Being new to Regex, I have a basic understanding but nowhere near advanced.
My current attempt in Data studio looks like this:

CASE
WHEN REGEXP_MATCH(Event Label,'^.*(HouseBrochure.*|home brochure.*|HomeBrochure.*|house brochure.*).*$') THEN 'Home Brochure'
END

This is just for the home brochure, yet it's not working, can someone help?
Also, as an FYI Datastudio uses REG2

1
Thanks for the reply, I thought i might not have explained it perfectly well so I can clarify a few things hopefully. the string i gave above was an example of a request type - the user may request multiple brochures at the same time. when i extract it will look like this: Home brochure|HomeBrochure|house brochure - 10 HomeBrochure - 3 house brochure|BathroomBrochure|HomeBrochure - 4Newbie_Regex
Meaning 4 people requested both a house and bathroom brochure, and 10 people requested a home, home and house brochure. I need to be able to run a count of people requesting a bathroom brochure a house brochure etc. In the above example, I would count 10 bathroom brochures and 17 HomeBrochures. Hopefully this makes sense and hopefully there is a solution to split out each brochure and run a count.Newbie_Regex

1 Answers

0
votes

My approach would be:

  1. convert everything to lower case (avoid messing with upper/lower case differences)
  2. Use regex to replace variations with base form:

e.g.

(house|home)\s*brochure

replace with

HomeBrochure

Test here.

  1. Do some counting as needed, using just the base keywords.