2
votes

I'm trying to extract the product name from the URL between the 2 slashes using REGEXP_EXTRACT. For example, I want to extraxt ace-5 from the URLs below:

www.abc.com/products/phones/ace-5/
www.abc.com/products/phones/ace-5/?cid=dm66363&bid
www.abc.com/products/phones/ace-5/?fbclid=iwar30dpnmmpwppnla7
www.abc.com/products/phones/ace-5/?et_cid=em_367029&et_rid=130

I have a RegEx to extract the Domain Name but it is not something I'm actually looking for. Below is the RegEx:

REGEXP_EXTRACT(page,'^[^.]+.([^.]+)')

It gives the following result: abc

3

3 Answers

0
votes

Assuming that the product name would always be the fixed fourth path element, we can try:

REGEXP_EXTRACT(page, '(?:[^\/]+\/){3}([^\/]+).*')

or, if the above would not work:

REGEXP_EXTRACT(page, '[^\/]+\/[^\/]+\/[^\/]+\/([^\/]+).*')

Here is a demo for the above:

Demo

0
votes

Since I do not have the Same Page with my GDS, but I tried to recreate with my set of data source i.e pages from the google analytics.

Use may use the below which will get you all the records after two slash as per your requirement.

REGEXP_EXTRACT(Page,'[^/]+/[^/]+/([^/]+)')

You need to create a calculated column with this formula, once you have created this calculated column you might need to add an additional filter to remove those with the null value.

example Page: "/products/phones/ace-5/" The Calculated Column value will be "ace-5"

Just make sure this regex will only give you the extracted word after phones/, if you do not have any record after that it will give you null in return.

0
votes

The REGEXP_EXTRACT Calculated Field below does the trick, extracting all characters after the 3rd / till the next instance of /:

REGEXP_EXTRACT(Page, "^(?:[^/]+/){3}([^/]+)")

Google Data Studio Report and a GIF to elaborate