1
votes

I have a website with thousands of directories which I am trying to find out which are the most popular through using the following field

REGEXP_EXTRACT(Page,'(/[^/]+)')

This is working great following my previous question here, however there is a random extra string in the URL which is impacting the figures.

e.g

  • sitename.com/directoryA returns the same page as
  • sitename.com/randomstring/directoryA

This means that when I run my field some of the views that should be attributed to the /directoryA/ are being attributed to /randomstring and my table reports as follows:

  1. /randomstring = 1,000 views
  2. directoryA = 800 views
  3. directoryB = 850 views
  4. ...

Does anyone know how to clean the data in Google Data studio so that /randomstring is removed from the URLs before I run my REGEXP_EXTRACT code?

1
Iā€™m voting to close this question because its not programming related may be better suited for webapps.stackexchange.com ā€“ DaImTo
If it is random, how can you differentiate it from directoryA and directoryB? What makes directoryA not random? ā€“ Diego Queiroz
Bad choice of words. The string isn't random, is the same every time ā€“ Benjamin Heap

1 Answers

0
votes

Based on the additional details provided by the Original Poster (Benjamin Heap) in the Comments:

The string isn't random, is the same every time

One approach is to use a REGEXP_REPLACE function before the REGEXP_EXTRACT, thus, the below will do the trick (replace /randomstring with the actual string):

REGEXP_EXTRACT(REGEXP_REPLACE(Page, "(/randomstring)", ""), "(/[^/]+)")

Editable Google Data Studio Report and a GIF to elaborate: