Google Analytics / Data Studio - Remove string from the middle of a URL before processing data

Question

I have a website with thousands of directories which I am trying to find out which are the most popular through using the following field

REGEXP_EXTRACT(Page,'(/[^/]+)')

This is working great following my previous question here, however there is a random extra string in the URL which is impacting the figures.

e.g

sitename.com/directoryA returns the same page as
sitename.com/randomstring/directoryA

This means that when I run my field some of the views that should be attributed to the /directoryA/ are being attributed to /randomstring and my table reports as follows:

/randomstring = 1,000 views
directoryA = 800 views
directoryB = 850 views
...

Does anyone know how to clean the data in Google Data studio so that /randomstring is removed from the URLs before I run my REGEXP_EXTRACT code?

I’m voting to close this question because its not programming related may be better suited for webapps.stackexchange.com — DaImTo
If it is random, how can you differentiate it from directoryA and directoryB? What makes directoryA not random? — Diego Queiroz
Bad choice of words. The string isn't random, is the same every time — Benjamin Heap

Nimantha Nimantha · Accepted Answer · 2021-03-02T07:34:26

Based on the additional details provided by the Original Poster (Benjamin Heap) in the Comments:

The string isn't random, is the same every time

One approach is to use a REGEXP_REPLACE function before the REGEXP_EXTRACT, thus, the below will do the trick (replace /randomstring with the actual string):

REGEXP_EXTRACT(REGEXP_REPLACE(Page, "(/randomstring)", ""), "(/[^/]+)")

Editable Google Data Studio Report and a GIF to elaborate:

Google Analytics / Data Studio - Remove string from the middle of a URL before processing data

1 Answers