I am trying to replicate the Google Analytics UTM builder in Google Sheets. I have a formula I use elsewhere that works on just under 50% of the use cases but needs some additional regex adding to it.
=ARRAYFORMULA(IF(ISBLANK(A2:A),"",CONCAT(REGEXREPLACE(A2:A, "(/[^/]+)(?:/(?:\?|ref=)[^/]*)?$", "$1/"),"&utm_source=cpc&utm_medium=mysite")))
Here is a Google Sheet showing the examples and result in column D. The formula with regex in it is partially working on the supplied data. A copy can be made with File > Make a Copy.
Having unminified the JS file used on the Google Dev Tools page there appears to be a regex function that I think Google is using to perform the string builder, but the format is different to how it is used in Google Sheets.
f = function(e) {
var t = e.replace(".", "\\.");
return h("^(?:https?://)?(:?" + t + ")(?:$|[/?#])")
},
If you have a look at rows 5 to 8 in the Google Sheet, it might just be a tweak needed to the existing formula to cope with adding the string to cells with file extensions and performing no action if a UTM string is already present.
EDIT The existing formula containing regex adds the string "&utm_source=cpc&utm_medium=mysite" to the end of the cells The list below shows the ORIGINAL url, with the REQUIRED result afterward
https://www.aol.com/?utm_source=cpc&utm_medium=mysite
(Standard URL with no file extensions, so a / is added and the UTM string appending. STATUS: WORKING)
https://aol.com/?utm_source=cpc&utm_medium=mysite
(Standard URL with no www or file extensions, so a / is added and the UTM string appending. STATUS: WORKING)
https://www.aol.com/?utm_source=cpc&utm_medium=mysite
(Standard URL with no file extensions, but a ending / is present, so this is not added and the UTM string appended. STATUS: WORKING)
https://www.aol.com/index.html
https://www.aol.com/index.html?utm_source=cpc&utm_medium=mysite
(File extensions exist, so by adding a / to the end along with the UTM string appended it breaks. STATUS: FAILING. It required no / on file extensions)
https://www.aol.com/index.php?utm_source=cpc&utm_medium=mysite
(Another type of file extension exists(of which there will be hundreds), so by adding a / to the end along with the UTM string appended it breaks. STATUS: FAILING. It requires no / on file extensions)
http://www.asos.com/puma/puma-suede-classic-black-trainers/prd/3272954
(This string uses a product ID, so by adding a / to the end along with the UTM string appended it breaks. STATUS: FAILING. It required no / on Product IDs)
(The UTM string already exists, so it gets added on again. STATUS: FAILING. If a UTM exists, we need to ignore it and leave the cell as it is)
h("^(?:https?://)?(?:" + t + ")(?:$|[/?#])"), noth("^(?:https?://)?(:?" + t + ")(?:$|[/?#])"). The non-capturing group syntax is(?:....). Well, I do not get it why that code can help. Please put some strings with expexcted output in the question itself. - Wiktor Stribiżew