0
votes

I am having an issue in Google data studio. I am creating a case statements that looks for keywords in a few different dimensions and categorises them accordingly. This all works fine until I add a custom dimensions. The see statement is valid as far as Google is concerned but it returns nothing.

CASE 
WHEN CONTAINS_TEXT(PAGE, "Heart") THEN "Cardiology"
WHEN PAGE = "Diabetes.website.com" THEN "Diabetes"
End

The above works fine

CASE 
    WHEN CONTAINS_TEXT(PAGE, "Heart") THEN "Cardiology"
    WHEN PAGE = "Diabetes.website.com" THEN "Diabetes"
WHEN WEB_SUBJECT = "Hypertension" THEN "Cardiology"
END

The above is valid but is blank when used. Web_subject is a custom dimensions defined in Google analytics.

Any help greatly appreciate. This one has me stumped.

UPDATE: So changing the format of the custom dimension expression to be REGEX worked in returning results for that particular condition but it now does not show results for any others. I have a feeling it may because of the data sitting behind it all.

A little more context, the data source in this case is a roll up google analytics account, So it contains analytics from 10+ domains. What I am attempting to do is group together page views and users from different domains into subjects using various different dimensions. For example if a USER selects the filter 'Diabetes' I want to return all pageviews for: Domain A where page title contains x Domain B where customDimesnion 1 = Y etc

This is my current mess of a CASE statement:

    case
when contains_text(Page, 'biosimilars-confidence') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Oncologie).*') OR CONTAINS_TEXT(Page, 'oncology') OR contains_text(Page, 'rrmm-challenges') OR  contains_text(Page, 'oncologyhighlights2020') OR CONTAINS_TEXT(Page, 'ASCO') OR CONTAINS_TEXT(Page Title, 'cancer') OR contains_text(Page Title, 'Oncology') OR  contains_text(Page Title, 'ASCO') THEN "Oncology"
when CONTAINS_TEXT(Page, 'DOMAINB') OR contains_text(Page, 'eprint') THEN "Eprint"
when CONTAINS_TEXT(Page, 'diabetes') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Diabetes).*') OR contains_text(Page, 'hypoglycemia') OR contains_text(Page, 'glp1ras') OR contains_text(Page Title, 'Diabetes') OR contains_text(Page Title, 'hyperglycemia') OR contains_text(Page Title, 'diabetes') OR contains_text(Page Title, 'ADA') OR contains_text(Page Title, 'GLP-1') then "Diabetes/Endocrinology"
when CONTAINS_TEXT(Page, 'cardiology') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Cardiologie).*') OR contains_text(Page Title, 'Cardiology') OR contains_text(Page Title, 'Heart failure') then'Cardiology'
when contains_text(Page, 'gastro') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Gastro-enterologie).*') OR contains_text(Page Title, 'Gastroenterology') then "Gastroenterology"
when contains_text(Page, 'rheumatology') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Reumatologie).*') OR contains_text(Page Title, 'Rheumatology') OR contains_text(Page Title, 'adalimumab') OR CONTAINS_TEXT(Page Title, 'arthritis') OR CONTAINS_TEXT(Page Title, 'EULAR') OR contains_text(Page Title, "joint")  then "Rheumatology"
when CONTAINS_TEXT(Page, 'migraine') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Neurologie).*') OR contains_text(Page Title, 'Neurology') OR contains_text(Page Title, "Ataxia") OR contains_text(Page Title, "EAN 2020") then "Neurology"
when Page = 'DOMAINA.com' OR contains_text(Page, 'training') OR Page = 'ime.DOMAINA.com' then "Corporate site traffic"
when contains_text(Page Title, 'Gynaecology ') then "Gynaecology "
when CONTAINS_TEXT(Page Title, 'Dermatology') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Dermatologie).*') OR contains_text(Page Title, 'eczema') OR contains_text(Page Title, 'Laser hair removal') then "Dermatology"
when CONTAINS_TEXT(Page Title, 'COVID-19') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Coronavirus (COVID-19)).*') then "COVID-19"
WHEN CONTAINS_TEXT(Page Title, 'General practice') then "General practice"
when contains_text(Page Title, 'Haematology') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Hematologie).*') OR contains_text(Page Title, 'EHA25') then "Haematology"
when contains_text(Page Title, 'Mental health') OR CONTAINS_TEXT(Page Title, 'Psychology') then 'Mental health'
when contains_text(Page Title, 'penile length') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Urologie).*') then "Urology"
when contains_text(Page Title, 'ERS 2019') OR REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Longziekten).*') THEN "Lung diseases"
when contains_text(Page, 'noonan')Then "Genetics"
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Farmacie).*') then 'Pharmacy'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Heelkunde).*') then 'Surgery'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Huisartsgeneeskunde).*') then 'Family medicine'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Infectieziekten).*') then 'Infectious Diseases'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Overig).*') then 'Other/Uncategorized'
when REGEXP_MATCH(WEBSITE subject,  '.*(?i)(Voeding).*') then 'Nutrition'
else 'Other/Uncategorized'
end

I have anonymised some bits so if it looks strange that is why. It may well be I am going about this in completely the wrong way. One other thing to note, the domain names are stored in the PAGE dimension.

2

2 Answers

1
votes

Update (Expanded CASE)

0) Original Expanded CASE

It may be due to:

contains_text(Page Title, 'ADA')

Adding a Word Boundary around ADA may do the trick, ensuring that words that simply contain ADA are not captured into the respective WHEN clause

REGEXP_MATCH(Page Title, ".*(?i)(\\bADA\\b).*")

An alternative is to follow the two step process below:

1) Field_CONCAT

Create the following Data Source-level Calculated Field to CONCAT the fields into one single consolidated field:

CONCAT(Page, ", ", WEBSITE subject, ", ", Page Title)

2) New_CASE

Create the CASE statement below:

  • Field: Where Field_CONCAT represents the field created above;
  • | OR |: The | Pipe operators | serve as the RegEx version of OR
  • Escape Sequence: \\ is used to escape special RegEx characters such as (, |, ., etc;
  • Word Boundaries: Added a word boundary on either side of ADA (as well as ASCO) to ensure that text that simply contain ada are not part of the Diabetes/Endocrinology WHEN clause:
CASE
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(biosimilars-confidence|Oncologie|oncology|rrmm-challenges|oncologyhighlights2020|\\bASCO\\b|cancer).*") THEN "Oncology"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(DOMAINB|eprint).*") THEN "Eprint"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(diabetes|hypoglycemia|glp1ras|hyperglycemia|\\bADA\\b|GLP-1).*") THEN "Diabetes/Endocrinology"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(cardiology|Cardiologie|Heart failure).*") THEN 'Cardiology'
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(gastro|Gastro-enterologie|Gastroenterology).*") THEN "Gastroenterology"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(rheumatology|Reumatologie|adalimumab|arthritis|EULAR|joint).*") THEN "Rheumatology"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(migraine|Neurologie|Neurology|Ataxia|EAN 2020).*") THEN "Neurology"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(DOMAINA\\.com|training|ime\\.DOMAINA\\.com).*") THEN "Corporate site traffic"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Gynaecology).*") THEN "Gynaecology "
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Dermatology|Dermatologie|eczema|Laser hair removal).*") THEN "Dermatology"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(COVID-19|Coronavirus \\(COVID-19\\)).*") THEN "COVID-19"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(General practice).*") THEN "General practice"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Haematology|Hematologie|EHA25).*") THEN "Haematology"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Mental health|Psychology).*") THEN 'Mental health'
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(penile length|Urologie).*") THEN "Urology"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(ERS 2019|Longziekten).*") THEN "Lung diseases"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(noonan).*") THEN "Genetics"
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Farmacie).*") THEN 'Pharmacy'
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Heelkunde).*") THEN 'Surgery'
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Huisartsgeneeskunde).*") THEN 'Family medicine'
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Infectieziekten).*") THEN 'Infectious Diseases'
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Overig).*") THEN 'Other/Uncategorized'
  WHEN REGEXP_MATCH(Field_CONCAT, ".*(?i)(Voeding).*") THEN 'Nutrition'
  ELSE 'Other/Uncategorized'
END

Added a New Page to the Google Data Studio Report and a GIF to demonstrate:

Original POST

0) ELSE NULL

Note that the CASE statements in the question and this suggestion have not explicitly stated the ELSE clause, thus by default, ELSE values are treated as NULL.

1) Tweaked CASE

Tried out the CASE statement and it works as expected! One tweak to the initial CASE statement is the inclusion of a Logical Operator, OR:

CASE
  WHEN CONTAINS_TEXT(PAGE, "Heart") OR WEB_SUBJECT = "Hypertension" THEN "Cardiology"
  WHEN PAGE = "Diabetes.website.com" THEN "Diabetes"
END

2) Alternative CASE

In addition, you could also have a look to see whether the CASE statement below resolves the issue; it uses the REGEXP_MATCH function and ensures that fields Contain .* the respective values as well as adding a Case Insensitive Flag (?i) thus matching Hypertension, hypertension, HYPerTENsion, etc:

CASE
  WHEN REGEXP_MATCH(PAGE, ".*(?i)(Heart).*") OR REGEXP_MATCH(WEB_SUBJECT, ".*(?i)(Hypertension).*") THEN "Cardiology"
  WHEN REGEXP_MATCH(PAGE, ".*(?i)(Diabetes\\.website\\.com).*") THEN "Diabetes"
END

3) Next Steps and a GIF

If the above doesn't resolve the issue, could you elaborate with a screenshot(s) and provide some sample data (removing or replacing any sensitive information); feel free to add data to This Editable Google Sheet (also used in the Report and GIF below).

Google Data Studio Report and a GIF to elaborate on the above:

0
votes

So first of all @Nimantha thank you for getting me much further down the road on solving this and giving me a huge lesson in case statements that is going to help me massively going forward.

The problem as it turns out was that I have the 'Universal' analytics tag sitting on multiple sites implemented by different tag manager containers. I was defining custom dimensions in the tag containers to pass datalayer information through to google analytics. The problem is that i was passing no value through to the custom dimension to on any of the other domains. This caused the CASE statement to fall over as when a custom dimension value only existed on one domain, it only returned pages from that domain.

To fix this I went to all GA tags across all containers and matched up the custom dimensions, passing in (not set) if I have no actual value to give it. Now in using the CASE statement above by Nimantha I can see all of the pages starting to appear. Just unfortunate they are populating from now forward but at least it works!