2
votes

I'm trying to build a REGEXP_EXTRACT query inside a CASE statement which isn't quite working out the way I want to:

Example data:

1234-ABC
1234-abc
1234-ABC-1a
1234
1234-abc-test-this

What I would like to to is catch the ABC|abc out of the query if present. If it's not present I would like to return a default value.

I found a page where I can see the regex possibilities for Google-RE2 RegEx.

I know how to build the case:

CASE
    WHEN REGEXP_MATCH(Eventcategory, '^([0-9])*-([a-zA-Z0-9])*$') THEN 'it matches 1234-ABC and 1234-abc'
    WHEN REGEXP_MATCH(Eventcategory, '^([0-9])*-([a-zA-Z0-9])*-([a-zA-Z0-9])*') THEN 'it matches 1234-ABC-1a and 1234-abc-test-this'
    ELSE "it matches 1234"
END

The problem comes in extracting the right part of Eventcategory.

What I've tried so far:

REGEXP_EXTRACT(Eventcategory, '-([a-zA-Z0-9])*-') // Then it was only still needed to remove the - at the beginning and end
REGEXP_EXTRACT(Eventcategory, '-([a-zA-Z0-9])*$') // if it was 1234-Abc
REGEXP_EXTRACT(Eventcategory, '(?=^([0-9])*-)((-[a-zA-Z])*)')
REGEXP_EXTRACT(Eventcategory, '((-[a-zA-Z])*)')
REGEXP_EXTRACT(Eventcategory, '(?=^([0-9])*-)((-[a-zA-Z])*)')
REGEXP_EXTRACT(Eventcategory, '(-([a-zA-Z])+)')
REGEXP_EXTRACT(Eventcategory, '(?=(^([0-9])*-))(-([a-zA-Z])+)')

Really hope someone can help me out since I don't have any options left at the moment.

1

1 Answers

2
votes

To make your pattern work, you need to use a single capturing group around the pattern part you need to extract, put the * after the character class and remove the $ anchor:

'^[0-9]*-([a-zA-Z0-9]*)'

Or, you may also make sure you match at least 1 char of each subpattern by replacing * (0 or more) with + (1 or more):

'^[0-9]+-([a-zA-Z0-9]+)'

See the regex demo.

Alternatively, you may use

^[^-]+-([^-]+)

It will match

  • ^ - start of string
  • [^-]+ - 1 or more chars other than -
  • - - a hyphen
  • ([^-]+) - Group 1 (your extracted value): 1+ chars other than -

See this regex demo.