1
votes

I'm trying to extratct in Tableau the first occurance of part of speech name (e.g. subst, adj, fin) located between { and : in every line from column below:

{subst:pl:nom:m3=18, subst:pl:voc:m3=1, subst:pl:acc:m3=5}
{subst:sg:gen:m3=5, subst:sg:inst:m3=1, subst:sg:gen:f=1, subst:sg:nom:m3=1}
{subst:sg:nom:f=3, subst:sg:loc:f=2, subst:sg:inst:f=1, subst:sg:nom:m3=1}
{adj:sg:nom:m3:pos=2, adj:sg:acc:m3:pos=1, adj:sg:acc:n1.n2:pos=3, adj:pl:acc:m1.p1:pos=3, adj:sg:nom:f:pos=1}
{adj:sg:gen:f:pos=2, adj:sg:nom:n:pos=1}
{fin:sg:ter:imperf=5}

To do this I use the following regular expression: {(\w+):(?:.*?)}$. Unfortunately my calculated field returns only Null's:

Screeen from Tableau

I checked my regular expression on regex tester and is valid:

Sreen from regex101.com

I don't know what I'm doing wrong so if anybody has any suggestions I would be greatfull.

1
Try ^\{(\w+):.*\}$ - Wiktor Stribiżew
Also doesn't work... - alb108
^\{(\w+):.*\}$ should work, it has the capturing group and the braces are escaped as per ICU regex rules, please check. Else, maybe the matches are not at the start? Remove ^ then. - Wiktor Stribiżew
Works great! Thank you, Wiktor:) - alb108
If my solution works, you may accept the answer below by ticking the grey mark to the left of the answer. - Wiktor Stribiżew

1 Answers

0
votes

Tableau regex engine is ICU, and there are some differences between it and PCRE.

One of them is that braces that should be matched as literal symbols must be escsaped.

Your regex also contains a redundant non-capturing group ((?:.*?) = .*?) and a lazy quantifier that slows down matching since you want to check for a } at the end of the string, and thus should be changed to a greedy .*.

You can use

REGEXP_EXTRACT([col], '^\{(\w+):.*\}$')