0
votes

Sorry if I'm being really stupid (I try and avoid Java-ish RegEx, and mainly use Perl for that sort of stuff), but I've hit on a problem that's really bugging me.

I've got an XQuery resource in an OSB pipeline that uses a function I've written, the meat of which is extracting t-l-v data (NAME;[number];VALUE;NAME;[number];VALUE;...) as so:

if ($arg != '' and $name != '' and matches($arg, concat('.*;', $name, ';[0-9]+;')))
  then substring(normalize-space(substring-before(replace($arg, concat('.*;', $name, ';[0-9]+;'), ''), ';')), 1, 64)
  else ''

It works about 50% of the time, and presumably fails on bigger strings, given the massive org.apache.xmlbeans.impl.regex.RegularExpression.matchString recursive stack trace.

The thing is, it sometimes fails on an $arg input that it's previously been happy with, so I guess it's just running out of memory dependent on what else is happening at the time, so this points to an inefficient expression as opposed to one that doesn't work.

The thing is, I can't see a better way to define it than matching .*NAME;\d+; - especially given that XQuery and/or OSB seems to be fairly limited (\d didn't actually work, hence [0-9] in my code)...unless I'm missing something obvious.

Any ideas?

1
Have you tried tokenize($input,';')? - Cylian
I guess this would still run multiple matches on the same string, so may also run into problems. It also then means I have to traverse the tokenized output to get the item two places after the matching $name (or one item after, if I do tokenize($input,';[0-9]+;')). May give it a go though. - Marc

1 Answers

0
votes

Do you really need .* in there? Won't that match a lot of stuff? How about [.&&[^;]]* or the equivalent? i.e. match anything .* matches except semicolon.