0
votes

For an input string like:

10,14,"1011 testing 1",10,"1022 testing 2",10,"1033, 234, testing 3"

where those double quote characters are a part of the string, I need to set a pattern that recognizes the numbers and commas following them, but not when they're inside the quotes.

I'm using it in groovy code, so I'm doing a replaceAll call where the regex max means I am going to replace it with an empty string ("").

That input string needs to become:

"1011 testing 1","1022 testing 2","1033, 234, testing 3"

This:

[0-9]+,

gets me to recognize the numbers followed by commas. But how do I say that last part about not when inside double quotes? Is there a way to say as long as there are an even number of double quotes before the match?

I see other posts that are somewhat similar, but they're not quite the same.

1
It seems all you want is extract all strings in between double quotes, why not use s.findAll(/"[^"]*"/).join(",")? - Wiktor Stribiżew
I'd use a CSV parser off the shelf. They are built to deal with all this nonsense. - cfrick

1 Answers

0
votes

If you are parsing CSV, you should use a CSV parser that does all hard work for you.

In case this is a standalone string and you just want to get a string with comma-separated substrings between double quotes you can use

String result = text.findAll(/"[^"]*"/).join(",")

See a Groovy demo online.