You may use
.replaceAll("\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}")
Or, if the matches may span across multiple lines, add (?s)
modifier:
.replaceAll("(?s)\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}")
See the regex demo .
Details
\B"\b
- a "
that is either at the start of the string or preceded with a non-word char, and that is followed with a word char
(.*?)
- Group 1: any zero or more chars other than line break chars, as few as possible
\b"\B
- a "
that is either at the end of the string or followed with a non-word char, and that is preceded with a word char.
The replacement is a backslash ("\\\\"
, note the double literal backslash is necessary in the regex replacement part to insert a real, literal backslash since a backslash is a special char in the replacement pattern), q{
, the Group1 value ($1
) and a }
.
See the Java demo:
String s = "This is my \"te\n\nst\" case\nwith lines for \"tes\"t\"ing\" with regex\nBut as he said \"It could be an arbitrary number of words\"";
System.out.println(s.replaceAll("\\B\"\\b(.*?)\\b\"\\B", "\\\\q{$1}"));
Output:
This is my "te
st" case
with lines for \q{tes"t"ing} with regex
But as he said \q{It could be an arbitrary number of words}
NOTE:
If you also need to match two consecutive double quotes that are not preceded, nor followed with word characters, you can modify the above regular expression the following way:
.replaceAll("(?s)\\B(\"\\b(.*?)\\b\"|\"\")\\B", "\\\\q{$2}")
See the regex demo.
Details
(?s)
- an embedded flag option (equal to Pattern.DOTALL
) that makes .
match line break chars, too
\B
- a non-word boundary, here, it means that immediately to the left, there must be a non-word char or start of string (because after \B
, there is a non-word char, "
)
(
- start of the first capturing group:
"\b(.*?)\b"
- "
followed with a word char, then Group 2 capturing any zero or more chars, as few as possible, and then a "
that is preceded with a word char (that is why this pattern can't match ""
, since after the first and before the second, there must be a letter, digit or _
)
|
- or
""
- a ""
substring
)
- end of the first capturing group
\B
- a non-word boundary, here, it means that immediately to the right, there must be a non-word char or end of string (because before \B
, there is a non-word char, "
).
”te st”
but that is not a word (is not comprised entirely of word characters). What about”te x st”
? Do you wish to matchb
ina”b”
? – Cary Swovelandte x ting
should be matched whilea"b test
should be ignored, iea"b "Test"
should match"Test"
only. matching empty would be great as well. Honestly I didn't test all the cases. But I need to have it for Java definitely – LeO