3
votes

Given a string in between quotations, as such "Hello"

The following regular expression will print out a match of the string without the double quotations:

/"([^"]+)"/

I don't understand how it is capturing the characters. I believe what this should be capturing is just the initial double quote. What this regular expression is saying is find an expression that starts and ends with double quotes and again has one or more double quotes at the beginning. And it captures that one or more double quotes at the beginning. How does it end up matching the string here with [^"]+ ?

1
I assume you're just confusing the ^ inside the [], which negates the character set, with the non-bracketed use of ^ to mean the beginning of the string. Which is a pretty confusing reuse of a character, for sure.glenn mcdonald

1 Answers

8
votes

The expression [^"]+ means literally to match all characters which are not the double quote ". So when placed inside (), all characters following the first " and up to the next " are captured. This is because the ^ inside a [] character class implies negation rather than the start of a string as it would mean outside the []. So [^"] literally means anything but a ".

The () itself is the capture group, and the regex will only capture the expression which exists inside (). Depending on the programming language you use, it may also record the entire string matched "Hello" by the entire expression /"([^"]+)"/ in a separate variable, but the purpose of () is to capture its contents.

Full breakdown of the expression:

  • " - first literal quote
  • ( - begin capture
  • [^"]+ all subsequent characters up to but not including "
  • ) - end capture group
  • " - final closing quote literal