1
votes

I am working on a Node.js project, in this project we are searching a bunch of PHP view files, and replacing some of the attributes. I am trying to get the HTML open tag attribute values, and replace them.

Basically, if this is the tag

<tag attr1="[capture ANYTHING inside single/double qoutes]" attr2='[CAPTURE ANYTHING]'></tag>  

I want to capture anything inside the attribute quotes. and by [ANYTHING] I mean really anything!

example2: attr="with HTML <br/><b>also been captured</b>"
example3: attr="with line break style \n or \n\r this is still is part of what should been captured and this line too!"
example4: attr="a PHP code <?php echo $ThisPHPcodeisInsideTheQoutes?> should be captured as well!"
example5: title="{{angular?'if inside the attribute': 'it should be acptured as well' }}"

I had wrote the next regex:

/<\w+\s+(:?[\w-]+=(:?"|')(.|[\r\n])*?\2\s*?)>?/g

this regex is catching only the first attribute.

Here is a fiddle with some demo data

regex breakdown:

< tag start
\w+ a word, mainly tag name this will force avoiding PHP tags <?php
\s+ a space or multiple sapces <tag attr
(:? a non capturing group1, I want to get Multiple attributes, but capture only the content!
[\w-]+ a word or - for example attr or ng-attr
= the attribute equal sign
(:?"|') a non capturing group2 open quote or double qoutes
(.|[\r\n])*? -- the actual data I am trying to capture, capture everything . or [\r\n] line break \2 - back reference to (:?"|') so well have "[data]" or '[data]'
\s*? - zero or more sapces before the next tag not greedy
) - close of non capturing group1
>? - end of opening tag not greedy

I don't understand why multiple attributes are not being captured Thanks in advance for the help

1
(:? is a non-capturing group? \w will match the ? in <?php? Are you not allowing spaces before and after the =? How are you trying to use this regexp (show code)? >? is a non-greedy match (hint: no, it's an optional >). - user663031
@torazaburo please run it in a regex editor, you will see that your comment is wrong , you may see it here: refiddle.com/refiddles/57c80c5275622d7947c11600 - Wazime
Which comment do you mean? I don't need to use a regexp editor to know that (:? is not a non-capturing group; it's a group starting with an optional :. You probably meant (?:. This could possibly be the reason for your regexp not capturing multiple attributes. - user663031
Where is your closing quote? What is \2 supposed to refer to, since you're (trying) not to capture the group containing the quotes, right? - user663031
BY definition, a back-reference does NOT work with a non-capturing group. It works for you only because you are writing the non-capturing group INCORRECTLY as (:?, which, as I said an hour ago, is NOT a non-capturing group, but rather a capturing group starting with an optional colon. If you love the regexp editors so much, please review CAREFULLY their narrative description of your (:? construct. - user663031

1 Answers

0
votes

I don't see how this is possible to do with a single regex match. As far as I am aware, you cannot match multiple subpatterns using a backreference end.

Instead, I would recommend processing the HTML in two steps. First, extract the opening tag string using

/<\w+\s+[\w-]+=("|')(?:.|[\r\n])*?\1\s+.*?>/g

and then go back through the matches and extract each of the attribute/value pairs using

/([\w-]+=("|')(?:.|[\r\n])*?\2)/g

At that point, you can split on the first "=" to break apart each attribute from its value.

Here is a fiddle implementing what I recommend. Your sample text should parse out the way you want it.