0
votes

I have a regular expression that is running on a string of HTML, but I need to exclude anything that is between a <p></p> tag from being able to match with my regex. Is there a way of doing this in my current regex?

My regex (matches: $, %, decimal, and whole number values in a string): /(?:\$?)(?:\d{1,3}(?:,\d{3})*(?:\%?)|\d+)(?:\.\d+(?:\%?))?/g

Basically, this regex should match with the following.

<div>$50</div>
<p>$40</p>
<div>$30</div>

matches: $50 & $30
ignores: $40
2

2 Answers

0
votes

You could use DOMParser to convert your string to Html and then use querySelectorAll and forEach to remove the p tags from your document and then use your regex:

const htmlString = "<div>$50</div><p>$40</p><div>$30</div>";
const doc = new DOMParser().parseFromString(htmlString , "text/html");
doc.querySelectorAll('p').forEach((a) => a.remove());
console.log(doc.body.innerHTML);
//do your regex captures with the doc.body.innerHTML
const matches = doc.body.innerHTML.match(/(?:\$?)(?:\d{1,3}(?:,\d{3})*(?:\%?)|\d+)(?:\.\d+(?:\%?))?/g);
console.log(matches);
0
votes

/(?:(?<!<p>.*)(?:\$?)(?:\d{1,3}(?:,\d{3})*(?:\%?)|\d+)(?:\.\d+(?:\%?))?(?!.*<\/p>))/g will work on most browsers see https://regex101.com/r/YMoe12/1

I wrote most because negative Lookbehind is still not officially supported in all browsers but is supported in chrome edge and firefox on all supported versions.

full list Here

/(?:(?:\$?)(?:\d{1,3}(?:,\d{3})*(?:\%?)|\d+)(?:\.\d+(?:\%?))?(?![^>]*<\/p>))/g is some workaround for that will work on safari also as seen here