0
votes

I'd like some help with regular expressions because I'm not really familiar with. So far, I have created the following regex:

/\b(?<![\#\-\/\>])literal(?![\<\'\"])\b/i

As https://regex101.com/ states:

\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)

Negative Lookbehind (?])

Assert that the Regex below does not match

Match a single character present in the list below [#-/>]

# matches the character # literally (case insensitive)

- matches the character - literally (case insensitive)

/ matches the character / literally (case insensitive)

> matches the character > literally (case insensitive)

literal matches the characters literal literally (case insensitive)

Negative Lookahead (?![\<\'\"])

Assert that the Regex below does not match

Match a single character present in the list below [\<\'\"]

\< matches the character < literally (case insensitive)

\' matches the character ' literally (case insensitive)

\" matches the character " literally (case insensitive)

\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)

Global pattern flags

i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])

I want to add two exceptions to this matching rule. 1) if the ">" is preceded by "p", that is for example a <p> starting tag, to match the literal only. 2) Also the literal should only be matched when < is follwed by /p, that is for example a </p> closing tag. How can achieve this ?

Example: only the bold ones should match.

<p>
    **Literal** in computer science is a
    <a href='http://www.google.com/something/literal#literal'>literal</a>
    for representing a fixed value in source code. Almost all programming 
    <a href='http://www.google.com/something/else-literal#literal'>languages</a>
    have notations for atomic values such as integers, floating-point 
    numbers, and strings, and usually for booleans and characters; some
    also have notations for elements of enumerated types and compound
    values such as arrays, records, and objects. An anonymous function
    is a **literal** for the function type which is **LITERAL**
</p>

I know I have over-complicated things, but the situation is complicated itself and I think I have no other way.

1
Can you give an example of input and output of what you're trying to do with it? And what programming language are you using the regex with?4castle
@4castle I have added an example. Would you mind editing it again as before? No clue how to add actual html.dpesios
What programming language is this in? It looks like you need an HTML parser, and not a regex. Please read about the XY Problem.4castle
They never learn, no matter how hard we try, they keep coming back again and again. Please, do not parse HTML with REGEX use an HTML parser: stackoverflow.com/a/1732454/460557Jorge Campos
If this simplifies things there can be only two kind of tags in the text. The p tag and the a tag.dpesios

1 Answers

0
votes

If the text you're searching is just text mixed with some <a> tags, then you can simplify the < and > parts of the lookarounds, and give a specific string that it shouldn't be followed by: </a>.

/\b(?<![-#\/])literal(?!<\/a>)\b/i

Regex101 Demo