I need logical AND in regex.
something like
jack AND james
agree with following strings
'hi jack here is james'
'hi james here is jack'
I need logical AND in regex.
something like
jack AND james
agree with following strings
'hi jack here is james'
'hi james here is jack'
You can do checks using positive lookaheads. Here is a summary from the indispensable regular-expressions.info site:
Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions...lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not.
It then goes on to explain that positive lookaheads are used to assert that what follows matches a certain expression without taking up characters in that matching expression.
So here is an expression using two subsequent postive lookaheads to assert that the phrase matches jack
and james
in either order:
^(?=.*\bjack\b)(?=.*\bjames\b).*$
The expressions in parentheses starting with ?=
are the positive lookaheads. I'll break down the pattern:
^
asserts the start of the expression to be matched.(?=.*\bjack\b)
is the first positive lookahead saying that what follows must match .*\bjack\b
..*
means any character zero or more times.\b
means any word boundary (white space, start of expression, end of expression, etc).jack
is literally those four characters in a row. (the same for james
in the next positive lookahead.)$
asserts the end of the expression to me matched.So the first lookahead says "what follows (and is not itself a lookahead or lookbehind) must be an expression that starts with zero or more of any characters followed by a word boundary and then jack
and another word boundary," and the second look ahead says "what follows must be an expression that starts with zero or more of any characters followed by a word boundary and then james
and another word boundary." After the two lookaheads is .*
which simply matches any characters zero or more times and $
which matches the end of the expression.
"start with anything then jack or james then end with anything" satisfies the first lookahead because there are a number of characters then the word jack
, and it satisfies the second lookahead because there are a number of characters (which just so happens to include jack
, but that is not necessary to satisfy the second lookahead) then the word james
. Neither lookahead asserts the end of the expression, so the .*
that follows can go beyond what satisfies the lookaheads, such as "then end with anything".
I think you get the idea, but just to be absolutely clear, here is with jack
and james
reversed, i.e. "start with anything then james or jack then end with anything"; it satisfies the first lookahead because there are a number of characters then the word james
, and it satisfies the second lookahead because there are a number of characters (which just so happens to include james
, but that is not necessary to satisfy the second lookahead) then the word jack
. As before, neither lookahead asserts the end of the expression, so the .*
that follows can go beyond what satisfies the lookaheads, such as "then end with anything".
This approach has the advantage that you can easily specify multiple conditions.
^(?=.*\bjack\b)(?=.*\bjames\b)(?=.*\bjason\b)(?=.*\bjules\b).*$
Explanation of command that i am going to write:-
.
means any character, digit can come in place of .
*
means zero or more occurrences of thing written just previous to it.
|
means 'or'.
So,
james.*jack
would search james
, then any number of character until jack
comes.
Since you want either jack.*james
or james.*jack
Hence Command:
jack.*james|james.*jack
The expression in this answer does that for one jack
and one james
in any order.
Here, we'd explore other scenarios.
jack
and One james
Just in case, two jack
or two james
would not be allowed, only one jack
and one james
would be valid, we can likely design an expression similar to:
^(?!.*\bjack\b.*\bjack\b)(?!.*\bjames\b.*\bjames\b)(?=.*\bjames\b)(?=.*\bjack\b).*$
Here, we would exclude those instances using these statements:
(?!.*\bjack\b.*\bjack\b)
and,
(?!.*\bjames\b.*\bjames\b)
We can also simplify that to:
^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b|.*\bjack\b).*$
If you wish to simplify/update/explore the expression, it's been explained on the top right panel of regex101.com. You can watch the matching steps or modify them in this debugger link, if you'd be interested. The debugger demonstrates that how a RegEx engine might step by step consume some sample input strings and would perform the matching process.
jex.im visualizes regular expressions:
const regex = /^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b|.*\bjack\b).*$/gm;
const str = `hi jack here is james
hi james here is jack
hi james jack here is jack james
hi jack james here is james jack
hi jack jack here is jack james
hi james james here is james jack
hi jack jack jack here is james
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
jack
and One james
in a specific orderThe expression can be also designed for first a james
then a jack
, similar to the following one:
^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b.*\bjack\b).*$
and vice versa:
^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjack\b.*\bjames\b).*$
Vim has a branch operator \&
that is useful when searching for a line containing a set of words, in any order. Moreover, extending the set of required words is trivial.
For example,
/.*jack\&.*james
will match a line containing jack
and james
, in any order.
See this answer for more information on usage. I am not aware of any other regex flavor that implements branching; the operator is not even documented on the Regular Expression wikipedia entry.