Regex: match a url but not an email domain

Question

I have a very loose regex to match any kind of url inside a string: [a-z]+[:.].*?(?=\s|$) The only problem is that this regex will also match the domain of an email, when instead i want to exclude from the match any email address.

To be precise i do want the following match (matched string in bold)

test example.com test

test [email protected]

Any solution i tried just excludes emailstring and matches myemail.com

Here's a more complete test case https://regex101.com/r/NsxzCM/3/

Is it really worth it trying to construct a monstrous error-prone regex that will filter out all url's but exclude emails? Wouldn't it be much easier to first find all url-like strings, and then check in the second step that they are not e-mail addresses? — Andrey Tyukin
It's a good point but i need to parse a text and replace the url with markup on the fly. Not sure how to do this in multiple steps. Unless i split the whole text by spaces, replace and then rejoin, but keeping track of which part was an email and which wasn't (so i can parse the url instead) will make it messy — Bolza

Andrey Tyukin Andrey Tyukin · Accepted Answer · 2018-05-30T12:53:34

Here is a two-step proposal that uses regex replace with lambdas. The first regex finds everything that looks like an ordinary URL or an email, and the second regex then filters out the strings that look like email addresses:

input = 
  "test\n" +
  "example.com\n" +
  "www.example.com\n" +
  "test sub.example.com test\n" +
  "http://example.com\n" +
  "test http://www.example.com test\n" +
  "http://sub.example.com\n" +
  "https://example.com\n" +
  "https://www.example.com\n" +
  "https://sub.example.com\n" +
  "\n" +
  "test [email protected] <- i don't want to match this\n" +
  "[email protected]    <- i don't want to match this\n" +
  "\n" +
  "git://github.com/user/project-name.git\n" +
  "irc://irc.undernet.org:6667/mIRC jhasbdjkbasd\n";

includeRegex = /(?:[\w/:@-]+\.[\w/:@.-]*)+(?=\s|$)/g ;
excludeRegex = /.*@.*/ ;

result = input.replace(includeRegex, function(s) {
  if (excludeRegex.test(s)) {
    return s; // leave as-is
  } else {
    return "(that's a non-email url: " + s +")";
  }
});

console.log(result);

Regex: match a url but not an email domain

3 Answers