1
votes

Lets say we have some text:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus cursus vestibulum quam, et tristique nisi tristique ac. Nam ac risus vehicula tortor facilisis tincidunt. Aliquam at nisi vel arcu aliquet dignissim nec et massa. Curabitur vel magna eros, accumsan rutrum augue. Lorem ipsum http://subdomain-1.example.com/dir1 dolor sit amet, consectetur adipiscing elit. Nunc ut vehicula purus. Phasellus nunc diam, hendrerit in ultrices vitae, adipiscing ut odio. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Cras molestie felis nec diam sollicitudin placerat pellentesque metus dapibus. Aliquam ipsum ante, lacinia porta http://subdomain-2.example.com/dir2 faucibus non, porttitor at nunc. Quisque suscipit, urna sit amet rhoncus bibendum, elit mi rhoncus lorem, ac luctus lectus nunc in velit.

need c# function which finds all URLs and replaces domain name with given one lets say for ex example.com to stackoverflow.com, but everything else remain the same (subdomain, and the rest of url).

For example the text should look like this after replacing:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus cursus vestibulum quam, et tristique nisi tristique ac. Nam ac risus vehicula tortor facilisis tincidunt. Aliquam at nisi vel arcu aliquet dignissim nec et massa. Curabitur vel magna eros, accumsan rutrum augue. Lorem ipsum http://subdomain-1.stackoverflow.com/dir1 dolor sit amet, consectetur adipiscing elit. Nunc ut vehicula purus. Phasellus nunc diam, hendrerit in ultrices vitae, adipiscing ut odio. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Cras molestie felis nec diam sollicitudin placerat pellentesque metus dapibus. Aliquam ipsum ante, lacinia porta http://subdomain-2.stackoverflow.com/dir2 faucibus non, porttitor at nunc. Quisque suscipit, urna sit amet rhoncus bibendum, elit mi rhoncus lorem, ac luctus lectus nunc in velit.

2
This initially seems like a pretty easy problem to solve - possibly even a homework assignment. What code do you already have, and what problems do you have with it?atk
Of course, in the real world it would be not quite as easy, since you would want subdomain-1.example.com replaced with subdomain-1.stackoverflow.com, and subdomain-1.example.co.uk replaced with subdomain-1.stackoverflow.co.uk, but not have example.google.com replaced with stackoverflow.google.comBlueRaja - Danny Pflughoeft
AND you can't just check the third-level domains for anything that ends in .uk, because there are a handful of domains left around that were registered with just something.uk, from before the UK decided every domain had to be registered at the third-level.BlueRaja - Danny Pflughoeft
Well does it even make sense to match all theoretical cases? Generally you will know what subdomains you have to deal with and what the url you are replacing is before you design the regex (I am assuming this is needed for a specific replacment).ternaryOperator

2 Answers

1
votes

I think this works:

Regex r = new Regex("@(?<SCHEME>https?://)(?<SUBDOMAIN>([^.]+\.)*)example\.com(?<PATH>/.*)?");
string newText = r.Replace(text, "${SCHEME}${SUBDOMAIN}stackoverflow.com${PATH}");

I use named groups because they're easier to keep track of and read. The first is the scheme, http:// or https://, the second grabs the subdomain, and the last one grabs an optional path (as you might have http://foo.example.com or http://foo.example.com/ or http://foo.example.com/bar)

0
votes

The regular expression you use should look something like:

s!(http[s]?://[\w\-]+)\.domain\.com([\w\d/]+)!$1.newdomain.org$2!gi

Note: you will have to rewrite this in C#'s notation.