0
votes

My code could be given any of the 3 possible URLS below, and I'd like to extract the user name 'mrsmith' which could be any kind of user name depending on the URL being passed, while the 'somewebsite', '/artist/' and '/portfolio/' remain constant:

http://www.somewebsite.com/artist/mrsmith
http://mrsmith.somewebsite.com/
http://mrsmith.somewebsite.com/portfolio/variablenames

Is there an elegant way to do this using regex?

2
Do you want to extract it if it exists, or do you want to remove it? Which bit are you going to use - the mrsmith, or the rest of the URL?slugster
I was just looking to grab the user name mrsmith, I will edit the original post to be clearer.Chris L
If you are looking to extract the subdomain (which just happens to be a name, but that in itself is irrelevant) then that is what you need to say.slugster
@ChrisL as slugster said, it's so hard to differentiate between the user name and the subdomain names.Avinash Raj
I guess, what I was wondering - is it possible to have something that says, "if it's NOT "www.", then it's the user name, otherwise, look for the username at the end of the URL?Chris L

2 Answers

3
votes

The below regex would look for mrsmith just after a / symbol followed by a . or $(end)

(?<=\/)mrsmith(?=\.|$)

DEMO

Explanation:

  • (?<=\/) A positive lookbehind is used. It sets the marker just after to the / symbol.
  • mrsmith(?=\.|$) again it checks for the string mrsmith, if it's there then again check whether the symbol which follows the mrsmith string is a dot or end. If these conditions are true, then it matches the corresponding mrsmith string.

Update:

Your regex would be,

(?=www\.).*\/\K.*|(?<=http:\/\/)[^\.]*

OR

(?=www\.).*\/\K.*|(?!www\.)(?<=http:\/\/)[^\.]*

It matches the string after last / when a line contains www. or it matches the sting after http:// upto the first . if www. is not present on that line.

DEMO

0
votes

simplest regex would be

http:\/\/(.*?)\..*\/(.*)

this will match both groups the subdomain and the end path

so these will be the match for three lines above

MATCH 1

  1. [7-10] www
  2. [34-41] mrsmith

MATCH 2

  1. [49-56] mrsmith
  2. [73-73] ``

MATCH 3

  1. [81-88] mrsmith
  2. [115-128] variablenames

now you may choose which group is the username

for example if first group has www then definitely second is the name other wise the first

try it here http://regex101.com/r/kE9bB4/1