Regex for getting the host of a website
I ran into this problem where we wanted to strip away the url of different sites to its root url. On top of that I also wanted to remove anything before the domain name, for example for staging.domain.com I only want domain.com from it.
I ended up solving this issue using regular expression:
/[^.]*\.([^.]{2,}|[^.]{2,3}\.[^.]{2})$/
Explaining from the last character of the regex to the first one:
$
is the end of string e.g. 'www.domain.co.uk'<-(here)-
Within the () bracket there are two different matches e.g. (a|b) matches to a or b
[^.]{2,}
: matches to anything like ‘com’, ‘uk’, ‘london’ and etc[^.]{2,3}\.[^.]{2}
: matches to anything like ‘co.uk’ or ‘com.us’ and etc
[^.]*\.
this matches against ‘domain.’ or ‘google.’
Some test domain urls are listed below, you can try it out on regex101.com (make sure you click on JavaScript on the right hand side instead of pcre):
www.domain.com
www.domain.co.uk
www.domain.us
www.url2s.com
www.domain.london
www.url2s.travel
www.bbc.com
andigital.com
bbc.co.uk
www.domain.com.us
staging.su.ik.il.domain.org.uk
www.domain.xn----china