In addition to Adam Davis's correct answer, I would like to post my own solution for this operation.
As list is something big, there is three of many differents tested solutions...
First prepare your TLD List in that way:
wget -O - https://publicsuffix.org/list/public_suffix_list.dat |
grep '^[^/]' |
tac > tld-list.txt
Note: tac
will reverse list to ensure testing .co.uk
before .uk
.
posix shell version
splitDom() {
local tld
while read tld;do
[ -z "${1##*.$tld}" ] &&
printf "%s : %s\n" $tld ${1%.$tld} && return
done <tld-list.txt
}
Tests:
splitDom super.duper.domain.co.uk
co.uk : super.duper.domain
splitDom super.duper.domain.com
com : super.duper.domain
In order to reduce forks (avoid myvar=$(function..)
syntax), I prefer to set variables instead of dump output to stdout, in bash functions:
tlds=($(<tld-list.txt))
splitDom() {
local tld
local -n result=${2:-domsplit}
for tld in ${tlds[@]};do
[ -z "${1##*.$tld}" ] &&
result=($tld ${1%.$tld}) && return
done
}
Then:
splitDom super.duper.domain.co.uk myvar
declare -p myvar
declare -a myvar=([0]="co.uk" [1]="super.duper.domain")
splitDom super.duper.domain.com
declare -p domsplit
declare -a domsplit=([0]="com" [1]="super.duper.domain")
Quicker bash version:
With same preparation, then:
declare -A TLDS='()'
while read tld ;do
if [ "${tld##*.}" = "$tld" ];then
TLDS[${tld##*.}]+="$tld"
else
TLDS[${tld##*.}]+="$tld|"
fi
done <tld-list.txt
This step is a significantly slower, but splitDom
function will become a lot quicker:
shopt -s extglob
splitDom() {
local domsub=${1%%.*(${TLDS[${1##*.}]%\|})}
local -n result=${2:-domsplit}
result=(${1#$domsub.} $domsub)
}
Tests on my raspberry-pi:
Both bash scripts was tested with:
for dom in dom.sub.example.{,{co,adm,com}.}{com,ac,de,uk};do
splitDom $dom myvar
printf "%-40s %-12s %s\n" $dom ${myvar[@]}
done
posix version was tested with a detailed for
loop, but
All test script produce same output:
dom.sub.example.com com dom.sub.example
dom.sub.example.ac ac dom.sub.example
dom.sub.example.de de dom.sub.example
dom.sub.example.uk uk dom.sub.example
dom.sub.example.co.com co.com dom.sub.example
dom.sub.example.co.ac ac dom.sub.example.co
dom.sub.example.co.de de dom.sub.example.co
dom.sub.example.co.uk co.uk dom.sub.example
dom.sub.example.adm.com com dom.sub.example.adm
dom.sub.example.adm.ac ac dom.sub.example.adm
dom.sub.example.adm.de de dom.sub.example.adm
dom.sub.example.adm.uk uk dom.sub.example.adm
dom.sub.example.com.com com dom.sub.example.com
dom.sub.example.com.ac com.ac dom.sub.example
dom.sub.example.com.de com.de dom.sub.example
dom.sub.example.com.uk uk dom.sub.example.com
Full script containing file read and splitDom
loop take ~2m with posix version, ~1m29s with first bash script based on $tlds
array, but ~22s
with last bash script based on $TLDS
associative array.
Posix version $tldS (array) $TLDS (associative array)
File read : 0.04164 0.55507 18.65262
Split loop : 114.34360 88.33438 3.38366
Total : 114.34360 88.88945 22.03628
So if populating associative array is a stonger job, splitDom
function become lot quicker!