2
votes

I have fetched an HTML page using cURL into a string and loaded it up in a DOMDocument. There I can get all the img tags and their source attributes. My problem now is... how can I make these URLs absolute?

The list of URLs can contain all kinds of variants, for example:

  • foobar.jpg
  • http://example.com/foobar.jpg
  • /foobar.jpg
  • ../foobar.jpg
  • folder/foobar.jpg

If the HTML is fetched from an arbitrary URL, what is a safe way of converting these image URLs into absolute ones? Is there a way you can take the base tag into consideration too?

2

2 Answers

1
votes

Here is great PHP example how to do this.

function rel2abs($rel, $base) { 
// something
}

More good examples:

1
votes

Here you are a handy function found on this page :

function absUrl($rel, $base) {
    if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
    if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
    extract(parse_url($base));
    $path = preg_replace('#/[^/]*$#', '', $path);
    if ($rel[0] == '/') $path = '';
    $abs = "$host$path/$rel"; 
    $re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
    for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}   
    return $scheme.'://'.$abs;
}

$rel is your relative path and $base is your base URL.