2
votes

I have some HTML and want to replace the "src" attributes of all the img tags so that they point to copies of the identical images (although with different file names) on another host.

So for instance, given these three tags

<IMG SRC="../graphics/pumpkin.gif" ALT="pumpkin">
<IMG BORDER="5" SRC="redball.gif" ALT="*"> 
<img alt="cool image" src="http://www.crunch.com/pic.jpg"/>

I would like them replaced with

<IMG SRC="http://myhost.com/cache/img001.gif" ALT="pumpkin">
<IMG BORDER="5" SRC="http://myhost.com/cache/img002.gif" ALT="*"> 
<img alt="cool image" src="http://myhost.com/cache/img003.jpg"/>

I know there is some regexp magic to this, just not sure what it should look like (or if this is in fact the best way).

6
You should use a XML parser, not regex for this :) - Colin Hebert
Friends don't let friends parse HTML with regular expressions. (I can't believe how many times a day I have to paste this link.) - Ether

6 Answers

5
votes

I tried doing this with SimpleHTMLDOM, and it seems to work:

$html = str_get_html( ... ); // what you have done

$map = array(
  "../graphics/pumpkin.gif"       => "http://myhost.com/cache/img001.gif",
  "redball.gif"                   => "http://myhost.com/cache/img002.gif",
  "http://www.crunch.com/pic.jpg" => "http://myhost.com/cache/img003.gif",
);

foreach ($html->find("img") as $element) {
  if (isset($map[$element->src])) {
    $element->src = $map[$element->src];
  }
}

echo $html;

PS: If you need to clarify your question, you should edit your original question instead of opening a new, identical question.

4
votes

This being asked on SO, you will most likely get a lot of answers telling you to use a parser instead. Guess what, I think it's the right answer. In PHP, you can use DOMDocument's loadHTML method to create a DOM tree from a given HTML document, which you can walk over, modifying the tags as you go along.

0
votes

You will need case insensitive RegEx matching, and you'll also need to consider " vs ' quotes.

Hhmm. I think I'd use a System.Text.RegularExpressions.RegEx.Replace with delegate call.

You'd need to make sure the quote matched, so you'd need an ORed check. Roughly:

\<IMG .* src\=\'.*?\' | \<IMG .* src\=\".*?\"
0
votes

using jquery, you can get all the images as such:

$("img").each(function(
  if($this.attr('src') == "../graphics/pumpkin.gif"){
    $this.attr('src', 'http://myhost.com/cache/img001.gif');
  }else if...
))

0
votes

Just run over all images in the document and get/set the src attribute.

var images=document.getElementByTagName('img');
for(var i=0;i<images.length;i++)
{
   images[i].getAttribute("src");//do something with it
   images[i].setAttribute("src",some_new_value);//set new src
}

As many have already said, you don't need RegExp for this.

0
votes

You can use phpQuery to do this.

foreach (pq("img") as $img) {
  // insert regexp magic here
  $img->attr('src', $newurl);
}

Quite possibly overkill, but it works. Especially for people used to working with jQuery.