9
votes

I'm trying to parse RSS feed from this link http://www.gazetaexpress.com/rss.php?cid=1,13&part=rss but when i try displaying the results it gives me the following error:

Warning: DOMDocument::load() [domdocument.load]: Opening and ending tag mismatch: strong line 208 and description in http://www.gazetaexpress.com/rss.php?cid=1,13&part=rss, line: 209 in C:\wamp\www\gazetaExpress\scripts\reader.php on line 17

as well as

Warning: DOMDocument::load() [domdocument.load]: Premature end of data in tag rss line 2 in http://www.gazetaexpress.com/rss.php?cid=1,13&part=rss, line: 226 in C:\wamp\www\gazetaExpress\scripts\reader.php on line 17

the script that i'm using for parsing is

 $xmlDoc->load($xml);

$x=$xmlDoc->getElementsByTagName('item');

for ($i=0; $i<6; $i++)  {
    $item_title=$x->item($i)->getElementsByTagName('title')->item(0)->childNodes->item(0)->nodeValue;
    $item_link=$x->item($i)->getElementsByTagName('link')->item(0)->childNodes->item(0)->nodeValue;
    $item_desc=$x->item($i)->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue;

 // and echo statements

}

When I try some other rss feed from this site (like sports: http://www.gazetaexpress.com/rss.php?cid=1,24&part=rss), it works fine. It's exactly the above rss feed that won't work. Is there any way to get around this? any help would be hugely appreciated.

2
The error is in the construction/authoring of the feed itself. There's nothing you can really do about it (unless you're the author of the feed).Brian Driscoll
The best way would be to contact the site and inform them that their RSS feed is broken. Opera gives this error: XML parsing failed XML parsing failed: syntax error (Line: 209, Character: 159) Error: mismatched end-tagh00ligan

2 Answers

9
votes

This is due to the usage of <br> and other self closing tags. The dom tries to find the end like this <br/> where <br is start and /> is end. Modern browsers will not have problems with <tag> but the php dom function still wants you to keep the XML standard so you need to find al the <singletags> and replace them with <singletags /> then it works just fine.

6
votes

When the fragment you want to parse is not conform to XML specs (eg self closing tags without '/' or unclosed tags) and if it dosesn't contain duplicate ids you can try with loadHTML, it's more permissive.

$xmlDoc->loadHTML($xml);