1
votes

I am trying to get the title, description, link, image, date of each item from this rss feed http://www.autoexpress.co.uk/car-news/feed/. But don't understand why, the link tag and the src for the image tag are impossible to get, rest of them work fine. This is what I tried:

<?php
    include "testing3/lib/simple_html_dom.php";
    $url = 'http://www.autoexpress.co.uk/car-news/feed';
    $rss= file_get_html($url);
    $items = $rss->find('item');
    foreach ($items as $article) {
        $title[] = $article->find('title',0)->plaintext;
        $description[] = $article->find('description',0)->plaintext;
        $link[] = $article->find('link', 0)->plaintext;
        $image[] = $article->find('img', 0);
        $date[] = $article->find('pubDate', 0)->plaintext;
    }
    echo 'Title is '.$title[0].'<br>';
    echo 'Description is '.strip_tags(html_entity_decode($description[0])).'<br>';
    echo 'Link is '.$link[1].'<br>';
    echo 'Date is '.$date[1].'<br>';
    echo 'Image Source is '.$image[1];
?>

This is the output Title is Fiat Panda 4x4 Antarctica review - pictures Description is Pictures See all 8 pictures 24 May, 2014 Link is Date is Fri, 23 May 2014 16:29:39 +0000 Image Source is

var_dump($link); I get an array of empty strings:

array(40) { [0]=> string(0) "" [1]=> string(0) "" [2]=> string(0) "" etc

var_dump($image) same thing just that there are NULL VALUES. What am I mistaking?

2

2 Answers

1
votes

Straight off the bat, that's a pretty nasty-looking RSS feed. My guess is your library isn't capable of dealing with nested/escaped RSS tags. Since no-one's got back to you in 40-odd minutes, here's the bog-standard approach:

            $rssfeed = simplexml_load_file('http://www.autoexpress.co.uk/car-news/feed');
            foreach ($rssfeed->channel as $channel) {

                echo '<ul>';
                foreach ($channel->item as $item) {
                    echo '<li><a href="' . htmlentities($item->link) . '"</a>';
                    echo htmlentities($item->title);
                    echo htmlentities($item->description);
                    echo htmlentities($item->img);
                    echo htmlentities($item->pubDate);
                    echo '</li>';
                }
                echo '</ul>';
            }

Yup , that doesn't even use the library you've cited at the top of your excerpt, but it grabs the required code, escaped img tag included, even if it needs some serious clean-up afterwards.

Actually I think this script fails in the img tag, but that's because the escaped img tag is nested inside the description.

2
votes

You simply can't parse "link" tags with PHP Simple HTML DOM Parser for unknown reasons. I used this library too, and it never parsed those elements. You can make a simple HTML file with < link > elements, and they won't be parsed. However, if you change it to < link2 > (or similar), parser will start working instantly. I guess that "link" is "reserved" word in this parser, or something.