I am parsing RSS feeds with the following procedure:
parser runs once, fetches all the RSS items and stores the time run.
then every time it runs again it checks if an RSS item has a later pubDate than the last time run and stores it in the database
My problem is that for a specific website's feed, some items are added after the last ones but with the same pubDate, so my parser doesn't store them.
For example at 9pm it has one item with <pubDate>Fri, 01 Mar 2013 05:00:00 Z</pubDate>
and later at 12pm it adds another with the same pubDate.
The feed does not offer a guid.
Is there any way to get the actual latest items?
Here the code that i am using now
function getLatest($lastTimeRun, $data, $pubDates)
{
$latestData = array();
for($i=0;$i<sizeof($data);$i++)
{
$pubDates[$i] = strtotime($pubDates[$i]);
//compare the last time the script run with each feed's item publish date
if($lastTimeRun < $pubDates[$i])
{
array_push($latestData, $data[$i]);
}
}
return $latestData;
}