0
votes

I need to clean up Google News links in pages dynamicly, and get actual links of the content.

Google News link looks like this:

http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGkF58EwDE7aA742GfVP9aE8azmhg&url=http://www.reuters.com/article/2012/01/15/us-obama-mlk-idUSTRE80E0PD20120115

I want to keep the actual link, everything after &url= :

http://www.reuters.com/article/2012/01/15/us-obama-mlk-idUSTRE80E0PD20120115

I NEED to preg_match/preg_replace and eliminate the "non-essential" part of the URL, in essence targeting everything starting with http://news.google.com and ending with &url= ?

http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGkF58EwDE7aA742GfVP9aE8azmhg&url=

As you can probably tell, I'm no regex expert. :)

Thanks a lot!

2

2 Answers

2
votes

You could use preg_replace with ~http://new\.google\.com.*?&url=~, replacing with ''.

Or, you could use preg_match with &url=(.*)$ and pull out $1.

1
votes

If I've understood you, you just want to have the part after &url=, so this could be solved with a simple regex like &url=(.*)$. If there are other GET values after url, you would need &url=(.*)&.

I recommend Rubular to try and play with regexes, although it is ruby-based.