5
votes

i need to extract data from url like title , description ,and any vedios images in the given url like facebook share button

like this : http://www.facebook.com/sharer.php?u=http://www.wired.com&t=Test

regards

5

5 Answers

5
votes

Embed.ly has a nice api for exactly this purpose. Their api returns the site's oEmbed data if available - otherwise, it attempts to extract a summary of the page like Facebook.

4
votes

Use something like cURL to get the page and then something like Simple HTML DOM to parse it and extract the elements you want.

2
votes

If the web site has support for oEmbed, that's easier and more robust than scraping HTML:

oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.

oEmbed is supported by sites like YouTube and Flickr.

1
votes

I am working on a project for this issue, it is not as easy as writing an html parser and expecting sites to be 'semantical'. Especially extracting videos and finding auto-play parameters are killing. You can check the project in http://www.embedify.me, which has also fb-style url preview script. As I see, embed.ly and oembed are passive parser, they need the sites to support them, so called providers, the approach is quite different than fb does.

-1
votes

While I was looking for a similar functionality, I came across a jQuery + PHP demo of the url extract feature of Facebook messages: http://www.99points.info/2010/07/facebook-like-extracting-url-data-with-jquery-ajax-php/

Instead of using an HTML DOM parser, it works with simple regular expressions. It looks for title, description and img tags. Hence, the image extraction doesn't perform well with a lot of websites, which use CSS for images. Also, Facebook looks first at its own meta tags and then at the classic description tag of HTML but it illustrates well the principe.