0
votes

Here is a web-page which contains a Flash video: http://nptel.iitm.ac.in/courses/106101007/1. On parsing the HTML, it is very easy to find that there is a span with id player whose href attribute contains the link to the video source. Once we have that link, it is trivial to automate the download process through say wget.

However, in most of the Flash based video sharing sites like Youtube, Metacafe, Dailymotion, such links are not there. Instead there are a series of meta tags within which I guess the link is hidden somewhere. How do I download such videos? What is the process softwares like Youtube downloaders follow to download such flash based video where the link to the actual resource is not mentioned anywhere?

1

1 Answers

2
votes

General video downloading can get complicated, as you're noticing. The issue is that web pages can embed this general purpose programming language called JavaScript. This creates the possibility for an infinite number of ways to obscure the true video download URL.

You have observed that video download tools exist. Such tools need to be updated from time to time because they are engaged in an arms race with the maintainers of the video sites who generally wish to prevent people from downloading and saving content.

Sometimes, the video isn't even available via straight HTTP. Instead, it may require another protocol to stream (e.g., RTMP, RTSP, MMS). In those cases, another tool needs to be invoked.

In order to develop such a tool, it often takes a bit of JavaScript reverse engineering as well as a protocol analyzer or web browser network analysis tool. A decade ago, I wrote a tool to download WMV videos from a now-defunct music video website. I did it by using a network protocol analyzer to watch the various URLs that the browser would send to the site. Then I wrote a tool that mimicked the same conversation. Modern tools operate similarly. When my tool derived the true streaming URL, it passed the URL to a separate tool that was able to download and save MMS streams. Any time that the site would update its little conversation protocol (typically every few months), my script would break and I would have to expend the effort to upgrade it if I cared enough.