2
votes

I'm trying to create a job which can download several files via HTTP. The list of these files is in a MySQL table. I create a main job with these steps in line: start, set variables, FILELIST (a transformation that I created), DOWNLOAD (a job that i've created) and success.

The transformation FILELIST contains the following steps: table input and copy rows to result (this transformation communicates with the database and gives the list of URLs to the main task). The task DOWNLOAD contains the following steps: start, http, success (this task should download files to my computer).

All this doesn't work, why? Does anybody know a better way to do the same thing?

1
Unfortunately there's not enough detail here to help you. At minimum, screen shots of your jobs and transformations would help make it clearer what you are doing. Also, you say "all this doesn't work". In what way? Do you get an error message? If so, what is it? - G Gordon Worley III

1 Answers

5
votes

I expect that you have basic knowledge of Kettle. So, getting a list of something from a DB is probably not the issue. I guess you are stuck at having Kettle download and save all of those files - effectively running a loop.

The step for downloading a file is "HTTP" and it is only available in Jobs. So the trick is to have a Job (containing the HTTP step for the download) executed for every file - or to use Kettle-lingo "executed for every row". The URL is passed down into the download-Job as a parameter which is set from a field.

If this didn't help you, then check out the following link where I go into more detail how to accomplish that feat (it is kind of a feat - it shouldn't be one though):

http://www.joyofdata.de/blog/batch-downloading-files-with-pentaho-kettle/