4
votes

I have a list of URLs which I would like to feed into wget using --input-file.

However I can't work out how to control the --output-document value at the same time, which is simple if you issue the commands one by one. I would like to save each document as the MD5 of its URL.

 cat url-list.txt | xargs -P 4 wget

And xargs is there because I also want to make use of the max-procs features for parallel downloads.

4

4 Answers

4
votes

Don't use cat. You can have xargs read from a file. From the man page:

       --arg-file=file
       -a file
              Read items from file instead of standard input.  If you use this
              option, stdin remains unchanged when commands are  run.   Other‐
              wise, stdin is redirected from /dev/null.
2
votes

how about using a loop?

while read -r line
do
   md5=$(echo "$line"|md5sum)
   wget ... $line ... --output-document $md5 ......
done < url-list.txt
2
votes

In your question you use -P 4 which suggests you want your solution to run in parallel. GNU Parallel http://www.gnu.org/software/parallel/ may help you:

cat url-list.txt | parallel 'wget {} --output-document "`echo {}|md5sum`"'
1
votes

You can do that like this :

cat url-list.txt | while read url; do wget $url -O $( echo "$url" | md5 ); done

good luck