wget recurses to the second-bottom level and goes no further. If I specify the bottom level HTML file as the source, it parses it and goes further. I think this may be caused by the PDF files linked off the HTML document being in an different root file path on the server. I need it to retrieve all the PDF files off the leaves of this hierarchy since I am going to promote them together as part of a campaign for depression awareness.
I am using GNU Wget 1.19.4 built on linux-gnu.
I have tried, --exclude, --exclude-directory, -l2, -l10, --continue and many other switches. I need to use the --include commands or wget grabs the entire site. If I use -np it won't go "up" into /docs
This code gets me the HTML files but does not follow links in the "bottom most" HTML files.
wget --mirror --include docs/default-source/research-project-files --include about-us/research-projects/research-projects/ https://www.beyondblue.org.au/about-us/research-projects/research-projects/
This code, when I manually specify the HTML file, gets the PDF files I want in it.
wget --mirror --include docs/default-source/research-project-files --include about-us/research-projects/research-projects https://www.beyondblue.org.au/about-us/research-projects/research-projects/online-forums-user-research
I want it to visit all the HTML files in this branch, get out all the PDF links in them, and retrieve all the PDF files from /docs
https://www.beyondblue.org.au/about-us/research-projects/research-projects/online-forums-user-research
Here is one of the PDFs. The /docs directory does not have a listing.
https://www.beyondblue.org.au/docs/default-source/research-project-files/online-forums-2015-report.pdf?sfvrsn=3d00adea_2
The best I can get wget to do is walk the site and get HTML files down to this level:
https://www.beyondblue.org.au/about-us/research-projects/research-projects/online-forums-user-research
https://www.beyondblue.org.au/about-us/research-projects/research-projects/networks-of-advocacy-and-influence-peer-mentors-in-beyond-blue-s-mental-health-forums
...
150 of them
It seems like a depth-limiting setting or a path traversal limitation or something. I suspect it's an easy one to spot. Thanks again!