0
votes

I'm attempting to download the html using wget for this website:

https://cxcfps.cfa.harvard.edu/cda/footprint/cdaview.html#Footprints|filterText%3D%24filterTypes%3D|query_string=&posfilename=&poslocalname=&inst=ACIS-S&inst=ACIS-I&inst=HRC-S&inst=HRC-I&RA=210.905648&Dec=39.609177&Radius=0.0006&Obsids=&preview=1&output_size=256&cutout_size=12.8|ra=&dec=&sr=&level=&image=&inst=ACIS-S%2CACIS-I%2CHRC-S%2CHRC-I&ds=

Which is a version of the main website:

https://cxcfps.cfa.harvard.edu/cda/footprint/cdaview.html

The only difference from the main website is that the first link takes you to the version that has already searched through a database and displayed results, which you can see in a table. But when I use wget to download the text version of the html for the longer link, but it gives me the exact same text as for the main/short link. I'm confused, but maybe I just don't understand enough about html. I thought they should be slightly different, display the text-html for the database results, etc.

I also used the --mirror option to download all the necessary files, but they all look the same, too. I've also tried using cURL for this too, and the same thing. Can someone please explain why this is happening and if it's fixable?

1

1 Answers

0
votes

The problem is that the main website has a lot of javascript and other code that is not included in the version that you are downloading. The --mirror option will download all the necessary files, but it's not going to be exactly what you want. You can use wget to download the HTML file from the main website, then use wget again with the --mirror option to download all the necessary files. Then you can use grep to search through the HTML file for the table that you want.