0
votes

I am re-framing an existing question for simplicity. I have the following code to download Excel files from a company Share Point site.

import requests
import pandas as pd

def download_file(url):

    filename = url.split('/')[-1]

    r = requests.get(url)

    with open(filename, 'wb') as output_file:
    
    output_file.write(r.content)

df = pd.read_excel(r'O:\Procurement Planning\QA\VSAF_test_macro.xlsm')
df['Name'] = 'share_point_file_path_documentName' #i'm appending the sp file path to the document name
file = df['Name'] #I only need the file path column, I don't need the rest of the dataframe

# for loop for download
for url in file:
   download_file(url)

The downloads happen and I don't get any errors in Python, however when I try to open them I get an error from Excel saying Excel cannot open the file because the file format or extension is not valid. If I print the link in Jupyter Notebooks it does open correctly, the issue appears to be with the download.

1
I don't understand what this code is doing. You download an excel file from sharepoint. You convert it to a dataframe. You overwrite the Name column of the dataframe with a path. Then you loop over the column, downloading each path. But in the previous step, you set every value in that column to the same path, so why the loop?Nick ODell
Sorry @NickODell to be clear, I am exporting the Share Point list to Excel (which contains a hyper link to the form I want). The df[Name] column I am adding the share point file path to the front of the file name, not making it all the same, I know that is confusing I have added some clarification to my question.Alex Dowd

1 Answers

0
votes
  1. Check r.status_code. This must be 200 or you have the wrong url or no permission.
  2. Open the downloaded file in a text editor. It might be a HTML file (Office Online)
  3. If the URL contains a web=1 query parameter, remove it or replace it by web=0.