0
votes

I am requesting a URL and getting a return in bytes. I want to store this in a data frame and then to CSV.

#Get Data from the CSV 
url = "someURL" 
req = requests.get(URL)
url_content = req.content
csv_file = open('test.txt', 'wb')
print(type(url_content))
print(url_content)
csv_file.write(url_content)
csv_file.close()

I tried many approaches, but couldn't find the solution. The above code is storing the output in CSV, but getting the below error. My end objective is to store this in CSV then send it to google cloud. And create a google big query table.

enter image description here

Output:

<class 'bytes'>

b'PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x13\x00\x00\x00[Content_Types].xml\xb5S\xcbn\xc20\x10\xfc\x95\xc8\xd76\xf4PU\x15\x81C\x1f\xc7\x16\xa9\xf4\x03\{\x93X\xf8%\xaf\xa1\xf0\xf7]\x078\x94R\x89\nq\xf2cfgfW\xf6d\xb6q\xb6ZCB\x13|\xc3\xc6|\xc4\xf0h\xe3\xbb\x86},^\xea{Va\x96^K\x1b<4\xcc\x076\x9bN\x16\xdb\x08XQ\xa9\xc7\x86\xf59\xc7\x07!P\xf5\xe0$\xf2\x10\xc1\x13\xd2\x86\xe4d\xa6c\xeaD\x94j);\x10\xb7\xa3\xd1\x9dP\xc1g\xf0\xb9\xceE\x83M'O\xd0\xca\x95\xcd\xd5\xe3\xee\xbeH7L\xc6h\x8d\x92\x99R\x89\xb5\xd7G\xa2\xf5^\x90'\xb0\x03\x07{\x13\xf1\x86\x08\xacz\xde\x90\xca\xae\x1bB\x91\x893\x1c\x8e\x0b\xcb\x99\xea\xdeh.\xc9h\xf8W\xb4\xd0\xb6F\x81\x0ej\xe5\xa8\x84CQ\xd5\xa0\xeb\x98\x88\x98\xb2\x81}\xce\xb9L\xf9U:\x12\x14D\x9e\x13\x8a\x82\xa4\xf9%\xde\x87\xb1\xa8\x90\xe0,\xc3B\xbc\xc8\xf1\xa8[\x8c\t\xa4\xc6\x1e ;\xcb\xb1\x97\t\xf4{N\xf4\x98~\x87\xd8X\xf1\x83p\xc5\x1cykOL\xa1\x04\x18\x90kN\x80V\xee\xa4\xf1\xa7\xdc\xbfBZ~\x86\xb0\xbc\x9e\x7fq\x18\xf6\x7f\xd9\x0f \x8aa\x19\x1fr\x88\xe1{O\xbf\x01PK\x07\x08z\x94\xcaq;\x01\x00\x00\x1c\x04\x00\x00PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00_rels/.rels\xad\x92\xc1j\xc30\x0c\x86_\xc5\xe8\xde8\xed`\x8cQ\xb7\x972\xe8m\x8c\xee\x014[ILb\xcb\xd8\xda\x96\xbd\xfd\xcc.[K\n\x1b\xec($}\xff\x07\xd2v?\x87I\xbdQ.\x9e\xa3\x81u\xd3\x82\xa2h\xd9\xf9\xd8\x1bx>=\xac\xee@\x15\xc1\xe8p\xe2H\x06"\xc3~\xb7}\xa2\t\xa5n\x94\xc1\xa7\xa2"\x16\x03\x83H\xba\xd7\xba\xd8\x81\x02\x96\x86\x13\xc5\xda\xe98\x07\x94Z\xe6^'\xb4#\xf6\xa47m{\xab\xf3O\x06\x9c3\xd5\xd1\x19\xc8G\xb7\x06u\xc2\xdc\x93\x18\x98'\xfd\xcey|a\x1e\x9b\x8a\xad\x8d\x8fD\xbf\t\xe5\xae\xf3\x96\x0el_\x03EY\xc8\xbe\x98\x00\xbd\xec\xb2\xf9vql\x1f3\xd7ML\xe9\xbfeh\x16\x8a\x8e\xdc*\xd5\x04\xca\xe2\xa9\3\xbaY0\xb2\x9c\xe9oJ\xd7\x8f\xa2\x03\t:\x14\xfc\xa2^\x08\xe9\xb3\x1f\xd8}\x02PK\x07\x08\xa7\x8cz\xbd\xe3\x00\x00\x00I\x02\x00\x00PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00docProps/app.xmlM\x8e\xc1\n\xc20\x10D\xef~E\xc8\xbd\xdd\xeaAD\xd2\x94\x82\x08\x9e\xecA? \xa4\xdb6\xd0lB\xb2J?

2

2 Answers

2
votes

The original URL (now edited out of the question) suggests that the downloaded file is in .xlsx format. The .xlsx format is essentially one or more xml files in a zip archive (iBug's answer is correct in this respect).

Therefore if you want to get the file's data in a dataframe, tell Pandas to read it as an excel file.

import pandas as pd

url = "someURL" 
req = requests.get(URL)
url_content = req.content

# Load into a dataframe
df = pd.read_excel(url_content)

# Write to csv
df.to_csv('data.csv')
2
votes

The initial bytes PK\x03\x04 suggest that it's PK Zip format. Try unzipping it first, either with unzip x <filename> or with Python builtin zipfile module.