0
votes

I am having an issue trying to use a code for converting a file into csv. I am using the code below as a start

directory = 'C:\OI Data'

filename = 'OpenInterest08-24-16'

data_xls = pd.read_excel(os.path.join(directory,filename), 'Sheet1',    index_col=None)
    data_xls.to_csv(os.path.join(directory,filename +'.csv'), encoding='utf-8')

and I am getting the following error:

Traceback (most recent call last):

File "", line 1, in

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile execfile(filename, namespace)

File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/Users/Public/Documents/Python Scripts/work.py", line 26, in data_xls = pd.read_excel(os.path.join(directory,filename), 'Sheet1', index_col=None)

File "C:\Anaconda2\lib\site-packages\pandas\io\excel.py", line 170, in read_excel io = ExcelFile(io, engine=engine)

File "C:\Anaconda2\lib\site-packages\pandas\io\excel.py", line 227, in init self.book = xlrd.open_workbook(io)

File "C:\Anaconda2\lib\site-packages\xlrd__init__.py", line 441, in open_workbook ragged_rows=ragged_rows,

File "C:\Anaconda2\lib\site-packages\xlrd\book.py", line 91, in open_workbook_xls biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)

File "C:\Anaconda2\lib\site-packages\xlrd\book.py", line 1230, in getbof bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])

File "C:\Anaconda2\lib\site-packages\xlrd\book.py", line 1224, in bof_error raise XLRDError('Unsupported format, or corrupt file: ' + msg) xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\n\n\n\n\n '

I am struggling to figure out the file format I am using https://www.theice.com/marketdata/reports/icefuturesus/PreliminaryOpenInterest.shtml?futuresExcel=&tradeDate=8%2F24%2F16

opening the file myself I get the following enter image description here

I am still a beginner at python and some help would be much appreciated.

Thanks

1
Is your file an xls or xlsx file? pandas will choose the appropriate parser depending on the extension and there doesn't appear to be one... - Jon Clements♦
This is problematic: data_xls.to_csv(os.path.join(directory,filename,'.csv'), encoding='utf-8')... - mechanical_meat
Try instead: data_xls.to_csv(os.path.join(directory,filename+'.csv'), encoding='utf-8') - mechanical_meat

1 Answers

1
votes

You can start by fixing this part:

data_xls.to_csv(os.path.join(directory,filename,'.csv'), encoding='utf-8')

What happens when you do that is:

'C:\OI Data\\OpenInterest08-24-16\\.csv'

Which is not what you want. Instead do:

os.path.join(directory,filename+'.csv')

Which will give you:

'C:\OI Data\\OpenInterest08-24-16.csv'

Also, this is not a problem here, but in general be careful with this because a single backslash and a character can indicate an escape sequence, e.g. \n is a newline:

directory = 'C:\OI Data'

Instead escape the backslash like so:

directory = 'C:\\OI Data'