0
votes

For some reason pandas is throwing an error when looking through some .csv stock data I have. Here is the error:

Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Date'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./python-for-finance-7.py", line 75, in compile_data() File "./python-for-finance-7.py", line 59, in compile_data df.set_index('Date', inplace=True) File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", >line 3909, in set_index level = frame[col]._values File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", >line 2688, in getitem return self._getitem_column(key) File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", >line 2695, in _getitem_column return self._get_item_cache(key) File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache values = self._data.get(item) File "/usr/local/lib/python3.7/site->packages/pandas/core/internals.py", line 4115, in get loc = self.items.get_loc(item) File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc> return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Date'

to this code:

import bs4 as bs
import datetime as dt
import os 
import pandas as pd
import pandas_datareader.data as web
import pickle
import requests

def compile_data():
    with open("sp500tickers.pickle","rb") as f:
        tickers = pickle.load(f)

    main_df = pd.DataFrame()

    for count,ticker in enumerate(tickers):
        df = pd.read_csv('stock_dfs/{}.csv'.format(ticker), 
delimiter=',', encoding="utf-8-sig")

        df.set_index('Date', inplace=True)

        df.rename(columns = {'Adj Close':ticker}, inplace=True)
        df.drop(['High','Low','Open','Close','Volume'], 1, inplace=True)

        if main_df.empty:
            main_df = df
        else:
            main_df = main_df.join(df, how='outer')

        print(count)

    print(main_df.head())
    main_df.to_csv('sp500_joined_closes.csv')

compile_data()

The data in the CSV files is arranged like this:

Date         High   Low  Open   Close   Volume   Adj. Close

yyyy-mm-dd   $$      $$   $$      $$      $$       $$

I tried changing the casing of Date (ie changing Date to date) but it just moves on to throw another

KeyError:"['High', 'Low', 'Open', 'Close', 'Volume'] not found in axis

Can someone please help??

2
If you add print(df.head()) right after df = pd.read_csv('stock_dfs/{}.csv'.format(ticker), delimiter=',', encoding="utf-8-sig"), what is the exact output?Peter Leimbigler
@PeterLeimbigler It seems like there is another column at the begining simply marking numbers 0,1,2, etc.... the output is: date open high low close volume` 0 2014-01-02 122.5646 123.4081 122.1740 122.6357 3053521 1 2014-01-03 122.8488 123.3015 122.5469 122.9198 2133336 2 2014-01-06 123.1861 123.6656 122.0586 122.1918 2712055 3 2014-01-07 122.0319 122.6978 121.7478 122.2095 2525529 4 2014-01-08 122.0586 122.2983 121.0465 121.3039 3122674majorshimo
Also sorry for the formatting, Im having trouble with it when writing commentsmajorshimo
Ok I see what the issue is, in some of the csv files the column names are capitalized while in others they are not. Is there any way to define the columns as either/or uppercase/lowercase and then standardize it to all lowercase or all uppercase?majorshimo
I just solved it! ended up printing the ticker at the end of each loop in order to see which tickers are the ones with lowercase instead of uppercase names. Deleted those csv files and re downloaded the data properly formatted :) thanks for your help though it gave me the idea that led me to thismajorshimo

2 Answers

0
votes

It looks like you're using the wrong delineator. The file is white-space delineated, not comma delineated.

Try using a whitespace delineator:

df = pd.read_csv('stock_dfs/{}.csv'.format(ticker), 
delimiter=r'\s+', encoding="utf-8-sig")
0
votes

In my case, I didn't have any entries when setting the index, the data frame was empty. It's worth checking

if len(df) > 0:

before setting the index