0
votes

I'm trying to split my time series data into train and test set. But I'm getting Key Error :1 while running the code:

def prepare_data(data, lags=1):
    X, y = [], []
    for row in range(len(data) - lags - 1):
        a = data[row:(row + lags), 0]
        X.append(a)
        y.append(data[row + lags, 0])

    return np.array(X), np.array(y)     

# prepare the data
lags = 1
X_train, y_train = prepare_data(train, lags)
X_test, y_test = prepare_data(test, lags)
y_true = y_test     # due to naming convention

Error msg:

KeyError Traceback (most recent call last) C:\ProgramData\Anacondaa3\envs\tf\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2656 try: -> 2657 return self._engine.get_loc(key) 2658 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 1

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last) in 1 # prepare the data 2 lags = 1 ----> 3 X_train, y_train = prepare_data(train, lags) 4 X_test, y_test = prepare_data(test, lags) 5 y_true = y_test # due to naming convention

in prepare_data(data, lags) 4 a = data[row:(row + lags)] 5 X.append(a) ----> 6 y.append(data[row + lags]) 7 return np.array(X), np.array(y) 8

C:\ProgramData\Anacondaa3\envs\tf\lib\site-packages\pandas\core\frame.py in getitem(self, key) 2925 if self.columns.nlevels

1: 2926 return self._getitem_multilevel(key) -> 2927 indexer = self.columns.get_loc(key) 2928 if is_integer(indexer): 2929 indexer = [indexer]

C:\ProgramData\Anacondaa3\envs\tf\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2657
return self._engine.get_loc(key) 2658 except KeyError: -> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2660
indexer = self.get_indexer([key], method=method, tolerance=tolerance) 2661 if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 1

1

1 Answers

0
votes

I guess your index error is raised, because you use the index of the dataframe, but it seems what you really intended was to access it position based.

Please try, if the following change of your code fixes the issue:

  def prepare_data(data, lags=1):
        X, y = [], []
        index_of_y_column= 0    # change this if necessary
        indexes_of_x_columns= [i for i in range(data.shape[1]) if i != index_of_y_column]
        for row in range(len(data) - lags - 1):
            a = data.iloc[(row + lags): indexes_of_x_columns]
            X.append(a)
            y.append(data.iloc[row + lags, index_of_y_column])
        return np.array(X), np.array(y)     

Btw, if you just want to shift one or more columns, or the whole dataframe (in the last case all data is shifted relative to the index), you can also use the shift() method.

E.g. data.shift(1) will shift your whole dataframe by one row.

data['targetcol_shifted']= data['targetcol'].shift(-1)

Will fill a new column targetcol_shifted with the value of targetcol stored in the next row. If you can use shift() I would recommend to prefer it, because it will be much faster, than doing it by hand.