0
votes

Looking for some help here... so stuck.. Below is my code and the error I'm getting. Thanks for all your help.

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    """
    Frame a time series as a supervised learning dataset.
    Arguments:
        data: Sequence of observations as a list or NumPy array.
        n_in: Number of lag observations as input (X).
        n_out: Number of observations as output (y).
        dropnan: Boolean whether or not to drop rows with NaN values.
    Returns:
        Pandas DataFrame of series framed for supervised learning.
    """
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg

# load dataset
dataset = pd.read_csv('newdf2.csv', header=0, index_col=0)
dataset = dataset.drop('Monthday.Key', axis = 1)
dataset.head()

values = dataset.values
# integer encode direction
encoder = LabelEncoder()
values[:,4] = encoder.fit_transform(values[:,4])
# ensure all data is float
values = values.astype('float32')

# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
# frame as supervised learning
reframed = series_to_supervised(scaled, 1, 1)
# drop columns we don't want to predict
reframed.drop(reframed.columns[[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,20,21,22,23,24]], axis=1, inplace=True)
print(reframed.head())

# split into train and test sets
values = reframed.values
n_train_hours = round(len(dataset) *.7)
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

#(799, 1, 22) (799,) (342, 1, 22) (342,)

# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
inv_yhat = concatenate((yhat, test_X[:, 1:]), axis=1)

inv_yhat = scaler.inverse_transform(inv_yhat)

This is the error I'm getting:

ValueError Traceback (most recent call last) in ----> 1 inv_yhat = scaler.inverse_transform(inv_yhat)

~\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in inverse_transform(self, X) 383 X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES) 384 --> 385 X -= self.min_ 386 X /= self.scale_ 387 return X

ValueError: operands could not be broadcast together with shapes (342,22) (23,) (342,22)

1

1 Answers

0
votes

Tough this would sound weird, it did help me fix the error.

If in case u are using "excel" to tamper "csv training data" file and if you are deleting a column from the excel,

you would end up with a blank ",," value in your csv data which would cause the issue for me. Guess it helps.

Removing it or making sure you are not tampering the csv data file manually helped fix the issue for me