5
votes

I am trying to forecast my target variable using Quantile Regression in Python.

The data I am considering for training and validation is from period 2015 Oct -2017 Dec 31st.

Now the model has developed,Im trying to forecast values for 2018 Jan, which throws the following error:

ValueError: operands could not be broadcast together with shapes (34,) (33,)

mod = smf.quantreg('ASBCU_SUM~Month+Year+WeekofMonth+DayNum+isHoliday+PCOP_CS+PCOP_LS+PCOP_IFS+PCOP_LSS+PCOP_FSS+PCOP_FS+DayOfWeek_6+DayOfWeek_5+DayOfWeek_2+DayOfWeek_7+DayOfWeek_3+DayOfWeek_4',dfTrainingData)

res = mod.fit(q=0.8)

If I check,the error comes from quantile regression.py file inside statmodels.

diff = np.max(np.abs(beta - beta0))

I have gone through similar posts on stack overflow,which recommends to check the data type of target variable being numerical or not. This is the dtype of the variables:

ASBCU_SUM: int64

Month: category

Year: category

WeekofMonth: category

isHoliday: int64

DayNum: int32

PCOP_SUM: int64

PCOP_CS: int64

PCOP_LS: int64

PCOP_IFS: int64

PCOP_LSS: int64

PCOP_FS: int64

PCOP_FSS: int64

DayOfWeek_3: float64

DayOfWeek_2: float64

DayOfWeek_5: float64

DayOfWeek_7: float64

DayOfWeek_4: float64

DayOfWeek_6: float64

The datatypes are same while developing the model using 2015-2017 data as well.

I really appreciate any help..

1
Please provide sample rows of dfTrainingData that reproduces error. And include a full code block with all import lines so we can run data and code to help.Parfait
most likely this is github.com/statsmodels/statsmodels/issues/2597 Check whether the design matrix has full rank.Josef
I'm running into a similar problem any help on this would be great.njBernstein

1 Answers

3
votes

I encountered the same error before. Checked @Josef 's reply, I found the input matrix is not full rank, after fixing the rank issue, the bug is fixed. For example, if you run below code:

import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
data = {'col_1': [3, 2, 1, 0], 'col_2': [0, 0, 0, 0],
        'y': [1,2,3,4]}
data = pd.DataFrame.from_dict(data)
data.head()
model =smf.quantreg("y ~ col_1 + col_2", data).fit()
print(model.summary())

The bug will appear:

diff = np.max(np.abs(beta - beta0))
ValueError: operands could not be broadcast together with shapes (3,)
(2,) 

If you delete 'col_2' which caused the non-full rank issue, the bug would be fixed.