I'm trying to fit logistic regression. I want to split training and testing data by account (a variable that doesn't play a role into fitting). I want them to be split by account, and each account can have lots of variables. For example, 80% of the account will be training, 20% account will be testing.
I've tried the following, but this code just give me 80% training and 20% testing randomly. Then in training data, it will give me some account, but in testing data, it will also give me that exactly account just with different variables. That's not what I want.
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state=0)
Please advise. Thank you!
each account can have lots of variables
- what does this mean? – Supratim Haldar