3
votes

Train the model

import lightgbm as lgb lgb_train = lgb.Dataset(x_train, y_train) lgb_val = lgb.Dataset(x_test, y_test)

parameters = { 'application': 'binary', 'objective': 'binary', 'metric': 'auc', 'is_unbalance': 'true', 'boosting': 'gbdt', 'num_leaves': 31, 'feature_fraction': 0.5, 'bagging_fraction': 0.5, 'bagging_freq': 20, 'learning_rate': 0.05, 'verbose': 0 }

model = lgb.train(parameters, train_data, valid_sets=test_data, num_boost_round=5000, early_stopping_rounds=100)

y_pred = model.predict(test_data)

3

3 Answers

2
votes

If you used cut or qcut functions for binning and did not encode later (one-hot encoding, label encoding ..). this may be the cause of the error. Try to use an encoding.

I hope it works.

0
votes

I had what might be the same problem.

Post the whole traceback to make sure.

For me it was a problem serializing to JSON, which LightGBM does under the hood to save the booster for later use.

Check your dataset for any date/datetime columns, or anything that remotely looks like a date, and either drop it or convert to something JSON can handle.

Mine had all been converted to categorical dtype by some Pandas code I had poorly written, and I usually do the initial GBM run fairly fast-n-dirty to see what variables show up as important. LightGBM let me make the data binaries for training (i.e. it would have thrown an error if they were datetime or timedelta dtypes before letting me run anything). It would run the training just fine, report an AUC, then fail after the last training step when it was dumping the categoricals to JSON. It was maddening, with a cryptic traceback.

Hope this helps.

0
votes

If you have any time delta variable in the dataset convert it into an int using the dt.days attribute. I faced the same issue it is the issue reported in Github of light gbm