0
votes

While working on Kaggle churn prediction using Google's Big query, I am encountering following issue. Please help

ERROR SEEN AFTER RUNNING THE BIGQUERY :

"Logistic regression requires at least 2 unique labels and the label column had only 1 unique label".

Bigquery commands being used :

CREATE or REPLACE MODEL 'churndataset.mymodel` 
OPTIONS(model_type = 'logistic_reg'

   , labels= ['Churn'])

   AS
SELECT 

    * EXCEPT(customerID)
FROM 'churndataset.Churn_table` LIMIT 1000  

`

Churn column (feature) in the dataset has values "Yes" or "No" only.

  1. Please let me know if I have to change the values to "0" or "1" instead of "Yes" or "No"

  2. How to make Bigquery to understand Churn has 2 unique labels not one ?

From Bigquery table I could see that Churn is detected as a Boolean variable.

Please help.

1
you have LIMIT 1000 - so most likely within those 1000 "selected" rows the churn column has ONLY one valueMikhail Berlyant
Is this a public dataset? I guess it is because it's Kaggle. If you uploaded it to BigQuery, can you make it public?Felipe Hoffa
Eliminating the Limit 1000 has solved the issue. Thanks Mikhail.pubg fever

1 Answers

0
votes

You'll have to eliminate the LIMIT 1000 which should resolve your issue.