I am compiling the framework for a logistic regression with classifiers. Can someone help me to validate it and suggest the major library (sklearn, for instance) functions? Here is what I came up with:
Run logistic regression from sklearn for N observations and M variables (M < N)
train set - about 80% of the total dataset test set - the remaining 20%
Q: is there a function, which would allow to select the test set as an extrapolation of a train set rather than using a random selection? (train_test_split does no do this)
Q: is there a function which will let to run logistic regression with regularization? StandardScaler maybe?
When Logistic Regression is complete how do we use the results:
do we use just a decision boundary plot and make a decision about our new data point based on whether it is IN or OUT of the plot?
I can get the coefficients but what is the formula to calculate the target? Is it a linear polinom under the sigmoid umbrella? Is this a way to go?
Is there a function to calculate a probability of our decision being a correct one (Yes or No)? I can get the error using score attribute (KNeighborsClassifier). There is also predict.proba attribute but I am not sure to interpret it. There is also a confusion matrix and probability can be calculated using its numbers. What is the right way?
Aside from Logistic Regression there are other functions used, such as:
KNeighborsClassifier LDA and others
What role do they play vs. Logistic Regression and how they must be used?
Thank you