Binary logistic regression
Assuming binary logistic regression it is pretty easy; you have weights for each feature of your input which, after training, increases or decreases probability.
Let's say you have 4
features and output either 0
or 1
. Let's assume that after training, coefficients for those features are, respectively:
[0.0, -2.2, 1.3, -0.45]
Here, you can easily see that 2nd (numbering from 0
) feature with weight 1.3
contributes to higher probability if this input feature for specific example is greater than zero (or, in other words, feature 2
is positively correlated with probability).
On the other hand, first feature (-2.2
) is negatively correlated with probability, while the zeroth feature no matter it's value has no effect on probability outcome.
You can get those coefficients/weights by issuing
clf.coeffs_
provided your LogisticRegression
is named clf
.
Multinomial logistic regression
In general, multinomial logistic regression would have a matrix of features, each row for probability of a label.
Once again, let's assume you want to classify input into one of 5
classes and 34
input features and let's assume learned weight matrix looks like this:
[
[0.1, 2.2, -0.1, 0.133], # Features of class 0
[-2, -1.1, 0, 4.56],
[-0.1, 0, 0.3, 0.4],
[3.3, -2, 15, -9.4],
[0.45, 0.5, 0.66, 5.5],
]
Now you could apply the same ideas as seen above; let's take how each of those 4
features contribute to probability of outputing label 3
, so we take this row:
[3.3, -2, 15, -9.4]
And you can see feature 0
and 2
have positive correlation with probability of outputting label 3
, while features 1
and 3
have negative.
Bias
Bias contributes prior knowledge. Assume all weights are zero. In binary case, there is only one bias, so it would either output small (negative bias) probability or large (positive bias).
In multinomial case there is one bias for each class but it works similarly.
Contribution of coefficients
You could normalize weights into [-1, 1]
range, with the biggest negative weight having biggest impact on outputing zero probability and highest positive having the biggest impact on outputing probability of one.