1
votes

I have some problem with adding own features to sklearn.linear_model.LogisticRegression. But anyway lets see some example code:

from sklearn.linear_model import LogisticRegression, LinearRegression
import numpy as np

#Numbers are class of tag
resultsNER = np.array([1,2,3,4,5])

#Acording to resultNER every row is another class so is another features
#but in this way every row have the same features
xNER = np.array([[1.,0.,0.,0.,-1.,1.],
                 [1.,0.,1.,0.,0.,1.],
                 [1.,1.,1.,1.,1.,1.],
                 [0.,0.,0.,0.,0.,0.],
                 [1.,1.,1.,0.,0.,0.]])

#Assing resultsNER to y
y = resultsNER
#Create LogReg
logit = LogisticRegression(C=1.0)
#Learn LogReg
logit.fit(xNER,y)

#Some test vector to check wich class will be predict
xPP = np.array([1.,1.,1.,0.,0.,1.])

#linear = LinearRegression()
#linear.fit(x, y)

print "expected: ", y
print "predicted:", logit.predict(xPP)
print "decision: ",logit.decision_function(xNER)
print logit.coef_
#print linear.predict(x)
print "params: ",logit.get_params(deep=True)

Code above is clear and easy. So I have some classes which I called 1,2,3,4,5(resultsNER) they are related to some classes like "data", "person", "organization" etc. So for each class I make custom features which return true or false, in this case one and zero numbers. Example: if token equals "(S|s)unday", it is data class. Mathematically it is clear. I have token for each class features I test it. Then I look which class have the max value of sum of features (that’s why return number not boolean) and pick it up. In other words I use argmax function. Of course in summarization each feature have alpha coefficients. In this case it is multiclass classification, so I need to know how to add multiclass features to sklearn.LogisticRegression.

I need two things, alphas coefficients and add my own features to Logistic Regression. The most important for me is how to add to sklearn.LogisticRegression my own features functions for each class.

I know I can compute coefficients by gradient descent. But I think when I use fit(x,y) the LogisticRegression use some algorithm to compute coefficients witch I can get by attribute .coef_ .

So in the end my main question is how to add custom features for different classes in my example classes 1,2,3,4,5 (resultNER).

1
Your question is very unclear. Try rephrasing it as it doesn't make sense the way it is. The learning algorithm (regression in this case) has little to do with feature extraction. Read up scikit-learn.org/stable/modules/feature_extraction.html. Your code as it is learns a fit on a dataset containing 5 points each representing one class, meaning it's quite over-fitted. The vector you give in order to test logically resolves to class nr. 3, which is what you'd expect, and not the label array y. - Viktor Vojnovski

1 Answers

1
votes

Not quite sure about your question, but few thing that might help you:

  • You can use predict_proba function to estimate probabilities for each class:

    >>> logit.predict_proba(xPP)
    array([[ 0.1756304 ,  0.22633999,  0.25149571,  0.10134168,  0.24519222]])
    
  • If you want features to have some weights (is this the thing you're calling alpha?), you do it not in learning algorithm but on preprocessing phase. I your case you can use an array of coefficients:

    >>> logit = LogisticRegression(C=1.0).fit(xNER,y)
    >>> logit.predict(xPP)
    array([3])
    >>> alpha = np.array([[0.2, 0.2, 1, 1, 0.3, 1]])
    >>> logit = LogisticRegression(C=1.0).fit(alpha*xNER,y)
    >>> logit.predict(alpha*xPP)
    array([2])