I am trying with a sample dataFrame :
data = [['Alex','USA',0],['Bob','India',1],['Clarke','SriLanka',0]]
df = pd.DataFrame(data,columns=['Name','Country','Traget'])
Now from here, I used get_dummies to convert string column to an integer:
column_names=['Name','Country']
one_hot = pd.get_dummies(df[column_names])
After conversion the columns are: Age,Name_Alex,Name_Bob,Name_Clarke,Country_India,Country_SriLanka,Country_USA
Slicing the data.
x=df[["Name_Alex","Name_Bob","Name_Clarke","Country_India","Country_SriLanka","Country_USA"]].values
y=df['Age'].values
Splitting the dataset in train and test
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=float(0.5),random_state=0)
Logistic Regression
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(x_train, y_train)
Now, model is trained.
For prediction let say i want to predict the "target" by giving "Name" and "Country".
Like : ["Alex","USA"].
Prediction.
If I used this:
logreg.predict([["Alex","USA"]).
obviously it will not work.
["Name_Alex","Name_Bob","Name_Clarke","Country_India","Country_SriLanka","Country_USA"]
. You will have to read your sample csv file and then shape it into an array in this shape, then calllogreg.predict(my_array)
– Karl