i already have some boolean features (1 or 0), but i have some categorical vars that need OHE and some numeric vars that need imputed/scaled ... i can add the categorical vars + numeric vars to a pipeline column transformer but how do i add the boolean features to the pipeline so they get included in the model? can't find any examples or a good phrase to search for this kind of dilemma ... any ideas?
here is an example from sklearn combining a num and cat pipeline however what if some of my features are already in boolean form (1/0) and do not need preprocessed/OHE ... how do i keep those features (i.e. add it to the pipeline with the num and cat variables)?
source: https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html
titanic_url = ('https://raw.githubusercontent.com/amueller/scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')
data = pd.read_csv(titanic_url)
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression(solver='lbfgs'))])
X = data.drop('survived', axis=1)
y = data['survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf.fit(X_train, y_train)
print("model score: %.3f" % clf.score(X_test, y_test))