This example on sklearn website and this answer to sklearn pipelines on SO uses and talks only about using .fit()
or .fit_transform()
methods in Pipleines.
But, how do I use .predict or .transfrom methods in Pipelines. let's say I have pre-processed my train data, searched for best hyper-parameters and trained an LightGBM model. I would now like to predict on new data, instead of doing all the aforementioned things manually, I want to do them all one-after-one, according to the definition:
Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit.
But, I only want to implement .transform
methods on my validation(or test) data and some more functions(or classes) that take pandas Series(or DataFrame or numpy array) and return processed one, then finally implement .predict
method of my LightGBM, which would use the hyper-parameters I already have.
I currently have nothing, since I don't know how to include methods of classes properly( like
StandardScaler_instance.transform()
) and more such methods.!
How do I do this or what have I missed?