I want to avoid using sagemaker notebook and preprocess data before training like simply changing the from csv to protobuf format as shown in the first link below for the built-in models.
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-preprocess-data-transform.html
In the following example it explains preprocessing by using sklearn pipelines with the help of sagemaker python-sdk
What are the best practices if you just need to do format like changes and you don't need to use sklearn way of processing.