When / in what context should you use StringIndexer vs StringIndexer+OneHotEncoder?
Looking at the docs for sparkml's StringIndexer (https://spark.apache.org/docs/latest/ml-features#stringindexer) and OneHotEncoder (https://spark.apache.org/docs/latest/ml-features#onehotencoder), it's not obvious to me when to use just StringIndexer vs StringIndexer+OneHotEncoder (I've been using just a StringIndexer on a benchmarking dataset and getting pretty good results as is, but I suppose that does not mean that doing this is necessarily "correct"). The ohe docs refer to a StringIndexer > OneHotEncoder > VectorAssembler staging pipeline, but the way it is worded make that seem optional (vs just doing StringIndexer > VectorAssembler).
Can anyone clarify this for me?