0
votes

I want to use cross-validation instead of the normal validation set approach just as a means to get a better estimate of the test error rate. I am using spark-MLLib Dataframe based API. However if I run the following code -

cv = tuning.CrossValidator(estimator=randomForestRegressor, evaluator=evaluator, numFolds=5)
cv_model = cv.fit(vsdf)

I get the error -

KeyError                                  Traceback (most recent call last)
<ipython-input-44-d4e7a9d3602e> in <module>
----> 1 cv_model = cv.fit(vsdf)

C:\Spark\spark-3.1.2-bin-hadoop3.2\python\pyspark\ml\base.py in fit(self, dataset, params)
    159                 return self.copy(params)._fit(dataset)
    160             else:
--> 161                 return self._fit(dataset)
    162         else:
    163             raise ValueError("Params must be either a param map or a list/tuple of param maps, "

C:\Spark\spark-3.1.2-bin-hadoop3.2\python\pyspark\ml\tuning.py in _fit(self, dataset)
    667     def _fit(self, dataset):
    668         est = self.getOrDefault(self.estimator)
--> 669         epm = self.getOrDefault(self.estimatorParamMaps)
    670         numModels = len(epm)
    671         eva = self.getOrDefault(self.evaluator)

C:\Spark\spark-3.1.2-bin-hadoop3.2\python\pyspark\ml\param\__init__.py in getOrDefault(self, param)
    344             return self._paramMap[param]
    345         else:
--> 346             return self._defaultParamMap[param]
    347 
    348     def extractParamMap(self, extra=None):

KeyError: Param(parent='CrossValidator_a9121a59fda3', name='estimatorParamMaps', doc='estimator param maps')

I guess this is because I have not provided any parameter-map to search over. Is there no way to do cross-validation in spark-ml without a parameter grid?