Spark: Load Scala ML model to PySpark

Question

I trained a LDA model in scala Spark.

val lda = new LDA().setK(k).setMaxIter(iter).setFeaturesCol(colnames).fit(data)

lda.save(path)

I checked my saved model and it contains two folders: metadata and data.

However, when I tried to load this model into PySpark, I got an error says:

model = LDAModel.load(sc, path = path) 


File "/Users/hongbowang/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-
0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
o33.loadLDAModel.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not 
exist:file:/Users/hongbowang/Personal/Spark%20Program/Spark%20Project/
T1/output_K20_topic/lda/metadata

Does anyone know how can I fix it? Thanks a lot~!

Alper t. Turker Alper t. Turker · Accepted Answer · 2017-12-02T23:08:36

You saved ml.clustering.LDAModel but you try to read with mllib.clustering.LDAModel. You should import correct LDAModel. For local model:

from pyspark.ml.clustering import LocalLDAModel

LocalLDAModel.load(path)

for distributed model:

from pyspark.ml.clustering import DistributedLDAModel

DistributedLDAModel.load(path)

Spark: Load Scala ML model to PySpark

1 Answers