I had created a dataproc cluster with Anaconda as optional component and created a virtual env. in that. Now when running a pyspark py file on master node I'm getting this error -
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
I need RDKit package inside the virtual env. and with that python 3x version gets installed. The following commands on my master node and then the python version changes.
conda create -n my-venv -c rdkit rdkit=2019.*
conda activate my-venv
conda install -c conda-forge rdkit
How can I solve this?