3
votes

I have a airflow DAG "example_ml.py" which has a task "train_ml_model" and this task is calling/running a python script "training.py".

-Dags/example_ml.py -Dags/training.py

When I run Dag, it is failing to import modules required for training script to execute. Error in import sklearn module

Code snippet for DAG task:

   train_model = PythonOperator(
        task_id='train_model',
        python_callable=training,
        dag = dag
    )

PS: I'm using k8s cluster. Airflow is running in k8s cluster, and executor is set to kubernetesExecutor. So when each DAG is triggered a new pod gets assigned to complete the task.

2

2 Answers

0
votes

Could you give more details? Are you running this on your local computer? A container? Are you sure the package is installed? As you commented out, the error seems to be related to missing package. Creating a task to install may not solve the issue. The ideal is just install requirements on whatever you are running airflow

0
votes

I had the same issue and this is how i solved it:

running following python code

>>> import sys
>>> from pprint import pprint
>>> pprint(sys.path)

I get these paths

 '/home/.local/lib/python3.6/site-packages',
 '/usr/local/lib/python3.6/dist-packages',
 '/usr/lib/python3/dist-packages'

for me airflow is listed under

'/usr/local/lib/python3.6/dist-packages'

therefore for the package to be found, it must be installed right here. I used this command to install my package:

sudo python3 -m pip install -system [package-name] -t $(pwd)