I am currently trying to get automatic offloading working using Intel Python 2019 and a Xeon Phi X100 KNC (SC3120A) card. For testing the offloading I am trying this benchmark: https://github.com/accre/Intel-Xeon-Phi/blob/master/Python/automatic-offloading/bmark.py
However, I cannot get it to work. The code is simply executed on the host CPU. I am using MPSS 3.8.6 and Intel Parallel Studio 2017 (last version with X100 support) on CentOS. miccheck passes and I can also use SSH to run crosscompiled code directly on the card. I am using Intel Python 3.6 My .bashrc file looks like that:
export PATH=$PATH:/opt/intel/intelpython3/bin/libfabric/
source /opt/intel/bin/compilervars.sh intel64
source /opt/intel/intelpython3/bin/activate root
source /opt/intel/intelpython3/bin/mklmicvars.sh
export USE_DAAL4PY_SKLEARN=YES
export OFFLOAD_DEVICES=0
export MKL_MIC_DISABLE_HOST_FALLBACK=1
I also followed this page and installed mkl-mic: https://software.intel.com/en-us/articles/using-intel-python-with-coprocessor-cards
It seems that I am missing something fundamental here. Does Python 3.6 work at all? The micperf benchmark packages provided by Intel are e.g. written for Python 2, so I cannot try them currently. I really would like to leverage the computational power of the card for my Python code. Do you have any idea what could help here or what I could check?