If the answer is yes, what will be a simple example to test this capability?
I have tried to use the multiprocessing capabilities of SFrame and implicit. But the CPU utilization is always below 10% of a n1-highmem-32 (32 vCPUs, 208 GB memory) instance.
import os
os.environ['OMP_NUM_THREADS'] = "25"
import sframe
sframe.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 25)
import implicit
item_factors, user_factors = implicit.alternating_least_squares(train, 2)