0
votes

I need to upload files to server, it takes too long (lot of small files). I tried multiprocess but for some reason it doesn't seem to work. The 'result' is not changed when it is called through the Pool. If I call the function just on one object it changes the result. If I do print(self)-with removed (def repr) I can see that the process is working on a different object (a copy?) How do I fix this?

import time
from builtins import range, enumerate, str
from multiprocessing import Pool

class UploadJob():
    def __init__(self, value):
        self.value = value
        self.result = None

    def run(self):
        print("start",self.value)
        time.sleep(1)#simulate uploading
        print("end",self.value)
        self.result = str(self.value) + "_fromServer" #save some ID for file

    def __repr__(self):
        return str(self.value)+"-"+str(self.result)


job = UploadJob(99)
print(job)
job.run()
print(job)
print()


arr = [x for x in range(0,5)]
for idx,val in enumerate(arr):
    arr[idx] = UploadJob(val)

print(arr)


def func(val:UploadJob):
    val.run()


pool = Pool()
for val in arr:
    res = pool.apply_async(func, args=(val,))

pool.close()
pool.join()

print(arr)

output:

99-None
start 99
end 99
99-99_fromServer

[0-None, 1-None, 2-None, 3-None, 4-None]
start 0
start 1
start 2
start 3
start 4
end 0
end 1
end 2
end 3
end 4
[0-None, 1-None, 2-None, 3-None, 4-None]

EDIT: If I change the func to return the value, and use pool.map it works correctly, the original array is not changed but the copy is correct. If UploadJob will have a file as bytearray It will the process make a copy of it?

def func(val:UploadJob):
    val.run()
    return val

with Pool() as pool:
    arr1 = pool.map(func, arr)

print(arr)#prints the original result with None
print(arr1)#prints the correct values
1
the result is not changed because processes don't share the memory. That means, that for each process you create, there is a new "results" variable. - akhavro

1 Answers

0
votes

with pool.apply_async(func, args=(val,)), val sends to the child process by pickle/unpickle, thus it is a different object in the child process, although they have the same value. state change in the child process cannot affect the parent process, because they have separate memory space.