I have some troubles with exchange of the object (dataframe) between 2 processes through the Queue.
First process get the data from a queue, second put data into a queue. The put-process is faster, so the get-process should clear the queue with reading all object.
I've got strange behaviour, because my code works perfectly and as expected but only for 100 rows in dataframe, for 1000row the get-process takes always only 1 object.
import multiprocessing, time, sys
import pandas as pd
NR_ROWS = 1000
i = 0
def getDf():
global i, NR_ROWS
myheader = ["name", "test2", "test3"]
myrow1 = [ i, i+400, i+250]
df = pd.DataFrame([myrow1]*NR_ROWS, columns = myheader)
i = i+1
return df
def f_put(q):
print "f_put start"
while(1):
data = getDf()
q.put(data)
print "P:", data["name"].iloc[0]
sys.stdout.flush()
time.sleep(1.55)
def f_get(q):
print "f_get start"
while(1):
data = pd.DataFrame()
while not q.empty():
data = q.get()
print "get"
if not data.empty:
print "G:", data["name"].iloc[0]
else:
print "nothing new"
time.sleep(5.9)
if __name__ == "__main__":
q = multiprocessing.Queue()
p = multiprocessing.Process(target=f_put, args=(q,))
p.start()
while(1):
f_get(q)
p.join()
Output for 100rows dataframe, get-process takes all objects
f_get start
nothing new
f_put start
P: 0 # put 1.object into the queue
P: 1 # put 2.object into the queue
P: 2 # put 3.object into the queue
P: 3 # put 4.object into the queue
get # get-process takes all 4 objects from the queue
get
get
get
G: 3
P: 4
P: 5
P: 6
get
get
get
G: 6
P: 7
P: 8
Output for 1000rows dataframe, get-process takes only one object.
f_get start
nothing new
f_put start
P: 0 # put 1.object into the queue
P: 1 # put 2.object into the queue
P: 2 # put 3.object into the queue
P: 3 # put 4.object into the queue
get <-- #!!! get-process takes ONLY 1 object from the queue!!!
G: 1
P: 4
P: 5
P: 6
get
G: 2
P: 7
P: 8
P: 9
P: 10
get
G: 3
P: 11
Any idea what I am doing wrong and how to pass also the bigger dataframe through?
__version__
: pandas 0.16.2, multiprocessing 0.70a1, python 2.7.10) - chris-scDataFrame
per se, but to all Python objects above some size threshold that will be system dependent. - chris-sc