2
votes

As a learning exercise and because I'd like to do something similar with my own data, I'm trying to copy the answer to this example exactly but implement it in Python via rpy2.

This is turning out to be trickier than I thought because plyr uses a lot of convenient sytax (e.g. as.quoted variables, summarize, functions) that I haven't found easy to port to rpy2. Without even getting to the ggplot2 segment, this is what I've been able to manage so far, using **{} to allow use of the '.' arguments:

# import rpy2.robjects as ro
# from rpy2.robjects.packages import importr
# stats= importr('stats')
# plyr = importr('plyr')
# bs = importr('base')
# r = ro.r
# df = ro.DataFrame

mms = df( {'delicious': stats.rnorm(100), 
           'type':bs.sample(bs.as_factor(ro.StrVector(['peanut','regular'])), 100, replace=True),
           'color':bs.sample(bs.as_factor(ro.StrVector(['r','g','y','b'])), 100, replace=True)} )

# first define a function, then use it in ddply call
myfunc  = r('''myfunc <- function(var) {paste('n =', length(var))} ''')
mms_cor = plyr.ddply(**{'.data':mms, 
                        '.variables':ro.StrVector(['type','color']), 
                        '.fun':myfunc})

This runs without error, but printing the resulting mms_cor gives the following, which suggests the function isn't working correctly in the context of the ddply call (the length of the mms data.frame is 3, which is what I think is being calculated because other inputs to myfunc return different values):

     type color    V1
1  peanut     b n = 3
2  peanut     g n = 3
3  peanut     r n = 3
4  peanut     y n = 3
5 regular     b n = 3
6 regular     g n = 3
7 regular     r n = 3
8 regular     y n = 3 

Ideally I would get this to work with summarize, as done in the example answer, to have multiple calculations/label the output, but I couldn't get this to work either, and it really becomes awkward syntax-wise:

mms_cor = plyr.ddply(plyr.summarize, n=bs.paste('n =', bs.length('delicious')), 
                     **{'.data':mms,'.variables':ro.StrVector(['type','color'])})

This gives the same output as above with 'n = 1'. I know it's reflecting the length of the 1-item vector 'delicious', but can't figure out how to make this a variable instead of a string, or which variable it would be (which is why I moved toward the function above). Additionally, it would be useful to know how one might get the as.quoted variable syntax (e.g. ddply(.data=mms, .(type, color), ...)) to work with rpy2. I know plyr has several as_quoted methods, but I can't figure out how to use them because documentation and examples are tricky to find.

Any help is greatly appreciated. Thanks.

Edit:

lgautier's solution to fix myfunc with nrow not length.

myfunc = r('''myfunc <- function(var) {paste('n =', nrow(var))} ''')

Solution for ggplot2 if useful for others (note had to add x and y values to mms_cor as a workaround for using aes_string (can't get aes to work in Python environment):

#rggplot2 = importr('ggplot2') # note ggplot2 import above doesn't take 'mapping' kwarg
p = rggplot2.ggplot(data=mms, mapping=rggplot2.aes_string(x='delicious')) + \
    rggplot2.geom_density() + \
    rggplot2.facet_grid('type ~ color') + \
    rggplot2.geom_text(data=mms_cor, mapping=rggplot2.aes_string(x='x', y='y', label='V1'), colour='black', inherit_aes=False)

p.plot()
1
In the first part, the function is working correctly and "n = 3" is what one would expect. length(var) is the length of the R data.frame, which is 3. nrows(var) is the number of rows.lgautier
nrow (minus the 's') did the trick, thanks!. will add the ggplot2 solution as edit.williaster
In general, I would avoid writing R code in Python as much as you can. I would create one big R function called e.g. plot_stuff, source that into your rpy session and call that funtction with the appropriate parameters. This also makes debugging the R code easier.Paul Hiemstra
That's a fantastic, simple tip! I can't say how much easier that will make things.williaster
@PaulHiemstra I'd recommend the exact opposite to someone writing an application in Python from existing bits in R (the exception being that one developer or the mixed group of Python and R developer are more proficient in R than in Python for a given task), and this precisely for the purpose of debugging.lgautier

1 Answers

2
votes

Since you are using a callback, I can't resist showing one of the unexpected things rpy2 can do (note: the code is untested, there might be typos) :

def myfunc(var):
    # var is a data.frame, the length of
    # the first vector is the number of rows
    if len(var) == 0:
        nr = 0
    else:
        nr = len(var[0])
    # any string format feature in Python could
    # be used here
    return 'n = %i' % nr 

# create R function from the Python function
from rpy2.rinterface import rternalize
myfunc_r = rternalize(myfunc)

mms_cor = plyr.ddply(**{'.data':mms, 
                        '.variables':ro.StrVector(['type','color']), 
                        '.fun':myfunc_r})