0
votes

I am stuck plotting observed vs predicted values using rpy2. The R code below works:

                                                 #==============================#
                                                 # Set up the data.             #
                                                 #==============================#
yTmp <- seq(1,20)
y <- NULL
x1Tmp <- seq(1,20)
x1 <- NULL
x2Tmp <- seq(1,20)
x2 <- NULL
for (i in 1:20)
{
    y[i] = yTmp[i] + runif(1, 0, 1)
    x1[i] = x1Tmp[i] + runif(1, 0, 1)
    x2[i] = x2Tmp[i] + runif(1, 0, 1)
}
                                                #==============================#
                                                # Fit the model.               #
                                                #==============================#
fittedModel <- lm(y ~ x1 + x2)
                                                #==============================#
                                                # Plot the observed vs predicted
                                                #==============================#
png(filename="fittedModel_R.png")
plot(fittedModel$fitted.values, y)
dev.off()

Which generates a nice observed vs predicted plot (which I would post but I need at least 10 reputation to post images).

I have tried to reproduce this using rpy2, but I'm unable to figure out how to get the fitted values to play nicely. The code below is as equivalent to the R code above as I can make it, but does not work:

#!/usr/bin/env python
                                                #==============================#
                                                # Set up packages.             #
                                                #==============================#
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
import random
import string
stats = importr('stats')
from rpy2.robjects import Formula
lattice = importr('lattice')
rprint = robjects.globalenv.get("print")
grdevices = importr('grDevices')
                                                #==============================#
                                                # Set up the data.             #
                                                #==============================#
y = robjects.FloatVector(())
x1 = robjects.FloatVector(())
x2 = robjects.FloatVector(())
for i in range(1,20):
    yValue = i + random.random()
    y.rx[i] = yValue
    x1Value = i + random.random()
    x1.rx[i] = x1Value
    x2Value = i + random.random()
    x2.rx[i] = x2Value
robjects.globalenv["y"] = y
robjects.globalenv["x1"] = x1
robjects.globalenv["x2"] = x2
                                                #==============================#
                                                # Fit the model.               #
                                                #==============================#
fittedModel = stats.lm("y ~ x1 + x2")
                                                #==============================#
                                                # Attempt to extract the fitted 
                                                # values from the model and put
                                                # on a vector.                 #
                                                #==============================#
robjects.globalenv['predicted'] = robjects.Vector(fittedModel.rx('fitted.values'))
                                                #==============================#
                                                # Plot the observed vs predicted
                                                #==============================#
grdevices.png(file = "fittedModel_RPY2.png", width = 512, height = 512)
formula = Formula('y ~ predicted')
p = lattice.xyplot(formula)
rprint(p)
grdevices.dev_off()

This generates an error:

/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py:106: UserWarning: Error in order(as.numeric(x)) : (list) object cannot be coerced to type 'double'

res = super(Function, self).call(*new_args, **new_kwargs) Traceback (most recent call last): File "./testRPY2.py", line 45, in p = lattice.xyplot(formula) File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 178, in call return super(SignatureTranslatedFunction, self).call(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 106, in call res = super(Function, self).call(*new_args, **new_kwargs) rpy2.rinterface.RRuntimeError: Error in order(as.numeric(x)) :
(list) object cannot be coerced to type 'double'

The problem is definitely with the predicted values because changing the formula to plot y vs y generates a nice plot:

formula = Formula('y ~ y')

I have made many attempts to coerce the data in python into a plottable format, including converting to a string in python, manipulating, and sending back to rpy2 as a floating vector. But I don't really understand why it's not working and there must be a better way. Any insights and help into my problem is very much appreciated.

1

1 Answers

0
votes

If any doubt about what is predicted, you can try

print(type(robjects.globalenv['predicted']))

or for a longer output:

print(robjects.globalenv['predicted'])

You'll see that you do not have what is called an atomic vector in R, and this is almost certain coming from the one line where predicted is created:

robjects.globalenv['predicted'] = robjects.Vector(fittedModel.rx('fitted.values'))

The method .rx() corresponds to R's [ while .rx2() corresponds to R's [[. You'll want the latter.

The introduction in the documentation has a short example with R's lm(): http://rpy.sourceforge.net/rpy2/doc-2.6/html/introduction.html#linear-models