20
votes

Recently I've been experimenting with Rcpp (inline) to generate DLLs that perform various tasks on supplied R inputs. I'd like to be able to debug the code in these DLLs line by line, given a specific set of R inputs. (I'm working under Windows.)

To illustrate, let's consider a specific example that anybody should be able to run...

The code below is a really simple cxxfunction which simply doubles the input vector. Note however that there's an additional variable myvar that changes value a few times but doesn't affect the output - this has been added so that we'll be able to see when the debugging process is running correctly.

library(inline)
library(Rcpp)

f0 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
    Rcpp::NumericVector xa(a);
    int myvar = 19;
    int na = xa.size();
    myvar = 27;
    Rcpp::NumericVector out1(na);
    for(int i=0; i < na; i++) {
        out1[i] = 2*xa[i];
        myvar++;
    }
    myvar = 101;
    return(Rcpp::List::create( _["out1"] = out1));
')

After we run the above, typing the command

getLoadedDLLs()

brings up a list of DLLs in the R session. The last one listed should be the DLL created by the above process - it has a random temporary name, which in my case is

file7e61645c

The "Filename" column shows that cxxfunction has put this DLL in the location tempdir(), which for me is currently

C:/Users/TimP/AppData/Local/Temp/RtmpXuxtpa/file7e61645c.dll

Now, the obvious way to call the DLL is via f0, as follows

> f0(c(-7,0.7,77))

$out1
[1] -14.0   1.4 154.0

But we can of course also call the DLL directly by name using the .Call command:

> .Call("file7e61645c",c(-7,0.7,77))

$out1
[1] -14.0   1.4 154.0

So I've reached the point where I'm calling a standalone DLL directly with R input (here, the vector c(-7,0.7,77)), and having it return the answer correctly to R.

What I really need, though, is a facility for line-by-line debugging (using gdb, I presume) that will allow me to observe the value of myvar being set to 19, 27, 28, 29, 30, and finally 101 as the code progresses. The example above is deliberately set up so that calling the DLL tells us nothing about myvar.

To clarify, the "win condition" here is being able to observe myvar changing (seeing the value myvar=19 would be the first step!) without adding anything else to the body of the code. This obviously may require changes to the way in which the code is compiled (are there debugging mode settings to turn on?), or the way R is called - but I don't know where to begin. As noted above, all of this is Windows-based.

Final note: In my experiments, I actually made some minor modifications to a copy of cxxfunction so that the output DLL - and the code within it - receives a user-defined name and sits in a user-defined directory, rather than a temporary name and location. But this doesn't affect the essence of the question. I mention this just to emphasise that it should be fairly easy to alter the compilation settings if someone gives me a nudge :)

For completeness, setting verbose=TRUE in the original cxxfunction call above shows the compilation argument to be of the following form:

C:/R/R-2.13.2/bin/i386/R CMD SHLIB file7e61645c.cpp 2> file7e61645c.cpp.err.txt 
g++ -I"C:/R/R-213~1.2/include"    -I"C:/R/R-2.13.2/library/Rcpp/include"      -O2 -Wall  -c file7e61645c.cpp -o file7e61645c.o
g++ -shared -s -static-libgcc -o file7e61645c.dll tmp.def file7e61645c.o C:/R/R-2.13.2/library/Rcpp/lib/i386/libRcpp.a -LC:/R/R-213~1.2/bin/i386 -lR

My adapted version has a compilation argument identical to the above, except that the string "file7e61645c" is replaced everywhere by the user's choice of name (e.g. "testdll") and the relevant files copied over to a more permanent location.

Thanks in advance for your help guys :)

1
I can't help directly, but I know Dirk etc are always helpful. However they generally do business on the Rcpp email listJase_
Have made some mild progress on this, so a brief update. Playing around with inline:::compileCode, which gets called within cxxfunction, I found that adding --debug at the end of R CMD SHLIB allowed me to examine what's going on inside the DLL through combining gdb and R. HOWEVER, this isn't a full solution as some variables were inaccessible (such as i, during the loop over i); the message came up that they had been "optimized out". I think I therefore need to replace the "-O2" with "-O0" in the compilation argument... but I've no notion of how to make this happen...Tim P
I've gathered a good bit of evidence that all I need to do is change the R CMD SHLIB compiler flags from -O2 to something like -g -O0 ...e.g. see the post at stat.ethz.ch/pipermail/r-devel/2008-November/051390.html - but I'm lacking a precise statement of what needs specifying and how. Some online sources mention creating a file at <RHOME>/.R/Makevars.win but they don't describe the file, and this isn't a valid location in Windows due to the dot before the letter R...Tim P
@Jase: Thanks, but actually my last post to that list received no replies at all! I posted here as it's obviously a question of general interest (seeing as the fix for the bug seems to be related to changing the compiler flags).Tim P
Of course ~/.R/Makevars is valid in Windoze -- below my $HOME I have a ton of files starting with a dot, as well as directories (frequently created by Unix-y programs ported to Windoze). And yes, you need -g if you want debugging output.Dirk Eddelbuettel

1 Answers

19
votes

I am a little stunned by the obsession some Rcpp users have with the inline package and its cxxfunction(). Yes, it is indeed very helpful and it has surely has driven adoption of Rcpp further as it makes quick experimentation so much easier. Yes, it allowed us to use 700+ unit tests in the sources. Yes, I use it all the time to demonstrate examples here, on the rcpp-devel list or even live in presentations.

But does that mean we should use it for each and every task? Does it mean that it does not have "costs" such as randomized filenames in a temporary directory etc pp? Romain and I argued otherwise in our documentation.

Lastly, debugging of dynamically loaded R modules is difficult as it stands. There is an entire section in the (mandatory) Writing R Extensions about it, and Doug Bates once or twice posted a tutorial about how to do this via ESS and Emacs (though I always forget where he posted it; once was IIRC on the rcpp-devel list).

Edit 2012-Jul-07:

Here is your step by step:

  • (Preamble: I've used gcc and g++ for many years, and even when I add -g I don't always turn -O2 into -O0. I am really not sure you need that, but as you ask for it...)

  • Set your environment variable CXXFLAGS to "-g -O0 -Wall". There numerous ways to do it, some are platform-dependent (eg Windows control panel) and therefore less universal and interesting. I use ~/.R/Makevars on Windows and Unix. You could use that, or you could override R's system-wide $RHOME/etc/Makeconf or you could use Makeconf.site or ... See the full docs---but as I said, ~/.R/Makevars is my preferred way as it does NOT interfere with compilation outside of R.

  • Now every compilation R does via R CMD SHLIB, R CMD COMPILE, R CMD INSTALL, ... will use. So it no longer matters you use inline or a local package. Continuing with inline...

  • For the rest, we mostly follow 'Section 4.4.1 Finding entry points in dynamically loaded code' of "Writing R Extensions":

  • Start another R session with R -d gdb.

  • Compile your code. For

fun <- cxxfunction(signature(), plugin="Rcpp", verbose=TRUE, body='
   int theAnswer = 42;
   return wrap(theAnswer);
')

I get

[...]
Compilation argument:
 /usr/lib/R/bin/R CMD SHLIB file11673f928501.cpp 2> file11673f928501.cpp.err.txt 
 ccache g++-4.6 -I/usr/share/R/include -DNDEBUG   -I"/usr/local/lib/R/site- library/Rcpp/include"   -fpic  -g -O0 -Wall -c file11673f928501.cpp -o file11673f928501.o
g++-4.6 -shared -o file11673f928501.so file11673f928501.o -L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib/R/site-library/Rcpp/lib -L/usr/lib/R/lib -lR
  • Invoke eg tempdir() to see the temporary directory, cd to this temporary directory used above and dyn.load() the file built above:
 dyn.load("file11673f928501.so")
  • Now suspend R by sending a break signal (in Emacs, a simple choice from a drop-down).

  • In gdb, set a breakpoint. The single assignment above became line 32 for me, so

break file11673f928501.cpp 32
cont
  • Back in R, call the function:

    fun()

  • Presto, in the debugger at the break point we wanted:

R> fun()

Breakpoint 1, file11673f928501 () at file11673f928501.cpp:32
32      int theAnswer = 42;
(gdb) 
  • Now it is "just" up to you to work gdb to its magic

Now, as I said in my first attempt, all this would be easier (in my eyes) via a simple package which Rcpp.package.skeleton() can write for you as you don't have to deal with randomized directories and filenames. But each to their own...