1
votes

The == operator in Rcpp works as expected when comparing numeric vectors against a single value. I.e. each element of the vector is compared to the value and a logical vector is returned. For example, consider the following which behaves as expected:

library(Rcpp)
cppFunction('
CharacterVector test_vals(NumericVector x) {
  if (is_true(any(x == 3))) return ("Values include 3");
  return ("3 not found");
}')
test_vals(1:2)
# [1] "3 not found"
test_vals(1:5)
# [1] "Values include 3"

However, if I try to compare a character vector against a character scalar, it only seems to test the first element of the vector:

cppFunction('
CharacterVector test_names(NumericVector x) {
  CharacterVector y = x.attr("names");
  if (is_true(any(y == CharacterVector::create("foo")))) return ("Names include foo");
  return ("foo not found");
}')
test_names(c(a=1, b=2, foo=3))
# [1] "foo not found"
test_names(c(foo=3, a=1, b=2))
# [1] "Names include foo"

I know that comparing two character vectors of the same length appears to work in a vectorised manner, as expected:

cppFunction('
CharacterVector test_names(NumericVector x) {
  CharacterVector y = x.attr("names");
  CharacterVector foo(x.size());
  foo.fill("foo");
  if (is_true(any(y == foo))) return ("Names include foo");
  return ("foo not found");
}')
test_names(c(a=1, b=2, foo=3))
# [1] "Names include foo"
test_names(c(foo=3, a=1, b=2))
# [1] "Names include foo"
test_names(c(a=1, b=2))
# [1] "foo not found"

Does this mean that comparisons of character vectors against a single value has not been implemented in Rcpp, or am I just missing how to do it?

1
Good question. It does look like NumericVector has the appropriate opeator=() but CharacterVector may not (as characters are generally a different cattle of fish). We could probably add it; in the meantime you could probably write yourself a little helper that does it 'by hand' for two vectors. - Dirk Eddelbuettel
So to be plain you desire a 'contains()' operator taking a vector of strings, and a single string, returning a boolean? Phrasing it as == is a little "off" to my reading because of many-vs-one mapping here. You really are looking at a set operator here, right? (I am with you that in that std::vector<> is the single best container...) - Dirk Eddelbuettel
yes - that's exactly what I'm looking for - dww
I think we should look at something else then -- think std::vector<std::string> and hand it off to the STL which will already do this... - Dirk Eddelbuettel
Glad this helps. Thinking more about the lack of == support: we generally do not 'recycle' as R does (which some languages call 'broadcast'). So 'many-to-one' comparisons are somewhat uncharted territory. - Dirk Eddelbuettel

1 Answers

2
votes

Following up on our quick discussion, here is a very simple solution as the problem (as posed) is simple -- no regular expression, no fancyness. Just loop over all elements and return as soon as match is found, else bail with false.

Code

#include <Rcpp.h>

// [[Rcpp::export]]
bool contains(std::vector<std::string> sv, std::string txt) {
    for (auto s: sv) {
        if (s == txt) return true;
    }
    return false;
}

/*** R
sv <- c("a", "b", "c")
contains(sv, "foo")
sv[2] <- "foo"
contains(sv, "foo")
*/

Demo

> Rcpp::sourceCpp("~/git/stackoverflow/66895973/answer.cpp")

> sv <- c("a", "b", "c")

> contains(sv, "foo")
[1] FALSE

> sv[2] <- "foo"

> contains(sv, "foo")
[1] TRUE
> 

And that is really just shooting from the hip before looking for either what we may already have in the (roughly) 100k lines of Rcpp, or what the STL may have...

The same will work for your earlier example of named attributes as you can the same, of course, with a CharacterVector, and/or use the conversion from it to std::vector<std::string> we used here, or... If you have an older compiler, switch the for from C++11 style to K+R style.