47
votes

Unless I am missing something, this regex seems pretty straightforward:

grepl("Processor\.[0-9]+\..*Processor\.Time", names(web02))

However, it doesn't like the escaped periods, \. for which my intent is to be a literal period:

Error: '\.' is an unrecognized escape in character string starting "Processor\."

What am I misunderstanding about this regex syntax?

3
I don't know R but have your tried \\.?mu is too short
@mu: Ya, that fixes it. But wonder exactly why I need the double `` to escape itKyle Brandt
You need one to escape the other so that you get one past the string mangler and through to the regex engine.mu is too short
@Mu: Okay that makes sense now that I read stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.htmlKyle Brandt

3 Answers

67
votes

My R-Fu is weak to the point of being non-existent but I think I know what's up.

The string handling part of the R processor has to peek inside the strings to convert \n and related escape sequences into their character equivalents. R doesn't know what \. means so it complains. You want to get the escaped dot down into the regex engine so you need to get a single \ past the string mangler. The usual way of doing that sort of thing is to escape the escape:

grepl("Processor\\.[0-9]+\\..*Processor\\.Time", names(web02))

Embedding one language (regular expressions) inside another language (R) is usually a bit messy and more so when both languages use the same escaping syntax.

9
votes

Instead of

\.

Try

\\.

You need to escape the backspace first.

4
votes

The R-centric way of doing this is using the [::] notation, for example:

grepl("[:.:]", ".")
# [1] TRUE
grepl("[:.:]", "a")
# [1] FALSE

From the docs (?regex):

The metacharacters in extended regular expressions are . \ | ( ) [ { ^ $ * + ?, but note that whether these have a special meaning depends on the context.

[:punct:] Punctuation characters: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.