103
votes

I have some data that look like this

john, dave, chris
rick, sam, bob
joe, milt, paul

I'm using this regex to match the names

/(\w.+?)(\r\n|\n|,)/

which works for the most part but the file ends abruptly after the last word meaning the last value doesn't end in \r\n, \n or , it ends with EOF. Is there a way to match EOF in regex so I can put it right in that second grouping?

9
Are you trying to capture all the names in one group or one capture group per name? - Andrew Hare
one thing to do when having trouble with regex is to try elements of you pattern in isolation. if you are concerned about the token at the end, test your expression without it. - akf
just wanted to add a great regex testing site: regexplanet.com/simple - northpole
@Sinan - I agree; merged - Marc Gravell♦

9 Answers

177
votes

The answer to this question is \Z took me awhile to figure it out, but it works now. Note that conversely, \A matches beginning of the whole string (as opposed to ^ and $ matching the beginning of one line).

26
votes

EOF is not actually a character. If you have a multi-line string, then '$' will match the end of the string as well as the end of a line.

In Perl and its brethren, \A and \Z match the beginning and end of the string, totally ignoring line-breaks.

GNU extensions to POSIX regexes use \` and \' for the same things.

19
votes

In Visual Studio, you can find EOF like so: $(?![\r\n]). This works whether your line endings are CR, CRLF, or just LF.

As a bonus, you can ensure all your code files have a final newline marker like so:

               Find What: (?<![\r\n])$(?![\r\n])
            Replace With: \r\n
 Use Regular Expressions: checked
Look at these file types: *.cs, *.cshtml, *.js

How this works:

Find any line end (a zero-width match) that is not preceded by CR or LF, and is also not followed by CR or LF. Some thought will show you why this works!

Note that you should Replace With your desired line-ending character, be it CR, LF, or CRLF.

11
votes

Contrast the behavior of Ryan's suggested \Z with \z:

$ perl -we 'my $corpus = "hello\n"; $corpus =~ s/\Z/world/g; print(":$corpus:\n")'
:helloworld
world:
$ perl -we 'my $corpus = "hello\n"; $corpus =~ s/\z/world/g; print(":$corpus:\n")'
:hello
world:
$ 

perlre sez:

\Z  Match only at end of string, or before newline at the end
\z  Match only at end of string

A translation of the test case into Ruby (1.8.7, 1.9.2) behaves the same. In a comment above, @mmdemirbas adds that Java is the same.

6
votes

Recently I was looking for something like this, but for JavaScript.

Putting this here, so that anyone with the same issue can benefit

var matchEndOfInput = /$(?![\r\n])/gm;

Basically this would match the end of the line, which is not followed by carriage return or new line characters. In essence this is the same as \Z but for JavaScript.

2
votes

Do you really have to capture the line separators? If not, this regex should be all you need:

/\w+/

That's assuming all the substrings you want to match consist entirely of word characters, like in your example.

2
votes

Maybe try $ (EOL/EOF) instead of (\r\n|\n)?

/\"(.+?)\".+?(\w.+?)$/
1
votes

Assuming you are using proper modifier forcing to treat string as a whole (not line-by-line - and if \n works for you, you are using it), just add another alternative - end of string: (\r\n|\n|,|$)

0
votes

/(\w.+?)(\r\n|\n|,|$)/