44
votes

I need the MatchData for each occurrence of a regular expression in a string. This is different than the scan method suggested in Match All Occurrences of a Regex, since that only gives me an array of strings (I need the full MatchData, to get begin and end information, etc).

input = "abc12def34ghijklmno567pqrs"
numbers = /\d+/

numbers.match input # #<MatchData "12"> (only the first match)
input.scan numbers  # ["12", "34", "567"] (all matches, but only the strings)

I suspect there is some method that I've overlooked. Suggestions?

5
I want the begin and end positions for each match. But that is irrelevant to my question. MatchData exists for a reason, doesn't it? If I can get it for the first match, it follows that it would be useful for all matches.Joshua Flanagan
Ok, I want more than one thing, in a convenient package, for each match.Joshua Flanagan
You have the convenient package, as you name it, in the solution I gave below (from which you can get begin, end or whatever match data you need as you wish) . Or is it anything else that you are looking for?i-blis

5 Answers

71
votes

You want

"abc12def34ghijklmno567pqrs".to_enum(:scan, /\d+/).map { Regexp.last_match }

which gives you

[#<MatchData "12">, #<MatchData "34">, #<MatchData "567">] 

The "trick" is, as you see, to build an enumerator in order to get each last_match.

9
votes

My current solution is to add an each_match method to Regexp:

class Regexp
  def each_match(str)
    start = 0
    while matchdata = self.match(str, start)
      yield matchdata
      start = matchdata.end(0)
    end
  end
end

Now I can do:

numbers.each_match input do |match|
  puts "Found #{match[0]} at #{match.begin(0)} until #{match.end(0)}"
end

Tell me there is a better way.

8
votes

I’ll put it here to make the code available via a search:

input = "abc12def34ghijklmno567pqrs"
numbers = /\d+/
input.gsub(numbers) { |m| p $~ }

The result is as requested:

⇒ #<MatchData "12">
⇒ #<MatchData "34">
⇒ #<MatchData "567">

See "input.gsub(numbers) { |m| p $~ } Matching data in Ruby for all occurrences in a string" for more information.

2
votes

I'm surprised nobody mentioned the amazing StringScanner class included in Ruby's standard library:

require 'strscan'

s = StringScanner.new('abc12def34ghijklmno567pqrs')

while s.skip_until(/\d+/)
  num, offset = s.matched.to_i, [s.pos - s.matched_size, s.pos - 1]

  # ..
end

No, it doesn't give you the MatchData objects, but it does give you an index-based interface into the string.

0
votes
input = "abc12def34ghijklmno567pqrs"
n = Regexp.new("\\d+")
[n.match(input)].tap { |a| a << n.match(input,a.last().end(0)+1) until a.last().nil? }[0..-2]

=> [#<MatchData "12">, #<MatchData "34">, #<MatchData "567">]