Is there a way to get ruby 1.9 unicode regex behaviour in ruby 1.8 (with rails 2.3)?

Question

I'm trying to strip non-word characters from the beginning and end of a string. The function I've got so far is:

$KCODE='UTF-8'

...

def clean_string str
  str && str.gsub(/\s+/msiu, ' ').gsub(/\A\W*|\W*\Z/msiu,'')
end

It works in most cases, but it's failing on pound signs.

>> puts clean_string('£5.00')
£5.00

I've read that in ruby 1.8 this is per-spec behaviour; that all non-ascii characters are considered to be word characters. But it's not the behaviour I want; I want only word characters to be considered to be word characters, as they are in ruby 1.9.

Is there a way to get ruby 1.9 unicode regex behaviour in ruby 1.8 (with rails 2.3.10)?

Simon Simon · Accepted Answer · 2011-03-10T16:35:45

I eventually found that you can do it using the oniguruma gem:

require 'oniguruma'

def clean_string str
  squishy_regexp = Oniguruma::ORegexp.new('\s+',       'msi', 'utf8')
  clean_regexp   = Oniguruma::ORegexp.new('^\W*|\W*$', 'msi', 'utf8')

  if str
    str = squishy_regexp.gsub(str, ' ')
    str = clean_regexp.gsub(str, '')
  end

  str
end

>> puts clean_string('£5.00')
5.00

Is there a way to get ruby 1.9 unicode regex behaviour in ruby 1.8 (with rails 2.3)?

1 Answers