1
votes

I have just recently upgraded to ruby 1.92 and one of my monkey patches is failing with some sort of encoding error. I have the following function:

  def strip_noise()
    return if (!self) || (self.size == 0)

    self.delete(160.chr+194.chr).gsub(/[,]/, "").strip
  end

That now gives me the following error:

incompatible character encodings: UTF-8 and ASCII-8BIT

Has anyone else come across this?

4
Welcome to Ruby 1.9's Brave New World of character encodings! You might want to read up on the topic.ewall

4 Answers

1
votes

This is working for me at the moment anyway:

class String
  def strip_noise()
    return if empty? 
    self.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'')
  end
end

I need to do more testing but I can progress..

1
votes
class String
  def strip_noise
    return if empty? 
    ActiveSupport::Inflector.transliterate self, ''
  end
end

"#{160.chr}#{197.chr} string with noises" # => "\xA0\xC5 string with noises"
"#{160.chr}#{197.chr} string with noises".strip_noise # => "A string with noises"
0
votes

This might not be exactly what you want:

  def strip_noise
    return if empty?
    sub = 160.chr.force_encoding(encoding) + 194.chr.force_encoding(encoding)
    delete(sub).gsub(/[,]/, "").strip
  end

Read more on the topic here: http://yehudakatz.com/2010/05/17/encodings-unabridged/

0
votes

It's not entirely clear what you're trying to do here, but 160.chr+194.chr is not valid UTF-8: 160 is a continuation byte, and 194 is the first byte of a 2-byte character. Reversed they form the unicode character for "non breaking space".

If you want to remove all non-ASCII-7 characters, try this:

s.delete!("^\u{0000}-\u{007F}")