3
votes

How can I get a list of compatible encodings for a Ruby String? (MRI 1.9.3)

Use case: I have some user provided strings, encoded with UTF-8. Ideally I need to convert them to ISO/IEC 8859-1 (8-bit), but I also need to fallback to unicode when some special characters are present.

Also, is there a better way to accomplis this? Maybe I am testing the wrong thing.


EDIT- adding more details
Tanks for the answers, I should probably add some context. I know how to perform encoding conversion.
I'm looking for a way to quickly find out if a string can be safely encoded to another encoding or, to put it in another (and quite wrong) way, what is the minimum encoding to support all the characters in that string.

Just converting the strings to 16-byte is not an option, because they will be sent as SMSs and converting them to a 16-byte encoding cuts the amount of available characters from 160 down to 70.

I need to convert them to 16-bytes only when they contain a special character which is not supported in ISO/IEC 8859-1.

5
This might help a little bit: yehudakatz.com/2010/05/05/…Casper
That's not a use case... just using a unicode encoding in the first place will be effectively the same but simplerEsailija
@Esailija Thanks, if I could I'd do it. I can't use unicode in the first place because I need to minimize the size of the strings. I'm working with SMPP and need to build a byte buffer from the strings. I should use a 16-bit encoding as a last resort.tompave
I see.. I thought you were talking about using different internal encodings.. converting to any encoding on output is okEsailija

5 Answers

4
votes

Unluckily, Ruby’s ideas of encoding compatibility are not fully congruent with your use case. However, trying to encode your UTF-8 string in ISO-8859-1 and catching the error that is thrown when a conversion is not possible will achieve what you are after:

begin
  'your UTF-8 string'.encode!('ISO-8859-1')
rescue Encoding::UndefinedConversionError
end

will convert your string to ISO-8859-1 if possible and leave it as UTF-8 if not.

Note this uses encode, which actually transcodes the string using Encoding::Converter (i.e. reassigns the correct encoding byte pattern to the character representations of the string), unlike force_encoding, which just changes the encoding flag (i.e. tells Ruby to interpret the string’s byte stream according to the set encoding).

2
votes

Ruby has standard library in which u can find class Encoding and his sub-class called Encoding::Converter they are probably your best friends in this case.

#!/usr/bin/env ruby
# encoding: utf-8

converter = Encoding::Converter.new("UTF-8", "ISO-8859-1")
converted = converter.convert("é")

puts converted.encoding
# => ISO-8859-1

puts converted.dump
# => "\xE9"
2
votes

Is valid_encoding? (instance method of String) useful? That is:

try_str = str.force_encoding("ISO/IEC 8859-1")
str = try_str if try_str.valid_encoding?
-1
votes

To convert to ISO-8859-1 you can follow the below code to encode it.

1.9.3p194 :002 > puts "é".force_encoding("ISO-8859-1").encode("UTF-8")
é
 => nil 

Linked Answer

-1
votes
"Some String".force_encoding("ISO/IEC 8859-1")

Also you can refer rails encoding link