4
votes

I've been with Ruby for about a year now and have a language question: are symbols necessary because Ruby strings are mutable and not interned?

In, say, Java, strings are immutable and interned. So "foo" is always equal to "foo" in value and reference and its value cannot change. In Ruby, strings are mutable and not interned, so "a".object_id == "a".object_id will be false.

If Ruby had implemented strings like Java, symbols wouldn't be necessary, right?

4
Sidenote: As of 2.3 you can supply a flag for immutable String Literals RUBYOPT=--enable-frozen-string-literals which will make all literal strings (e.g. "This") frozen and immutable. This changes is currently planned to be default for Ruby 3.0 but it does not dispose of symbols. Symbols have their own place in ruby beyond "string" functionality, take Symbol#to_proc for example, which is extremely popular syntax how would one deal with this as a string?engineersmnky
This comment by @engineersmnky is the best answer here so far and I believe it should be converted into an answer. Symbol#to_proc is worth pretty much everything in Ruby.Aleksei Matiushkin
@mudasobwa adapted to an answer with a bit more contextengineersmnky

4 Answers

7
votes

As of Ruby 2.3 immutable Strings have been implemented optionally via the RUBYOPT flag --enable-frozen-string-literals i.e.

RUBYOPT=--enable-frozen-string-literals ruby /some/file

This will cause all String literals (strings created using "", %q{}, %Q{}, or "#{}" styles) to become immutable. This feature is currently being considered as default for Ruby 3.0. Follow along with Feature#11473. This feature is also available on a file level rather than a global level as a "magic comment"

# frozen_string_literal: true

This will have the same impact as the RUBYOPT version but will apply only to the specific file. (one other way is to interact with the VM directly RubyVM::InstructionSequence.compile_option = {frozen_string_literal: true})

Since this is optional obviously it can be turned on and off and will still be an option in 3.0 just defaulting to on instead of off. Mutable Strings can still be created using String.new and Immutable Strings can be duped to make their dup counter part mutable. (Please Note above: interpolation "#{}" creates a new Immutable string as well because of "")

All that being said it does not replace the need for Symbols in ruby. First of all the underlying C that powers ruby leverages Symbols heavily via rb_itern to handle references for things like method definitions (These have been titled "Immortal Symbols" and will never be GCed).

Additionally Symbols like all things in ruby are their own Object and have their own useful sets of functionality. Take Symbol#to_proc for example. This originated as a monkey patch solution for syntactical ease and was consumed into core in 1.8.7. This style is highly encouraged and regularly leveraged by the ruby community as a whole. Please advise how you would suggest having degradation of this feature work with a String instead of a Symbol.

While Symbols used to be considered somewhat "dangerous" (for lack of a better word) due to their internment and memory consumption in combination with the dynamics of ruby. As of Ruby 2.2 most Symbols (see above) can be garbage collected i.e. symbols created inside of ruby through String internment (#intern, #to_sym, etc.). (These have been coined "Mortal Symbols")

Minor caveats include things like

 define_method(param[:meth].to_sym) {}

This seems like since it is calling to_sym that it should be a "Mortal Symbol" but since define_method calls rb_intern to keep the method reference it actually will create an "Immortal Symbol"

Hopefully this run down helps explain the necessity of Symbol in ruby not only from a developer standpoint but also the heavy usage as part of the C internals of ruby's implementation.

2
votes

Pretty much, yes. My understanding is that the compiler keeps a table of symbols, which can grow dynamically. This is why you must never accept user-input and convert it unchecked into symbols, because you can create what's called a symbol overflow attack.

I believe the symbol overflow vulnerability was patched in Ruby 2.2.

see Getting warning : Denial of Service

2
votes

Java-like strings would replace the functionality of symbols, so in that sense you are correct. However, I don't think Matz would be happy with a language which only has immutable and interned strings.

With strings and symbols, Ruby offers the best of both worlds. Symbols provide memory-efficiency for read-only strings like hash keys, whereas mutable strings are memory-efficient for string operations like concatenation.

So maybe "if Ruby had implemented strings like Java" isn't the right train of thought. Ruby did implement strings like Java. And they're called "symbols." Then Ruby implemented a second type of string, which it calls a "string." The naming is purely aesthetic, but I think it makes sense.

2
votes

I've been with Ruby for about a year now and have a language question: are symbols necessary because Ruby strings are mutable and not interned?

No.

Symbol and String are simply two different data types. String is for text, Symbol is for labels.

In, say, Java, strings are immutable and interned.

No, they are not. They are immutable and sometimes interned, sometimes not. If Strings were interned, then why is there a method java.lang.String.intern() which interns a String? Strings in Java are only interned if

  • you call java.lang.String.intern() or
  • the String is the result of a String literal expression or
  • the String is the result of String-typed constant value expression

Otherwise, they are not.

So "foo" is always equal to "foo" in value and reference and its value cannot change.

Again, this is not true:

class Test {
  public static void main(String... args) {
    System.out.println("foo".equals(args[0]));
    System.out.println("foo" == args[0]);
  }
}

Call it with

java Test foo
# true
# false

In Ruby, strings are mutable and not interned, so "a".object_id == "a".object_id will be false.

In modern Ruby, that is not necessarily true either:

#frozen_string_literal: true
"a".object_id == "a".object_id
#=> true

If Ruby had implemented strings like Java, symbols wouldn't be necessary, right?

No. Like I said, they are different types for different use cases.

Take a look at Scala, for example, which implements "strings like Java" (in fact, on the JVM implementation of Scala, there is no String, Scala String simply is java.lang.String). Yet, it also has a Symbol class.

Likewise, Clojure has not one but two datatypes like Ruby's Symbol: keywords are exactly equivalent to Ruby's Symbols, they evaluate to themselves and stand only for themselves. Symbols OTOH may stand for something else.

Erlang has immutable strings and atoms, which are like Clojure/Lisp symbols.

ECMAScript has immutable strings and recently added a Symbol datatype. They are not 100% equivalent to Ruby Symbols, though, since they have an additional guarantee: not only do they evaluate only to themselves and stand only for themselves, but they are also unforgeable (meaning it is impossible to create a Symbol which is equal to another Symbol).

Note that Ruby is moving away from mutable strings:

  • Ruby 2.1 optimizes the pattern 'literal string'.freeze to return a frozen string from a global string pool.
  • Ruby 2.3 introduces the # frozen_string_literal: true pragma and --enable=frozen-string-literal feature toggle switch to make all string literals frozen (and pooled) by default on a per-script (pragma) or per-process (feature toggle) basis.
  • Ruby 3 will switch the default for both of those to true, so that you have to explicitly say # frozen_string_literal: false or --disable=frozen-string-literal in order to get the current behavior.
  • Some later version will remove support for mutable strings altogether.