Why is 'pure polymorphism' preferable over using RTTI?

Question

Almost every C++ resource I've seen that discusses this kind of thing tells me that I should prefer polymorphic approaches to using RTTI (run-time type identification). In general, I take this kind of advice seriously, and will try and understand the rationale -- after all, C++ is a mighty beast and hard to understand in its full depth. However, for this particular question, I'm drawing a blank and would like to see what kind of advice the internet can offer. First, let me summarize what I've learned so far, by listing the common reasons that are quoted why RTTI is "considered harmful":

Some compilers don't use it / RTTI is not always enabled

I really don't buy this argument. It's like saying I shouldn't use C++14 features, because there are compilers out there that don't support it. And yet, no one would discourage me from using C++14 features. The majority of projects will have influence over the compiler they're using, and how it's configured. Even quoting the gcc manpage:

-fno-rtti

Disable generation of information about every class with virtual functions for use by the C++ run-time type identification features (dynamic_cast and typeid). If you don't use those parts of the language, you can save some space by using this flag. Note that exception handling uses the same information, but G++ generates it as needed. The dynamic_cast operator can still be used for casts that do not require run-time type information, i.e. casts to "void *" or to unambiguous base classes.

What this tells me is that if I'm not using RTTI, I can disable it. That's like saying, if you're not using Boost, you don't have to link to it. I don't have to plan for the case where someone is compiling with -fno-rtti. Plus, the compiler will fail loud and clear in this case.

It costs extra memory / Can be slow

Whenever I'm tempted to use RTTI, that means I need to access some kind of type information or trait of my class. If I implement a solution that does not use RTTI, this usually means I will have to add some fields to my classes to store this information, so the memory argument is kind of void (I'll give an example of this further down).

A dynamic_cast can be slow, indeed. There's usually ways to avoid having to use it speed-critical situations, though. And I don't quite see the alternative. This SO answer suggests using an enum, defined in the base class, to store the type. That only works if you know all your derived classes a-priori. That's quite a big "if"!

From that answer, it seems also that the cost of RTTI is not clear, either. Different people measure different stuff.

Elegant polymorphic designs will make RTTI unnecessary

This is the kind of advice I take seriously. In this case, I simply can't come up with good non-RTTI solutions that cover my RTTI use case. Let me provide an example:

Say I'm writing a library to handle graphs of some kind of objects. I want to allow users to generate their own types when using my library (so the enum method is not available). I have a base class for my node:

class node_base
{
  public:
    node_base();
    virtual ~node_base();

    std::vector< std::shared_ptr<node_base> > get_adjacent_nodes();
};

Now, my nodes can be of different types. How about these:

class red_node : virtual public node_base
{
  public:
    red_node();
    virtual ~red_node();

    void get_redness();
};

class yellow_node : virtual public node_base
{
  public:
    yellow_node();
    virtual ~yellow_node();

    void set_yellowness(int);
};

Hell, why not even one of these:

class orange_node : public red_node, public yellow_node
{
  public:
    orange_node();
    virtual ~orange_node();

    void poke();
    void poke_adjacent_oranges();
};

The last function is interesting. Here's a way to write it:

void orange_node::poke_adjacent_oranges()
{
    auto adj_nodes = get_adjacent_nodes();
    foreach(auto node, adj_nodes) {
        // In this case, typeid() and static_cast might be faster
        std::shared_ptr<orange_node> o_node = dynamic_cast<orange_node>(node);
        if (o_node) {
             o_node->poke();
        }
    }
}

This all seems clear and clean. I don't have to define attributes or methods where I don't need them, the base node class can stay lean and mean. Without RTTI, where do I start? Maybe I can add a node_type attribute to the base class:

class node_base
{
  public:
    node_base();
    virtual ~node_base();

    std::vector< std::shared_ptr<node_base> > get_adjacent_nodes();

  private:
    std::string my_type;
};

Is std::string a good idea for a type? Maybe not, but what else can I use? Make up a number and hope no one else is using it yet? Also, in the case of my orange_node, what if I want to use the methods from red_node and yellow_node? Would I have to store multiple types per node? That seems complicated.

Conclusion

This examples doesn't seem overly complex or unusual (I'm working on something similar in my day job, where the nodes represent actual hardware that gets controlled through the software, and which do very different thing depending on what they are). Yet I wouldn't know a clean way of doing this with templates or other methods. Please note that I'm trying to understand the problem, not defend my example. My reading of pages such as the SO answer I linked above and this page on Wikibooks seem to suggest I'm misusing RTTI, but I would like to learn why.

So, back to my original question: Why is 'pure polymorphism' preferable over using RTTI?

What you're "missing" (as a language feature) to solve your poke oranges example would be multiple dispatch ("multimethods"). Thus, looking for ways to emulate that could be an alternative. Usually, the visitor pattern is used therefore. — Daniel Jour
Using a string as type isn't very helpful. Using an pointer to an instance of some "type" class would make this faster. But then you're basically doing manually what RTTI would be doing. — Daniel Jour
@MargaretBloom No, it won't, RTTI stands for Runtime Type Information while CRTP is only for templates -- static types, so. — edmz
@mbr0wn : all engineering processes are bound by some rules; programming is not an exception. The rules can be split into two buckets: soft rules (SHOULD) and hard rules (MUST). (There's also an advice/option bucket (COULD), so to speak.) Read how the C/C++ standard (or any other eng. standard, for the fact) defines those. I guess your problem comes from the fact you've mistaken "don't use RTTI" as a hard rule ("you MUSN'T use RTTI"). It's actually a soft rule ("you SHOULDN'T use RTTI"), meaning that you should avoid it whenever possible - and just use when you can't avoid doing so — user719662
I note lots of answers don't note the idea that your example suggests node_base is part of a library and users will make their own node types. Then they can't modify node_base to allow another solution, so maybe RTTI becomes their best option then. On the other hand, there are other ways to design such a library so that new node types can fit in much more elegantly without needing to use RTTI (and other ways to design the new node types, too). — Matthew Walton

Yakk - Adam Nevraumont Yakk - Adam Nevraumont · Accepted Answer · 2016-03-03T09:22:38

An interface describes what one needs to know in order to interact in a given situation in code. Once you extend the interface with "your entire type hierarchy", your interface "surface area" becomes huge, which makes reasoning about it harder.

As an example, your "poke adjacent oranges" means that I, as a 3rd party, cannot emulate being an orange! You privately declared an orange type, then use RTTI to make your code behave special when interacting with that type. If I want to "be orange", I must be within your private garden.

Now everyone who couples with "orangeness" couples with your entire orange type, and implicitly with your entire private garden, instead of with a defined interface.

While at first glance this looks like a great way to extend the limited interface without having to change all clients (adding am_I_orange), what tends to happen instead is it ossifies the code base, and prevents further extension. The special orangeness becomes inherent to the functioning of the system, and prevents you from creating a "tangerine" replacement for orange that is implemented differently and maybe removes a dependency or solves some other problem elegantly.

This does mean your interface has to be sufficient to solve your problem. From that perspective, why do you need to only poke oranges, and if so why was orangeness unavailable in the interface? If you need some fuzzy set of tags that can be added ad-hoc, you could add that to your type:

class node_base {
  public:
    bool has_tag(tag_name);

This provides a similar massive broadening of your interface from narrowly specified to broad tag-based. Except instead of doing it through RTTI and implementation details (aka, "how are you implemented? With the orange type? Ok you pass."), it does so with something easily emulated through a completely different implementation.

This can even be extended to dynamic methods, if you need that. "Do you support being Foo'd with arguments Baz, Tom and Alice? Ok, Fooing you." In a big sense, this is less intrusive than a dynamic cast to get at the fact the other object is a type you know.

Now tangerine objects can have the orange tag and play along, while being implementation-decoupled.

It can still lead to a huge mess, but it is at least a mess of messages and data, not implementation hierarchies.

Abstraction is a game of decoupling and hiding irrelevancies. It makes code easier to reason about locally. RTTI is boring a hole straight through the abstraction into implementation details. This can make solving a problem easier, but it has the cost of locking you into one specific implementation really easily.