Recursive data types in Python

Question

What might be the thing in Python that is closest to the to the recursive data types in Haskell? (i.e. using the type's own definition while defining itself.)

Edit:

To give a more concrete definition of a recursive type, below is a binary tree in Haskell:

data Tree a             = Leaf a | Branch (Tree a) (Tree a)

How I read this is like the following: A binary tree can either be a leaf, or can contain two sub-trees which are again the type tree itself.

For further information about recursive types in Haskell, you can refer to here: https://www.haskell.org/tutorial/goodies.html

What I actually had in mind was converting a word tree definition in Haskell to Python. This is the definition of the WordTree from an old project of mine:

data WordTree = Word String | Subword String [WordTree] | Root [WordTree]

A WordTree is a n-arytree structure where common prefixes of words are stored at the parents and remaining parts are stored at the leaf of the trees in a sorted manner. I believe this type definition is somewhat similar to a Trie. Yet, as Haskell is a functional programming language, it allows this type definition to be recursive. What might be the closest thing in Python (or maybe, in object-oriented programming, in general) for this kind of a definition of a type?

Can you perhaps give an example of such a thing in Haskell and maybe someone can show you how it's done in python — smac89
There isn't really anything terribly similar in Python, given the wildly different type system used. One could probably implement Haskell-style algebraic types with a suitable metaclass, but I don't think it's trivial. — chepner
Does this answer your question? Defining a recursive type hint in Python? — Elazar

Elazar Elazar · Accepted Answer · 2020-09-01T20:19:45

Since Python is dynamically typed, there is no issue defining whatever classes you need.

class Tree:
    left = None
    right = None
    def __init__(self, left, right):
        self.left = left
        self.right = right

Even if you are interested in typing these definitions, you can do that like in any other class-based object oriented language:

from typing import Union

class Tree:
    left: Union['Tree', int]
    right: Union['Tree', int]
    def __init__(self, left: Union['Tree', int], right: Union['Tree', int]) -> None:
        self.left = left
        self.right = right

Note the use of strings for the name of the type (which you can avoid in more recent Python versions).

See this open issue in mypy for direct recursive algebraic types such as

Tree = Union[Tuple['Tree', 'Tree'], int]

The most common (though not necessarily recommended) way of defining the WordTree you describe is using a superclass and a shallow hierarchy:

from typing import List, final

class WordTree: pass

@final
class Word(WordTree):
    word: str

@final
class Subword(WordTree):
    subword: str
    children: List[WordTree]

@final
class Root(WordTree):
    children: List[WordTree]

Using such an implementation might require using isinstance checks (though Python3.10 gives you nice sugar for those). Constructors are omitted in this example to avoid clutter; you might want to use dataclass to get them, and other kinds of behavior, easily.

To date, Python gives you no way to disallow unrelated classes from inheriting from WordTree, thus breaking some of the ability to statically reason about such programs.

Some other OOP languages, such as Scala and Kotlin and (soon) Java, can take such a definition (using sealed classes) and give you type checks and syntactic constructs that are similar to the ones given by functional languages such as Haskell.

For all I know, this kind of design is usually recommended only for pure-data classes, such as ASTs. It is less suited for defining user-facing container such as trie, since it exposes the inner workings of the data structure. So even if you go with that design, you might want to use it as an implementation detail, and use another class, Trie, to be used by client code through a well-defined API. That class can have a WordTree field, or any other way of implementing the same logic.

IMO this is essential to how object-oriented design differs from functional design. The latter focuses on data flow and on static reasoning, whereas the former focuses on APIs, extensibility and decoupling. I think this is helpful to note, when porting between languages and environments - though as noted above, some languages try to enable both design approaches.

Recursive data types in Python

1 Answers