Splitting type-classes and their instances to the different submodules in Haskell

Question

I am currently writing a small helper library and I've faced the problem of really huge source code in one of the modules. Basically, I am declaring a new parametric type-class and want to implement it for two different monad stacks.

I've decided to split the declaration of type-class and its implementations to the different modules, but I'm constantly getting warnings about orphaned instances.

As I know, that might happen if it is possible to import a datatype without an instance, i.e. if they are in a different module. But I have both type declaration and instance implementation inside each module.

To simplify the whole example, here is what I have now: First is the module, where I define a type-class

-- File ~/library/src/Lib/API.hs 
module Lib.API where

-- Lots of imports

class (Monad m) => MyClass m where
  foo :: String -> m () 
  -- More functions are declared

Then the module with instance implementation:

-- File ~/library/src/Lib/FirstImpl.hs
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}
module Lib.FirstImpl where

import Lib.API
import Data.IORef
import Control.Monad.Reader

type FirstMonad = ReaderT (IORef String) IO

instance MyClass FirstMonad where
  foo = undefined

Both of them are listed in my project's .cabal file, it's also impossible to use FirstMonad without the instance because they are defined in one file.

However, when I launch ghci using stack ghci lib, I'm getting the next warning:

~/library/src/Lib/FirstImpl.hs:11:1: warning: [-Worphans]
    Orphan instance: instance MyClass FirstMonad
    To avoid this
        move the instance declaration to the module of the class or of the type, or
        wrap the type with a newtype and declare the instance on the new type.
   |
11 | instance MyClass FirstMonad where
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...
Ok, two modules loaded

What am I missing and is there any way to split type-class declarations and their implementations into the different submodules?

Akihito KIRISAKI Akihito KIRISAKI · Accepted Answer · 2020-12-22T11:53:55

To avoid this, you can wrap type in newtype

newtype FirstMonad a = FirstMonad (ReaderT (IORef String) IO a)

But after deep considering you feel need orphan instances, you can suppress warnings:

{-# OPTIONS_GHC -fno-warn-orphans #-}

Detail

Coherence

For example, considering following definition for now:

data A = A

instance Eq A where
   ...

It can be regarded as type based overloading. In the above, Checking equality (==) is able to be used under various types:

f :: Eq a => a -> a -> a -> Bool
f x y z = x == y && y == z

g :: A -> A -> A -> Bool
g x y z = x == y && y == z

In definition of f, type a is abstract and under constraint Eq, but in g, type A is concrete. The former derives method from constrains, but Haskell also in the latter can derive. How to derive is to just elaborate Haskell into language which has no type class. This way is called dictionary passing.

class C a where
  m1 :: a -> a

instance C A where
  m1 x = x

f :: C a => a -> a
f = m1 . m1

It will be converted:

data DictC a = DictC
  { m1 :: a -> a
  }

instDictC_A :: DictC A
instDictC_A = DictC
  { m1 = \x -> x
  }

f :: DictC a -> a -> a
f d = m1 d . m1 d

As the above, make a data type called dictionary corresponds to a type class, and pass the value of the type.

Haskell has a constraint that a type may not be declared as an instance of a particular class more than once in the program. This causes various problems.

class C1 a where
  m1 :: a

class C1 a => C2 a where
  m2 :: a -> a

instance C1 Int where
  m1 = 0

instance C2 Int where
  m2 x = x + 1

f :: (C1 a, C2 a) => a
f = m2 m1

g :: Int
g = f

This code uses inheritance of type class. It derives following elaborated code.

  { m1 :: a
  }

data DictC2 a = DictC2
  { superC1 :: DictC1 a
  , m2 :: a -> a
  }

instDictC1_Int :: DictC1 Int
instDictC1_Int = DictC1
  { m1 = 0
  }

instDictC2_Int :: DictC2 Int
instDictC2_Int = DictC2
  { superC1 = instDictC1_Int
  , m2 = \x -> x + 1
  }

f :: DictC1 a -> DictC2 a -> a
f d1 d2 = ???

g :: Int
g = f instDictC1_Int instDictC2_Int

Well, what is definition of f going on? Actually, Definition's' are following:

f :: DictC1 a -> DictC2 a -> a
f d1 d2 = m2 d2 (m1 d1)

f :: DictC1 a -> DictC2 a -> a
f _ d2 = m2 d2 (m1 d1)
  where
    d1 = superC1 d2

Do you confirm it has no problem in typing? If Haskell can define Int as a instance of C1 repeatedly, superC1 in DictC2 will be filled in elaboration, the value will be probably defferent from DictC1 a passed to f when call g.

Let's see more example:

h :: (Int, Int)
h = (m1, m1)

Of course, elaboration is one:

h :: (Int, Int)
h = (m1 instDictC1_Int, m1 instDictC1_Int)

But if can define instance repeatedly, can also consider following elaboration:

h :: (Int, Int)
h = (m1 instDictC1_Int, m1 instDictC1_Int')

Hence, two same types are applied two different instances. For example, calling same function twice, but returns different value by different algorithm possibly.

The stated example is little bit exaggerated, though how about next example?

instance C1 Int where
  m1 = 0

h1 :: Int
h1 = m1

instance C1 Int where
  m1 = 1

h2 :: (Int, Int)
h2 = (m1, h1)

In this case, quite possibly use different instances m1 in h1 and m1 in h2. Haskell often prefers to transformation based on equational reasoning, so it will be a problem that h1 is not able to be replaced directly to m1.

Generally, type system include resolving instances of type classes. In such a case, resolve instances when check types. And codes are elaborated by derivation tree made during checking types. Such transformation is sometimes adapted by besides type class, specifically, implicit type conversion, record type and so on. Then, these cases possibly cause the problem as the above. This problem can formalized following:

When convert derivation tree of type into language, in two different derivation tree of one type, results of conversion don't become semantically equivalent.

As the stated, even apply whatever instance matches type, and it generally must pass type checking. However, a result of elaboration by using a instance is possibly different a result of elaboration after resolving other instance. Vice versa, if don't have this problem, can acquire certain guarantee of type system. This guarantee, a combination of type system which the problem formalized above doesn't work and property pf elaboration, is generally called coherence. There are some way to guarantee coherence, Haskell limits number of instance definition corresponding type class to one in order to guarantee coherence.

Orphan Instance

How Haskell does is easy to say, but has some issues. Quite famous one is orphan instance. GHC, in a type declaration T as an instance of C, treatment of instance depends on whether or not the declaration is in a same module which has declaration T or C. Especially, not in same module, called orphan instance, GHC will warn. Why how it works?

First, in Haskell, instances propagate implicitly between modules. This is stipulated as following:

All instances in scope within a module are always exported and any import brings all instances in from the imported module. Thus, an instance declaration is in scope if and only if a chain of import declarations leads to the module containing the instance declaration. --5 Modules

We can't stop this, can't control this. In the first place, Haskell decided to let us define one type as one instance, so it's unnecessary to mind it. By the way, it's as good there is such regulation, actually compiler of Haskell must resolve instances according to the regulation. Of course, compiler doesn't know which modules have instances, must check all modules at worst case. It also bothers us. If two important modules hold each instance definition toward same type, all modules which have their import chains include the modules become unavailable in order to conflict.

Well, to use a type as a instance of a class, we need information of them, so we will go to see a module which has declarations. Then, that a third party fiddles the module is not going to happen. Therefore, if either of the modules includes the instance declaration, compiler can see necessary information with instances, we are happy that enable to load modules guarantees that they have no conflicts. For that reason, that a type as an instance of a class placed in a same module which has declaration the type or the class is being recommended. On the contrary, avoiding orphan instance as much as possible is being recommended. Hence, if want to make a type as a independent instance, making a new type by newtype in order to only change semantics of a instance, declaring the type as the instance.

In addition, GHC marks up internally modules have orphan instances, modules have orphan instances are enumerated in their dependent modules' interface files. And then, compiler refers all of the list. Thus, to make orphan instance once, an interface file of a module which has the instance, when all modules depend on the module recompile, will reloaded if whatever changes. So, orphan instance affects bad to compile time.

Detail is under CC BY-SA 4.0 (C) Mizunashi Mana

Original is 続くといいな日記 – 型クラスの Coherence と Orphan Instance

2020-12-22 revised and translated by Akihito Kirisaki

Splitting type-classes and their instances to the different submodules in Haskell

1 Answers

Detail

Coherence

Orphan Instance