I'm trying out the attached code for linear regression with automatic differentiation. It specifies a datatype [Dual][2] made of two Floats, and declares it to be instance of Num, Fractional and Floating. As in all fitting/regression tasks, there's a scalar cost function, parametrized by the fitting parameters c and m, and an optimizer which improves on estimates of these two parameters by gradient descent.
Question I'm using GHC 7.8.3, and the authors explicitly mention that this is H98 code (I mentioned it in the title because it's the only substantial difference I can think of between my setup and the Author's, however plz correct if wrong). Why does it choke within the definition of the cost function? My understanding is: the functions idD and constD map Floats to Duals, g is polymorphic (it can perform algebraic operations on Dual inputs because Dual inherits from Num, Fractional and Floating), and deriv maps Duals to Doubles. The type signature for g (the eta-reduced cost function wrt the data) was inferred. I tried omitting it, and making it more general by substituting a Floating type constraint to the Fractional one. Moreover, I tried converting the numeric types of c and m inline with (fromIntegral c :: Double), to no avail.
Specifically this code gives this error:
No instance for (Integral Dual) arising from a use of ‘g’
In the first argument of ‘flip’, namely ‘g’
In the expression: flip g (constD c)
In the second argument of ‘($)’, namely ‘flip g (constD c) $ idD m’
Any hints, please? I'm sure it's a very noob question, but I just don't get it.
The complete code is the following:
{-# LANGUAGE NoMonomorphismRestriction #-}
module ADfw (Dual(..), f, idD, cost) where
data Dual = Dual Double Double deriving (Eq, Show)
constD :: Double -> Dual
constD x = Dual x 0
idD :: Double -> Dual
idD x = Dual x 1.0
instance Num Dual where
fromInteger n = constD $ fromInteger n
(Dual x x') + (Dual y y') = Dual (x+y) (x' + y')
(Dual x x') * (Dual y y') = Dual (x*y) (x*y' + y*x')
negate (Dual x x') = Dual (negate x) (negate x')
signum _ = undefined
abs _ = undefined
instance Fractional Dual where
fromRational p = constD $ fromRational p
recip (Dual x x') = Dual (1.0 / x) (- x' / (x*x))
instance Floating Dual where
pi = constD pi
exp (Dual x x') = Dual (exp x) (x' * exp x)
log (Dual x x') = Dual (log x) (x' / x)
sqrt (Dual x x') = Dual (sqrt x) (x' / (2 * sqrt x))
sin (Dual x x') = Dual (sin x) (x' * cos x)
cos (Dual x x') = Dual (cos x) (x' * (- sin x))
sinh (Dual x x') = Dual (sinh x) (x' * cosh x)
cosh (Dual x x') = Dual (cosh x) (x' * sinh x)
asin (Dual x x') = Dual (asin x) (x' / sqrt (1 - x*x))
acos (Dual x x') = Dual (acos x) (x' / (-sqrt (1 - x*x)))
atan (Dual x x') = Dual (atan x) (x' / (1 + x*x))
asinh (Dual x x') = Dual (asinh x) (x' / sqrt (1 + x*x))
acosh (Dual x x') = Dual (acosh x) (x' / (sqrt (x*x - 1)))
atanh (Dual x x') = Dual (atanh x) (x' / (1 - x*x))
-- example
-- f = sqrt . (* 3) . sin
-- f' x = 3 * cos x / (2 * sqrt (3 * sin x))
-- linear fit sum-of-squares cost
-- cost :: Fractional s => s -> s -> [s] -> [s] -> s
cost m c x y = (/ (2 * (fromIntegral $ length x))) $
sum $ zipWith errSq x y
where
errSq xi yi = zi * zi
where
zi = yi - (m * xi + c)
-- test data
x_ = [1..10]
y_ = [a | a <- [1..20], a `mod` 2 /= 0]
-- learning rate
gamma = 0.04
g :: (Integral s, Fractional s) => s -> s -> s
g m c = cost m c x_ y_
deriv (Dual _ x') = x'
z_ = (0.1, 0.1) : map h z_
h (c, m) = (c - gamma * cd, m - gamma * md) where
cd = deriv $ g (constD m) $ idD c
md = deriv $ flip g (constD c) $ idD m
-- check for convergence
main = do
take 2 $ drop 1000 $ map (\(c, m) -> cost m c x_ y_) z_
take 2 $ drop 1000 $ z_
where the test data x_ and y_ are arrays and the learning rate gamma a scalar.
[2]: The two fields of a Dual object are in fact adjoint one with the other, if we see the derivative as an operator