2
votes

I am trying to write a machine learning library in Haskell, to work on my Haskell skills. I thought about a general design involving a class which is like so:

  class Classifier classifier where
    train :: X -> y -> trainingData
    classify :: trainingData -> x -> y

For example, given a set of examples X, and their true labels y, train returns trainingData which is used in the classify function.

So, if I want to implement KNN, I would do it like so:

data KNN = KNN Int (Int -> Int -> Float) 

Where the first int is the number of neighbors and the function its the metric that calculate the distance between the vectors

  instance Classifier KNN where
---This is where I am stuck---

How can I implement the Classifier type class function so they would be generic to all of the classifier that I will create? I am feeling like I am treating Haskell too much like an imperative OOP like language and I'd like to do this the Haskell way.

2
It sounds like you're starting at the problem from the wrong end. Can you post some type signatures of some actual classify and train functions for the various classifiers you plan on creating? Then it will probably be obvious if, why and how to abstract thingsjberryman

2 Answers

4
votes

I would say you need multi-parameter type classes (with optional functional dependencies, or type families; I omit those).

 class Classifier c s l  k where
      train :: c -> [(s, l)] -> k
      classify :: c -> k -> s -> l
      combine :: c -> k -> k -> k

There is a four-sided relationship between classifier, sample, label and knowledge types.

The train method derives some knowledge (k) from a set of sample (s) — label (l) pairs. The classify method uses that knowledge to infer a label for a sample. (The combine method joins two pieces of knowledge together; don't know if it always applies).

3
votes

Assuming your type class has no knowledge of what a classifier provides, you could do something like

class Classifier c where
  train :: [x] -> [y] -> c -> [(x,y)]
  classify :: [(x,y)] -> c -> x > y

Here, train is getting a list of samples of type x, a list of labels of type y, and a classifier of some type c, and needs to return a list of sample/label pairs.

classify takes a list of sample/label pairs (such as that produced by train), the classifier, and a sample, and produces a new label.

(At the very least, though, I'd probably replace [(x,y)] with something like Map x y.)

The key is that the classifier itself needs to be used by both train and classify, although you don't need to know what that would look like at this time.

Your instance for KNN could then look like

instance Classifier KNN where

  train samples labels (KNN n f) = ...
  classify td (KNN n f) sample = ...

Here, n and f can be used both to create the training data, and to help pick the closest member of the training data for a sample point.