Creating a Semigroup data type instance in Haskell

Question

May goal concerned with creating a new Semigroup type class instance for newly defined datatype in Haskell (For those who knows the "Get programming with Haskell" book by Will Kurt, I may refer you to the page 428,i.e. the end of the capstone project 5 with exercise extension).

There is a newly defined datatype:

data HINQ m a b = HINQ (m a -> m b) (m a) (m a -> m a)
                | HINQ_ (m a -> m b) (m a)

This datatype designates the SQL-like query, where m defines the context (Monad or Alternative), (m a -> m b) is the function whose goal is similar to SQL function SELECT,i.e. defines the property kind one wants to see in a database, (m a) is a "table" to which applies the previous function (similar to SQL's table_name) and finally (m a -> m a) filters out the property one is seeking for (similar to SQL's WHERE).

My goal is to make this datatype an instance of a Semigroup (and finally a Monoid). It is worth mentioning that all needed Semigroup instances for a, b etc are assumed.

instance (Semigroup a, Semigroup (m a), Semigroup b,...) => 
          Semigroup (HINQ m a b) where
  (<>) (HINQ func1 start1 test1)
       (HINQ func2 start2 test2) =

So the rough idea (on the background to see it clearer) of it is to make it possible to compose several different queries to a database into a single query, but I couldn't come up with an idea how to merge two different functions of type (m a -> m b) into one at the same time merging two tables (m a)... The first idea was to combine them into lists but then the type signature changes I haven't found yet the solution to this.

I don't really understand the context - particularly the two different data constructors for HINQ - so this might be leading you towards the wrong solution, but if m is Applicative (which it will be if as you say it's a Monad or Alternative) and a is a semigroup then m a can be automatically made a semigroup via liftA2 (<>). (Not sure how to prove that will necessarily be associative but I'm sure it must fall out of the Applicative laws in some way.) — Robin Zigmond
It's certainly possible to write a Semigroup instance for this type, and it may even be law-abiding. (in fact, there's a trivial one - lift the existing Semigroup instances for each component of the 3-tuple). But this instance is not unique, so the Semigroup laws won't guide you here. You should first start by writing out a couple of example values for HINQ and decided how you want the result of their semigroup addition to look like. — user2407038
@user2407038, think I get you, and the main problem was to establish the Semigroup instance for (m a -> m b), which (on the background purpose) should take a "database", e.g. a List of some data a and extract needed information and return this data in a form List b for example. The problem is if I have two different functions f1 and f2 of type (m a -> m b) and two datasets of type m a (let us forget for a moment of the third one) and then I would like to merge these functions and datasets to one object for each in order to represent two queries as one. — A. Gonus
@RobinZigmond, thanks for the idea of lifting, now I should get how to align this lifting with creating a Semigroup instance for the first function. Regarding the data contractors, let us think firstly as if there is only the second one so that there is no last function. — A. Gonus
Forget about the generic types for a second. What does this operation look like for two functions of type [String] -> [String]? e.g what's the result of drop 10 <> take 20? What about reverse <> id or reverse <> reverse? As it stands your question is underspecified - there are many implementations of <> which would satisfy the semigroup law and all do different things for the examples above. What behavior do you actually want? — user2407038

Daniel Wagner Daniel Wagner · Accepted Answer · 2021-09-15T19:14:19

I think you do not want a Semigroup. It doesn't make sense to compose all pairs of queries -- only ones where the output type of the one matches up sensibly with the input type of the other! Luckily, we have a concept that corresponds to a "typed" variant of Semigroup (actually, a typed Monoid, but close enough): Category.

Also, I think it's a design mistake to couple the query with the table you are querying. They are notionally separate concepts; indeed, when composing two queries, you still have just one table, not two. So:

data HINQ m a b = HINQ (m a -> m b) (m a -> m a)

instance Category (HINQ m) where
    id = HINQ id id
    HINQ slct whr . HINQ slct' whr' = HINQ (slct . whr . slct') whr'

The identity laws are pretty clear, but the asymmetry between the uses of the right and left WHERE clauses looks a bit suspicious, so we should check the associativity law carefully:

(HINQ s0 w0 . HINQ s1 w1) . HINQ s2 w2
= HINQ (s0 . w0 . s1) w1 . HINQ s2 w2
= HINQ (s0 . w0 . s1 . w1 . s2) w2
= HINQ s0 w0 . HINQ (s1 . w1 . s2) w2
= HINQ s0 w0 . (HINQ s1 w1 . HINQ s2 w2)

Looks good!

EDIT

Err, hmm... maybe the x . id = x law isn't so clear after all. Yikes! This probably isn't fixable, unless you only consider equality up to the composition of the contained functions, in which case why not just use the Category instance for functions directly? The other option you have, of course, is not to demand the identity laws hold. That's a bit unusual, but I guess it depends a lot on your use case whether this is reasonable or not.

Composition might be easier if you represent your filtering more explicitly, e.g. as a -> Bool or a -> m Bool instead of m a -> m a. This gives you more of a chance to combine the two filters in your (.) implementation, rather than rolling one of the filters into the select operation as the instance above does.

Creating a Semigroup data type instance in Haskell

1 Answers