Pick the most correct FromJSON instance with overlapping definitions

Question

I have a kind of unusual use case for supporting multiple versions of a record that is communicated via JSON and has a large number of Maybe values.

data VersionedThing = V1 Thing1 | V2 Thing2 

data Thing1 = Thing { 
  name :: Maybe String,
  val1 :: Maybe String,
  val2 :: Maybe String,
}

data Thing2 = Thing { 
  name :: Maybe String,
  val3 :: Maybe String,
  val4 :: Maybe String,
} 

instance FromJSON Thing1 where 
  parseJSON (Object v) = Thing <$> v.: "name" <*> v.:? "val1" <*> v .:? "val2"

instance FromJSON Thing2 where 
  parseJSON (Object v) = Thing <$> v.: "name" <*> v.:? "val3" <*> v .:? "val4"

instance FromJSON (VersionedThing) where
  parseJSON v = (V1 <$> parseJSON v)
        `mplus` (V2 <$> parseJSON v)

My problem is that because these records share a name field with no other requirements, JSON that represents a specific version will always be able to be parsed as another version.

For example decoding the JSON

{
  "name":"Foo",
  "val3":"Bar",
  "val4":"Baz"
}

Could yield the haskell values:

Thing1 (Just "Foo") Nothing Nothing

or

Thing2 (Just "Foo") (Just "Bar") (Just "Baz)

Is there a way to write my FromJSON instance of VersionedThing in such a way that it always parses whichever is the "most correct" value, rather than simply using the first one to succeed?

What if the object contained both a "val1" and a "val3"? What if it contained none of them? I think "most correct" needs to be better defined. If "val1" and "val3" are not allowed to co-exist, make parsing fail if you see "val3" when parsing Thing1. — Rahul Manne
You'll only ever get JSON that looks like Thing1 or Thing2. I can pick an arbitrary default case for only having a "name". The real issue is this could grow to much more than two versions of data and I'd like something more extensible than just looking for fields that shouldn't be in some cases. So "most correct" would be whichever datatype decodes with the least number of Nothing's. — jkeuhlen
What if, you made parsing fail if all of the fields were Nothing? I think that would resolve your issue. (eg: for Thing1, parseJSON (Object v) = do { n <- v.: "name"; v1 <- v .:? "val1"; v2 <- v .:? "val2"; if v1 == Nothing && v2 == Nothing then fail "At least one must be present" else ...}. — Rahul Manne
That might work... But I'd prefer to not have to change the parseJSON instances for Thing1 or Thing2. My general goal is to allow multiple versions of the data to coexist so I'd prefer to not change any previous definitions. I was kind of thinking I could decode them, then re-encode into Object's and look at the length of the HashMap to see the number of keys that succeeded. But that felt pretty hacky too. — jkeuhlen
Write a scoring function that tells how good a parse is (eg, number of Just). Then try all parses and pick the one with the highest score. — augustss

Daniel Wagner Daniel Wagner · Accepted Answer · 2018-05-18T14:22:43

Here's my plan: while parsing, we'll track which keys we've looked at. Parsers that don't consume all the keys of the object will fail. Here's your preamble, fleshed out to be complete and compilable:

{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative
import Control.Monad.Writer
import Data.Aeson
import Data.Aeson.Types (Parser)
import Data.Functor.Compose
import Data.HashMap.Lazy (HashMap)
import Data.Text (Text)
import qualified Data.HashMap.Lazy as HM

data VersionedThing = V1 Thing1 | V2 Thing2 deriving (Eq, Ord, Read, Show)

data Thing1 = Thing1
    { name :: Maybe String
    , val1 :: Maybe String
    , val2 :: Maybe String
    } deriving (Eq, Ord, Read, Show)

data Thing2 = Thing2
    { name :: Maybe String
    , val3 :: Maybe String
    , val4 :: Maybe String
    } deriving (Eq, Ord, Read, Show)

Now we'll add a type for parsing and tracking at the same time, together with embeddings for "just parse without tracking" and "just track without parsing".

type ParseAndTrack = Compose Parser (Writer (HashMap Text ()))

parse :: Parser a -> ParseAndTrack a
track :: Text -> ParseAndTrack ()

parse p = Compose (pure <$> p)
track t = Compose . pure . tell $ HM.singleton t ()

We can use these two primitives to lift (.:) and (.:?) to tracked versions of themselves. We'll use the suffix & for things that parse and track.

(.:&) :: FromJSON a => Object -> Text -> ParseAndTrack a
o .:& t = track t *> parse (o .: t)

(.:?&) :: FromJSON a => Object -> Text -> ParseAndTrack (Maybe a)
o .:?& t = (Just <$> (o .:& t)) <|> pure Nothing

Finally, we'll give a top-level way to drop back down from "parse-and-track" mode to "parse-only" mode, failing if we haven't consumed all the keys available.

consumeAllOf :: Object -> ParseAndTrack a -> Parser a
consumeAllOf o p = do
    (result, keys) <- runWriter <$> getCompose p
    let unusedKeys = HM.difference o keys
    unless (null unusedKeys) . fail $
        "unrecognized keys " ++ show (HM.keys unusedKeys)
    return result

Now we can write your two parsers with the above additional tools, and things should pretty much just work. The only difference in the parsers for Thing1 and Thing2 is that we throw a consumeAllOf on front and use the tracking versions of .: and .:? in the middle.

instance FromJSON Thing1 where
    parseJSON (Object v) = consumeAllOf v $ Thing1 <$> v.:& "name" <*> v.:?& "val1" <*> v .:?& "val2"

instance FromJSON Thing2 where
    parseJSON (Object v) = consumeAllOf v $ Thing2 <$> v.:& "name" <*> v.:?& "val3" <*> v .:?& "val4"

instance FromJSON (VersionedThing) where
    parseJSON v = (V1 <$> parseJSON v)
          `mplus` (V2 <$> parseJSON v)

Try it out in ghci:

> decode "{\"name\": \"foo\", \"val1\": \"bar\"}" :: Maybe VersionedThing
Just (V1 (Thing1 {name = Just "foo", val1 = Just "bar", val2 = Nothing}))
> decode "{\"name\": \"foo\", \"val3\": \"bar\"}" :: Maybe VersionedThing
Just (V2 (Thing2 {name = Just "foo", val3 = Just "bar", val4 = Nothing}))

Pick the most correct FromJSON instance with overlapping definitions

1 Answers