Actually, at its core it is pretty easy to do that, though it comes with a caveat I was not able to solve in a satisfying manner.
There is something called schema validation. We are going to use that, but instead of using an actual JsonSchema, we use a query expression instead.
> db.runCommand({
// We modify an existing collection named "validate"...
collMod:"validate",
// ...adding a validator...
"validator":{
// ...ensuring that documents are only considered valid if
// the value of field A is not equal the value of field B
$expr:{$ne:["$A","$B"]}}
}
)
{ "ok" : 1 }
Let us put that to a test:
> db.validate.insert({"A":"foo","B":"bar"})
WriteResult({ "nInserted" : 1 })
> db.validate.insert({"A":"foo","B":"foo"})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
Yay, looks good so far, doesn't it? Except, there is a caveat:
> db.validate.insert({"A":"foo","B":"foo "})
WriteResult({ "nInserted" : 1 })
A simple whitespace can actually beat our little equality check. So obviously we need to expand our validator:
db.runCommand({collMod:"validate","validator":{$and:[
// Both A and B must not have leading or trailing white space.
{"A":{$regex:/^\w+$/}},
{"B":{$regex:/^\w+$/}},
// As before.
{$expr:{$ne:["$A","$B"]}}
]}})
So, let us recheck:
db.validate.deleteMany({})
{ "acknowledged" : true, "deletedCount" : 2 }
> db.validate.insert({"A":"foo","B":"bar"})
WriteResult({ "nInserted" : 1 })
> db.validate.insert({"A":"foo","B":"bar "})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
> db.validate.insert({"A":"foo","B":"foo"})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
Seems to work. However, there is no way for the user to find out why the document validation failed. If you can live with that, you have your solution.
Personally, I'd rather implement a dedicated CLI tool doing the modifications, validating the input beforehand. Data modifications for production databases should not use the DBMS CLI.
Furthermore, there are use case related questions open: is "foo" equal "Foo" and "FOO"? And if that is the case, are A and B equal when A holds "fOo" and B holds "foO"? And so on and so forth.