Can I use compound shard key MongoShardedPartitioner in spark-mongo db connector?

Question

I am new to spark-mongo connector 2.0. Please correct me if my understanding is not correct.

I need to access a sharded collection in mongodb and want to set a compound shard key in the read config. I read the documentation, it says:

shard key : The field by which to split the collection data. The field should be indexed and contain unique values.

https://docs.mongodb.com/spark-connector/v2.0/configuration/#conf-mongoshardedpartitioner

Can I use a compound shard key when partitioner is MongoShardedPartitioner? How can I do that?

In mongodb, you could use a json to set a compound shard key but not sure how to set a compound shard key in spark-mongo read config.

I have tried to add {"partitioner" -> "MongoShardedPartitioner","shardKey" -> """{"id":1, "timestamp":1}""") to the read config. However, I got java.lang.IllegalArgumentException: The value for key $gte can not be null. I guess this exception is caused by the improper shardkey.

Can anyone give me some hints? Thanks.

Peter Jakubčo Peter Jakubčo · Accepted Answer · 2018-03-15T10:59:20

You can use compound shard key, but mongo-spark-connector can take just one field name. Therefore, the key must be "wrapped" into single field.

For example, let's say you have a document:

{
  someField: 1.0,
  myShardKey: {
    id: 1.0,
    timestamp: 1
  }
}

The "myShardKey" can be used as the shard key in mongo-spark-connector. But the other requirement is that the collection must use the same field as the shard key. So the collection should be created as follows:

db.createCollection('myCollection');
sh.shardCollection('someDatabase.myCollection', {
    "myShardKey" : 1.0
});

Can I use compound shard key MongoShardedPartitioner in spark-mongo db connector?

1 Answers