I have setup Sharded MongoDB cluster using hashed sharding in kuberenetes.I first created the config server Replicaset and then created 2 shard replicasets. Finally created mongos to connect to the sharded cluster.
I followed the below link to setup sharded MongoDB Click https://docs.mongodb.com/manual/tutorial/deploy-sharded-cluster-hashed-sharding/
After creation of mongos,I have enabled sharding for the database and have sharded the collection using the hashed sharding strategy.
After all this setup,I'm able to connect to mongos and have added some data to some of the collections in the database and able to check the distribution of data across different shards.
The issue that I'm facing is when trying to access mongodb from my java spring boot project,the connection stalls randomly.But once the connection is established for a particular query, that particular query won't stall for next few tries.After some idle time if I try to make request again to mongodb,it will again start to stall.
Note : MongoDB is hosted in "DS2 v2" VM and this cluster has 4 nodes.1 for config server,2 for shards and 1 for mongos
In one of the link,they had asked to set proper shard key to all the collections and this will have an impact on the performance of the mongodb.There were couple of things to consider before selecting the right shard key,I had considered all those factors before selecting shard key.I read through this link to select shard key - Click https://www.mongodb.com/blog/post/on-selecting-a-shard-key-for-mongodb
One of the other solution that I came across was that to set the ShardingTaskExecutorPoolMaxConnecting and to limit the rate at which mongos nodes add connectons to connection pools.I tried setting it to 20,5,100,150 and none of this resolved the stalling issue that I'm facing. This is the link - Click https://jira.mongodb.org/browse/SERVER-29237
I tried tweaking other parameters like ShardingTaskExecutorPoolMinSize and taskExecutorPoolSize.Even this did not resolve stalling issue.
I also set --serviceExecutor as adaptive.
Increased the wiredTigerCacheSizeGB from 0.25 to 2.This also dint make any difference to the stalling issue
1) YAML file of service and Deployment for config server of mongodb is -
apiVersion: v1
items:
- apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose convert -d -f docker-compose.yml -o azure-deployment.yaml
kompose.version: 1.12.0 (0ab07be)
creationTimestamp: null
labels:
io.kompose.service: mongo-conf-service
name: mongo-conf-service
spec:
type: LoadBalancer
ports:
- name: "27017"
port: 27017
targetPort: 27017
selector:
io.kompose.service: mongo-conf-service
status:
loadBalancer: {}
- apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert -d -f docker-compose.yml -o azure-deployment.yaml
kompose.version: 1.12.0 (0ab07be)
creationTimestamp: null
labels:
io.kompose.service: mongo-conf-service
name: mongo-conf-service
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: mongo-conf-service
spec:
containers:
- env:
- name: MONGO_INITDB_ROOT_USERNAME
value: #Username
- name: MONGO_INITDB_ROOT_PASSWORD
value: #Password
command:
- "mongod"
- "--storageEngine"
- "wiredTiger"
- "--port"
- "27017"
- "--bind_ip"
- "0.0.0.0"
- "--wiredTigerCacheSizeGB"
- "2"
- "--configsvr"
- "--replSet"
- "ConfigDBRepSet"
image: #MongoImageName
name: mongo-conf-service
ports:
- containerPort: 27017
resources: {}
volumeMounts:
- name: mongo-conf
mountPath: /data/db
restartPolicy: Always
volumes:
- name: mongo-conf
persistentVolumeClaim:
claimName: mongo-conf
2) YAML file of service and Deployment for Shard mongodb is -
apiVersion: v1
items:
- apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose convert -d -f docker-compose.yml -o azure-deployment.yaml
kompose.version: 1.12.0 (0ab07be)
creationTimestamp: null
labels:
io.kompose.service: mongo-shard
name: mongo-shard
spec:
type: LoadBalancer
ports:
- name: "27017"
port: 27017
targetPort: 27017
selector:
io.kompose.service: mongo-shard
status:
loadBalancer: {}
- apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert -d -f docker-compose.yml -o azure-deployment.yaml
kompose.version: 1.12.0 (0ab07be)
creationTimestamp: null
labels:
io.kompose.service: mongo-shard
name: mongo-shard
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: mongo-shard
spec:
containers:
- env:
- name: MONGO_INITDB_ROOT_USERNAME
value: #Username
- name: MONGO_INITDB_ROOT_PASSWORD
value: #Password
command:
- "mongod"
- "--storageEngine"
- "wiredTiger"
- "--port"
- "27017"
- "--bind_ip"
- "0.0.0.0"
- "--wiredTigerCacheSizeGB"
- "2"
- "--shardsvr"
- "--replSet"
- "Shard1RepSet"
image: #MongoImage
name: mongo-shard
ports:
- containerPort: 27017
resources: {}
3) YAML File of mongos server:
apiVersion: v1
items:
- apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose convert -d -f docker-compose.yml -o azure-deployment.yaml
kompose.version: 1.12.0 (0ab07be)
creationTimestamp: null
labels:
io.kompose.service: mongos-service
name: mongos-service
spec:
type: LoadBalancer
ports:
- name: "27017"
port: 27017
targetPort: 27017
selector:
io.kompose.service: mongos-service
status:
loadBalancer: {}
- apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert -d -f docker-compose.yml -o azure-deployment.yaml
kompose.version: 1.12.0 (0ab07be)
creationTimestamp: null
labels:
io.kompose.service: mongos-service
name: mongos-service
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: mongos-service
spec:
containers:
- env:
- name: MONGO_INITDB_ROOT_USERNAME
value: #USername
- name: MONGO_INITDB_ROOT_PASSWORD
value: #Password
command:
- "numactl"
- "--interleave=all"
- "mongos"
- "--port"
- "27017"
- "--bind_ip"
- "0.0.0.0"
- "--configdb"
- "ConfigDBRepSet/mongo-conf-service:27017"
image: #MongoImageName
name: mongos-service
ports:
- containerPort: 27017
resources: {}
- The logs of mongos server is :
2019-08-05T05:27:52.942+0000 I NETWORK [listener] connection accepted from 10.0.0.0:5058 #308807 (79 connections now open)
2019-08-05T05:27:52.964+0000 I ACCESS [conn308807] Successfully authenticated as principal Assist_Random_Workspace on Random_Workspace from client 10.0.0.0:5058
2019-08-05T05:27:54.267+0000 I NETWORK [worker-3] end connection 10.0.0.0:52954 (78 connections now open)
2019-08-05T05:27:54.269+0000 I NETWORK [listener] connection accepted from 10.0.0.0:52988 #308808 (79 connections now open)
2019-08-05T05:27:54.275+0000 I NETWORK [listener] connection accepted from 10.0.0.0:7174 #308809 (80 connections now open)
2019-08-05T05:27:54.279+0000 I ACCESS [conn308809] SASL SCRAM-SHA-1 authentication failed for Assist_Refactored_Code_DB on Refactored_Code_DB from client 10.0.0.:7174 ; UserNotFound: User "Assist_Refactored_Code_DB@Refactored_Code_DB" not found
2019-08-05T05:27:54.281+0000 I NETWORK [worker-1] end connection 10.0.0.5:7174 (79 connections now open)
2019-08-05T05:27:54.342+0000 I NETWORK [worker-1] end connection 10.0.0.6:57391 (78 connections now open)
2019-08-05T05:27:54.343+0000 I NETWORK [listener] connection accepted from 10.0.0.0:57527 #308810 (79 connections now open)
2019-08-05T05:27:55.080+0000 I NETWORK [worker-3] end connection 10.0.0.0:56021 (78 connections now open)
2019-08-05T05:27:55.081+0000 I NETWORK [listener] connection accepted from 10.0.0.0:56057 #308811 (79 connections now open)
2019-08-05T05:27:56.054+0000 I NETWORK [worker-1] end connection 10.0.0.0:59137 (78 connections now open)
2019-08-05T05:27:56.055+0000 I NETWORK [listener] connection accepted from 10.0.0.0:59184 #308812 (79 connections now open)
2019-08-05T05:27:59.268+0000 I NETWORK [worker-1] end connection 10.0.0.5:52988 (78 connections now open)
2019-08-05T05:27:59.270+0000 I NETWORK [listener] connection accepted from 10.0.0.0:53047 #308813 (79 connections now open)
2019-08-05T05:27:59.343+0000 I NETWORK [worker-3] end connection 10.0.0.6:57527 (78 connections now open)
2019-08-05T05:27:59.344+0000 I NETWORK [listener] connection accepted from 10.0.0.0:57672 #308814 (79 connections now open)
2019-08-05T05:28:00.080+0000 I NETWORK [worker-3] end connection 10.0.1.1:56057 (78 connections now open)
2019-08-05T05:28:00.081+0000 I NETWORK [listener] connection accepted from 10.0.0.0:56116 #308815 (79 connections now open)
2019-08-05T05:28:01.054+0000 I NETWORK [worker-3] end connection 10.0.0.0:59184 (78 connections now open)
2019-08-05T05:28:01.058+0000 I NETWORK [listener] connection accepted from 10.0.0.0:59225 #308816 (79 connections now open)
2019-08-05T05:28:01.763+0000 I NETWORK [listener] connection accepted from 10.0.0.0:7173 #308817 (80 connections now open)
2019-08-05T05:28:01.768+0000 I ACCESS [conn308817] SASL SCRAM-SHA-1 authentication failed for Assist_Sharded_Database on Sharded_Database from client 10.0.0.0:7173 ; UserNotFound: User "Assist_Sharded_Database@Sharded_Database" not found
2019-08-05T05:28:01.770+0000 I NETWORK [worker-3] end connection 10.0.0.0:7173 (79 connections now open)
2019-08-05T05:28:04.271+0000 I NETWORK [worker-3] end connection 10.0.0.0:53047 (78 connections now open)
2019-08-05T05:28:04.272+0000 I NETWORK [listener] connection accepted from 10.0.0.0:53083 #308818 (79 connections now open)
2019-08-05T05:28:04.283+0000 I NETWORK [listener] connection accepted from 10.0.0.0:7105 #308819 (80 connections now open)
2019-08-05T05:28:04.287+0000 I ACCESS [conn308819] SASL SCRAM-SHA-1 authentication failed for Assist_Refactored_Code_DB on Refactored_Code_DB from client 10.0.0.0:7105 ; UserNotFound: User "Assist_Refactored_Code_DB@Refactored_Code_DB" not found
In the above logs,there is an error in authentication to Assist_Refactored_Code_DB(This database is not created by me).Im not sure why this authentication is failing and in which mongo URI the username and password should be mentioned.And Im also not sure whether this is one of the reason for stalling or not. This is the only error logs that I could find in mongos.All other logs in config server and shard mongo doesnt have any errors.
Logs of Shard1Repset is :
019-08-06T10:48:08.926+0000 I NETWORK [listener] connection accepted from 10.0.0.4:58010 #782186 (10 connections now open)
2019-08-06T10:48:11.585+0000 I NETWORK [conn782183] end connection 10.0.0.0:64938 (9 connections now open)
2019-08-06T10:48:11.586+0000 I NETWORK [listener] connection accepted from 10.0.0.7:64989 #782187 (10 connections now open)
2019-08-06T10:48:11.765+0000 I NETWORK [conn782184] end connection 10.0.0.0:62126 (9 connections now open)
2019-08-06T10:48:11.766+0000 I NETWORK [listener] connection accepted from 10.0.0.6:62302 #782188 (10 connections now open)
2019-08-06T10:48:13.763+0000 I NETWORK [conn782185] end connection 10.0.0.0:52907 (9 connections now open)
2019-08-06T10:48:13.763+0000 I NETWORK [listener] connection accepted from 10.0.0.1:52947 #782189 (10 connections now open)
2019-08-06T10:48:13.926+0000 I NETWORK [conn782186] end connection 10.0.0.0:58010 (9 connections now open)
2019-08-06T10:48:13.927+0000 I NETWORK [listener] connection accepted from 10.0.0.0:58051 #782190 (10 connections now open)
2019-08-06T10:48:16.586+0000 I NETWORK [conn782187] end connection 10.0.0.0:64989 (9 connections now open)
2019-08-06T10:48:16.587+0000 I NETWORK [listener] connection accepted from 10.0.0.0:65054 #782191 (10 connections now open)
2019-08-06T10:48:16.766+0000 I NETWORK [conn782188] end connection 10.0.0.6:62302 (9 connections now open)
2019-08-06T10:48:16.767+0000 I NETWORK [listener] connection accepted from 10.0.0.6:62445 #782192 (10 connections now open)
2019-08-06T10:48:18.765+0000 I NETWORK [conn782189] end connection 10.0.2.1:52947 (9 connections now open)
2019-08-06T10:48:18.765+0000 I NETWORK [listener] connection accepted from 10.0.2.1:52989 #782193 (10 connections now open)
2019-08-06T10:48:18.927+0000 I NETWORK [conn782190] end connection 10.0.0.4:58051 (9 connections now open)
2019-08-06T10:48:18.929+0000 I NETWORK [listener] connection accepted from 10.0.0.4:58100 #782194 (10 connections now open)
2019-08-06T10:48:21.588+0000 I NETWORK [conn782191] end connection 10.0.0.7:65054 (9 connections now open)
2019-08-06T10:48:21.589+0000 I NETWORK [listener] connection accepted from 10.0.0.7:65105 #782195 (10 connections now open)
2019-08-06T10:48:21.767+0000 I NETWORK [conn782192] end connection 10.0.0.6:62445 (9 connections now open)
2019-08-06T10:48:21.768+0000 I NETWORK [listener] connection accepted from 10.0.0.6:62581 #782196 (10 connections now open)
2019-08-06T10:48:23.766+0000 I NETWORK [conn782193] end connection 10.0.2.1:52989 (9 connections now open)
2019-08-06T10:48:23.766+0000 I NETWORK [listener] connection accepted from 10.0.2.1:53030 #782197 (10 connections now open)
2019-08-06T10:48:23.928+0000 I NETWORK [conn782194] end connection 10.0.0.4:58100 (9 connections now open)
2019-08-06T10:48:23.930+0000 I NETWORK [listener] connection accepted from 10.0.0.4:58145 #782198 (10 connections now open)
2019-08-06T10:48:26.589+0000 I NETWORK [conn782195] end connection 10.0.0.7:65105 (9 connections now open)
2019-08-06T10:48:26.590+0000 I NETWORK [listener] connection accepted from 10.0.0.7:65148 #782199 (10 connections now open)
2019-08-06T10:48:26.768+0000 I NETWORK [conn782196] end connection 10.0.0.6:62581 (9 connections now open)
2019-08-06T10:48:26.770+0000 I NETWORK [listener] connection accepted from 10.0.0.6:62746 #782200 (10 connections now open)
2019-08-06T10:48:28.766+0000 I NETWORK [conn782197] end connection 10.0.2.1:53030 (9 connections now open)
2019-08-06T10:48:28.767+0000 I NETWORK [listener] connection accepted from 10.0.2.1:53081 #782201 (10 connections now open)
2019-08-06T10:48:28.930+0000 I NETWORK [conn782198] end connection 10.0.0.4:58145 (9 connections now open)
2019-08-06T10:48:28.931+0000 I NETWORK [listener] connection accepted from 10.0.0.4:58217 #782202 (10 connections now open)
2019-08-06T10:48:31.590+0000 I NETWORK [conn782199] end connection 10.0.0.7:65148 (9 connections now open)
Logs of ConfigDBRepSet is:
2019-08-06T10:52:18.962+0000 I NETWORK [conn781553] end connection 10.0.0.4:60257 (10 connections now open)
2019-08-06T10:52:18.963+0000 I NETWORK [listener] connection accepted from 10.0.0.4:60306 #781557 (11 connections now open)
2019-08-06T10:52:21.296+0000 I NETWORK [conn781554] end connection 10.0.0.7:50910 (10 connections now open)
2019-08-06T10:52:21.297+0000 I NETWORK [listener] connection accepted from 10.0.0.7:50956 #781558 (11 connections now open)
2019-08-06T10:52:22.380+0000 I NETWORK [conn781555] end connection 10.0.0.5:54999 (10 connections now open)
2019-08-06T10:52:22.381+0000 I NETWORK [listener] connection accepted from 10.0.0.5:55043 #781559 (11 connections now open)
2019-08-06T10:52:22.554+0000 I NETWORK [conn781556] end connection 10.0.3.1:57125 (10 connections now open)
2019-08-06T10:52:22.555+0000 I NETWORK [listener] connection accepted from 10.0.3.1:57258 #781560 (11 connections now open)
2019-08-06T10:52:23.963+0000 I NETWORK [conn781557] end connection 10.0.0.4:60306 (10 connections now open)
2019-08-06T10:52:23.964+0000 I NETWORK [listener] connection accepted from 10.0.0.4:60341 #781561 (11 connections now open)
2019-08-06T10:52:26.298+0000 I NETWORK [conn781558] end connection 10.0.0.7:50956 (10 connections now open)
2019-08-06T10:52:26.299+0000 I NETWORK [listener] connection accepted from 10.0.0.7:50998 #781562 (11 connections now open)
2019-08-06T10:52:27.382+0000 I NETWORK [conn781559] end connection 10.0.0.5:55043 (10 connections now open)
2019-08-06T10:52:27.383+0000 I NETWORK [listener] connection accepted from 10.0.0.5:55086 #781563 (11 connections now open)
2019-08-06T10:52:27.555+0000 I NETWORK [conn781560] end connection 10.0.3.1:57258 (10 connections now open)
2019-08-06T10:52:27.556+0000 I NETWORK [listener] connection accepted from 10.0.3.1:57415 #781564 (11 connections now open)
2019-08-06T10:52:28.964+0000 I NETWORK [conn781561] end connection 10.0.0.4:60341 (10 connections now open)
2019-08-06T10:52:28.965+0000 I NETWORK [listener] connection accepted from 10.0.0.4:60406 #781565 (11 connections now open)
2019-08-06T10:52:31.299+0000 I NETWORK [conn781562] end connection 10.0.0.7:50998 (10 connections now open)
2019-08-06T10:52:31.300+0000 I NETWORK [listener] connection accepted from 10.0.0.7:51043 #781566 (11 connections now open)
2019-08-06T10:52:32.383+0000 I NETWORK [conn781563] end connection 10.0.0.5:55086 (10 connections now open)
2019-08-06T10:52:32.384+0000 I NETWORK [listener] connection accepted from 10.0.0.5:55136 #781567 (11 connections now open)
2019-08-06T10:52:32.556+0000 I NETWORK [conn781564] end connection 10.0.3.1:57415 (10 connections now open)
2019-08-06T10:52:32.556+0000 I NETWORK [listener] connection accepted from 10.0.3.1:57535 #781568 (11 connections now open)
2019-08-06T10:52:33.966+0000 I NETWORK [conn781565] end connection 10.0.0.4:60406 (10 connections now open)
2019-08-06T10:52:33.967+0000 I NETWORK [listener] connection accepted from 10.0.0.4:60461 #781569 (11 connections now open)
Output of sh.status() :
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5d3a7c7d035b4525a7de5eaa")
}
shards:
{ "_id" : "Shard1RepSet", "host" : "Shard1RepSet/94.245.111.162:27017", "state" : 1 }
{ "_id" : "Shard2RepSet", "host" : "Shard2RepSet/13.74.42.35:27017", "state" : 1 }
active mongoses:
"4.0.10" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
2 : Success
databases:
#Databases sharding Information
I expect the sharded mongodb to not stall at any point of time and work similar to standalone mongodb.
Can anyone guide me to resolve the stalling of sharded mongodb issue?