1
votes

My MongoS servers are not staring they are sending this error in logs.

SHARDING [Balancer] caught exception while doing balance: Server's sharding metadata manager failed to initialize and will remain in this state until the instance is manually reset :: caused by :: HostNotFound: unable to resolve DNS for host confserv_1.xyz.com

2016-05-02T17:57:06.612+0530 I SHARDING [Balancer] about to log metadata event into actionlog: { _id: "DB2255-2016-05-02T17:57:06.611+0530-5727479aa1051c5fb04fcc49", server: "mongoS1", clientAddr: "", time: new Date(1462192026611), what: "balancer.round", ns: "", details: { executionTimeMillis: 35, errorOccured: true, errmsg: "Server's sharding metadata manager failed to initialize and will remain in this state until the instance is manually reset :: caused by :: HostNotFoun..." } }  

When I connect config server using host name it is working fine.
I tried to restart MongoS server it is not coming up.

I check Mongo code and found this error mentioned in
https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/sharding_state.cpp

/ TODO: remove after v3.4.
// This is for backwards compatibility with old style initialization through metadata
// commands/setShardVersion. As well as all assignments to _initializationStatus and
// _setInitializationState_inlock in this method.
if (_getInitializationState() == InitializationState::kInitializing) {
    auto waitStatus = _waitForInitialization_inlock(deadline, lk);
    if (!waitStatus.isOK()) {
        return waitStatus;
    }
}

if (_getInitializationState() == InitializationState::kError) {
    return {ErrorCodes::ManualInterventionRequired,
            str::stream() << "Server's sharding metadata manager failed to initialize and will "
                             "remain in this state until the instance is manually reset"
                          << causedBy(_initializationStatus)};
}  

But it does not mention anything what manual intervention is required. Current Mongo version is 3.2.6

1
'unable to resolve DNS' sounds like an administrative problem. Solutions include ping, traceroute, nmap and the likes, which seems OT. - mnemosyn
Try connecting to config server from mongos instance. Could be that config server port is not open. - titogeo
I checked that one time connectivity got lost and after that it was restored but MongoS did not picked it up. We restarted MongoS server and also stop restart balancer nothing worked. - viren
As I mentioned earlier I tried to connect config server form MongoS and it was working but it kept asking for manually restart. - viren

1 Answers

0
votes

I just ran into this problem while trying to harden the security configuration. As in your case, I was able to connect to the config servers from all mongos instances.

In my case I was also testing a case with members of replica sets being in different datacenters, and I had the problem only after steppingDown some primaries.

I noticed at the end that, not as the error message is pretending, the issue was happening on some primaries of one datacenter, who were not able to route back to the config server. After fixing the routing problem (/etc/hosts eventually), no more problems occurred on the mongo side.