0
votes

A Microsoft Azure Cloud Service has a web role which is defined like this in the service definition:

<ServiceDefinition name="Magic" schemaVersion="" xmlns="[WHATEVER]">
   <WebRole name="MagicRole">
      <Sites>
        <Site name="Web" >
          <Bindings>
            <Binding name="HttpIn" endpointName="HttpIn" />
            <Binding name="HttpsIn" endpointName="HttpsIn" />
          </Bindings>
        </Site>
      </Sites>
      <Endpoints>
        <InputEndpoint name="HttpIn" protocol="http" port="80" />
        <InputEndpoint name="HttpsIn" protocol="https"
           port="443" certificate="ServiceCert"/>
      </Endpoints>
      <Certificates>
        <Certificate name="ServiceCert"
           storeLocation="LocalMachine" storeName="My" />
      </Certificates>
   </WebRole>
</ServiceDefinition>

The service has been working just fine for months. Recently users started reporting some obscure problems when establishing SSL connection to the service.

Safari on iOS is reported to say it's unable to verify server identity, cURL is reported to say it's unable to get local issuer certificate and third party SSL validation tools such as this and this are reported to say the certificate is improperly installed.

The problem is not reproduced consistently. Sometimes requests succeed and sometimes they fail. Third party tools sometimes report the service is properly configured and sometimes report its misconfigured.

Nothing was changed in the service for two week before users started reporting those problems.

What could cause this problem?

1

1 Answers

1
votes

Your service definition is broken. Azure documentation has recently been updated to show proper configuration. The Certificates element must list all the intermediate certificates from the service certificate trust chain. The intermediate certificates will not be bound to any endpoints, they just have to be listed. Here's how:

<Certificates>
  <Certificate name="IntermediateCAForServiceCert"
       storeLocation="LocalMachine" storeName="CA" />
  <!-- List all intermediate certificates from the chain
     when the chain contains more than one intermediate-->
  <Certificate name="ServiceCert"
       storeLocation="LocalMachine" storeName="My" />
</Certificates>

With such configuration the intermediate certificate is installed into the local store and IIS can find it there and serve to clients (along with the service certificate) so that clients can validate the service certificate chain.

With your configuration you likely see "it works" anyway because of undocumented behavior somewhere deep in Windows/IIS. This answer shows proofs. In a nutshell when the service certificate is being installed into the role instance something in the middleware attempts to fetch missing intermediate certificates from the CA infrastructure and store them somewhere locally (not in the certificate store). If fetching succeeds then IIS has the certificate and can serve it to clients. If fetching fails (network problems, CA infrastructure temporarily unavailable, whatever) then IIS only serves the service certificate.

Remember there's a load balancer in front of your web role. Different requests may arrive to different instances. If your service scales out and new instances are started they may be unable to fetch the intermediates and requests arriving to them will yield responses without intermediates which makes users unhappy. Some instances may be relocated or re-images and fail to re-fetch the intermediates which will cause the same problem.

The bottom line your service definition causes unreliable service behavior. List the intermediates under Certificates element to fix this.