I have installed the Jetstack cert-manager within my private GKE cluster. That all went well, but I can't get a certificate issued. The error that I get is:
E1101 03:45:15.754642 1 sync.go:184] cert-manager/controller/challenges "msg"="propagation check failed" "error"="wrong status code '404', expected '200'" "dnsName"="[snip]" "resource_kind"="Challenge" "resource_name"="[snip]-certificate-2096248848-189663135-2951658629" "resource_namespace"="default" "type"="http-01"
I1101 03:45:15.755017 1 controller.go:135] cert-manager/controller/challenges "level"=0 "msg"="finished processing work item" "key"="default/[snip]-certificate-2096248848-189663135-2951658629"
I1101 03:45:25.755400 1 controller.go:129] cert-manager/controller/challenges "level"=0 "msg"="syncing item" "key"="default/[snip]-certificate-2096248848-189663135-2951658629"
I1101 03:45:25.755810 1 pod.go:58] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "level"=0 "msg"="found one existing HTTP01 solver pod" "dnsName"="[snip]" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-b6k59" "related_resource_namespace"="default" "resource_kind"="Challenge" "resource_name"="[snip]-certificate-2096248848-189663135-2951658629" "resource_namespace"="default" "type"="http-01"
I1101 03:45:25.755897 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "level"=0 "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="[snip]" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-qsvbv" "related_resource_namespace"="default" "resource_kind"="Challenge" "resource_name"="[snip]-certificate-2096248848-189663135-2951658629" "resource_namespace"="default" "type"="http-01"
I1101 03:45:25.755960 1 ingress.go:91] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "level"=0 "msg"="found one existing HTTP01 solver ingress" "dnsName"="[snip]" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-br7d2" "related_resource_namespace"="default" "resource_kind"="Challenge" "resource_name"="[snip]-certificate-2096248848-189663135-2951658629" "resource_namespace"="default" "type"="http-01"
This corresponds with an error event in the ClusterIssuer
that I deployed:
Warning ErrVerifyACMEAccount 27m (x4 over 28m) cert-manager Failed to verify ACME account: Get https://acme-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
Because of this my CertificateRequest
and Certificate
resources perpetually stay in a "pending" state.
This is happening during initial cluster creation. My configuration for the certificate manager & ingress is as follows:
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-uat
spec:
acme:
email: cert-manager+uat@[snip]
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-uat-private-key
solvers:
- http01:
ingress:
class: nginx
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
name: [snip]-uat-certificate
spec:
secretName: [snip]-uat-tls-cert
duration: 2160h
renewBefore: 360h
commonName: [snip]
dnsNames:
- [snip]
issuerRef:
name: letsencrypt-uat
kind: ClusterIssuer
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: [snip]-uat-tls-ingress
namespace: default
annotations:
kubernetes.io/ingress.class: "nginx"
cert-manager.io/cluster-issuer: letsencrypt-uat
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/affinity: "cookie"
spec:
rules:
- host: [snip]
http:
paths:
- backend:
serviceName: [snip]-uat-webapp-service
servicePort: 80
tls:
- hosts:
- [snip]
secretName: [snip]-uat-tls-cert
I am on a GKE private cluster and have therefore also been unable to run the webhook component. The documentation seems to imply that this it's OK, but not recommended, to run this way.
Also, I note that the documentation references the need to add a firewall rule to allow the webhook to work. And I wonder if that is also relevant here? The error above seems to indicate some kind of networking (firewall?) related issue.
Environment details:: GKE (1.14.7-gke.10) Kubernetes (v1.16.2) (I think) cert-manager (0.11.0)
Installed with kubectl
Do I need to configure a firewall rule, perhaps?
Many thanks, Ben
Edit 1:
The "dial tcp: i/o timeout" is a red herring. That error persists only as long as the DNS takes to initialise with my cluster. I am also coming closer to the conclusion that the propagation error is simply LetsEncrypt DNS not seeing my domain associated with my IP address (yet).
Is it correct that I use an A record here? I made the DNS update around an hour ago - is there any way that I can see what LetsEncrypt's DNS sees?