0
votes

Attempted to install: jFrog Artifactory HA Platform: GCE kubernetes cluster on CoreOS; 1 master, 2 workers Installation method: Helm chart Helm steps taken:

  1. Add jFrog repo to local helm: helm repo add jfrog https://charts.jfrog.io
  2. Install license as kubernetes secret in cluster: kubectl create secret generic artifactory-cluster-license --from-file=./art.lic
  3. Install via helm: helm install --name artifactory-ha jfrog/artifactory-ha --set artifactory.masterKey=,artifactory.license.secret=artifactory-cluster-license,artifactory.license.dataKey=art.lic

Result:

Helm installation went without complaint. Checked services, seemed to be fine, LoadBalancer was pending and came online.

Checked PVs and PVCs, seemed to be fine and bound:

NAME STATUS artifactory-ha-postgresql Bound volume-artifactory-ha-artifactory-ha-member-0 Bound volume-artifactory-ha-artifactory-ha-primary-0 Bound

Checked the pods and only postgres was ready:

NAME READY STATUS RESTARTS AGE artifactory-ha-artifactory-ha-member-0 0/1 Running 0 3m artifactory-ha-artifactory-ha-primary-0 0/1 Running 0 3m artifactory-ha-nginx-697844f76-jt24s 0/1 Init:0/1 0 3m artifactory-ha-postgresql-676999df46-bchq9 1/1 Running 0 3m

Waited for a few minutes, no change. Waited 2 hours, still at the same state as above. Checked logs of the artifactory-ha-artifactory-ha-primary-0 pod (it's quite long, but I can post if that will help anybody determine the problem), but noted this error:

SEVERE: One or more listeners failed to start. Full details will be found in the appropriate container log file. I couldn't think of where else to check for logs. Services were running, other pods seemed to be waiting on this primary pod.

The log continues with SEVERE: Context [/artifactory] startup failed due to previous errors and then starts spewing Java stack dumps after the "ACCESS" ASCII art, messages like WARNING: The web application [artifactory] appears to have started a thread named [Thread-5] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:

I ended up leaving the cluster up over night, and now, about 12 hours later, I'm very surprised to see that the "primary" pod did actually come online:

NAME READY STATUS RESTARTS AGE artifactory-ha-artifactory-ha-member-0 1/1 Terminating 0 19m artifactory-ha-artifactory-ha-member-1 0/1 Terminating 0 17m artifactory-ha-artifactory-ha-primary-0 1/1 Running 0 3h artifactory-ha-nginx-697844f76-vsmzq 0/1 Running 38 3h artifactory-ha-postgresql-676999df46-gzbpm 1/1 Running 0 3h

Though, the nginx pod did not. It eventually succeeded at its init container command (until nc -z -w 2 artifactory-ha 8081 && echo artifactory ok; do), but cannot pass its readiness probe: Warning Unhealthy 1m (x428 over 3h) kubelet, spczufvthh-worker-1 Readiness probe failed: Get http://10.2.2.45:80/artifactory/webapp/#/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Perhaps I missed some required step in the setup or helm installation switches? This is my first attempt at setting up jFrog Artifactory HA, and I noticed most of the instructions seem to be for baremetal clusters, so perhaps I confused something.

Any help is appreciated!

1

1 Answers

1
votes

Turned out we messed up a couple of things, and had a few misunderstandings about how the install process works. Maybe this will be some help to people in the future.

1) The masterKey value needs to be at least 16 characters long. We had initially tried too short of a key. We tried installing again and writing this new masterKey to a secret instead, but...

2) The values in the secrets seem to get read once at initial install attempt, then they are written to the persistent volume and updating the secret after that seems to have no effect.

3) We also didn't understand the license key format and constraints. You need a license for every node that will run Artifactory, and all the licenses go into a single file, with each license separated by two return/new lines.

The error logs were pretty unhelpful to us in these errors. We eventually wiped out the install, including the PVs, and finally everything went fine.