I've been deploying clusters in GCP via ansible scripts for more then a year now, but all of a sudden one of my scripts keeps giving me this error:
libcloud.common.google.GoogleBaseError: u\"The zone 'projects/[project]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
The obvious reason would be that I don't have enough resources, but not a whole lot has changed and quotas look good:
The ansible script itself doesn't ask for a lot. I'm creating 3 instances of n1-standard-4 with 100GB SSD. See snippet of script below:
tasks:
- name: create boot disks
gce_pd:
disk_type: pd-ssd
image: "debian-9-stretch-v20171025"
name: "{{ item.node }}-disk"
size_gb: 100
state: present
zone: "europe-west1-d"
service_account_email: "{{ service_account_email }}"
credentials_file: "{{ credentials_file }}"
project_id: "{{ project_id }}"
with_items: "{{nodes}}"
async: 3600
poll: 2
- name: create instances
gce:
instance_names: "{{item.node}}"
zone: "europe-west1-d"
machine_type: "n1-standard-4"
preemptible: "{{ false if item.num == '0' else true }}"
disk_auto_delete: true
disks:
- name: "{{ item.node }}-disk"
mode: READ_WRITE
state: present
service_account_email: "{{ service_account_email }}"
service_account_permissions: "compute-rw"
credentials_file: "{{ credentials_file }}"
project_id: "{{ project_id }}"
tags: "elasticsearch"
register: gce_raw_results
with_items: "{{nodes}}"
async: 3600
poll: 2
Update 1:
- The service account is editor of the entire project. So right issue seems unlikely.
- It started happening March 24 2018. And every night since then. So if it's a 'out of stock' issue that would be very coincidental, right? Besides I have been running this script the entire day so far and it fails most of the time (see below for success).
- I've tested a few times and it might have something to do with the 'preemptible' flag on the instance. (I start 3 nodes, but at least the first has to stay up to at least work) =>
preemptible: "{{ false if item.num == '0' else true }}"
If I turn off preemptible (false) then it runs without a hitch. The 'workaround' seems to be just don't use preemptible instances, but this used to work for a year without failing once. Did something change? Did GCP's API change? Did ansible gce not implement these changes?
The full error is:
TASK [Gathering Facts] ****************************************************************************************************************************************************************************************************************************************************************************************************** ok: [localhost]
TASK [create boot disks] **************************************************************************************************************************************************************************************************************************************************************************************************** changed: [localhost] => (item={u'node': u'elasticsearch-link-0', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'0', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) changed: [localhost] => (item={u'node': u'elasticsearch-link-1', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'1', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) ok: [localhost] => (item={u'node': u'elasticsearch-link-2', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'2', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'})
TASK [create instances] ***************************************************************************************************************************************************************************************************************************************************************************************************** changed: [localhost] => (item={u'node': u'elasticsearch-link-0', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'0', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) changed: [localhost] => (item={u'node': u'elasticsearch-link-1', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'1', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) failed: [localhost] (item={u'node': u'elasticsearch-link-2', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'2', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) => {"ansible_job_id": "371957735383.2688", "changed": false, "cmd": "/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/gce.py", "data": "", "failed": 1, "finished": 1, "item": {"cluster_name": "elasticsearch-link", "ip_field": "private_ip", "machine_type": "n1-standard-4", "node": "elasticsearch-link-2", "num": "2", "project_id": "[projectid]", "zone": "europe-west1-d"}, "msg": "Traceback (most recent call last):\n File \"/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/async_wrapper.py\", line 158, in _run_module\n (filtered_outdata, json_warnings) = _filter_non_json_lines(outdata)\n File \"/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/async_wrapper.py\", line 99, in _filter_non_json_lines\n raise ValueError('No start of json char found')\nValueError: No start of json char found\n", "stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 750, in \n main()\n File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 712, in main\n module, gce, inames, number)\n File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 524, in create_instances\n instance, lc_machine_type, lc_image(), **gce_args\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 3874, in create_node\n self.connection.async_request(request, method='POST', data=node_data)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 784, in async_request\n response = request(**kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 121, in request\n response = super(GCEConnection, self).request(*args, **kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 806, in request\n *args, **kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 641, in request\n response = responseCls(**kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 163, in init\n self.object = self.parse_body()\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 268, in parse_body\n raise GoogleBaseError(message, self.status, code)\nlibcloud.common.google.GoogleBaseError: u\"The zone 'projects/[projectid]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.\"\n", "stderr_lines": ["Traceback (most recent call last):", " File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 750, in ", " main()", " File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 712, in main", "
module, gce, inames, number)", " File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 524, in create_instances", " instance, lc_machine_type, lc_image(), **gce_args", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 3874, in create_node", "
self.connection.async_request(request, method='POST', data=node_data)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 784, in async_request", " response = request(**kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 121, in request", " response = super(GCEConnection, self).request(*args, **kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 806, in request", " *args, **kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 641, in request", " response = responseCls(**kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 163, in init", " self.object = self.parse_body()", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 268, in parse_body", " raise GoogleBaseError(message, self.status, code)", "libcloud.common.google.GoogleBaseError: u\"The zone 'projects/[projectid]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.\""]} to retry, use: --limit @/usr/local/airflow/ansible/playbooks/elasticsearch-link-cluster-create.retry