0
votes

I've been deploying clusters in GCP via ansible scripts for more then a year now, but all of a sudden one of my scripts keeps giving me this error:

libcloud.common.google.GoogleBaseError: u\"The zone 'projects/[project]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.

The obvious reason would be that I don't have enough resources, but not a whole lot has changed and quotas look good: quotas

The ansible script itself doesn't ask for a lot. I'm creating 3 instances of n1-standard-4 with 100GB SSD. See snippet of script below:

tasks:
    - name: create boot disks
      gce_pd:
          disk_type: pd-ssd
          image: "debian-9-stretch-v20171025"
          name: "{{ item.node }}-disk"
          size_gb: 100
          state: present
          zone: "europe-west1-d"
          service_account_email: "{{ service_account_email }}"          
          credentials_file: "{{ credentials_file }}"
          project_id: "{{ project_id }}"          
      with_items: "{{nodes}}"
      async: 3600
      poll: 2

    - name: create instances
      gce:        
        instance_names: "{{item.node}}"
        zone: "europe-west1-d"
        machine_type: "n1-standard-4"        
        preemptible: "{{ false if item.num == '0' else true }}"        
        disk_auto_delete: true
        disks:
          - name: "{{ item.node }}-disk"
            mode: READ_WRITE
        state: present
        service_account_email: "{{ service_account_email }}"
        service_account_permissions: "compute-rw"
        credentials_file: "{{ credentials_file }}"
        project_id: "{{ project_id }}"
        tags: "elasticsearch"        
      register: gce_raw_results
      with_items: "{{nodes}}"
      async: 3600
      poll: 2

Update 1:

  • The service account is editor of the entire project. So right issue seems unlikely.
  • It started happening March 24 2018. And every night since then. So if it's a 'out of stock' issue that would be very coincidental, right? Besides I have been running this script the entire day so far and it fails most of the time (see below for success).
  • I've tested a few times and it might have something to do with the 'preemptible' flag on the instance. (I start 3 nodes, but at least the first has to stay up to at least work) => preemptible: "{{ false if item.num == '0' else true }}" If I turn off preemptible (false) then it runs without a hitch. The 'workaround' seems to be just don't use preemptible instances, but this used to work for a year without failing once. Did something change? Did GCP's API change? Did ansible gce not implement these changes?

The full error is:

TASK [Gathering Facts] ****************************************************************************************************************************************************************************************************************************************************************************************************** ok: [localhost]

TASK [create boot disks] **************************************************************************************************************************************************************************************************************************************************************************************************** changed: [localhost] => (item={u'node': u'elasticsearch-link-0', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'0', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) changed: [localhost] => (item={u'node': u'elasticsearch-link-1', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'1', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) ok: [localhost] => (item={u'node': u'elasticsearch-link-2', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'2', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'})

TASK [create instances] ***************************************************************************************************************************************************************************************************************************************************************************************************** changed: [localhost] => (item={u'node': u'elasticsearch-link-0', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'0', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) changed: [localhost] => (item={u'node': u'elasticsearch-link-1', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'1', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) failed: [localhost] (item={u'node': u'elasticsearch-link-2', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'2', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) => {"ansible_job_id": "371957735383.2688", "changed": false, "cmd": "/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/gce.py", "data": "", "failed": 1, "finished": 1, "item": {"cluster_name": "elasticsearch-link", "ip_field": "private_ip", "machine_type": "n1-standard-4", "node": "elasticsearch-link-2", "num": "2", "project_id": "[projectid]", "zone": "europe-west1-d"}, "msg": "Traceback (most recent call last):\n File \"/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/async_wrapper.py\", line 158, in _run_module\n (filtered_outdata, json_warnings) = _filter_non_json_lines(outdata)\n File \"/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/async_wrapper.py\", line 99, in _filter_non_json_lines\n raise ValueError('No start of json char found')\nValueError: No start of json char found\n", "stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 750, in \n main()\n File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 712, in main\n module, gce, inames, number)\n File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 524, in create_instances\n instance, lc_machine_type, lc_image(), **gce_args\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 3874, in create_node\n self.connection.async_request(request, method='POST', data=node_data)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 784, in async_request\n response = request(**kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 121, in request\n response = super(GCEConnection, self).request(*args, **kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 806, in request\n *args, **kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 641, in request\n response = responseCls(**kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 163, in init\n self.object = self.parse_body()\n File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 268, in parse_body\n raise GoogleBaseError(message, self.status, code)\nlibcloud.common.google.GoogleBaseError: u\"The zone 'projects/[projectid]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.\"\n", "stderr_lines": ["Traceback (most recent call last):", " File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 750, in ", " main()", " File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 712, in main", "
module, gce, inames, number)", " File \"/tmp/ansible_OnIK1e/ansible_module_gce.py\", line 524, in create_instances", " instance, lc_machine_type, lc_image(), **gce_args", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 3874, in create_node", "
self.connection.async_request(request, method='POST', data=node_data)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 784, in async_request", " response = request(**kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\", line 121, in request", " response = super(GCEConnection, self).request(*args, **kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 806, in request", " *args, **kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 641, in request", " response = responseCls(**kwargs)", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\", line 163, in init", " self.object = self.parse_body()", " File \"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\", line 268, in parse_body", " raise GoogleBaseError(message, self.status, code)", "libcloud.common.google.GoogleBaseError: u\"The zone 'projects/[projectid]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.\""]} to retry, use: --limit @/usr/local/airflow/ansible/playbooks/elasticsearch-link-cluster-create.retry

1
There may be a temporary stockout for that zone on one of the resources you request. As stated by the error message, you can either try in another zone or try again later.LundinCast
Well it happens every night since last week or so. So it seems persistent.Tom Lous

1 Answers

2
votes

The error message is not showing that is an error with the quota, but rather an issue with the zone resources, I would advise you to try a new zone.

Quoting from the documentation:

Even if you have a regional quota, it is possible that a resource might not be available in a specific zone. For example, you might have quota in region us-central1 to create VM instances, but might not be able to create VM instances in the zone us-central1-a if the zone is depleted. In such cases, try creating the same resource in another zone, such as us-central1-f.

Therefore when creating the script you should take this possibility into account even if it is not so common.

This issue is even more highlithed in case of preentible instances since:

Preemptible instances are finite Compute Engine resources, so they might not always be available. [...] these instances if it requires access to those resources for other tasks. Preemptible instances are excess Compute Engine capacity so their availability varies with usage.

UPDATE

To doublecheck what I am saying you can try to keep the preentible flag and change the zone to be sure the script it is working properly and it is a stockout happening during the evening (and since during the day it works this should be the case).

  • If the issue it is really the availability -| you might consider to spin up preentible instance and if not available, catch the error and then either rely on normal one or on a different zone |-

UPDATE2

As I promised I created on your behalf the feature request, you can follow the updates on the public tracker. I advise you to start it in order to receive the updates on the email: