1
votes

I have an all-in-one-setup with my controller and compute services running on the same node.all my nova and other dependent services are up and running. However, when i try to launch an instance the state of the nova-compute process becomes down. Because of this the instance is stuck in spawning state.

> [root@localhost nova(keystone_admin)]# nova service-list
> +----+------------------+-----------------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary           | Host                  | Zone     | Status  |
> State | Updated_at                 | Disabled Reason |
> +----+------------------+-----------------------+----------+---------+-------+----------------------------+-----------------+ | 6  | nova-cert        | localhost.localdomain | internal | enabled |
> up    | 2016-11-04T07:24:32.000000 | -               | | 7  |
> nova-consoleauth | localhost.localdomain | internal | enabled | up   
> | 2016-11-04T07:24:32.000000 | -               | | 8  | nova-scheduler
> | localhost.localdomain | internal | enabled | up    |
> 2016-11-04T07:24:33.000000 | -               | | 9  | nova-conductor  
> | localhost.localdomain | internal | enabled | up    |
> 2016-11-04T07:24:33.000000 | -               | | 11 | nova-compute    
> | localhost.localdomain | nova     | enabled | **down**  |
> 2016-11-04T06:43:03.000000 | -               | | 12 | nova-console    
> | localhost.localdomain | internal | enabled | up    |
> 2016-11-04T07:24:32.000000 | -               |

====

[root@localhost nova(keystone_admin)]# systemctl status openstack-nova-compute.service -l ● openstack-nova-compute.service - OpenStack Nova Compute Server Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2016-11-04 12:08:54 IST; 49min ago Main PID: 37586 (nova-compute)
CGroup: /system.slice/openstack-nova-compute.service └─37586 /usr/bin/python2 /usr/bin/nova-compute

Nov 04 12:08:46 localhost.localdomain systemd[1]: Starting OpenStack Nova Compute Server... Nov 04 12:08:53 localhost.localdomain nova-compute[37586]: Option "verbose" from group "DEFAULT" is deprecated for removal. Its value may be silently ignored in the future. Nov 04 12:08:53 localhost.localdomain nova-compute[37586]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications". Nov 04 12:08:54 localhost.localdomain systemd[1]: Started OpenStack Nova Compute Server.

======== The status for the nova compute process is perfectly fine. My rabbitmq service is also running

FYI,

[root@localhost nova(keystone_admin)]# systemctl status rabbitmq-server ● rabbitmq-server.service - RabbitMQ broker Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/rabbitmq-server.service.d └─limits.conf

Active: active (running) since Thu 2016-11-03 12:32:08 IST; 24h ago Main PID: 1835 (beam.smp) CGroup: /system.slice/rabbitmq-server.service ├─1835 /usr/lib64/erlang/erts-5.10.4/bin/beam.smp -W w -K true -A30 -P 1048576 -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq --... ├─1964 /usr/lib64/erlang/erts-5.10.4/bin/epmd -daemon ├─5873 inet_gethost 4 └─5875 inet_gethost 4

Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: {user,<<"guest">>, Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: [administrator], Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: rabbit_auth_backend_internal,...}, Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: <<"/">>, Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: [{<<...>>,...},{...}], Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: <0.14812.0>,<0.14816.0>]}}, Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: {restart_type,intrinsic}, Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: {shutdown,4294967295}, Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: {child_type,worker}]}]}} Nov 04 12:13:12 localhost.localdomain rabbitmq-server[1835]: function_clause

=======

[root@localhost nova(keystone_admin)]# netstat -anp | grep 5672 | grep 37586 tcp 0 0 10.1.10.22:55628 10.1.10.22:5672
ESTABLISHED 37586/python2 tcp 0 0 10.1.10.22:56204
10.1.10.22:5672 ESTABLISHED 37586/python2 tcp 0 0 10.1.10.22:56959 10.1.10.22:5672 ESTABLISHED 37586/python2
===== 37586 is the nova-compute process id.

I have checked the logs for nova-compute, nova-api and nova-conductor and there are no errors.

I have checked the nova scheduler logs and there are some errors stating refused to connect to rabbitmq and the database service.

**

2016-11-03 12:24:50.930 2092 ERROR nova.servicegroup.drivers.db DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '10.1.10 .22' ([Errno 111] ECONNREFUSED)") 2016-11-03 12:24:53.811 2092 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 10.1.10.22:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in

** 16 seconds.

======= Can someone suggest what should i do to handle it. As i am on the same node, why are these services not reachable?

1
What configuration do you use for the database connection? (check the /etc/nova/nova.conf)RichArt
Are you sure that you can have the controller and the compute on the same node? May be DevStack could be a better solution for you?RichArt
Yes, controller and compute can be on the same node. I have been using this setup since the last 3 months and that worked like a charm. So i am also sure that the configurations are all correct.gaurav parashar
I suspect that i had earlier configured some iptable rules to accept traffic on port 5672 and 3306. After the restart, i have again done a iptables-restore, but the issue is still not solvedgaurav parashar

1 Answers

1
votes

If nova-compute is down, there are two possible reasons: a. nova-compute is actually down b. it cannot communicate with rabbit, or nova-conductor cannot communicate with rabbit.

As far as I can see in your logs, you have issue with rabbit: "10.1.10.22:5672 is unreachable". Check if you have rabbit listening on this IP/port. Check if you can connect to rabbit from compute host. I usually use nc 10.1.10.22 5672 to see if there are connection or not.

Check if nova settings for rabbit are correct. Example of correct settings:

[DEFAULT]
rpc_backend=rabbit
rabbit_host=rabbitmq-ip-here
rabbit_port=5672
rabbit_hosts=$rabbit_host:$rabbit_port
rabbit_use_ssl=false
rabbit_userid=guest
rabbit_password=guest
rabbit_login_method=AMQPLAIN
rabbit_virtual_host=/compute

Check logs in the /var/log/nova/*.log

Enable debug=true in the [DEFAULT] section of nova.conf