Losing SSH on Google Compute Engine Instances

Question

My company is transitioning our operations to the Google Cloud and we have several instances in Google Compute Engine running. I have had 3 instances (running Ubuntu 14.04) now where I lose SSH ability after weeks of everything working fine. Here is output from multiple methods of trying to connect:

SSH from one session to another (same internal IP):

ssh: connect to host 130.211.137.231 port 22: Connection refused

SSH from Google Dev Console :

We are unable to connect to the VM on port
22. Learn more about possible causes of this issue.

SSH from PuTTY client : Network error: Connection refused

The most recent time this issue has happened, the instance is still running. I have an NFS shared directory that ftp'd files get written to, and they are still being updated. So NFS is still mounted and exported, and cronjobs are still running.

Running nmap from another instance on the same network results as following:

vwadmin@vw-server:~$ nmap -Pn 130.211.137.231

Starting Nmap 6.40 ( http://nmap.org ) at 2015-03-09 15:41 UTC
Nmap scan report for 231.137.211.130.bc.googleusercontent.com (130.211.137.231)
Host is up (0.0019s latency).
Not shown: 997 filtered ports
PORT     STATE  SERVICE
22/tcp   closed ssh
3389/tcp closed ms-wbt-server
8008/tcp closed http

Nmap done: 1 IP address (1 host up) scanned in 4.18 seconds
vwadmin@vw-server:~$

SSH was lost sometime late Friday evening. On Saturday evening I created a snapshot of the drive for troubleshooting. Looking at the logfiles, syslog and auth.log both stopped being written to on Friday evening (I'm guessing around the time we lost SSH). Where/what should I be looking for in system logs that could stop logs from being written, close all ports, but yet allow NFS to continue working and cronjobs to run fine? Please keep in mind, this a cloud environment so SSH is my only way into the instance itself so all I can do right now is look through the logs from the snapshot. This particular instance that has broke twice is only running a handful of lftp type cronjobs currently.

Does rebooting the instance bring up ssh again? One option to troubleshoot would be creating a snapshot of disk and attach it to another instance. — Paolo P.
Have a look at this question's comments. As you are using ubuntu-14.04, this is very likely to be due to ip2ban or sshguar like software rendering SSH unusable due to IPTables poor management. — Antxon

abhinaw kumar abhinaw kumar · Accepted Answer · 2017-01-16T08:53:15

I have faced this issue. Two reasons I found, causing connection refuse error.

Improper firewall rule:- Check whether firewall rule for port 22 is properly open for your compute engine instance. Try giving unique tag to your instance and then paste that tag in firewall rule for port 22 at "Target tags" area then save it.
Private key got expired (I don't know why this occurred) :- for this try generating new keys using PuTTYgen, then copy newly generated public key and paste it at "consol->Compute Engine->VM instances->(your instance name)->Edit->SSH keys". Make sure you have unchecked "Block project-wide SSH keys" field and save it. Now save your private key and use that private key to do SSH via PuTTY.

First try method 1 and check...if that doesn't work then try method 2.

Hope it might solve your problem

Losing SSH on Google Compute Engine Instances

2 Answers

My edit

GCloud default

sysctl -p