2
votes

My company is transitioning our operations to the Google Cloud and we have several instances in Google Compute Engine running. I have had 3 instances (running Ubuntu 14.04) now where I lose SSH ability after weeks of everything working fine. Here is output from multiple methods of trying to connect:

SSH from one session to another (same internal IP):

ssh: connect to host 130.211.137.231 port 22: Connection refused

SSH from Google Dev Console :

We are unable to connect to the VM on port
22. Learn more about possible causes of this issue.

SSH from PuTTY client : Network error: Connection refused

The most recent time this issue has happened, the instance is still running. I have an NFS shared directory that ftp'd files get written to, and they are still being updated. So NFS is still mounted and exported, and cronjobs are still running.

Running nmap from another instance on the same network results as following:

vwadmin@vw-server:~$ nmap -Pn 130.211.137.231

Starting Nmap 6.40 ( http://nmap.org ) at 2015-03-09 15:41 UTC
Nmap scan report for 231.137.211.130.bc.googleusercontent.com (130.211.137.231)
Host is up (0.0019s latency).
Not shown: 997 filtered ports
PORT     STATE  SERVICE
22/tcp   closed ssh
3389/tcp closed ms-wbt-server
8008/tcp closed http

Nmap done: 1 IP address (1 host up) scanned in 4.18 seconds
vwadmin@vw-server:~$

SSH was lost sometime late Friday evening. On Saturday evening I created a snapshot of the drive for troubleshooting. Looking at the logfiles, syslog and auth.log both stopped being written to on Friday evening (I'm guessing around the time we lost SSH). Where/what should I be looking for in system logs that could stop logs from being written, close all ports, but yet allow NFS to continue working and cronjobs to run fine? Please keep in mind, this a cloud environment so SSH is my only way into the instance itself so all I can do right now is look through the logs from the snapshot. This particular instance that has broke twice is only running a handful of lftp type cronjobs currently.

2
Does rebooting the instance bring up ssh again? One option to troubleshoot would be creating a snapshot of disk and attach it to another instance. - Paolo P.
Have a look at this question's comments. As you are using ubuntu-14.04, this is very likely to be due to ip2ban or sshguar like software rendering SSH unusable due to IPTables poor management. - Antxon
I have the same issue, wondering any solution? - SamK

2 Answers

0
votes

I have faced this issue. Two reasons I found, causing connection refuse error.

  1. Improper firewall rule:- Check whether firewall rule for port 22 is properly open for your compute engine instance. Try giving unique tag to your instance and then paste that tag in firewall rule for port 22 at "Target tags" area then save it.
  2. Private key got expired (I don't know why this occurred) :- for this try generating new keys using PuTTYgen, then copy newly generated public key and paste it at "consol->Compute Engine->VM instances->(your instance name)->Edit->SSH keys". Make sure you have unchecked "Block project-wide SSH keys" field and save it. Now save your private key and use that private key to do SSH via PuTTY.

First try method 1 and check...if that doesn't work then try method 2.

Hope it might solve your problem

0
votes

Today I have found some point when tunning on sysctl.conf (VM box on GcloudGcloud(GCE))

My edit

kernel.sem = 250 32000 100 128

kernel.shmmax = 17179869184
kernel.shmall = 4194304

GCloud default

kernel.sem = 32000 1024000000 500 32000

kernel.shmall = 18446744073692774399

kernel.shmmax = 18446744073692774399

After run

sysctl -p

It's OK.

But if you reboot system then VM will read optimize value on sysctl.conf then will effect could not connect to VM . (Sometime could not ping to that VM or could not ssh to that VM).

Please concern on these tunning value on sysctl.conf also in Gcloud(GCE).