1
votes

I installed hadoop cluster using click-to-deploy mechanism on dev console. I did some modifications to the custom setup e.g. kind of machine, number of machines. Cluster gets deployed.

But now when I log into master and run following command

sudo gcloud compute firewall-rules list

I get error: Insufficient Permission

I checked the permission of the master node and I see this:

Permissions

User info-Disabled, Compute-Disabled Storage-Full Task queue-Disabled BigQuery -Disabled Cloud SQL-Disabled Cloud Datastore-Disabled Cloud Platform-Disabled

When I launch an individual vm I can enable its permission for these aspects, however when I launch a cluster I am not able to. Is this the reason I am seeing the permission error on the hadoop master?

How can it be fixed?

more background: I need to enable the firewall port so that I can see status of job using the ip http://:50030/

3

3 Answers

2
votes

Your GCE instance would need read permission via the service account to be able to list instances via the Cloud SDK (i.e. gcloud compute) within your project. Typically instances are only granted read permission to Google Cloud Storage by default. You can find more information about using Cloud SDK tools with service accounts here: https://cloud.google.com/compute/docs/authentication#tools

Once created the service account associated with an instance cannot be modified. These scopes can only be granted at creation time.

Alternatively you can authenticate to the Cloud SDK by typing the following from an instance and then following the instructions: (this is using your credentials as opposed to a service account)

gcloud auth login --no-launch-browser

None of this is directly related to modifying firewall rules. There is a comprehensive guide to manipulating firewall rules using the Cloud SDK here:

https://cloud.google.com/sdk/gcloud/reference/compute/firewall-rules/create

2
votes

The problem with opening insecure ports

Note: this is a general problem, not one limited to Hadoop specifically.

The current solution of opening up ports is not a good idea because Hadoop status pages are served via HTTP (not HTTPS) which means that they are served in plain text and hence, anyone on the Internet can also access your instance and view or take control of your Hadoop jobs or cluster or the data they contain.

Solution alternatives

Instead, you should be sending all traffic over an encrypted channel, either HTTPS/SSL or by using an SSH tunnel and send your browser traffic over that.

Hadoop at this time does not serve HTTPS to my knowledge, so what you can do is create an SSH tunnel and browse via that secure tunnel.

The benefits of this solution are:

  • it's secure: all communication between your browser and the VM instance is over your SSH connection, so even if the connection is using HTTP instead of HTTPS, it's still secure from external users
  • you can connect to hostnames (i.e., your VM names directly), as if they were on your local network, e.g., http://my-host:5392
  • you can connect to any port on any host, without having to open each and every port individually

Complete guide to connecting securely to GCE VMs

See the "Securely Connecting to VM Instances" guide for more details beyond SOCKS proxy guide below, including firewalls, HTTPS and SSL, port forwarding over SSH, SOCKS proxy over SSH, bastion hosts, VPNs, NATs, etc.

Connecting securely via SSH tunnel + SOCKS proxy

The way to do this is to set up a SOCKS proxy which will use an SSH tunnel to secure your communication with the Hadoop cluster on GCE. You can either use the full script or create your own as follows:

#!/bin/bash

# Modify these variables to match your deployment.
export PROJECT="curious-lemming-42"  # Google Cloud Platform Project
export ZONE="us-central-1"           # zone of Hadoop cluster
export PORT="9000"                   # port on local machine to run proxy;
                                     # just choose an open port
export SERVER="my-instance"          # any VM instance in the cluster

# This command starts the SOCKS proxy on $PORT.
gcloud compute ssh \
    --project="${PROJECT}" \
    --zone="${ZONE}" \
    --ssh-flag="-D" \
    --ssh-flag="${PORT}" \
    --ssh-flag="-N" \
    "${SERVER}"

Open a new terminal on your local machine (not on a GCE VM) and run this script there. While this script is running, you will have a secure proxy set up to your Hadoop cluster over SSH.

Then, assuming you're using Google Chrome, you can use this script, also on your local machine, not on a GCE VM, to connect securely to your Hadoop cluster:

#!/bin/bash

# This port must match the port in the other script above.
declare -r PORT="9000"

# Create a directory for the proxy profile to separate it from the others.
# You can change this directory if you wish.
declare -r CHROME_PROXY_PROFILE="${HOME}/chrome-proxy-profile"
if ! [ -d "${CHROME_PROXY_PROFILE}" ]; then
  mkdir -p "${CHROME_PROXY_PROFILE}"
fi

# Run a new instance of Chrome using the custom proxy profile.
declare -r OS_NAME="$(uname -s)"
if [[ "${OS_NAME}" == "Linux" ]]; then
  /usr/bin/google-chrome \
      --user-data-dir="${CHROME_PROXY_PROFILE}" \
      --proxy-server="socks5://localhost:${PORT}"
elif [[ "${OS_NAME}" == "Darwin" ]]; then
  "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
      --user-data-dir="${CHROME_PROXY_PROFILE}" \
      --proxy-server="socks5://localhost:${PORT}"
else
  echo "Unrecognized OS: ${OS_NAME}" >&2
  exit 1
fi

If you would like to set up Firefox, see these directions which cannot be scripted at this time.

Background and details on how and why this works

You can read more about SSH tunneling, what it is and how it works from these sources:

0
votes

Firewall rules, as they are used on the GCE platform, are defined on a Network level, rather than on a per-VM basis. Within each firewall rule you can specify, among other things, the instances it applies to. For this, a very handy thing to do is to use tags: define the same tag to a VM or group of VMs (or cluster), in order for the rule to apply to all VMs labeled with said tag.

By default, traffic between instances within the same network is unfiltered, and only a selected few ports are filtered from the VMs to the open internet. No so for incoming connections: for this you have to define said firewall rules, opening ports to incoming connections, defining the destination(s) of said connections, as mentioned above.

The permissions message you are receiving is due to the fact you are trying to access your project from a system (the cluster master), that neither has the machine permissions (defined by the VM's service account) to make modifications to your project (in this case, modify the firewall ruleset), nor is currently logged in with your user / owner credentials entitled to perform the same task. It is also not required: you can define said ruleset from the comfort of your own workstation, using the SDK / gcutil command... that is, as long as you're logged into your user / owner account.

In your specific case, TCP port 50030 is accessible from all VMs within the same Network by default. Were you to access said port from the open net, a firewall rule would have to be defined project-wide that allows for said incoming communication. Also, be aware that the cluster deployment system already adds several rules in reference to the cluster.

The most convenient way to view / administer firewall rules is by means of the Developers Console