Using OpenStack to manage Big Data virtual machines

Question

We installed some Big Data components like Apache Hadoop, Spark, and Kafka differents virtual machines. To manage those VMs on production environment (some physical servers with their local storage and without SAN storage), I want to use OpenStack. Reading OpenStack documents I figure out that it is created of many different components each for a specific purpose. In addition, OpenStack needs some mandatory separate nodes like controller, compute and network (I'm not sure about the network node!). My questions are:

What OpenStack components are needed for Big Data deployment?
How many separate physical node (controller, compute and network) OpenStack needs for running on production (except resources node)?
Can we run OpenStack in virtual machines like VirtualBox, just for test?

eandersson eandersson · Accepted Answer · 2018-09-16T08:34:25

This question is probably best asked over at ask.openstack.org, as it's a little off-topic for StackOverflow.

You basically need a couple of core components for OpenStack (Keystone, Nova, Neutron and Glance) to work. These can all run on the same two or three boxes.

Beyond Openstack components you will need RabbitMQ and MySQL. These should ideally have quorum for production deployments.

As for network, there are many network layouts, and in general you don't need any extra network nodes. If you only need a flat network this is relatively easy, but if you need something more advanced, you may want to ask a Network specialist.

You can indeed run OpenStack in a Virtual Machine using devstack, but keep in mind that such testing is intended to test base functionality, and not meant to test deploying complicated services like Hadoop or Spark.

It might be easier to use something like Kolla to set up a basic test environment. For testing purposes you only need a single node to host the control plane. Another alternative for deploying is Packstack.

Using OpenStack to manage Big Data virtual machines

1 Answers