We need a python script that orchestrates Hadoop instances in OpenStack. The script shall be integrated into OpenStack Dashboard.
It works as follow:
1. In the existing OpenStack dashboard the user can submit his MapReduce Algorithm (jar file), specify a link to it's data, and chose the number of worker nodes.
2. After the users hits the start button, a prepared Hadoop master image will be launched by OpenStack.
3. After the master node is up, the specified number of hadoop worker nodes will be fired up.
4. Once the worker nodes are up and connected to the master the specified jar. file and data to process will be loaded to the master.
5. Hadoop will be executed.
6. The output file(s) can be downloaded for the OpenStack Dashboard.
OpenStack Version: Folsom
Hadoop (Cloudera): Latest
OS Instances: Ubuntu 12.04