TechNewSources: Build a Hadoop Cluster in AWS in Minutes

Thursday, December 1, 2016

Build a Hadoop Cluster in AWS in Minutes

I use #Apache #Hadoop to process huge data loads. Setting up Hadoop in a cloud provider, such as #AWS, involves spinning up a bunch of EC2 instances, configuring nodes to talk to each other, installing software, configuring the master and data nodes' config files, and starting services. This was a good use case to automate, considering I wanted to solve these problems. How do I build the cluster in minutes (as opposed to hours and maybe even days for a large number of data nodes)? How do I save money? With AWS, I need the ability to tear down when I'm not using it. If it is not automated, then I need to spend extra time building manually each time I tear down. Again, how do I save money? AWS provides some managed services to build a Hadoop cluster, but there aren't too many options for the EC2 instance type you can choose (for example, m2-micro instance is not an option). So, I decided to build a solution that would allow me to quickly setup a Hadoop cluster in AWS with any number of nodes in a matter of minutes (as opposed to days if I were to build manually). A fully tested, Python-based solution can be found here. The solution can be summarized in two steps. Creating AWS Resources using CloudFormation. Provisioning Hadoop on EC2 resources. Creating AWS Resources AWS CloudFormation provides an easy way to create and manage a pool of AWS resources. Simply upload a JSON file that describes your AWS resources, and the CloudFormation stack (collection of resources) is created. This is a subset of the full JSON to create the Hadoop cluster.

https://dzone.com/articles/automate-building-hadoop-cluster-in-aws