What is Hadoop?
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
What is AWS?
Amazon Web Services (AWS) is a subsidiary of Amazon providing on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide a variety of basic abstract technical infrastructure and distributed computing building blocks and tools. One of these services is Amazon Elastic Compute Cloud (EC2), which allows users to have at their disposal a virtual cluster of computers, available all the time, through the Internet.
Let’s create Cluster:
✅ Launch one for name node and as much you want for DataNodes.
✅ Transfer Software to an instance.
✅ Install Software java and Hadoop.
✅ Configure one instance as name-node and the other as DataNode.
Step1: Launch Instances for cluster
Here I am launching instances with RHEL AMI and allow all traffic in the security Group.
Step2: Transfer Software to an instance
Hadoop build using java so before installing Hadoop java is required to install here I am transferring both java1.8 and Hadoop to an instance using WinSCP.
Step3: Install Software java and Hadoop.
Step 4: Configure one instance as name-node and the other as DataNode.
Create a new directory for name node and data node where that store data.
created in NameNode
Created in Datanode
Configure hdfs-site.xml and core-site.xml files for both name-node and data-node.In this file, we have to write properties.
In core file hdfs://namenodeIp: port we have to add as value.
#Run this command to format namenode and start service
hadoop namenode -format
hadoop-deamon start namenode
hadoop-deamon.sh start datanode
#This command show you count of datanode connected to namenode and more information
hadoop dfsadmin -report
here we successfully have done with Hadoop cluster
Thank you for reading!!😇😇