

Hadoop 分布式集群部署
source link: https://blog.51cto.com/liangww/5087643
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Hadoop 分布式集群部署
原创一、集群环境说明
1、主机规划
IP地址 角色 HDFS YARN
10.6.2.237 master NameNode、DataNode NodeManager
10.6.2.239 slave DataNode、SecondaryNameNode NodeManager
10.6.2.241 slave DataNode ResourceManager、NodeManager
2、规划说明
-
关于分布式集群的部署实验,最少服务器数量是三台
-
广义上说的 Hadoop 集群 ,其实是包括 Hadoop 和 yarn 两个集群,下面多次提到 Hadoop集群,有些是说的大部分是指广义上的 Hadoop 集群,要知道区分
-
关于上面表格里规划主机的各个角色,有些是可以进行调整的,有些甚至是可以不需要的,当然大数据的所有角色,这里也不可能全部都配置上,所有这些的关键,取决于部署的环境要求,或者说你要部署集群的需求
-
一般 HDFS 的 MASTER节点,只会存放元数据,不存放业务数据,所以,集群节点的规划中 ,服务器 10.6.2.237 不一定要配置为 DataNode 节点 ;同样的道理 10.6.2.239 也不一定要配置成为 SecondaryNameNode节点,完全可以把这个角色配置到 10.6.2.237 或者 10.6.2.241 上,当然也可以不配置这个角色
-
同样的道理,关于 YARN 集群中,ResourceManager 这个角色也不一定要配置在 10.6.2.241 这个节点上,完全可以配置在 10.6.2.237 或者 10.6.2.239 上,但是这个角色是必须配置的,是 YARN 集群中的一个很重要的角色
-
上面的规划,列出的是 Hadoop 和 Yarn 集群中的角色,注册服务 Zooekeeper 集群这里没有列出来,三个节点都必须部署,要部署成为一个 Zookeeper 的集群
-
上面之所以这么规划,是想着 "均衡"下资源,每个服务上都是三个服务的进程中运行,真实环境中不可能是这样的,一般的 Master 节点不会用存放业务数据的,所以在具体环境里面得看具体的配置而定
二、安装前准备
1、安装步骤说明
其实安装很简单,主要就 三大步:
-
第一大步:安装前环境准备
-
第二大步:安装 Zookeeper 集群
-
第三大步:安装 Hadoop 集群
2、系统初始化相关准备
三个节点都有进行相关操作
1)安装基本的系统命令包
[root@hadoop01 ~]# yum -y install epel-release
[root@hadoop01 ~]# yum -y install net-tools gcc gcc-c++ lrzsz vim wget curl git zip unzip ntp telnet
2)关闭 SELINUX 、防火墙
[root@hadoop01 ~]# setenforce 0
[root@hadoop01 ~]# sed -i 's/enforcing/disabled/' /etc/selinux/config
[root@hadoop01 ~]# systemctl stop firewalld && systemctl disable firewalld
3)配置主机名、添加解析
[root@hadoop01 ~]# hostnamectl set-hostname hadoop01
[root@hadoop02 ~]# hostnamectl set-hostname hadoop02
[root@hadoop03 ~]# hostnamectl set-hostname hadoop03
[root@hadoop01 ~]# cat << EOF >> /etc/hosts
> 10.6.2.237 hadoop01
> 10.6.2.239 hadoop02
> 10.6.2.241 hadoop03
> EOF
### 同样的操作在 hadoop02 ,hadoop03 上执行一次
4)配置时间同步
[root@hadoop01 ~]# \cp -f /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@hadoop01 ~]# ntpdate ntp.aliyun.com
[root@hadoop01 ~]# systemctl start ntpdate && systemctl enable ntpdate
3、安装配置 JDK
1)安装 JDK
[root@hadoop01 ~]# tar -zxvf jdk-8u131-linux-x64.tar.gz -C /usr/local/
[root@hadoop01 ~]# cd /usr/local/jdk1.8.0_131/bin
[root@hadoop01 ~]# ./java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
2)配置环境变量
[root@hadoop01 ~]# cp /etc/profile /etc/profile.bak
[root@hadoop01 ~]# vim /etc/profile # 文本末尾追加下面两行
export JAVA_HOME=/usr/local/jdk1.8.0_131
export PATH=.:$PATH:$JAVA_HOME/bin
[root@hadoop01 ~]# source /etc/profile
[root@hadoop01 ~]# java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
4、配置 SSH 互信
1)配置 SSH 免密登录
[root@hadoop01 ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[root@hadoop01 ~]# ssh-copy-id hadoop01
[root@hadoop01 ~]# ssh-copy-id hadoop02
[root@hadoop01 ~]# ssh-copy-id hadoop03
[root@hadoop02 ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[root@hadoop02 ~]# ssh-copy-id hadoop01
[root@hadoop02 ~]# ssh-copy-id hadoop02
[root@hadoop02 ~]# ssh-copy-id hadoop03
[root@hadoop03 ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[root@hadoop03 ~]# ssh-copy-id hadoop01
[root@hadoop03 ~]# ssh-copy-id hadoop02
[root@hadoop03 ~]# ssh-copy-id hadoop03
2)测试下免密登录
[root@hadoop01 ~]# ssh hadoop01
[root@hadoop01 ~]# ssh hadoop02
[root@hadoop01 ~]# ssh hadoop03
[root@hadoop02 ~]# ssh hadoop01
[root@hadoop02 ~]# ssh hadoop02
[root@hadoop02 ~]# ssh hadoop03
[root@hadoop03 ~]# ssh hadoop01
[root@hadoop03 ~]# ssh hadoop02
[root@hadoop03 ~]# ssh hadoop03
5、安装 Zookeeper 集群
1)安装集群
[root@hadoop01 ~]# tar -zxvf zookeeper-3.4.14.tar.gz -C /usr/local/
[root@hadoop01 ~]# mv /usr/local/zookeeper-3.4.14 /usr/local/zookeeper
[root@hadoop01 ~]# cd /usr/local/zookeeper/conf
[root@hadoop01 ~]# cp zoo_sample.cfg zoo.cfg
[root@hadoop01 ~]# vim zoo.cfg
。。。。。。
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/logs
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888
[root@hadoop01 ~]# scp -r /usr/local/zookeeper hadoop02:/usr/local
[root@hadoop01 ~]# scp -r /usr/local/zookeeper hadoop03:/usr/local
[root@hadoop01 ~]# mkdir -p /data/zookeeper/{data,logs}
[root@hadoop01 ~]# scp -r /data hadoop02:/
[root@hadoop01 ~]# scp -r /data hadoop03:/
[root@hadoop01 ~]# echo "1" > /data/zookeeper/data/myid
[root@hadoop02 ~]# echo "2" > /data/zookeeper/data/myid
[root@hadoop03 ~]# echo "3" > /data/zookeeper/data/myid
2) 启动集群
[root@hadoop01 ~]# cd /usr/local/zookeeper/bin/
[root@hadoop01 ~]#zkServer.sh start
[root@hadoop02 ~]# cd /usr/local/zookeeper/bin/
[root@hadoop02 ~]# zkServer.sh start
[root@hadoop03 ~]# cd /usr/local/zookeeper/bin/
[root@hadoop03 ~]# zkServer.sh start
3)验证集群启动情况
[root@hadoop01 ~]# cd /usr/local/zookeeper/bin/
[root@hadoop01 bin]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[root@hadoop02 ~]# cd /usr/local/zookeeper/bin/
[root@hadoop02 bin]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[root@hadoop03 ~]# cd /usr/local/zookeeper/bin/
[root@hadoop03 bin]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
4)配置环境变量
[root@hadoop01 ~]# cat << EOF >> /etc/profile
> export ZOOKEEPER_HOME=/usr/local/zookeeper
> export PATH=.:$PATH:$ZOOKEEPER_HOME/bin
> EOF
[root@hadoop01 ~]# source /etc/profile
[root@hadoop01 ~]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
### 同样的操作在 hadoop02 ,hadoop03 上执行一次
三、安装部署
1、上传并解压缩安装包
[root@hadoop01 ~]# tar -zxvf hadoop-2.7.3.tar.gz -C /usr/local/
[root@hadoop01 ~]# mv hadoop-2.7.3 hadoop
[root@hadoop01 ~]# cd /usr/local/hadoop/etc/hadoop/
2、修改相关配置文件
[root@hadoop01 hadoop]# cp hadoop-env.sh hadoop-env.sh.default
[root@hadoop01 hadoop]# vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_131
[root@hadoop01 hadoop]# cp yarn-env.sh yarn-env.sh.default
[root@hadoop01 hadoop]# vim yarn-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_131
[root@hadoop01 hadoop]# cp hdfs-site.xml hdfs-site.xml.default
[root@hadoop01 hadoop]# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/hadoop/datanode</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop01:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop02:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
[root@hadoop01 hadoop]# cp yarn-site.xml vim yarn-site.xml.default
[root@hadoop01 hadoop]# vim yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop03:8025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop03:8030</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop03:8050</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop03:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop03:8088</value>
</property>
</configuration>
[root@hadoop01 hadoop]# cp core-site.xml core-site.xml.default
[root@hadoop01 hadoop]# vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmpdata</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>10240</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>1000</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
[root@hadoop01 hadoop]# cp mapred-queues.xml.template mapred-site.xml
[root@hadoop01 hadoop]# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[root@hadoop01 hadoop]# vim slaves slaves.default
[root@hadoop01 hadoop]# vim slaves
hadoop01
hadoop02
hadoop03
3、创建相关的数据目录
[root@hadoop01 ~]# cd /data # 定义的数据存放路径 (ZK也是这个路径)
[root@hadoop01 data]# mkdir -p hadoop/{namenode,datanode,tmpdata}
4、传送 HADOOP 目录到 SLAVE 节点
[root@hadoop01 ~]# scp -r /usr/local/hadoop hadoop02:/usr/local
[root@hadoop01 ~]# scp -r /usr/local/hadoop hadoop03:/usr/local
[root@hadoop01 ~]# scp -r /data/hadoop hadoop02:/data
[root@hadoop01 ~]# scp -r /data/hadoop hadoop03:/data
5、配置 HADOOP 环境变量
[root@hadoop01 ~]# cat << EOF >> /etc/profile
> export HADOOP_HOME=/usr/local/hadoop
> export PATH=.:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
> EOF
[root@hadoop01 ~]# source /etc/profile
### 所有节点都要配置,相同的操作在其他两个节点执行一遍
四、初始化并启动集群
1、初始化集群 ( 在定义的 Master 节点执行 )
[root@hadoop01 ~]# hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
......
21/06/02 08:06:47 INFO common.Storage: Storage directory /data/hadoop/namenode has been successfully formatted.
21/06/02 08:06:47 INFO namenode.FSImageFormatProtobuf: Saving image file /data/hadoop/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
21/06/02 08:06:47 INFO namenode.FSImageFormatProtobuf: Image file /data/hadoop/namenode/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
21/06/02 08:06:47 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/06/02 08:06:47 INFO util.ExitUtil: Exiting with status 0
21/06/02 08:06:47 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop01/10.6.2.237
************************************************************/
2、启动集群
1、启动 HADOOP 集群 # ( hadoop01上操作 )
[root@hadoop01 ~]# start-dfs.sh
Starting namenodes on [hadoop01]
hadoop01: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop01.out
hadoop01: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop01.out
hadoop03: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop03.out
hadoop02: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop02.out
Starting secondary namenodes [hadoop02]
hadoop02: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-hadoop02.out
2、启动 YARN 集群 # ( hadoop03 操作 )
[root@hadoop03 ~]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-hadoop03.out
hadoop01: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop01.out
hadoop03: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop03.out
hadoop02: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop02.out
3、相关命令的说明
1)启动和停止命令
start-dfs.sh 、stop-dfs.sh 启动和停止所有HDFS集群相关的守护进程
start-yarn.sh 、stop-yarn.sh 启动和停止所有YARN集群相关的守护进程
2)替代命令
start-all.sh 、stop-all.sh 命令的效果等同于上面两个启动和停止命令
3)当个进程启动命令 ( 较少使用 )
hadoop-daemon.sh start/stop XXX 、yarn-daemon.sh start/stop XXX
4)二进制命令文件目录
不同于一般的服务,安装后会有一个 bin 目录,里面有相关服务进程的二进制执行文件,Hadoop部署后,会生成 bin 、sbin 两个执行文件存放的目录,关于集群启动和停止相关的命令在 bin 目录下,sbin 目录下是 Hadoop shell相关的命令。路径分别为: $HADOOP_HOME/bin 、$HADOOP_HOME/sbin
在对Hadoop做环境变量配置的过程中,就要添加相关的两个路径(如上节里面配所示),配置好了环境变量,后面就可以不需要关心具体路径的位置了
3、查看集群相关进程
[root@hadoop01 ~]# jps
18720 DataNode
19089 Jps
18346 QuorumPeerMain
18587 NameNode
18892 NodeManager
[root@hadoop02 ~]# jps
15121 DataNode
15410 Jps
15219 SecondaryNameNode
15063 QuorumPeerMain
15272 NodeManager
[root@hadoop03 ~]# jps
17696 DataNode
18219 Jps
17808 ResourceManager
17625 QuorumPeerMain
17906 NodeManager
五、浏览器访问验证
1、访问HDFS集群 ( http://10.6.2.237:50070/ 、 http://10.6.2.239:50090/ )
2、访问 YARN 集群 ( http://10.6.2.241:8088/ )
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK