2

Linux部署hadoop2.7.7集群

 1 year ago
source link: https://blog.51cto.com/zq2599/5576322
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Linux部署hadoop2.7.7集群

原创

程序员欣宸 2022-08-15 07:11:10 博主文章分类:Java技能 ©著作权

文章标签 hadoop hdfs mapreduce 文章分类 Spark 大数据 阅读数200

欢迎访问我的GitHub

这里分类和汇总了欣宸的全部原创(含配套源码): https://github.com/zq2599/blog_demos

  • 在CentOS7环境下,hadoop2.7.7集群部署的实战的步骤如下:
  1. 机器规划;
  2. Linux设置;
  3. 创建用户和用户组
  4. SSH免密码设置;
  5. 文件下载;
  6. Java设置;
  7. 创建hadoop要用到的文件夹;
  8. hadoop设置;
  9. 格式化hdfs;
  10. 启动hadoop;
  11. 验证hadoop;
  • 接下来就逐步开始吧;
  • 本次实战用到了三台CentOS7的机器,身份信息如下所示:
IP地址 hostname(主机名) 身份
192.168.119.163 node0 NameNode、ResourceManager、HistoryServer
192.168.119.164 node1 DataNode、NodeManager
192.168.119.165 node2 DataNode、NodeManager、SecondaryNameNode

Linux设置(三台电脑都要做)

  • 修改文件/etc/hostname,将三台电脑的内容分别改为node0、node1、node2;
  • 修改文件/etc/hosts,在尾部增加以下三行内容:
192.168.119.163 node0
192.168.119.164 node1
192.168.119.165 node2
  • 关闭防火墙,并禁止启动:
systemctl stop firewalld.service && systemctl disable firewalld.service
  • 关闭SELINUX,打开文件/etc/selinux/config,找到SELINUX的配置,改为SELINUX=disabled

创建用户和用户组

  • 执行以下命令创建用户和用户组:
groupadd hadoop && useradd -d /home/hadoop -g hadoop -m hadoop
  • 创建完账号后记得用命令passwd初始化hadoop账号的密码;

SSH免密码设置

改用hadoop账号登录

  • 后面在三台机器上的所有操作,都是用hadoop账号进行的,不再使用root账号;
  • 将JDK安装文件jdk-8u191-linux-x64.tar.gz下载到hadoop账号的家目录下;
  • 将hadoop安装文件hadoop-2.7.7.tar.gz下载到hadoop账号的家目录下;
  • 下载完毕后,家目录下的内容如下所示:
[hadoop@node0 ~]$ ls ~
hadoop-2.7.7.tar.gz  jdk-8u191-linux-x64.tar.gz

JDK设置(三台电脑都要做)

  • 解压jdk-8u191-linux-x64.tar.gz文件:
tar -zxvf ~/jdk-8u191-linux-x64.tar.gz
  • 打开文件~/.bash_profile,在尾部追加以下内容:
export JAVA_HOME=/home/hadoop/jdk1.8.0_191
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
  • 执行命令source ~/.bash_profile使得JDK设置生效;
    v执行命令java -version确认设置成功:
[hadoop@node0 ~]$ java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

创建hadoop要用到的文件夹(三台电脑都要做)

  • 创建文件夹,后面hadoop会用到:
mkdir -p ~/work/tmp/dfs/name && mkdir -p ~/work/tmp/dfs/data

hadoop设置

  • 以hadoop账号登录node0;
  • 解压hadoop安装包:
tar -zxvf hadoop-2.7.7.tar.gz
  • 进入目录~/hadoop-2.7.7/etc/hadoop;
  • 依次编辑hadoop-env.sh、mapred-env.sh、yarn-env.sh这三个文件,确保它们的内容中都有JAVA_HOME的正确配置,如下:
export JAVA_HOME=/home/hadoop/jdk1.8.0_191
  • 编辑core-site.xml文件,找到configuration节点,改成以下内容:
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://node0:8020</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/work/tmp</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file://${hadoop.tmp.dir}/dfs/name</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file://${hadoop.tmp.dir}/dfs/data</value>
  </property>
</configuration>
  • 编辑hdfs-site.xml文件,找到configuration节点,改成以下内容,把node2配置成sendary namenode:
<configuration>
  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>node2:50090</value>
  </property>
</configuration>
  • 编辑slaves文件,删除里面的"localhost",增加两行内容:
node1
node2
  • 编辑yarn-site.xml文件,找到configuration节点,改成以下内容:
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>node0</value>
  </property>
  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>106800</value>
  </property>
</configuration>
  • 将文件mapred-site.xml.template改名为mapred-site.xml:
mv mapred-site.xml.template mapred-site.xml
  • 编辑mapred-site.xml文件,找到configuration节点,改成以下内容:
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>node0:10020</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>node0:19888</value>
  </property>
</configuration>
  • 将整个hadoop-2.7.7目录同步到node1的家目录:
scp -r ~/hadoop-2.7.7 hadoop@node1:~/
  • 将整个hadoop-2.7.7目录同步到node2的家目录:
scp -r ~/hadoop-2.7.7 hadoop@node2:~/

格式化hdfs

  • 在node0执行以下命令格式化hdfs:
~/hadoop-2.7.7/bin/hdfs namenode -format

启动hadoop

  • node0机器执行以下命令,启动hdfs:
~/hadoop-2.7.7/sbin/start-dfs.sh
  • node0机器执行以下命令,启动yarn:
~/hadoop-2.7.7/sbin/start-yarn.sh
  • node0机器执行以下命令,启动ResourceManager:
~/hadoop-2.7.7/sbin/yarn-daemon.sh start resourcemanager
  • node0机器执行以下命令,启动日志服务:
~/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh start historyserver
  • 启动成功后,在node0执行jps命令查看java进程,如下:
[hadoop@node0 ~]$ jps
3253 JobHistoryServer
2647 NameNode
3449 Jps
2941 ResourceManager
  • 在node1执行jps命令查看java进程,如下:
[hadoop@node1 ~]$ jps
2176 DataNode
2292 NodeManager
2516 Jps
  • 在node2执行jps命令查看java进程,如下:
[hadoop@node2 ~]$ jps
1991 DataNode
2439 Jps
2090 SecondaryNameNode
2174 NodeManager
  • 至此,hadoop启动成功;

验证hadoop

  • 下面运行一次经典的WorkCount程序来检查hadoop工作是否正常:
  • 以hadoop账号登录node0,在家目录创建文件test.txt,内容如下:
hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop
  • 在hdfs上创建一个文件夹:
~/hadoop-2.7.7/bin/hdfs dfs -mkdir /input
  • 将test.txt文件上传的hdfs的/input目录下:
~/hadoop-2.7.7/bin/hdfs dfs -put ~/test.txt /input
  • 直接运行hadoop安装包中自带的workcount程序:
~/hadoop-2.7.7/bin/yarn \
jar ~/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar \
wordcount \
/input/test.txt \
/output
  • 控制台输出如下:
[hadoop@node0 ~]$ ~/hadoop-2.7.7/bin/yarn \
> jar ~/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar \
> wordcount \
> /input/test.txt \
> /output
19/02/08 14:34:28 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.119.164:8032
19/02/08 14:34:29 INFO input.FileInputFormat: Total input paths to process : 1
19/02/08 14:34:29 INFO mapreduce.JobSubmitter: number of splits:1
19/02/08 14:34:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1549606965916_0001
19/02/08 14:34:30 INFO impl.YarnClientImpl: Submitted application application_1549606965916_0001
19/02/08 14:34:30 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1549606965916_0001/
19/02/08 14:34:30 INFO mapreduce.Job: Running job: job_1549606965916_0001
19/02/08 14:34:36 INFO mapreduce.Job: Job job_1549606965916_0001 running in uber mode : false
19/02/08 14:34:36 INFO mapreduce.Job:  map 0% reduce 0%
19/02/08 14:34:41 INFO mapreduce.Job:  map 100% reduce 0%
19/02/08 14:34:46 INFO mapreduce.Job:  map 100% reduce 100%
19/02/08 14:34:46 INFO mapreduce.Job: Job job_1549606965916_0001 completed successfully
19/02/08 14:34:46 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=94
		FILE: Number of bytes written=245525
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=168
		HDFS: Number of bytes written=60
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=2958
		Total time spent by all reduces in occupied slots (ms)=1953
		Total time spent by all map tasks (ms)=2958
		Total time spent by all reduce tasks (ms)=1953
		Total vcore-milliseconds taken by all map tasks=2958
		Total vcore-milliseconds taken by all reduce tasks=1953
		Total megabyte-milliseconds taken by all map tasks=3028992
		Total megabyte-milliseconds taken by all reduce tasks=1999872
	Map-Reduce Framework
		Map input records=4
		Map output records=11
		Map output bytes=115
		Map output materialized bytes=94
		Input split bytes=97
		Combine input records=11
		Combine output records=7
		Reduce input groups=7
		Reduce shuffle bytes=94
		Reduce input records=7
		Reduce output records=7
		Spilled Records=14
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=93
		CPU time spent (ms)=1060
		Physical memory (bytes) snapshot=430956544
		Virtual memory (bytes) snapshot=4203192320
		Total committed heap usage (bytes)=285212672
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=71
	File Output Format Counters 
		Bytes Written=60
  • 查看输出结果:
~/hadoop-2.7.7/bin/hdfs dfs -ls /output
  • 可见hdfs的/output目录下,有两个文件:
[hadoop@node0 ~]$ ~/hadoop-2.7.7/bin/hdfs dfs -ls /output
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2019-02-08 14:34 /output/_SUCCESS
-rw-r--r--   3 hadoop supergroup         60 2019-02-08 14:34 /output/part-r-00000
  • 看一下文件part-r-00000的内容:
[hadoop@node0 ~]$ ~/hadoop-2.7.7/bin/hdfs dfs -cat /output/part-r-00000
hadoop	3
hbase	1
hive	2
mapreduce	1
spark	2
sqoop	1
storm	1
  • 可见WorkCount计算成功,结果符合预期;

  • hdfs网页如下图,可以看到文件信息,地址: http://192.168.119.163:50070

    Linux部署hadoop2.7.7集群_hdfs
  • yarn的网页如下图,可以看到任务信息,地址: http://192.168.119.163:8088

    Linux部署hadoop2.7.7集群_hadoop_02
  • 至此,hadoop2.7.7集群搭建和验证完毕,希望在您搭建环境时能给您提供一些参考;

欢迎关注51CTO博客:程序员欣宸

 学习路上,你不孤单,欣宸原创一路相伴…

  • 收藏
  • 评论
  • 分享
  • 举报

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK