运行Hadoop自带的wordcount单词统计程序

1.使用示例程序实现单词统计

（1）wordcount程序

wordcount程序在hadoop的share目录下，如下：

  [root@leaf mapreduce]# pwd 
/usr/local/hadoop/share/hadoop/mapreduce
 [root@leaf mapreduce]# ls 
hadoop-mapreduce-client-app-2.6.5.jar         hadoop-mapreduce-client-jobclient-2.6.5-tests.jar
hadoop-mapreduce-client-common-2.6.5.jar      hadoop-mapreduce-client-shuffle-2.6.5.jar
hadoop-mapreduce-client-core-2.6.5.jar        hadoop-mapreduce-examples-2.6.5.jar
hadoop-mapreduce-client-hs-2.6.5.jar          lib
hadoop-mapreduce-client-hs-plugins-2.6.5.jar  lib-examples
hadoop-mapreduce-client-jobclient-2.6.5.jar   sources

就是这个hadoop-mapreduce-examples-2.6.5.jar程序。

（2）创建HDFS数据目录

创建一个目录，用于保存MapReduce任务的输入文件：

  [root@leaf ~]# hadoop fs -mkdir -p /data/wordcount 

创建一个目录，用于保存MapReduce任务的输出文件：

  [root@leaf ~]# hadoop fs -mkdir /output 

查看刚刚创建的两个目录：

  [root@leaf ~]# hadoop fs -ls / 
 drwxr-xr-x   - root supergroup          0 2017-09-01 20:34 /data 
 drwxr-xr-x   - root supergroup          0 2017-09-01 20:35 /output 

（3）创建一个单词文件，并上传到HDFS

创建的单词文件如下：

  [root@leaf ~]# cat myword.txt  
leaf yyh
yyh xpleaf
katy ling
yeyonghao leaf
xpleaf katy

上传该文件到HDFS中：

  [root@leaf ~]# hadoop fs -put myword.txt /data/wordcount 

在HDFS中查看刚刚上传的文件及内容：

  [root@leaf ~]# hadoop fs -ls /data/wordcount 
 -rw-r--r--   1 root supergroup         57 2017-09-01 20:40 /data/wordcount/myword.txt 
 [root@leaf ~]# hadoop fs -cat /data/wordcount/myword.txt 
leaf yyh
yyh xpleaf
katy ling
yeyonghao leaf
xpleaf katy

（4）运行wordcount程序

执行如下命令：

  [root@leaf ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wordcount /output/wordcount 
...
 17/09/01 20:48:14 INFO mapreduce.Job: Job job_local1719603087_0001 completed successfully 
 17/09/01 20:48:14 INFO mapreduce.Job: Counters: 38 
         File System Counters 
                 FILE: Number of bytes read=585940 
                 FILE: Number of bytes written=1099502 
                 FILE: Number of read operations=0 
                 FILE: Number of large read operations=0 
                 FILE: Number of write operations=0 
                 HDFS: Number of bytes read=114 
                 HDFS: Number of bytes written=48 
                 HDFS: Number of read operations=15 
                 HDFS: Number of large read operations=0 
                 HDFS: Number of write operations=4 
         Map-Reduce Framework 
                 Map input records=5 
                 Map output records=10 
                 Map output bytes=97 
                 Map output materialized bytes=78 
                 Input split bytes=112 
                 Combine input records=10 
                 Combine output records=6 
                 Reduce input groups=6 
                 Reduce shuffle bytes=78 
                 Reduce input records=6 
                 Reduce output records=6 
                 Spilled Records=12 
                 Shuffled Maps =1 
                 Failed Shuffles=0 
                 Merged Map outputs=1 
                 GC time elapsed (ms)=92 
                 CPU time spent (ms)=0 
                 Physical memory (bytes) snapshot=0 
                 Virtual memory (bytes) snapshot=0 
                 Total committed heap usage (bytes)=241049600 
         Shuffle Errors 
                 BAD_ID=0 
                 CONNECTION=0 
                 IO_ERROR=0 
                 WRONG_LENGTH=0 
                 WRONG_MAP=0 
                 WRONG_REDUCE=0 
         File Input Format Counters  
                 Bytes Read=57 
         File Output Format Counters  
                 Bytes Written=48 

（5）查看统计结果

如下：

  [root@leaf ~]# hadoop fs -cat /output/wordcount/part-r-00000 
katy    2
leaf    2
ling    1
xpleaf  2
yeyonghao       1
yyh     2

本文转自 xpleaf 51CTO博客，原文链接：http://blog.51cto.com/xpleaf/1962271，如需转载请自行联系原作者

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/540032.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

运行Hadoop自带的wordcount单词统计程序

相关文章

java for 线程_如何在for循环中使用多线程

office2010安装出现错误1935的解决方法

java linux 服务_java项目部署Linux服务器几种启动方式总结经验

HTML温故知新1

java 接口与包_java常用类包接口

当使用easyui时，表单的onchange事件失效

shell脚本：批量修改文件名(文件名中添加字符)

java 格式化小数_java-如何格式化小数位数精度

当使用easyui时，jquery的设置disabled属性方法失效

用户登录提交前，密码加密传输

java 内存分布_一图看懂JVM内存分布，永久记住！

Winodows10 安全登录(Administrator账户与Microsoft Account关联

微信企业号三个连接模式

java opencv安装路径_Java搭建opencv开发环境

iBatis——执行原理

让IT工作者过度劳累的12个坏习惯

flux java_Java反应式框架Reactor中的Mono和Flux

MySQL grant 权限，分别可以作用在多个层次上

用了2年的EOS的感受

java 线程分组_Java多线程可以分组，还能这样玩！