1.使用示例程序实现单词统计
(1)wordcount程序
    wordcount程序在hadoop的share目录下,如下:
| 1 2 3 4 5 6 7 8 9 | [root@leaf mapreduce]# pwd/usr/local/hadoop/share/hadoop/mapreduce[root@leaf mapreduce]# lshadoop-mapreduce-client-app-2.6.5.jar         hadoop-mapreduce-client-jobclient-2.6.5-tests.jarhadoop-mapreduce-client-common-2.6.5.jar      hadoop-mapreduce-client-shuffle-2.6.5.jarhadoop-mapreduce-client-core-2.6.5.jar        hadoop-mapreduce-examples-2.6.5.jarhadoop-mapreduce-client-hs-2.6.5.jar          libhadoop-mapreduce-client-hs-plugins-2.6.5.jar  lib-exampleshadoop-mapreduce-client-jobclient-2.6.5.jar   sources | 
就是这个hadoop-mapreduce-examples-2.6.5.jar程序。
(2)创建HDFS数据目录
    创建一个目录,用于保存MapReduce任务的输入文件:
| 1 | [root@leaf ~]# hadoop fs -mkdir -p /data/wordcount | 
    创建一个目录,用于保存MapReduce任务的输出文件:
| 1 | [root@leaf ~]# hadoop fs -mkdir /output | 
    查看刚刚创建的两个目录:
| 1 2 3 | [root@leaf ~]# hadoop fs -ls /drwxr-xr-x   - root supergroup          0 2017-09-01 20:34 /datadrwxr-xr-x   - root supergroup          0 2017-09-01 20:35 /output | 
(3)创建一个单词文件,并上传到HDFS
    创建的单词文件如下:
| 1 2 3 4 5 6 | [root@leaf ~]# cat myword.txt leaf yyhyyh xpleafkaty lingyeyonghao leafxpleaf katy | 
    上传该文件到HDFS中:
| 1 | [root@leaf ~]# hadoop fs -put myword.txt /data/wordcount | 
    在HDFS中查看刚刚上传的文件及内容:
| 1 2 3 4 5 6 7 8 | [root@leaf ~]# hadoop fs -ls /data/wordcount-rw-r--r--   1 root supergroup         57 2017-09-01 20:40 /data/wordcount/myword.txt[root@leaf ~]# hadoop fs -cat /data/wordcount/myword.txtleaf yyhyyh xpleafkaty lingyeyonghao leafxpleaf katy | 
(4)运行wordcount程序
    执行如下命令:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | [root@leaf ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wordcount /output/wordcount...17/09/0120:48:14 INFO mapreduce.Job: Job job_local1719603087_0001 completed successfully17/09/0120:48:14 INFO mapreduce.Job: Counters: 38        File System Counters                FILE: Number of bytes read=585940                FILE: Number of bytes written=1099502                FILE: Number of readoperations=0                FILE: Number of large readoperations=0                FILE: Number of write operations=0                HDFS: Number of bytes read=114                HDFS: Number of bytes written=48                HDFS: Number of readoperations=15                HDFS: Number of large readoperations=0                HDFS: Number of write operations=4        Map-Reduce Framework                Map input records=5                Map output records=10                Map output bytes=97                Map output materialized bytes=78                Input splitbytes=112                Combine input records=10                Combine output records=6                Reduce input groups=6                Reduce shuffle bytes=78                Reduce input records=6                Reduce output records=6                Spilled Records=12                Shuffled Maps =1                Failed Shuffles=0                Merged Map outputs=1                GC timeelapsed (ms)=92                CPU timespent (ms)=0                Physical memory (bytes) snapshot=0                Virtual memory (bytes) snapshot=0                Total committed heap usage (bytes)=241049600        Shuffle Errors                BAD_ID=0                CONNECTION=0                IO_ERROR=0                WRONG_LENGTH=0                WRONG_MAP=0                WRONG_REDUCE=0        File Input Format Counters                 Bytes Read=57        File Output Format Counters                 Bytes Written=48 | 
    
(5)查看统计结果
    如下:
| 1 2 3 4 5 6 7 | [root@leaf ~]# hadoop fs -cat /output/wordcount/part-r-00000katy    2leaf    2ling    1xpleaf  2yeyonghao       1yyh     2 | 
本文转自 xpleaf 51CTO博客,原文链接:http://blog.51cto.com/xpleaf/1962271,如需转载请自行联系原作者