1.使用示例程序实现单词统计
(1)wordcount程序
wordcount程序在hadoop的share目录下,如下:
1
2
3
4
5
6
7
8
9
|
[root@leaf mapreduce]
# pwd
/usr/local/hadoop/share/hadoop/mapreduce
[root@leaf mapreduce]
# ls
hadoop-mapreduce-client-app-2.6.5.jar hadoop-mapreduce-client-jobclient-2.6.5-tests.jar
hadoop-mapreduce-client-common-2.6.5.jar hadoop-mapreduce-client-shuffle-2.6.5.jar
hadoop-mapreduce-client-core-2.6.5.jar hadoop-mapreduce-examples-2.6.5.jar
hadoop-mapreduce-client-hs-2.6.5.jar lib
hadoop-mapreduce-client-hs-plugins-2.6.5.jar lib-examples
hadoop-mapreduce-client-jobclient-2.6.5.jar sources
|
就是这个hadoop-mapreduce-examples-2.6.5.jar程序。
(2)创建HDFS数据目录
创建一个目录,用于保存MapReduce任务的输入文件:
1
|
[root@leaf ~]
# hadoop fs -mkdir -p /data/wordcount
|
创建一个目录,用于保存MapReduce任务的输出文件:
1
|
[root@leaf ~]
# hadoop fs -mkdir /output
|
查看刚刚创建的两个目录:
1
2
3
|
[root@leaf ~]
# hadoop fs -ls /
drwxr-xr-x - root supergroup 0 2017-09-01 20:34
/data
drwxr-xr-x - root supergroup 0 2017-09-01 20:35
/output
|
(3)创建一个单词文件,并上传到HDFS
创建的单词文件如下:
1
2
3
4
5
6
|
[root@leaf ~]
# cat myword.txt
leaf yyh
yyh xpleaf
katy ling
yeyonghao leaf
xpleaf katy
|
上传该文件到HDFS中:
1
|
[root@leaf ~]
# hadoop fs -put myword.txt /data/wordcount
|
在HDFS中查看刚刚上传的文件及内容:
1
2
3
4
5
6
7
8
|
[root@leaf ~]
# hadoop fs -ls /data/wordcount
-rw-r--r-- 1 root supergroup 57 2017-09-01 20:40
/data/wordcount/myword
.txt
[root@leaf ~]
# hadoop fs -cat /data/wordcount/myword.txt
leaf yyh
yyh xpleaf
katy ling
yeyonghao leaf
xpleaf katy
|
(4)运行wordcount程序
执行如下命令:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
[root@leaf ~]
# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wordcount /output/wordcount
...
17
/09/01
20:48:14 INFO mapreduce.Job: Job job_local1719603087_0001 completed successfully
17
/09/01
20:48:14 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes
read
=585940
FILE: Number of bytes written=1099502
FILE: Number of
read
operations=0
FILE: Number of large
read
operations=0
FILE: Number of write operations=0
HDFS: Number of bytes
read
=114
HDFS: Number of bytes written=48
HDFS: Number of
read
operations=15
HDFS: Number of large
read
operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=5
Map output records=10
Map output bytes=97
Map output materialized bytes=78
Input
split
bytes=112
Combine input records=10
Combine output records=6
Reduce input
groups
=6
Reduce shuffle bytes=78
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC
time
elapsed (ms)=92
CPU
time
spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=241049600
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=57
File Output Format Counters
Bytes Written=48
|
(5)查看统计结果
如下:
1
2
3
4
5
6
7
|
[root@leaf ~]
# hadoop fs -cat /output/wordcount/part-r-00000
katy 2
leaf 2
ling 1
xpleaf 2
yeyonghao 1
yyh 2
|
本文转自 xpleaf 51CTO博客,原文链接:http://blog.51cto.com/xpleaf/1962271,如需转载请自行联系原作者