Hadoop MapReduce之wordcount(词频统计)-阿里云开发者社区

Hadoop MapReduce之wordcount(词频统计)

2016-02-28 1396

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 1.创建test.log 点击(此处)折叠或打开 [root@sht-sgmhadoopnn-01 mapreduce]# more /tmp/test.

1.创建test.log

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 mapreduce]# more /tmp/test.log
1
2
3
a
b
a
v
a a a
abc
我是谁
%……
%

2.hadoop创建目录及上传

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 ~]# hadoop fs -mkdir /testdir
16/02/28 19:40:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@sht-sgmhadoopnn-01 ~]# hadoop fs -put /tmp/test.log /testdir/
16/02/28 19:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

3.查看官方封装好的函数,我们选取wordcount

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 ~]#cd /hadoop/hadoop-2.7.2/share/hadoop/mapreduce
[root@sht-sgmhadoopnn-01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.2.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

4.运行wordcount
# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1
# 官方模板jar包函数输入目录输出目录(未创建)

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1
16/02/28 19:40:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/28 19:40:53 INFO input.FileInputFormat: Total input paths to process : 1
16/02/28 19:40:53 INFO mapreduce.JobSubmitter: number of splits:1
16/02/28 19:40:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456590271264_0002
16/02/28 19:40:54 INFO impl.YarnClientImpl: Submitted application application_1456590271264_0002
16/02/28 19:40:54 INFO mapreduce.Job: The url to track the job: http://sht-sgmhadoopnn-01:8088/proxy/application_1456590271264_0002/
16/02/28 19:40:54 INFO mapreduce.Job: Running job: job_1456590271264_0002
16/02/28 19:41:04 INFO mapreduce.Job: Job job_1456590271264_0002 running in uber mode : false
16/02/28 19:41:04 INFO mapreduce.Job: map 0% reduce 0%
16/02/28 19:41:12 INFO mapreduce.Job: map 100% reduce 0%
16/02/28 19:41:21 INFO mapreduce.Job: map 100% reduce 100%
16/02/28 19:41:22 INFO mapreduce.Job: Job job_1456590271264_0002 completed successfully
16/02/28 19:41:22 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=102
FILE: Number of bytes written=244621
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=142
HDFS: Number of bytes written=56
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5537
Total time spent by all reduces in occupied slots (ms)=6555
Total time spent by all map tasks (ms)=5537
Total time spent by all reduce tasks (ms)=6555
Total vcore-milliseconds taken by all map tasks=5537
Total vcore-milliseconds taken by all reduce tasks=6555
Total megabyte-milliseconds taken by all map tasks=5669888
Total megabyte-milliseconds taken by all reduce tasks=6712320
Map-Reduce Framework
Map input records=12
Map output records=14
Map output bytes=100
Map output materialized bytes=102
Input split bytes=98
Combine input records=14
Combine output records=10
Reduce input groups=10
Reduce shuffle bytes=102
Reduce input records=10
Reduce output records=10
Spilled Records=20
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=79
CPU time spent (ms)=2560
Physical memory (bytes) snapshot=445992960
Virtual memory (bytes) snapshot=1775263744
Total committed heap usage (bytes)=306184192
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=44
File Output Format Counters
Bytes Written=56
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 mapreduce]#

5.验证wordcount，词频统计

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 mapreduce]# hadoop fs -ls /out1
16/02/28 19:43:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 3 root supergroup 0 2016-02-28 19:41 /out1/_SUCCESS
-rw-r--r-- 3 root supergroup 56 2016-02-28 19:41 /out1/part-r-00000
[root@sht-sgmhadoopnn-01 mapreduce]# hadoop fs -text /out1/part-r-00000
16/02/28 19:43:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
% 1
%…… 1
1 1
2 1
3 1
a 5
abc 1
b 1
v 1
我是谁 1
You have mail in /var/spool/mail/root

Hadoop MapReduce之wordcount(词频统计)

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Hadoop MapReduce之wordcount(词频统计)

热门文章

最新文章

相关课程

相关电子书

相关实验场景