1.创建test.log
2.hadoop创建目录及上传
3.查看官方封装好的函数,我们选取wordcount
4.运行wordcount
# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1
# 官方模板jar包 函数 输入目录 输出目录(未创建)
5.验证wordcount,词频统计
点击(此处)折叠或打开
- [root@sht-sgmhadoopnn-01 mapreduce]# more /tmp/test.log
- 1
- 2
- 3
- a
- b
- a
- v
- a a a
- abc
- 我是谁
- %……
- %
点击(此处)折叠或打开
- [root@sht-sgmhadoopnn-01 ~]# hadoop fs -mkdir /testdir
- 16/02/28 19:40:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- [root@sht-sgmhadoopnn-01 ~]# hadoop fs -put /tmp/test.log /testdir/
- 16/02/28 19:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
点击(此处)折叠或打开
- [root@sht-sgmhadoopnn-01 ~]#cd /hadoop/hadoop-2.7.2/share/hadoop/mapreduce
- [root@sht-sgmhadoopnn-01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.2.jar
- An example program must be given as the first argument.
- Valid program names are:
- aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
- aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
- bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
- dbcount: An example job that count the pageview counts from a database.
- distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
- grep: A map/reduce program that counts the matches of a regex in the input.
- join: A job that effects a join over sorted, equally partitioned datasets
- multifilewc: A job that counts words from several files.
- pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
- pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
- randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
- randomwriter: A map/reduce program that writes 10GB of random data per node.
- secondarysort: An example defining a secondary sort to the reduce.
- sort: A map/reduce program that sorts the data written by the random writer.
- sudoku: A sudoku solver.
- teragen: Generate data for the terasort
- terasort: Run the terasort
- teravalidate: Checking results of terasort
- wordcount: A map/reduce program that counts the words in the input files.
- wordmean: A map/reduce program that counts the average length of the words in the input files.
- wordmedian: A map/reduce program that counts the median length of the words in the input files.
- wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1
# 官方模板jar包 函数 输入目录 输出目录(未创建)
点击(此处)折叠或打开
- [root@sht-sgmhadoopnn-01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1
- 16/02/28 19:40:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- 16/02/28 19:40:53 INFO input.FileInputFormat: Total input paths to process : 1
- 16/02/28 19:40:53 INFO mapreduce.JobSubmitter: number of splits:1
- 16/02/28 19:40:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456590271264_0002
- 16/02/28 19:40:54 INFO impl.YarnClientImpl: Submitted application application_1456590271264_0002
- 16/02/28 19:40:54 INFO mapreduce.Job: The url to track the job: http://sht-sgmhadoopnn-01:8088/proxy/application_1456590271264_0002/
- 16/02/28 19:40:54 INFO mapreduce.Job: Running job: job_1456590271264_0002
- 16/02/28 19:41:04 INFO mapreduce.Job: Job job_1456590271264_0002 running in uber mode : false
- 16/02/28 19:41:04 INFO mapreduce.Job: map 0% reduce 0%
- 16/02/28 19:41:12 INFO mapreduce.Job: map 100% reduce 0%
- 16/02/28 19:41:21 INFO mapreduce.Job: map 100% reduce 100%
- 16/02/28 19:41:22 INFO mapreduce.Job: Job job_1456590271264_0002 completed successfully
- 16/02/28 19:41:22 INFO mapreduce.Job: Counters: 49
- File System Counters
- FILE: Number of bytes read=102
- FILE: Number of bytes written=244621
- FILE: Number of read operations=0
- FILE: Number of large read operations=0
- FILE: Number of write operations=0
- HDFS: Number of bytes read=142
- HDFS: Number of bytes written=56
- HDFS: Number of read operations=6
- HDFS: Number of large read operations=0
- HDFS: Number of write operations=2
- Job Counters
- Launched map tasks=1
- Launched reduce tasks=1
- Data-local map tasks=1
- Total time spent by all maps in occupied slots (ms)=5537
- Total time spent by all reduces in occupied slots (ms)=6555
- Total time spent by all map tasks (ms)=5537
- Total time spent by all reduce tasks (ms)=6555
- Total vcore-milliseconds taken by all map tasks=5537
- Total vcore-milliseconds taken by all reduce tasks=6555
- Total megabyte-milliseconds taken by all map tasks=5669888
- Total megabyte-milliseconds taken by all reduce tasks=6712320
- Map-Reduce Framework
- Map input records=12
- Map output records=14
- Map output bytes=100
- Map output materialized bytes=102
- Input split bytes=98
- Combine input records=14
- Combine output records=10
- Reduce input groups=10
- Reduce shuffle bytes=102
- Reduce input records=10
- Reduce output records=10
- Spilled Records=20
- Shuffled Maps =1
- Failed Shuffles=0
- Merged Map outputs=1
- GC time elapsed (ms)=79
- CPU time spent (ms)=2560
- Physical memory (bytes) snapshot=445992960
- Virtual memory (bytes) snapshot=1775263744
- Total committed heap usage (bytes)=306184192
- Shuffle Errors
- BAD_ID=0
- CONNECTION=0
- IO_ERROR=0
- WRONG_LENGTH=0
- WRONG_MAP=0
- WRONG_REDUCE=0
- File Input Format Counters
- Bytes Read=44
- File Output Format Counters
- Bytes Written=56
- You have mail in /var/spool/mail/root
- [root@sht-sgmhadoopnn-01 mapreduce]#
点击(此处)折叠或打开
- [root@sht-sgmhadoopnn-01 mapreduce]# hadoop fs -ls /out1
- 16/02/28 19:43:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- Found 2 items
- -rw-r--r-- 3 root supergroup 0 2016-02-28 19:41 /out1/_SUCCESS
- -rw-r--r-- 3 root supergroup 56 2016-02-28 19:41 /out1/part-r-00000
- [root@sht-sgmhadoopnn-01 mapreduce]# hadoop fs -text /out1/part-r-00000
- 16/02/28 19:43:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- % 1
- %…… 1
- 1 1
- 2 1
- 3 1
- a 5
- abc 1
- b 1
- v 1
- 我是谁 1
- You have mail in /var/spool/mail/root