2. 需求2:按日期进行排序
完整代码如下:
package com.shaonaiyi.mapreduce; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class accessTimesSort { public static class MyMapper extends Mapper<Object, Text, IntWritable, Text> { public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String lines = value.toString(); // 按tab键作为分隔符 String array[] = lines.split("\t"); // 将访问次数作为key int keyOutput = Integer.parseInt(array[1]); // 将日期作为value String valueOutput = array[0]; context.write(new IntWritable(keyOutput), new Text(valueOutput)); } } public static class MyReducer extends Reducer<IntWritable, Text, Text, IntWritable> { public void reduce(IntWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException { for (Text value : values) { // 对于IntWritable类型的key,MapReduce会默认进行升序排序 context.write(value, key); } } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length < 2) { System.err.println("Usage: wordcount <in> [<in>...] <out>"); System.exit(2); } Job job = new Job(conf, "Access Time Sort"); job.setJarByClass(accessTimesSort.class); job.setMapperClass(MyMapper.class); job.setReducerClass(MyReducer.class); job.setMapOutputKeyClass(IntWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) { FileInputFormat.addInputPath(job, new Path(otherArgs[i])); } FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
说明:
1、如果key为IntWritable类型,MapReduce会默认进行升序排序;
2、如果key为Text类型,MapReduce会默认按照字典顺序对字符串排序。
0x03 运行代码并观察结果
1. 需求1:按日期进行统计
(1)需求1传递参数
然后输入参数两个参数:
(2)结果
2. 需求2:按日期进行排序
(1)需求1传递参数
(2)结果
0x04 彩蛋
1. 打包放到HDFS上去统计
(1)将数据放到HDFS的 /
路径
(2)将项目达成jar包,比如此处为 hadoop-1.0.jar
(3)执行命令
格式为:
hadoop jar xxx.jar main方法的类 统计的文件路径 输出结果的路径
执行命令为:
hadoop jar target/hadoop-1.0.jar com.shaonaiyi.mapreduce.dailyAccessCount /user_login.txt /output
统计结果其实已经有了:
0xFF 总结
- 本文章对MapReduce进行基础的学习
- 想要学习更多大数据相关内容,请关注我!