MapReduce编程(二) 文件合并和去重

2017-03-30 4199

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 一、问题描述对输入的多个文件进行合并，并剔除其中重复的内容，去重后的内容输出到一个文件中。file1.txt中的内容:20150101 x20150102 y20150103 x20150104 yfile2.

一、问题描述

对输入的多个文件进行合并，并剔除其中重复的内容，去重后的内容输出到一个文件中。

file1.txt中的内容:

20150101     x
20150102     y
20150103     x
20150104     y

file2.txt中的内容:

20150105     z
20150106     x
20150101     y
20150102     y

file3.txt中的内容:


20150103     x
20150104     z
20150105     y

二、MapReduce程序

编写MapReduce程序，运行环境参考我的上一篇博客Intellij Idea配置MapReduce编程环境

package com.javacore.hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import java.io.IOException;


/**
 * Created by bee on 17/3/25.
 */
public class FileMerge {

    public static class Map extends Mapper<Object, Text, Text, Text> {
        private static Text text = new Text();

        public void map(Object key, Text value, Context content) throws IOException, InterruptedException {

            text = value;
            content.write(text, new Text(""));
        }
    }

    public static class Reduce extends Reducer<Text, Text, Text, Text> {
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            context.write(key, new Text(""));
        }
    }


    public static void main(String[] args) throws Exception {

        // delete output directory
        FileUtil.deleteDir("output");
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://localhost:9000");
        String[] otherArgs = new String[]{"input/filemerge/f*.txt",
                "output"};
        if (otherArgs.length != 2) {
            System.err.println("Usage:Merge and duplicate removal <in> <out>");
            System.exit(2);
        }

        Job job = Job.getInstance();
        job.setJarByClass(FileMerge.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }
}

三、输出

20150101     x
20150101     y
20150102     y
20150103     x
20150104     y
20150104     z
20150105     y
20150105     z
20150106     x

MapReduce编程(二) 文件合并和去重

一、问题描述

二、MapReduce程序

三、输出

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

MapReduce编程(二) 文件合并和去重

一、问题描述

二、MapReduce程序

三、输出

热门文章

最新文章

相关课程

相关电子书