MapReduce编程(五) 单表关联

2017-03-31 2717

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 一、问题描述下面给出一个child-parent的表格，要求挖掘其中的父子辈关系，给出祖孙辈关系的表格。输入文件内容如下:child parentSteven LucySteven Jack...

一、问题描述

下面给出一个child-parent的表格，要求挖掘其中的父子辈关系，给出祖孙辈关系的表格。

输入文件内容如下:

child    parent
Steven   Lucy
Steven   Jack
Jone     Lucy
Jone     Jack
Lucy     Mary
Lucy     Frank
Jack     Alice
Jack     Jesse
David    Alice
David    Jesse
Philip   David
Philip   Alma
Mark     David
Mark     Alma

根据父辈和子辈挖掘爷孙关系。比如：

Steven   Jack
Jack     Alice
Jack     Jesse

根据这三条记录，可以得出Jack是Steven的长辈，而Alice和Jesse是Jack的长辈，很显然Steven是Alice和Jesse的孙子。挖掘出的结果如下：

grandson    grandparent
Steven      Jesse
Steven      Alice

要求通过MapReduce挖掘出所有的爷孙关系。

二、分析

解决这个问题要用到一个小技巧，就是单表关联。具体实现步骤如下，Map阶段每一行的key-value输入，同时也把value-key输入。以其中的两行为例：

Steven   Jack
Jack     Alice

key-value和value-key都输入，变成4行：

Steven   Jack
Jack     Alice
Jack     Steven  
Alice    Jack

shuffle以后，Jack作为key值，起到承上启下的桥梁作用，Jack对应的values包含Alice、Steven，这时候Alice和Steven肯定是爷孙关系。为了标记哪些是孙子辈，哪些是爷爷辈，可以在Map阶段加上前缀，比如小辈加上前缀”-“，长辈加上前缀”+”。加上前缀以后，在Reduce阶段就可以根据前缀进行分类。

三、MapReduce程序

package com.javacore.hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.ArrayList;
import java.util.StringTokenizer;


/**
 * Created by bee on 3/29/17.
 */
public class RelationShip {

    public static class RsMapper extends Mapper<Object, Text, Text, Text> {

        private static int linenum = 0;

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            if (linenum == 0) {
                ++linenum;
            } else {
                StringTokenizer tokenizer = new StringTokenizer(line, "\n");
                while (tokenizer.hasMoreElements()) {
                    StringTokenizer lineTokenizer = new StringTokenizer(tokenizer.nextToken());
                    String son = lineTokenizer.nextToken();
                    String parent = lineTokenizer.nextToken();
                    context.write(new Text(parent), new Text(
                            "-" + son));
                    context.write(new Text(son), new Text
                            ("+" + parent));
                }
            }

        }
    }

    public static class RsReducer extends Reducer<Text, Text, Text, Text> {
        private static int linenum = 0;

        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

            if (linenum == 0) {
                context.write(new Text("grandson"), new Text("grandparent"));
                ++linenum;
            }
            ArrayList<Text> grandChild = new ArrayList<Text>();
            ArrayList<Text> grandParent = new ArrayList<Text>();

            for (Text val : values) {
                String s = val.toString();

                if (s.startsWith("-")) {
                    grandChild.add(new Text(s.substring(1)));
                } else {
                    grandParent.add(new Text(s.substring(1)));
                }
            }

            for (Text text1 : grandChild) {
                for (Text text2 : grandParent) {
                    context.write(text1, text2);
                }
            }


        }


    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

        FileUtil.deleteDir("output");
        Configuration cong = new Configuration();

        String[] otherArgs = new String[]{"input/relations/table.txt",
                "output"};
        if (otherArgs.length != 2) {
            System.out.println("参数错误");
            System.exit(2);
        }

        Job job = Job.getInstance();
        job.setJarByClass(RelationShip.class);
        job.setMapperClass(RsMapper.class);
        job.setReducerClass(RsReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }
}

四、输出结果

grandson    grandparent
Mark    Jesse
Mark    Alice
Philip  Jesse
Philip  Alice
Jone    Jesse
Jone    Alice
Steven  Jesse
Steven  Alice
Steven  Frank
Steven  Mary
Jone    Frank
Jone    Mary

MapReduce编程(五) 单表关联

一、问题描述

二、分析

三、MapReduce程序

四、输出结果

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

MapReduce编程(五) 单表关联

一、问题描述

二、分析

三、MapReduce程序

四、输出结果

热门文章

最新文章

相关课程

相关电子书

相关实验场景