MapReduce编程例子之Combiner与Partitioner

2022-06-16 271

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： MapReduce编程例子之Combiner与Partitioner

0x00 教程内容

本教程是在“MapReduce入门例子之单词计数”上做的升级，请查阅此教程。
包括了实现Combiner与Partitioner编程，都是一些编程技巧。

0x01 Combiner讲解

1. 优势

a. 其实就是本地的reducer，在本地先聚合一次

b. 可以减少Map Tasks输出的数据量以及数据网络的传输量

2. 使用场景

a. 适用于求和、次数等的加载

b. 求平均数等的计算并不合适

0x02 Partitioner讲解

1. 意义

a. 决定MapTask输出的数据交由哪个ReduceTask处理

b. 默认：计算分发的key的hash值对Reduce Task的个数取模决定有哪个处理

2. 测试单词的Hash值

a. 在进行WordCount的时候，我们可以通过测试代码，计算一下每个单词的Hash值是多少，然后再观察值最终是去到了哪个节点。

b. 如果我们是设置成了2个Reduce，则% 2，测试代码如下：

public class HashCodeTest {
    public static void main(String[] args) {
        System.out.println("an".hashCode() % 2);
        System.out.println("name".hashCode() % 2);
        System.out.println("you".hashCode() % 2);
        System.out.println("are".hashCode() % 2);
        System.out.println("example".hashCode() % 2);
        System.out.println("friend".hashCode() % 2);
        System.out.println("how".hashCode() % 2);
        System.out.println("is".hashCode() % 2);
        System.out.println("my".hashCode() % 2);
        System.out.println("this".hashCode() % 2);
        System.out.println("twq".hashCode() % 2);
        System.out.println("what".hashCode() % 2);
    }
}

0x03 编程实操

1. 实现Combiner

a. 逻辑上与reduce是一样的，因为其实就是本地聚合，在mian方法里添加此句即可：

job.setCombinerClass(MyReducer.class);

2. 自定义Partitioner

a. 准备统计的数据：

student 1500
teacher 200
student 2000
teacher 300
student 2000
teacher 300
doctor 100
doctor 200
artist 55

b. 修改MyMapper类里面的map方法代码：

for(String word :  words) {
  context.write(new Text(word), one);
}

修改成：

context.write(new Text(words[0]), new LongWritable(Long.parseLong(words[1])));

c. 添加一个Partitioner类：

public static class MyPartitioner extends Partitioner<Text, LongWritable> {
  @Override
  public int getPartition(Text key, LongWritable value, int numPartitions) {
    if(key.toString().equals("student")) {
      return 0;
    }
    if(key.toString().equals("teacher")) {
      return 1;
    }
    if(key.toString().equals("doctor")) {
      return 2;
    }
    return 3;
  }
}

d. 在main方法里添加上自定义的Partitioner类以及Reducer的个数：

//设置job的partition
job.setPartitionerClass(MyPartitioner.class);
//设置4个reducer
job.setNumReduceTasks(4);

0xFF 总结

注意reducer个数要与你文件的类型个数一致，如student、teacher、doctor、artist四种，则设置为4
如何执行请查看前面的教程。

MapReduce编程例子之Combiner与Partitioner

0x00 教程内容

0x01 Combiner讲解

1. 优势

2. 使用场景

0x02 Partitioner讲解

1. 意义

0x03 编程实操

1. 实现Combiner

2. 自定义Partitioner

0xFF 总结

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

MapReduce编程例子之Combiner与Partitioner

0x00 教程内容

0x01 Combiner讲解

1. 优势

2. 使用场景

0x02 Partitioner讲解

1. 意义

0x03 编程实操

1. 实现Combiner

2. 自定义Partitioner

0xFF 总结

热门文章

最新文章

相关课程

相关电子书