开发者社区> 问答> 正文

E-MapReduce Hadoop Streaming是什么?



python 写hadoop streaming作业


mapper代码如下

  1. [backcolor=transparent]#!/usr/bin/env python
  2. [backcolor=transparent]import[backcolor=transparent] sys
  3. [backcolor=transparent]for[backcolor=transparent] line [backcolor=transparent]in[backcolor=transparent] sys[backcolor=transparent].[backcolor=transparent]stdin[backcolor=transparent]:
  4. [backcolor=transparent]    line [backcolor=transparent]=[backcolor=transparent] line[backcolor=transparent].[backcolor=transparent]strip[backcolor=transparent]()
  5. [backcolor=transparent]    words [backcolor=transparent]=[backcolor=transparent] line[backcolor=transparent].[backcolor=transparent]split[backcolor=transparent]()
  6. [backcolor=transparent]    [backcolor=transparent]for[backcolor=transparent] word [backcolor=transparent]in[backcolor=transparent] words[backcolor=transparent]:
  7. [backcolor=transparent]        [backcolor=transparent]print[backcolor=transparent] [backcolor=transparent]'%s\t%s'[backcolor=transparent] [backcolor=transparent]%[backcolor=transparent] [backcolor=transparent]([backcolor=transparent]word[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]1[backcolor=transparent])

reducer代码如下
  1. [backcolor=transparent]#!/usr/bin/env python
  2. [backcolor=transparent]from[backcolor=transparent] [backcolor=transparent]operator[backcolor=transparent] [backcolor=transparent]import[backcolor=transparent] itemgetter
  3. [backcolor=transparent]import[backcolor=transparent] sys
  4. [backcolor=transparent]current_word [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]None
  5. [backcolor=transparent]current_count [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]0
  6. [backcolor=transparent]word [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]None
  7. [backcolor=transparent]for[backcolor=transparent] line [backcolor=transparent]in[backcolor=transparent] sys[backcolor=transparent].[backcolor=transparent]stdin[backcolor=transparent]:
  8. [backcolor=transparent]    line [backcolor=transparent]=[backcolor=transparent] line[backcolor=transparent].[backcolor=transparent]strip[backcolor=transparent]()
  9. [backcolor=transparent]    word[backcolor=transparent],[backcolor=transparent] count [backcolor=transparent]=[backcolor=transparent] line[backcolor=transparent].[backcolor=transparent]split[backcolor=transparent]([backcolor=transparent]'\t'[backcolor=transparent],[backcolor=transparent] [backcolor=transparent]1[backcolor=transparent])
  10. [backcolor=transparent]    [backcolor=transparent]try[backcolor=transparent]:
  11. [backcolor=transparent]        count [backcolor=transparent]=[backcolor=transparent] [backcolor=transparent]int[backcolor=transparent]([backcolor=transparent]count[backcolor=transparent])
  12. [backcolor=transparent]    [backcolor=transparent]except[backcolor=transparent] [backcolor=transparent]ValueError[backcolor=transparent]:
  13. [backcolor=transparent]        [backcolor=transparent]continue
  14. [backcolor=transparent]    [backcolor=transparent]if[backcolor=transparent] current_word [backcolor=transparent]==[backcolor=transparent] word[backcolor=transparent]:
  15. [backcolor=transparent]        current_count [backcolor=transparent]+=[backcolor=transparent] count
  16. [backcolor=transparent]    [backcolor=transparent]else[backcolor=transparent]:
  17. [backcolor=transparent]        [backcolor=transparent]if[backcolor=transparent] current_word[backcolor=transparent]:
  18. [backcolor=transparent]            [backcolor=transparent]print[backcolor=transparent] [backcolor=transparent]'%s\t%s'[backcolor=transparent] [backcolor=transparent]%[backcolor=transparent] [backcolor=transparent]([backcolor=transparent]current_word[backcolor=transparent],[backcolor=transparent] current_count[backcolor=transparent])
  19. [backcolor=transparent]        current_count [backcolor=transparent]=[backcolor=transparent] count
  20. [backcolor=transparent]        current_word [backcolor=transparent]=[backcolor=transparent] word
  21. [backcolor=transparent]if[backcolor=transparent] current_word [backcolor=transparent]==[backcolor=transparent] word[backcolor=transparent]:
  22. [backcolor=transparent]    [backcolor=transparent]print[backcolor=transparent] [backcolor=transparent]'%s\t%s'[backcolor=transparent] [backcolor=transparent]%[backcolor=transparent] [backcolor=transparent]([backcolor=transparent]current_word[backcolor=transparent],[backcolor=transparent] current_count[backcolor=transparent])

假设mapper代码保存在/home/hadoop/mapper.py, reducer代码保存在/home/hadoop/reducer.py , 输入路径为hdfs文件系统的/tmp/input,输出路径为hdfs文件系统的/tmp/output。则在E-MapReduce集群上提交下面的hadoop命令
hadoop jar /usr/lib/hadoop-current/share/hadoop/tools/lib/hadoop-streaming-*.jar -file /home/hadoop/mapper.py -mapper mapper.py -file /home/hadoop/reducer.py -reducer reducer.py -input /tmp/hosts -output /tmp/output

展开
收起
nicenelly 2017-10-27 16:14:34 1383 0
0 条回答
写回答
取消 提交回答
问答排行榜
最热
最新

相关电子书

更多
《构建Hadoop生态批流一体的实时数仓》 立即下载
零基础实现hadoop 迁移 MaxCompute 之 数据 立即下载
CIO 指南:如何在SAP软件架构中使用Hadoop 立即下载

相关实验场景

更多