"我想写一些数据集给hive。我试过hive jdbc,但它不支持batchExecute。所以我改为将其写入hdfs,然后生成hive表。
我尝试使用以下代码来编写hdfs:
package test;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.hadoop.mapreduce.HadoopOutputFormat;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.hadoop.fs.Path;
import org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.io.IntWritable;
import org.apache.flink.fs.s3presto.shaded.org.apache.hadoop.io.Text;
import org.apache.flink.fs.s3presto.shaded.org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.flink.util.Collector;
import org.apache.hadoop.mapreduce.Job;
public class Test {
public static void main(String[] args) {
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> text = env.fromElements(
""Who's there?"",
""I think I hear them. Stand, ho! Who's there?"");
DataSet<Tuple2<String, Integer>> hadoopResult = text
.flatMap(new LineSplitter())
.groupBy(0)
.sum(1);
//job and jobConf is null,I do not know how to initialize them (new)
Job job = null;
Job jobConf = null;
HadoopOutputFormat<String, Integer> hadoopOF =
new HadoopOutputFormat<String, Integer>(
new TextOutputFormat<String, Integer>(), job
);
hadoopOF.getConfiguration().set(""mapreduce.output.textoutputformat.separator"", "" "");
TextOutputFormat.setOutputPath(jobConf, new Path(""hdfs://somewhere/""));
hadoopResult.output(hadoopOF);
}
public static class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
private static final long serialVersionUID = 3100297611484689639L;
public void flatMap(String line, Collector<Tuple2<String, Integer>> out) {
for (String word : line.split("" "")) {
out.collect(new Tuple2<String, Integer>(word, 1));
}
}
}
}
但是有很多编译错误。所有代码都从官方网站复制并拼接这些代码。
我的问题:如何创建Job和Jobconf对象,然后将数据集写入hdfs?"
"
创建工作:
Job job = Job.getInstance();
我认为您不需要Jobconf对象 - 似乎你可以在两个地方使用Job对象。
"
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。