在将Flink数据集写入hdfs时如何创建Job对象

"我想写一些数据集给hive。我试过hive jdbc，但它不支持batchExecute。所以我改为将其写入hdfs，然后生成hive表。

我尝试使用以下代码来编写hdfs：

package test;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.hadoop.mapreduce.HadoopOutputFormat;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.hadoop.fs.Path;
import org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.io.IntWritable;
import org.apache.flink.fs.s3presto.shaded.org.apache.hadoop.io.Text;
import org.apache.flink.fs.s3presto.shaded.org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.flink.util.Collector;
import org.apache.hadoop.mapreduce.Job;

public class Test {

public static void main(String[] args) {

    final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    DataSet<String> text = env.fromElements(
            ""Who's there?"",
            ""I think I hear them. Stand, ho! Who's there?"");

        DataSet<Tuple2<String, Integer>> hadoopResult = text
            .flatMap(new LineSplitter())
            .groupBy(0)
            .sum(1);
        //job and jobConf is null,I do not know how to initialize them (new)
    Job job = null;
    Job jobConf = null;
    HadoopOutputFormat<String, Integer> hadoopOF =
              new HadoopOutputFormat<String, Integer>(
                new TextOutputFormat<String, Integer>(), job
              );
            hadoopOF.getConfiguration().set(""mapreduce.output.textoutputformat.separator"", "" "");
            TextOutputFormat.setOutputPath(jobConf, new Path(""hdfs://somewhere/""));
            hadoopResult.output(hadoopOF);
}

 public static class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
    private static final long serialVersionUID = 3100297611484689639L;
        public void flatMap(String line, Collector<Tuple2<String, Integer>> out) {
            for (String word : line.split("" "")) {
                out.collect(new Tuple2<String, Integer>(word, 1));
            }
        }
    }

}
但是有很多编译错误。所有代码都从官方网站复制并拼接这些代码。

我的问题：如何创建Job和Jobconf对象，然后将数据集写入hdfs？"

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

在将Flink数据集写入hdfs时如何创建Job对象

相关文章