开发者社区> 问答> 正文

Flink Shaded Hadoop S3文件系统仍然需要hdfs-default和hdfs-site配置路径

社区小助手 2018-12-11 16:17:53 1194

我正在尝试使用Flink 1.6.0将S3配置为我的状态后端。

flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: s3://*/flink-checkpoints
state.savepoints.dir: s3://*/flink-savepoints

s3.access-key: *
s3.secret-key: *
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#shaded-hadooppresto-s3-file-systems-recommended

我已将flink-s3-fs-hadoop-1.6.0.jar移动到lib目录中。文档没有为此特定方法指定任何hadoop配置文件的需要。然而,我正面临这个错误抱怨缺少hadoop配置路径。

2018-08-24 23:25:17,829 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - State backend is set to heap memory (checkpoints to filesystem "s3://*/flink-checkpoints")
2018-08-24 23:25:17,831 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory - Creating Hadoop file system (backed by Hadoop s3a file system)
2018-08-24 23:25:17,831 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory - Loading Hadoop configuration for Hadoop s3a file system
2018-08-24 23:25:17,872 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-default configuration-file path in Flink config.
2018-08-24 23:25:17,873 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-site configuration-file path in Flink config.
2018-08-24 23:25:17,873 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils - Could not find Hadoop configuration via any of the supported methods (Flink configuration, environment variables).
2018-08-24 23:25:17,878 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> Map -> Sink: Print to Std. Out (1/1) (ee0eeb00ea0f01043d90f6b8d3c0cc2e) switched from RUNNING to FAILED.
javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created

    at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
    at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
    at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
    at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
    at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
    at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
    at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.set(Configuration.java:1149)
    at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.set(Configuration.java:1121)
    at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.loadHadoopConfigFromFlink(HadoopConfigLoader.java:101)
    at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.getOrLoadHadoopConfig(HadoopConfigLoader.java:80)
    at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory.create(AbstractFileSystemFactory.java:55)
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:395)
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:318)
    at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
    at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:61)
    at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:443)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:257)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
    at java.lang.Thread.run(Thread.java:748)

我有没有错过什么?

分布式计算 Hadoop 流计算
分享到
取消 提交回答
全部回答(1)
  • 社区小助手
    2019-07-17 23:19:51

    搞砸了我的依赖关系,这就是造成这种无关异常的原因。正在尝试需要Hadoop依赖性的Bucketing和Rolling Sink连接器。在maven提供的范围中添加它们,但无法从IntelliJ IDEA运行它们。所以将它们切换为编译并保持原样。他们收拾了神器jar的一部分并导致了这个问题。

    经验教训:永远不要在默认(编译)范围内添加Hadoop依赖项。IntelliJ在“运行配置”中有一个选项,用于包含在提供的范围内声明的依赖项。

    0 0
+ 订阅

大数据计算实践乐园,近距离学习前沿技术

推荐文章
相似问题