MaxCompute操作报错合集之DataWorks在绑定MaxCompute引擎时,报错,如何解决

本文涉及的产品
云原生大数据计算服务MaxCompute,500CU*H 100GB 3个月
简介: MaxCompute是阿里云提供的大规模离线数据处理服务,用于大数据分析、挖掘和报表生成等场景。在使用MaxCompute进行数据处理时,可能会遇到各种操作报错。以下是一些常见的MaxCompute操作报错及其可能的原因与解决措施的合集。

问题一:DataWorks绑定maxcompute引擎报这个错误,可能是啥问题啊?

DataWorks绑定maxcompute引擎报这个错误,可能是啥问题啊?



参考答案:

这个名字是您创建maxcompute数据源的时候手动取的吗 老版本的逻辑绑定默认会生成一个叫odps first的数据源



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/574807



问题二:DataWorks采用阿里flink写入maxcompute,拉取依赖的时候报错了?

DataWorks采用阿里flink写入maxcompute,拉取依赖的时候报错了?

这个是什么问题呀?



参考答案:

这个错误提示表示在拉取依赖时,找不到名为'com.aliyun.odps:flink-connector-odps:iar:113.0'的依赖。请检查以下几点:

  1. 确保您的项目中已经添加了正确的依赖。在项目的pom.xml文件中添加以下依赖:
<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>flink-connector-odps</artifactId>
    <version>iar:113.0</version>
</dependency>
  1. 如果您使用的是Maven,请确保您的本地仓库中存在该依赖。您可以通过运行mvn clean install命令来下载并安装依赖。
  2. 如果问题仍然存在,尝试将版本号更改为最新版本,例如:
<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>flink-connector-odps</artifactId>
    <version>iar:latest</version>
</dependency>
  1. 如果以上方法都无法解决问题,请检查您的网络连接和防火墙设置,确保您可以访问阿里云的Maven仓库。



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/574773



问题三:开源spark3.1.3结构化流写maxcompute报错

当我使用https://github.com/aliyun/aliyun-maxcompute-data-collectors/tree/master/spark-datasource-v3.1 中开源的spark连接器往maxcompute写数据时会在固定时间段报错,白天可以正常写入数据,但是到凌晨有一定概率会报错,提示报错如下:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 32.0 failed 4 times, most recent failure: Lost task 1.3 in stage 32.0 (TID 130) (10.233.122.167 executor 1): java.net.SocketException: Unexpected end of file from server

at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)

at java.base/sun.net.www.http.HttpClient.parseHTTP(Unknown Source)

at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)

at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)

at java.base/java.net.HttpURLConnection.getResponseCode(Unknown Source)

at com.aliyun.odps.commons.transport.DefaultConnection.getResponse(DefaultConnection.java:132)

at com.aliyun.odps.tunnel.io.TunnelRecordWriter.write(TunnelRecordWriter.java:75)

at com.aliyun.odps.cupid.table.v1.tunnel.impl.TunnelWriter.write(TunnelWriter.java:62)

at com.aliyun.odps.cupid.table.v1.tunnel.impl.TunnelWriter.write(TunnelWriter.java:19)

at org.apache.spark.sql.odps.writer.DynamicPartitionWriter.write(DynamicPartitionWriter.scala:47)

at org.apache.spark.sql.odps.writer.DynamicPartitionWriter.write(DynamicPartitionWriter.scala:30)

at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)

at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)

at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)

at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

at org.apache.spark.scheduler.Task.run(Task.scala:131)

at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)

at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)

at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.base/java.lang.Thread.run(Unknown Source)

Suppressed: java.io.IOException: Stream is closed

at java.base/sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(Unknown Source)

at java.base/sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(Unknown Source)

at java.base/java.util.zip.DeflaterOutputStream.deflate(Unknown Source)

at java.base/java.util.zip.DeflaterOutputStream.write(Unknown Source)

at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:90)

at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)

at com.google.protobuf.CodedOutputStream.writeRawByte(CodedOutputStream.java:892)

at com.google.protobuf.CodedOutputStream.writeRawByte(CodedOutputStream.java:900)

at com.google.protobuf.CodedOutputStream.writeRawVarint32(CodedOutputStream.java:1012)

at com.google.protobuf.CodedOutputStream.writeTag(CodedOutputStream.java:994)

at com.google.protobuf.CodedOutputStream.writeSInt64(CodedOutputStream.java:273)

at com.aliyun.odps.commons.proto.ProtobufRecordStreamWriter.close(ProtobufRecordStreamWriter.java:371)

at com.aliyun.odps.tunnel.io.TunnelRecordWriter.close(TunnelRecordWriter.java:85)

at com.aliyun.odps.cupid.table.v1.tunnel.impl.TunnelWriter.close(TunnelWriter.java:71)

at org.apache.spark.sql.odps.writer.DynamicPartitionWriter.abort(DynamicPartitionWriter.scala:62)

at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$6(WriteToDataSourceV2Exec.scala:448)

at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1484)

... 10 more

Suppressed: java.lang.NullPointerException: Deflater has been closed

at java.base/java.util.zip.Deflater.ensureOpen(Unknown Source)

at java.base/java.util.zip.Deflater.deflate(Unknown Source)

at java.base/java.util.zip.Deflater.deflate(Unknown Source)

at java.base/java.util.zip.DeflaterOutputStream.deflate(Unknown Source)

at java.base/java.util.zip.DeflaterOutputStream.write(Unknown Source)

at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:90)

at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)

at com.google.protobuf.CodedOutputStream.writeRawByte(CodedOutputStream.java:892)

at com.google.protobuf.CodedOutputStream.writeRawByte(CodedOutputStream.java:900)

at com.google.protobuf.CodedOutputStream.writeRawVarint32(CodedOutputStream.java:1012)

at com.google.protobuf.CodedOutputStream.writeTag(CodedOutputStream.java:994)

at com.google.protobuf.CodedOutputStream.writeSInt64(CodedOutputStream.java:273)

at com.aliyun.odps.commons.proto.ProtobufRecordStreamWriter.close(ProtobufRecordStreamWriter.java:371)

at com.aliyun.odps.tunnel.io.TunnelRecordWriter.close(TunnelRecordWriter.java:85)

at com.aliyun.odps.cupid.table.v1.tunnel.impl.TunnelWriter.close(TunnelWriter.java:71)

at org.apache.spark.sql.odps.writer.DynamicPartitionWriter.close(DynamicPartitionWriter.scala:68)

at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$9(WriteToDataSourceV2Exec.scala:452)

at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1495)

... 10 more

Driver stacktrace:

at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2303)

at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2252)

at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2251)

at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2251)

at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)

at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1124)

at scala.Option.foreach(Option.scala:407)

at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1124)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2490)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2432)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2421)

at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:902)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)

at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:357)

... 49 more

Caused by: java.net.SocketException: Unexpected end of file from server

at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)

at java.base/sun.net.www.http.HttpClient.parseHTTP(Unknown Source)

at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)

at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)

at java.base/java.net.HttpURLConnection.getResponseCode(Unknown Source)

at com.aliyun.odps.commons.transport.DefaultConnection.getResponse(DefaultConnection.java:132)

at com.aliyun.odps.tunnel.io.TunnelRecordWriter.write(TunnelRecordWriter.java:75)

at com.aliyun.odps.cupid.table.v1.tunnel.impl.TunnelWriter.write(TunnelWriter.java:62)

at com.aliyun.odps.cupid.table.v1.tunnel.impl.TunnelWriter.write(TunnelWriter.java:19)

at org.apache.spark.sql.odps.writer.DynamicPartitionWriter.write(DynamicPartitionWriter.scala:47)

at org.apache.spark.sql.odps.writer.DynamicPartitionWriter.write(DynamicPartitionWriter.scala:30)

at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)

at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)

at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)

at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

at org.apache.spark.scheduler.Task.run(Task.scala:131)

at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)

at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)

at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.base/java.lang.Thread.run(Unknown Source)

Suppressed: java.io.IOException: Stream is closed

at java.base/sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(Unknown Source)

at java.base/sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(Unknown Source)

at java.base/java.util.zip.DeflaterOutputStream.deflate(Unknown Source)

at java.base/java.util.zip.DeflaterOutputStream.write(Unknown Source)

at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:90)

at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)

at com.google.protobuf.CodedOutputStream.writeRawByte(CodedOutputStream.java:892)

at com.google.protobuf.CodedOutputStream.writeRawByte(CodedOutputStream.java:900)

at com.google.protobuf.CodedOutputStream.writeRawVarint32(CodedOutputStream.java:1012)

at com.google.protobuf.CodedOutputStream.writeTag(CodedOutputStream.java:994)

at com.google.protobuf.CodedOutputStream.writeSInt64(CodedOutputStream.java:273)

at com.aliyun.odps.commons.proto.ProtobufRecordStreamWriter.close(ProtobufRecordStreamWriter.java:371)

at com.aliyun.odps.tunnel.io.TunnelRecordWriter.close(TunnelRecordWriter.java:85)

at com.aliyun.odps.cupid.table.v1.tunnel.impl.TunnelWriter.close(TunnelWriter.java:71)

at org.apache.spark.sql.odps.writer.DynamicPartitionWriter.abort(DynamicPartitionWriter.scala:62)

at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$6(WriteToDataSourceV2Exec.scala:448)

at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1484)

... 10 more

Suppressed: java.lang.NullPointerException: Deflater has been closed

at java.base/java.util.zip.Deflater.ensureOpen(Unknown Source)

at java.base/java.util.zip.Deflater.deflate(Unknown Source)

at java.base/java.util.zip.Deflater.deflate(Unknown Source)

at java.base/java.util.zip.DeflaterOutputStream.deflate(Unknown Source)

at java.base/java.util.zip.DeflaterOutputStream.write(Unknown Source)

at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:90)

at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)

at com.google.protobuf.CodedOutputStream.writeRawByte(CodedOutputStream.java:892)

at com.google.protobuf.CodedOutputStream.writeRawByte(CodedOutputStream.java:900)

at com.google.protobuf.CodedOutputStream.writeRawVarint32(CodedOutputStream.java:1012)

at com.google.protobuf.CodedOutputStream.writeTag(CodedOutputStream.java:994)

at com.google.protobuf.CodedOutputStream.writeSInt64(CodedOutputStream.java:273)

at com.aliyun.odps.commons.proto.ProtobufRecordStreamWriter.close(ProtobufRecordStreamWriter.java:371)

at com.aliyun.odps.tunnel.io.TunnelRecordWriter.close(TunnelRecordWriter.java:85)

at com.aliyun.odps.cupid.table.v1.tunnel.impl.TunnelWriter.close(TunnelWriter.java:71)

at org.apache.spark.sql.odps.writer.DynamicPartitionWriter.close(DynamicPartitionWriter.scala:68)

at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$9(WriteToDataSourceV2Exec.scala:452)

at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1495)

... 10 more

23/11/06 05:49:05 INFO ShutdownHookManager: Shutdown hook called

23/11/06 05:49:05 INFO ShutdownHookManager: Deleting directory /var/data/spark-d92fd15e-9117-485c-a426-29bb36269af6/spark-b2b68550-ac67-4daa-9ace-1796efe27dc2

23/11/06 05:49:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-16859bbb-6a2a-43c1-aa11-32ca5ee840a2

目前怀疑是在凌晨的时候dataworks的定时调度较多,导致网络阻塞引起的,请问有人了解该问题吗?是否有什么解决办法?



参考答案:

这个问题可能是由于网络不稳定或者服务器端的问题导致的。你可以尝试以下方法来解决这个问题:

  1. 增加重试次数:在写入数据时,可以设置一个较大的重试次数,当遇到错误时,会自动重试,直到达到最大重试次数。
  2. 增加超时时间:在连接服务器时,可以设置一个较长的超时时间,以便在网络不稳定的情况下,有足够的时间来完成连接。
  3. 检查服务器状态:确保服务器正常运行,没有出现故障或维护。
  4. 使用其他数据源:如果问题仍然存在,可以考虑使用其他数据源,如MaxCompute的ODPS Connector,看看是否能正常工作。



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/574238



问题四:DataWorks人工可以改,但是很容易出错?

DataWorks目的是从 maxcompute 回流到 ADB 3.0 - MySQL,目前一键自动建表,会把 maxcompute decamal(38,18) 映射成 mysql decamail,导致精度丢失。

人工可以改,但是很容易出错?



参考答案:

目前不支持修改默认映射类型哈 只能先手动改一下 或者提前手动建表



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/573625



问题五:DataWorks从maxcompute同步到mysql,mysql表id是自增id,和配置有关吗?

DataWorks从maxcompute同步到mysql,mysql表id是自增id,字段映射应该怎么配置?我现在两边字段配了一样数量,没把id配置进去,报了个错,和配置有关吗?两边的表都是存在的?com.aliyun.odps.tunnel.tunnelexception: RequestId=20231121182159c4e4ef0a054202be, ErrorCode=InvalidProjectTable, ErrorMessage=The specified project or table name is not valid or missing.



参考答案:

看这个报错是项目名 或者表名配置错误 可以参考maxcompute reader配置一下 https://help.aliyun.com/zh/dataworks/user-guide/maxcompute-data-source?spm=a2c4g.11186623.0.i0#task-2308965

另外执行desc 项目名.表名 确认看下表实际是否还存在



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/573461

相关实践学习
基于Hologres轻量实时的高性能OLAP分析
本教程基于GitHub Archive公开数据集,通过DataWorks将GitHub中的项⽬、行为等20多种事件类型数据实时采集至Hologres进行分析,同时使用DataV内置模板,快速搭建实时可视化数据大屏,从开发者、项⽬、编程语⾔等多个维度了解GitHub实时数据变化情况。
相关文章
|
3月前
|
分布式计算 关系型数据库 MySQL
【赵渝强老师】大数据交换引擎Sqoop
Sqoop是一款开源工具,用于在Hadoop与传统数据库如Oracle、MySQL之间传输数据。它基于MapReduce实现,支持数据导入导出、生成Java类及Hive表结构等操作,适用于大数据处理场景。
【赵渝强老师】大数据交换引擎Sqoop
|
3月前
|
人工智能 分布式计算 DataWorks
分布式×多模态:当ODPS为AI装上“时空穿梭”引擎
本文深入探讨了多模态数据处理的技术挑战与解决方案,重点介绍了基于阿里云ODPS的多模态数据处理平台架构与实战经验。通过Object Table与MaxFrame的结合,实现了高效的非结构化数据管理与分布式计算,显著提升了AI模型训练效率,并在工业质检、多媒体理解等场景中展现出卓越性能。
|
4月前
|
存储 分布式计算 DataWorks
从MaxCompute到Milvus:通过DataWorks进行数据同步,实现海量数据高效相似性检索
如果您需要将存储在MaxCompute中的大规模结构化数据导入Milvus,以支持高效的向量检索和相似性分析,可以通过DataWorks的数据集成服务实现无缝同步。本文介绍如何利用DataWorks,快速完成从MaxCompute到Milvus的离线数据同步。
|
9月前
|
SQL 存储 大数据
Flink 基础详解:大数据处理的强大引擎
Apache Flink 是一个分布式流批一体化的开源平台,专为大规模数据处理设计。它支持实时流处理和批处理,具有高吞吐量、低延迟特性。Flink 提供统一的编程抽象,简化大数据应用开发,并在流处理方面表现卓越,广泛应用于实时监控、金融交易分析等场景。其架构包括 JobManager、TaskManager 和 Client,支持并行度、水位线、时间语义等基础属性。Flink 还提供了丰富的算子、状态管理和容错机制,如检查点和 Savepoint,确保作业的可靠性和一致性。此外,Flink 支持 SQL 查询和 CDC 功能,实现实时数据捕获与同步,广泛应用于数据仓库和实时数据分析领域。
4641 32
|
7月前
|
机器学习/深度学习 搜索推荐 算法
大数据与金融科技:革新金融行业的动力引擎
大数据与金融科技:革新金融行业的动力引擎
155 0
大数据与金融科技:革新金融行业的动力引擎
|
10月前
|
机器学习/深度学习 数据可视化 大数据
机器学习与大数据分析的结合:智能决策的新引擎
机器学习与大数据分析的结合:智能决策的新引擎
575 15
|
9月前
|
数据采集 机器学习/深度学习 DataWorks
DataWorks产品评测:大数据开发治理的深度体验
DataWorks产品评测:大数据开发治理的深度体验
372 1
|
10月前
|
机器学习/深度学习 分布式计算 数据挖掘
MaxFrame 性能评测:阿里云MaxCompute上的分布式Pandas引擎
MaxFrame是一款兼容Pandas API的分布式数据分析工具,基于MaxCompute平台,极大提升了大规模数据处理效率。其核心优势在于结合了Pandas的易用性和MaxCompute的分布式计算能力,无需学习新编程模型即可处理海量数据。性能测试显示,在涉及`groupby`和`merge`等复杂操作时,MaxFrame相比本地Pandas有显著性能提升,最高可达9倍。适用于大规模数据分析、数据清洗、预处理及机器学习特征工程等场景。尽管存在网络延迟和资源消耗等问题,MaxFrame仍是处理TB级甚至PB级数据的理想选择。
187 6
|
10月前
|
存储 SQL 分布式计算
大数据时代的引擎:大数据架构随记
大数据架构通常分为四层:数据采集层、数据存储层、数据计算层和数据应用层。数据采集层负责从各种源采集、清洗和转换数据,常用技术包括Flume、Sqoop和Logstash+Filebeat。数据存储层管理数据的持久性和组织,常用技术有Hadoop HDFS、HBase和Elasticsearch。数据计算层处理大规模数据集,支持离线和在线计算,如Spark SQL、Flink等。数据应用层将结果可视化或提供给第三方应用,常用工具为Tableau、Zeppelin和Superset。
3901 8

相关产品

  • 云原生大数据计算服务 MaxCompute