[Spark]那些年我们遇到的Spark的坑

简介: 版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/SunnyYoona/article/details/72922155 1.
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/SunnyYoona/article/details/72922155

1. java.lang.NoClassDefFoundError: org/apache/spark/Logging

1.1 问题

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)I
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.streaming.twitter.TwitterUtils$.createStream(TwitterUtils.scala:44)
    at TwitterStreamingApp$.main(TwitterStreamingApp.scala:42)
    at TwitterStreamingApp.main(TwitterStreamingApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 23 more

1.2 解决方案

先看一下我们的Maven依赖:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.1.0</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka_2.11</artifactId>
    <version>1.6.3</version>
</dependency>

我们看到我们Spark版本为2.1.0版本,然而在spark 1.5.2版本之后 org/apache/spark/Logging 已经被移除了。 由于spark-streaming-kafka 1.6.3版本中使用到了logging,所以我们替换一个依赖:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
    <version>2.1.0</version>
</dependency>

2. Task not serializable

2.1 问题描述

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
	at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
	at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:546)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:546)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
	at org.apache.spark.streaming.StreamingContext.withScope(StreamingContext.scala:264)
	at org.apache.spark.streaming.dstream.DStream.map(DStream.scala:545)
	at org.apache.spark.streaming.api.java.JavaDStreamLike$class.map(JavaDStreamLike.scala:157)
	at org.apache.spark.streaming.api.java.AbstractJavaDStreamLike.map(JavaDStreamLike.scala:42)
	at com.sjf.open.spark.stream.SparkStreamTest.receiveKafkaData(SparkStreamTest.java:76)
	at com.sjf.open.spark.stream.SparkStreamTest.startStream(SparkStreamTest.java:71)
	at com.sjf.open.spark.stream.SparkStreamTest.main(SparkStreamTest.java:48)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.io.NotSerializableException: com.sjf.open.spark.stream.SparkStreamTest
Serialization stack:
	- object not serializable (class: com.sjf.open.spark.stream.SparkStreamTest, value: com.sjf.open.spark.stream.SparkStreamTest@1e6cc850)
	- element of array (index: 0)
	- array (class [Ljava.lang.Object;, size 1)
	- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
	- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class com.sjf.open.spark.stream.SparkStreamTest, functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeSpecial com/sjf/open/spark/stream/SparkStreamTest.handleMessage:(Lscala/Tuple2;)Ljava/lang/String;, instantiatedMethodType=(Lscala/Tuple2;)Ljava/lang/String;, numCaptured=1])
	- writeReplace data (class: java.lang.invoke.SerializedLambda)
	- object (class com.sjf.open.spark.stream.SparkStreamTest$$Lambda$8/1868987089, com.sjf.open.spark.stream.SparkStreamTest$$Lambda$8/1868987089@3c0036b)
	- field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
	- object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
	at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
	... 20 more

2.2 解决方案

发现我们的SparkStreamTest类没有进行序列化:

public class SparkStreamTest{
    ...
}

修改为:

public class SparkStreamTest implements Serializable{
    ...
}

3.java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z

3.1 问题

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:200)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
	at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
	at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:522)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
	at com.sjf.open.spark.JavaWordCount.test1(JavaWordCount.java:37)
	at com.sjf.open.spark.JavaWordCount.main(JavaWordCount.java:96)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
	at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:537)
	at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:448)
	at org.apache.spark.metrics.sink.MetricsServlet.<init>(MetricsServlet.scala:48)
	... 26 more

3.2 解决方案

引入的ES包中也包含com.fasterxml.jackson.core版本为2.8.1 ,版本较高,找不到那个方法. 所以需要显示指明com.fasterxml.jackson.core版本:

<jackson.version>2.6.5</jackson.version>
<!-- jackson spark sql 与 es 冲突 手动指定-->
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>${jackson.version}</version>
</dependency>
目录
相关文章
|
SQL 存储 数据挖掘
大数据技术组件选型对比
大数据技术组件选型对比
大数据技术组件选型对比
|
Java 测试技术 Nacos
Java访问Elasticsearch报错Request cannot be executed; I/O reactor status: STOPPED
Java访问Elasticsearch报错Request cannot be executed; I/O reactor status: STOPPED
4675 0
Java访问Elasticsearch报错Request cannot be executed; I/O reactor status: STOPPED
|
存储 数据采集 SQL
Dataphin流批一体的实时研发
本文为您介绍Dataphin流批一体的实时研发能力,为您介绍产品功能诞生的背景、特色能力,以及Dataphin如何从“快”、“准”、“稳”三方面支撑天猫双十一。
1325 0
Dataphin流批一体的实时研发
|
存储 SQL 分布式计算
《Apache Flink 案例集(2022版)》——2.数据分析——汽车之家-Flink 的实时计算平台 3.0 建设实践
《Apache Flink 案例集(2022版)》——2.数据分析——汽车之家-Flink 的实时计算平台 3.0 建设实践
291 0
|
存储 机器学习/深度学习 NoSQL
HyperLogLog 使用及其算法原理详细讲解
HyperLogLog 使用及其算法原理详细讲解
1144 0
HyperLogLog 使用及其算法原理详细讲解
|
分布式计算 Ubuntu Java
|
9月前
|
Java jenkins 测试技术
常见自动化测试框架和工具
常见自动化测试框架和工具
219 0
|
关系型数据库 MySQL 缓存
mysql 插入数据失败防止自增长主键增长的方法
mysql设置了自增长主键ID,插入失败的那个自增长ID也加一的,比如失败5个,下一个成功的不是在原来最后成功数据加1,而是直接变成加6了,失败次数一次就自动增长1了,能不能让失败的不增长的? 或者说mysql插入数据失败,怎么能防止主键增长? MYSQL不保证AUTO_INCREMENT依次增长(1,2,3,4,5),但是可以保证正向增长(1,3,5,9)所以,当你某次操作失败后,下次A
2522 0
|
3月前
|
存储 缓存 前端开发
CacheStorage详解
CacheStorage 是 Web API 的一部分,用于管理缓存对象的存储。它允许开发者在客户端存储和检索请求-响应对,实现离线访问和性能优化。通过 CacheStorage 接口,可以创建、删除和查询缓存,以及控制资源的缓存策略。

热门文章

最新文章