[Spark]那些年我们遇到的Spark的坑

简介: 版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/SunnyYoona/article/details/72922155 1.
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/SunnyYoona/article/details/72922155

1. java.lang.NoClassDefFoundError: org/apache/spark/Logging

1.1 问题

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)I
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.streaming.twitter.TwitterUtils$.createStream(TwitterUtils.scala:44)
    at TwitterStreamingApp$.main(TwitterStreamingApp.scala:42)
    at TwitterStreamingApp.main(TwitterStreamingApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 23 more

1.2 解决方案

先看一下我们的Maven依赖:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.1.0</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka_2.11</artifactId>
    <version>1.6.3</version>
</dependency>

我们看到我们Spark版本为2.1.0版本,然而在spark 1.5.2版本之后 org/apache/spark/Logging 已经被移除了。 由于spark-streaming-kafka 1.6.3版本中使用到了logging,所以我们替换一个依赖:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
    <version>2.1.0</version>
</dependency>

2. Task not serializable

2.1 问题描述

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
	at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
	at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:546)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:546)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
	at org.apache.spark.streaming.StreamingContext.withScope(StreamingContext.scala:264)
	at org.apache.spark.streaming.dstream.DStream.map(DStream.scala:545)
	at org.apache.spark.streaming.api.java.JavaDStreamLike$class.map(JavaDStreamLike.scala:157)
	at org.apache.spark.streaming.api.java.AbstractJavaDStreamLike.map(JavaDStreamLike.scala:42)
	at com.sjf.open.spark.stream.SparkStreamTest.receiveKafkaData(SparkStreamTest.java:76)
	at com.sjf.open.spark.stream.SparkStreamTest.startStream(SparkStreamTest.java:71)
	at com.sjf.open.spark.stream.SparkStreamTest.main(SparkStreamTest.java:48)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.io.NotSerializableException: com.sjf.open.spark.stream.SparkStreamTest
Serialization stack:
	- object not serializable (class: com.sjf.open.spark.stream.SparkStreamTest, value: com.sjf.open.spark.stream.SparkStreamTest@1e6cc850)
	- element of array (index: 0)
	- array (class [Ljava.lang.Object;, size 1)
	- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
	- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class com.sjf.open.spark.stream.SparkStreamTest, functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeSpecial com/sjf/open/spark/stream/SparkStreamTest.handleMessage:(Lscala/Tuple2;)Ljava/lang/String;, instantiatedMethodType=(Lscala/Tuple2;)Ljava/lang/String;, numCaptured=1])
	- writeReplace data (class: java.lang.invoke.SerializedLambda)
	- object (class com.sjf.open.spark.stream.SparkStreamTest$$Lambda$8/1868987089, com.sjf.open.spark.stream.SparkStreamTest$$Lambda$8/1868987089@3c0036b)
	- field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
	- object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
	at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
	... 20 more

2.2 解决方案

发现我们的SparkStreamTest类没有进行序列化:

public class SparkStreamTest{
    ...
}

修改为:

public class SparkStreamTest implements Serializable{
    ...
}

3.java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z

3.1 问题

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:200)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
	at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
	at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:522)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
	at com.sjf.open.spark.JavaWordCount.test1(JavaWordCount.java:37)
	at com.sjf.open.spark.JavaWordCount.main(JavaWordCount.java:96)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
	at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:537)
	at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:448)
	at org.apache.spark.metrics.sink.MetricsServlet.<init>(MetricsServlet.scala:48)
	... 26 more

3.2 解决方案

引入的ES包中也包含com.fasterxml.jackson.core版本为2.8.1 ,版本较高,找不到那个方法. 所以需要显示指明com.fasterxml.jackson.core版本:

<jackson.version>2.6.5</jackson.version>
<!-- jackson spark sql 与 es 冲突 手动指定-->
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>${jackson.version}</version>
</dependency>
目录
相关文章
|
JavaScript 前端开发 开发者
正则表达式深度解析:斜杠的妙用
【2月更文挑战第29天】
2005 0
正则表达式深度解析:斜杠的妙用
|
Python
Python 判断字符串是否包含子字符串
Python 判断字符串是否包含子字符串
1103 0
|
SQL 关系型数据库 MySQL
使用关系型数据库事务的例子
【5月更文挑战第12天】本文介绍了设置MySQL事务的三种方式:全局、当前session和下一个事务,并提供了示例代码展示如何开始事务和设置隔离级别。还简述了引擎设置和数据源DSN格式。最后,讨论了SQL优化策略,包括选择合适的存储引擎、优化字段数据类型、建立索引、避免全表扫描等。
382 4
使用关系型数据库事务的例子
|
10月前
|
存储 缓存 前端开发
CacheStorage详解
CacheStorage 是 Web API 的一部分,用于管理缓存对象的存储。它允许开发者在客户端存储和检索请求-响应对,实现离线访问和性能优化。通过 CacheStorage 接口,可以创建、删除和查询缓存,以及控制资源的缓存策略。
|
10月前
|
存储 Kubernetes 调度
深入理解Kubernetes中的Pod与Container
深入理解Kubernetes中的Pod与Container
580 0
|
10月前
|
安全 Java 编译器
什么是AOP面向切面编程?怎么简单理解?
本文介绍了面向切面编程(AOP)的基本概念和原理,解释了如何通过分离横切关注点(如日志、事务管理等)来增强代码的模块化和可维护性。AOP的核心概念包括切面、连接点、切入点、通知和织入。文章还提供了一个使用Spring AOP的简单示例,展示了如何定义和应用切面。
1206 1
什么是AOP面向切面编程?怎么简单理解?
|
Java jenkins 测试技术
常见自动化测试框架和工具
常见自动化测试框架和工具
329 0
|
缓存 测试技术 内存技术
npx 有什么作用跟意义?为什么要有 npx?什么场景使用?
npx 有什么作用跟意义?为什么要有 npx?什么场景使用?
1392 0
|
编译器 C语言 C++
在C/C++直接插入汇编代码的方法-内联汇编
在C/C++直接插入汇编代码的方法-内联汇编
906 0
在C/C++直接插入汇编代码的方法-内联汇编