开发者社区> 问答> 正文

Spark:Scala模拟,Task不可序列化

我试图使用mockito进行单元测试一些scala代码。我想在本地运行spark,即在我的IntelliJ IDE中。这是一个样本

class MyScalaSparkTests extends FunSuite with BeforeAndAfter with MockitoSugar with java.io.Serializable{

val configuration:SparkConf = new SparkConf()

.setAppName("Your Application Name")
.setMaster("local");

val sc = new SparkContext(configuration);
lazy val testSess = SparkSession.builder.appName("local_test").getOrCreate()
test ("test service") {

import testSess.implicits._
// (1) init
val testObject = spy(new MyScalaClass(<some args>))
val testDf = testSess.emptyDataset[MyCaseClass1].toDF()
testDf.union(Seq(MyCaseClass(<some args>)).toDF())
testObject.testDataFrame = testDf
val testSource = testSess.emptyDataset[MyCaseClass2].toDF()
testSource.union(Seq(MyCaseClass2(<some args>)).toDF())
testObject.setSourceDf(testSource)
val testRes = testObject.someMethod()

val r = testRes.take(1)
println(r)

}

}
所以基本上,这是我想要做的

MyScalaClass具有someMethod()其称为两个数据帧之间进行比较的数据testDataFrame和testSource。然后它返回另一个包含结果的数据框。现在,在我的单元测试,我刺探MyScalaClass创建testObject。然后我创建testDataFrame并testSource分配它们testObject。最后,我调试testObject.someMethod()。

现在在调试器中,在这一行

val r = testRes.take(1)
我看到这testRes是一个Dataset因此该方法返回的东西。但是,当我尝试从中获取take某些东西以验证我得到的结果时

Task not serializable
org.apache.spark.SparkException: Task not serializable
并进一步向下堆栈跟踪

Caused by: java.io.NotSerializableException: org.mockito.internal.creation.DelegatingMethod
Serialization stack:

- object not serializable (class: org.mockito.internal.creation.DelegatingMethod, value: org.mockito.internal.creation.DelegatingMethod@a97f2bff)
- field (class: org.mockito.internal.invocation.InterceptedInvocation, name: mockitoMethod, type: interface org.mockito.internal.invocation.MockitoMethod)
- object (class org.mockito.internal.invocation.InterceptedInvocation, bSV2PartValidator.toString();)
- field (class: org.mockito.internal.invocation.InvocationMatcher, name: invocation, type: interface org.mockito.invocation.Invocation)
- object (class org.mockito.internal.invocation.InvocationMatcher, bSV2PartValidator.toString();)
- field (class: org.mockito.internal.stubbing.InvocationContainerImpl, name: invocationForStubbing, type: interface org.mockito.invocation.MatchableInvocation)
- object (class org.mockito.internal.stubbing.InvocationContainerImpl, invocationForStubbing: bSV2PartValidator.toString();)
- field (class: org.mockito.internal.handler.MockHandlerImpl, name: invocationContainer, type: class org.mockito.internal.stubbing.InvocationContainerImpl)
- object (class org.mockito.internal.handler.MockHandlerImpl, org.mockito.internal.handler.MockHandlerImpl@47c019d7)
- field (class: org.mockito.internal.handler.NullResultGuardian, name: delegate, type: interface org.mockito.invocation.MockHandler)
- object (class org.mockito.internal.handler.NullResultGuardian, org.mockito.internal.handler.NullResultGuardian@7222e168)
- field (class: org.mockito.internal.handler.InvocationNotifierHandler, name: mockHandler, type: interface org.mockito.invocation.MockHandler)
- object (class org.mockito.internal.handler.InvocationNotifierHandler, org.mockito.internal.handler.InvocationNotifierHandler@1e4f8430)
- field (class: org.mockito.internal.creation.bytebuddy.MockMethodInterceptor, name: handler, type: interface org.mockito.invocation.MockHandler)
- object (class org.mockito.internal.creation.bytebuddy.MockMethodInterceptor, org.mockito.internal.creation.bytebuddy.MockMethodInterceptor@34d08905)
- field (class: com.walmart.labs.search.signals.validators.BSV2PartValidator$MockitoMock$213785213, name: mockitoInterceptor, type: class org.mockito.internal.creation.bytebuddy.MockMethodInterceptor)
- object (class com.walmart.labs.search.signals.validators.BSV2PartValidator$MockitoMock$213785213, com.walmart.labs.search.signals.validators.BSV2PartValidator$MockitoMock$213785213@7f289126)
- field (class: com.walmart.labs.search.signals.validators.BSV2PartValidator

$$ anonfun$1, name: $outer, type: class com.walmart.labs.search.signals.validators.BSV2PartValidator) - object (class com.walmart.labs.search.signals.validators.BSV2PartValidator $$

anonfun$1, )

- element of array (index: 1)
- array (class [Ljava.lang.Object;, size 7)
- field (class: org.apache.spark.sql.execution.WholeStageCodegenExec

$$ anonfun$8, name: references$1, type: class [Ljava.lang.Object;) - object (class org.apache.spark.sql.execution.WholeStageCodegenExec $$

anonfun$8, )

at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 78 more

我究竟做错了什么?是否有可能在IDE中窥探或模拟saprk行为?

展开
收起
社区小助手 2018-12-19 16:57:51 3022 0
1 条回答
写回答
取消 提交回答
  • 社区小助手是spark中国社区的管理员,我会定期更新直播回顾等资料和文章干货,还整合了大家在钉群提出的有关spark的问题及回答。

    默认情况下,模拟不可序列化,因为它通常是单元测试中的代码味道

    您可以尝试通过创建类似的模拟启用序列化,mockMyType.serializable())并查看当spark尝试使用它时会发生什么。

    顺便说一句,我建议你使用mockito-scala而不是传统的mockito,因为它可以为你解决一些其他问题

    2019-07-17 23:23:03
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
Hybrid Cloud and Apache Spark 立即下载
Scalable Deep Learning on Spark 立即下载
Comparison of Spark SQL with Hive 立即下载