spark单元测试与模拟spark会议
我有一个从数据框转换数据集的方法。方法如下:
def dataFrameToDataSetT
(implicit spark: SparkSession): Dataset[T] = {
import spark.implicits._
sourceName match {
case "oracle_grc_asset" =>
val ds = df.map(row => grc.Asset(row)).as[grc.Asset]
ds.asInstanceOf[Dataset[T]]
case "oracle_grc_asset_host" =>
val ds = df.map(row => grc.AssetHost(row)).as[grc.AssetHost]
ds.asInstanceOf[Dataset[T]]
case "oracle_grc_asset_tag" =>
val ds = df.map(row => grc.AssetTag(row)).as[grc.AssetTag]
ds.asInstanceOf[Dataset[T]]
case "oracle_grc_asset_tag_asset" =>
val ds = df.map(row => grc.AssetTagAsset(row)).as[grc.AssetTagAsset]
ds.asInstanceOf[Dataset[T]]
case "oracle_grc_qg_subscription" =>
val ds = df.map(row => grc.QgSubscription(row)).as[grc.QgSubscription]
ds.asInstanceOf[Dataset[T]]
case "oracle_grc_host_instance_vuln" =>
val ds = df.map(row => grc.HostInstanceVuln(row)).as[grc.HostInstanceVuln]
ds.asInstanceOf[Dataset[T]]
case _ => throw new RuntimeException("Function dataFrameToDataSet doesn't support provided case class type!")
}
}
现在我想测试这个方法。为此,我创建了一个测试类,看起来像:
"A dataFrameToDataSet function" should "return DataSet from dataframe" in {
val master = "local[*]"
val appName = "MyApp"
val conf: SparkConf = new SparkConf()
.setMaster(master)
.setAppName(appName)
implicit val ss :SparkSession= SparkSession.builder().config(conf).getOrCreate()
import ss.implicits._
//val sourceName = List("oracle_grc_asset", "oracle_grc_asset_host", "oracle_grc_asset_tag", "oracle_grc_asset_tag_asset", "oracle_grc_qg_subscription", "oracle_grc_host_instance_vuln")
val sourceName1 = "oracle_grc_asset"
val df = Seq(grc.Asset(123,"bat", Some("abc"), "cat", Some("abc"), Some(1), java.math.BigDecimal.valueOf(3.4) , Some(2), Some(2),Some("abc"), Some(2), Some("abc"), Some(java.sql.Timestamp.valueOf("2011-10-02 18:48:05.123456")), Some(6), Some(4), java.sql.Timestamp.valueOf("2011-10-02 18:48:05.123456"), java.sql.Timestamp.valueOf("2011-10-02 18:48:05.123456"), "India", "Test","Pod01")).toDF()
val ds = Seq(grc.Asset(123,"bat", Some("abc"), "cat", Some("abc"), Some(1), java.math.BigDecimal.valueOf(3.4) , Some(2), Some(2),Some("abc"), Some(2), Some("abc"), Some(java.sql.Timestamp.valueOf("2011-10-02 18:48:05.123456")), Some(6), Some(4), java.sql.Timestamp.valueOf("2011-10-02 18:48:05.123456"), java.sql.Timestamp.valueOf("2011-10-02 18:48:05.123456"), "India", "Test","Pod01")).toDS()
assert(dataFrameToDataSet(sourceName1, df) == ds)
}}
此测试用例失败,我收到FileNotFound异常:找不到HADOOP_HOME。虽然我已经在我的系统变量中使用winutils.exe创建了HADOOP_HOME
只需将sparkContext启动与实际代码分开,如https://medium.com/@mrpowers/testing-spark-applications-8c590d3215fa,它应该可以正常工作。
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。