Spark scala让类找不到scala.Any-问答-阿里云开发者社区-阿里云

开发者社区> 问答> 正文

Spark scala让类找不到scala.Any

社区小助手 2018-12-21 13:13:25 3221

val schema = df.schema
val x = df.flatMap(r =>
(0 until schema.length).map { idx =>

((idx, r.get(idx)), 1l)

}
)
这会产生错误

java.lang.ClassNotFoundException: scala.Any

分布式计算 Scala Spark
分享到
取消 提交回答
全部回答(1)
  • 社区小助手
    2019-07-17 23:23:22

    一种方法是将所有列强制转换为String。请注意,我正在将代码中的r.get(idx)更改为r.getString(idx)。以下工作。

    scala> val df = Seq(("ServiceCent4","AP-1-IOO-PPP","241.206.155.172","06-12-18:17:42:34",162,53,1544098354885L)).toDF("COL1","COL2","COL3","EventTime","COL4","COL5","COL6")
    df: org.apache.spark.sql.DataFrame = [COL1: string, COL2: string ... 5 more fields]

    scala> df.show(1,false)
    COL1 COL2 COL3 EventTime COL4 COL5 COL6
    ServiceCent4 AP-1-IOO-PPP 241.206.155.172 06-12-18:17:42:34 162 53 1544098354885

    only showing top 1 row

    scala> df.printSchema
    root
    |-- COL1: string (nullable = true)
    |-- COL2: string (nullable = true)
    |-- COL3: string (nullable = true)
    |-- EventTime: string (nullable = true)
    |-- COL4: integer (nullable = false)
    |-- COL5: integer (nullable = false)
    |-- COL6: long (nullable = false)

    scala> val schema = df.schema
    schema: org.apache.spark.sql.types.StructType = StructType(StructField(COL1,StringType,true), StructField(COL2,StringType,true), StructField(COL3,StringType,true), StructField(EventTime,StringType,true), StructField(COL4,IntegerType,false), StructField(COL5,IntegerType,false), StructField(COL6,LongType,false))

    scala> val df2 = df.columns.foldLeft(df){ (acc,r) => acc.withColumn(r,col(r).cast("string")) }
    df2: org.apache.spark.sql.DataFrame = [COL1: string, COL2: string ... 5 more fields]

    scala> df2.printSchema
    root
    |-- COL1: string (nullable = true)
    |-- COL2: string (nullable = true)
    |-- COL3: string (nullable = true)
    |-- EventTime: string (nullable = true)
    |-- COL4: string (nullable = false)
    |-- COL5: string (nullable = false)
    |-- COL6: string (nullable = false)

    scala> val x = df2.flatMap(r => (0 until schema.length).map { idx => ((idx, r.getString(idx)), 1l) } )
    x: org.apache.spark.sql.Dataset[((Int, String), Long)] = [_1: struct<_1: int, _2: string>, _2: bigint]

    scala> x.show(5,false)
    _1 _2
    [0,ServiceCent4] 1
    [1,AP-1-IOO-PPP] 1
    [2,241.206.155.172] 1
    [3,06-12-18:17:42:34] 1
    [4,162] 1

    only showing top 5 rows

    scala>

    0 0
大数据
使用钉钉扫一扫加入圈子
+ 订阅

大数据计算实践乐园,近距离学习前沿技术

推荐文章
相似问题