开发者社区> 问答> 正文

如何计算数据框每行中缺失值的数量-spark scala?

社区小助手 2018-12-12 14:06:02 625

我想计算spark scala中数据帧每行中缺失值的数量。

码:

val samplesqlDF = spark.sql("SELECT * FROM sampletable")

samplesqlDF.show()
输入数据帧:

------------------------------------------------------------------

| name | age | degree | Place |
| -----------------------------------------------------------------|
| Ram | | MCA | Bangalore |
| | 25 | | |
| | 26 | BE | |

| Raju | 21 | Btech | Chennai |

输出数据框(行级计数)如下:

-----------------------------------------------------------------

| name | age | degree | Place | rowcount |
| ----------------------------------------------------------------|
| Ram | | MCA | Bangalore | 1 |
| | 25 | | | 3 |
| | 26 | BE | | 2 |

| Raju | 21 | Btech | Chennai | 0 |

分布式计算 Scala Spark
分享到
取消 提交回答
全部回答(1)
  • 社区小助手
    2019-07-17 23:20:10

    看起来你想以动态的方式获得空计数。看一下这个

    val df = Seq(("Ram",null,"MCA","Bangalore"),(null,"25",null,null),(null,"26","BE",null),("Raju","21","Btech","Chennai")).toDF("name","age","degree","Place")
    df.show(false)
    val df2 = df.columns.foldLeft(df)( (df,c) => df.withColumn(c+"_null", when(col(c).isNull,1).otherwise(0) ) )
    df2.createOrReplaceTempView("student")
    val sql_str_null = df.columns.map( x => x+"_null").mkString(" ","+"," as null_count ")
    val sql_str_full = df.columns.mkString( "select ", ",", " , " + sql_str_null + " from student")
    spark.sql(sql_str_full).show(false)
    输出:

    name age degree Place null_count
    Ram null MCA Bangalore 1
    null 25 null null 3
    null 26 BE null 2
    Raju 21 Btech Chennai 0
    0 0
+ 订阅

大数据计算实践乐园,近距离学习前沿技术

推荐文章
相似问题
推荐课程