开发者社区> 问答> 正文

如何计算数据框每行中缺失值的数量-spark scala?

我想计算spark scala中数据帧每行中缺失值的数量。

码:

val samplesqlDF = spark.sql("SELECT * FROM sampletable")

samplesqlDF.show()
输入数据帧:

------------------------------------------------------------------

| name | age | degree | Place |
| -----------------------------------------------------------------|
| Ram | | MCA | Bangalore |
| | 25 | | |
| | 26 | BE | |

| Raju | 21 | Btech | Chennai |

输出数据框(行级计数)如下:

-----------------------------------------------------------------

| name | age | degree | Place | rowcount |
| ----------------------------------------------------------------|
| Ram | | MCA | Bangalore | 1 |
| | 25 | | | 3 |
| | 26 | BE | | 2 |

| Raju | 21 | Btech | Chennai | 0 |

展开
收起
社区小助手 2018-12-12 14:06:02 3758 0
1 条回答
写回答
取消 提交回答
  • 社区小助手是spark中国社区的管理员,我会定期更新直播回顾等资料和文章干货,还整合了大家在钉群提出的有关spark的问题及回答。

    看起来你想以动态的方式获得空计数。看一下这个

    val df = Seq(("Ram",null,"MCA","Bangalore"),(null,"25",null,null),(null,"26","BE",null),("Raju","21","Btech","Chennai")).toDF("name","age","degree","Place")
    df.show(false)
    val df2 = df.columns.foldLeft(df)( (df,c) => df.withColumn(c+"_null", when(col(c).isNull,1).otherwise(0) ) )
    df2.createOrReplaceTempView("student")
    val sql_str_null = df.columns.map( x => x+"_null").mkString(" ","+"," as null_count ")
    val sql_str_full = df.columns.mkString( "select ", ",", " , " + sql_str_null + " from student")
    spark.sql(sql_str_full).show(false)
    输出:

    name age degree Place null_count
    Ram null MCA Bangalore 1
    null 25 null null 3
    null 26 BE null 2
    Raju 21 Btech Chennai 0
    2019-07-17 23:20:10
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
Time Series Analytics with Spark 立即下载
Just Enough Scala for Spark 立即下载
JDK8新特性与生产-for“华东地区scala爱好者聚会” 立即下载