开发者社区> 问答> 正文

如何在Spark Scala中读取嵌套JSON?

这是我的嵌套JSON文件。

{
"dc_id": "dc-101",
"source": {

"sensor-igauge": {
  "id": 10,
  "ip": "68.28.91.22",
  "description": "Sensor attached to the container ceilings",
  "temp":35,
  "c02_level": 1475,
  "geo": {"lat":38.00, "long":97.00}                        
},
"sensor-ipad": {
  "id": 13,
  "ip": "67.185.72.1",
  "description": "Sensor ipad attached to carbon cylinders",
  "temp": 34,
  "c02_level": 1370,
  "geo": {"lat":47.41, "long":-122.00}
},
"sensor-inest": {
  "id": 8,
  "ip": "208.109.163.218",
  "description": "Sensor attached to the factory ceilings",
  "temp": 40,
  "c02_level": 1346,
  "geo": {"lat":33.61, "long":-111.89}
},
"sensor-istick": {
  "id": 5,
  "ip": "204.116.105.67",
  "description": "Sensor embedded in exhaust pipes in the ceilings",
  "temp": 40,
  "c02_level": 1574,
  "geo": {"lat":35.93, "long":-85.46}
}

}
}
如何使用Spark Scala将JSON文件读入Dataframe。JSON文件中没有数组对象,所以我不能使用explode。有

展开
收起
社区小助手 2018-12-21 13:46:23 4153 0
1 条回答
写回答
取消 提交回答
  • 社区小助手是spark中国社区的管理员,我会定期更新直播回顾等资料和文章干货,还整合了大家在钉群提出的有关spark的问题及回答。

    val df = spark.read.option("multiline", true).json("data/test.json")

    df
    .select(col("dc_id"), explode(array("source.*")) as "level1")
    .withColumn("id", col("level1.id"))
    .withColumn("ip", col("level1.ip"))
    .withColumn("temp", col("level1.temp"))
    .withColumn("description", col("level1.description"))
    .withColumn("c02_level", col("level1.c02_level"))
    .withColumn("lat", col("level1.geo.lat"))
    .withColumn("long", col("level1.geo.long"))
    .drop("level1")
    .show(false)
    样本输出:

    dc_ididiptempdescriptionc02_levellatlong
    dc-1011068.28.91.2235Sensor attached to the container ceilings147538.097.0
    dc-1018208.109.163.21840Sensor attached to the factory ceilings134633.61-111.89
    dc-1011367.185.72.134Sensor ipad attached to carbon cylinders137047.41-122.0
    dc-1015204.116.105.6740Sensor embedded in exhaust pipes in the ceilings157435.93-85.46

    您可以尝试编写一些通用UDF来获取所有单独的列,而不是选择每个列。

    注意:使用Spark 2.3进行测试

    2019-07-17 23:23:24
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
Hybrid Cloud and Apache Spark 立即下载
Scalable Deep Learning on Spark 立即下载
Comparison of Spark SQL with Hive 立即下载