开发者社区问答正文

如何在Spark Scala中读取嵌套JSON？

这是我的嵌套JSON文件。

{
"dc_id": "dc-101",
"source": {

"sensor-igauge": {
  "id": 10,
  "ip": "68.28.91.22",
  "description": "Sensor attached to the container ceilings",
  "temp":35,
  "c02_level": 1475,
  "geo": {"lat":38.00, "long":97.00}                        
},
"sensor-ipad": {
  "id": 13,
  "ip": "67.185.72.1",
  "description": "Sensor ipad attached to carbon cylinders",
  "temp": 34,
  "c02_level": 1370,
  "geo": {"lat":47.41, "long":-122.00}
},
"sensor-inest": {
  "id": 8,
  "ip": "208.109.163.218",
  "description": "Sensor attached to the factory ceilings",
  "temp": 40,
  "c02_level": 1346,
  "geo": {"lat":33.61, "long":-111.89}
},
"sensor-istick": {
  "id": 5,
  "ip": "204.116.105.67",
  "description": "Sensor embedded in exhaust pipes in the ceilings",
  "temp": 40,
  "c02_level": 1574,
  "geo": {"lat":35.93, "long":-85.46}
}

}
}
如何使用Spark Scala将JSON文件读入Dataframe。JSON文件中没有数组对象，所以我不能使用explode。有

展开

收起

社区小助手 2018-12-21 13:46:23 4282 版权

1 条回答

写回答

取消提交回答

社区小助手

社区小助手是spark中国社区的管理员，我会定期更新直播回顾等资料和文章干货，还整合了大家在钉群提出的有关spark的问题及回答。

val df = spark.read.option("multiline", true).json("data/test.json")

df
.select(col("dc_id"), explode(array("source.*")) as "level1")
.withColumn("id", col("level1.id"))
.withColumn("ip", col("level1.ip"))
.withColumn("temp", col("level1.temp"))
.withColumn("description", col("level1.description"))
.withColumn("c02_level", col("level1.c02_level"))
.withColumn("lat", col("level1.geo.lat"))
.withColumn("long", col("level1.geo.long"))
.drop("level1")
.show(false)
样本输出：

dc_id	id	ip	temp	description	c02_level	lat	long
dc-101	10	68.28.91.22	35	Sensor attached to the container ceilings	1475	38.0	97.0
dc-101	8	208.109.163.218	40	Sensor attached to the factory ceilings	1346	33.61	-111.89
dc-101	13	67.185.72.1	34	Sensor ipad attached to carbon cylinders	1370	47.41	-122.0
dc-101	5	204.116.105.67	40	Sensor embedded in exhaust pipes in the ceilings	1574	35.93	-85.46

您可以尝试编写一些通用UDF来获取所有单独的列，而不是选择每个列。

注意：使用Spark 2.3进行测试

2019-07-17 23:23:24

赞同展开评论

问答分类：

传感器 JSON 分布式计算 Scala 数据格式 Spark iOS开发容器高速通道

问答标签：

JSON嵌套 apache spark Scala Scala json spark scala JSON Apache Spark json

问答地址：

开发者社区 > 大数据 > 问答

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

如何在Spark Scala中读取嵌套JSON？

相关文章