Pandas 使用教程 JSON-阿里云开发者社区

Pandas 使用教程 JSON

2024-08-15 58

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Pandas 使用教程 JSON

复杂 JSON

Pandas 可以很方便的处理 JSON 数据

demo.json

[
    {
        "name":"张三",
        "age":23,
        "gender":true
    },
    {
        "name":"李四",
        "age":24,
        "gender":true
    },
    {
        "name":"王五",
        "age":25,
        "gender":false
    }
]

JSON 转换为 CSV

非常方便，只要通过 pd.read_json 读出JSON数据，再通过 df.to_csv 写入 CSV 即可

import pandas as pd
json_path = 'data/demo.json'
# 加载 JSON 数据
with open(json_path, 'r', encoding='utf8') as f:
    # 解析一个有效的JSON字符串并将其转换为Python字典
    df = pd.read_json(f.read())
    print(df.to_string())  # to_string() 用于返回 DataFrame 类型的数据，我们也可以直接处理 JSON 字符串。
    print('-' * 10)
    # 重新定义标题
    df.columns = ['姓名', '年龄', '性别']
    print(df)
    df.to_csv('data/result.csv', index=False, encoding='GB2312')

简单 JSON

从 URL 中读取 JSON 数据：

import pandas as pd
URL = 'https://static.runoob.com/download/sites.json'
df = pd.read_json(URL) # 和读文件一样
print(df)

输出：

id    name             url  likes
0  A001    菜鸟教程  www.runoob.com     61
1  A002  Google  www.google.com    124
2  A003      淘宝  www.taobao.com     45

字典转化为 DataFrame 数据

import pandas as pd
s = {
    "col1": {"row1": 1, "row2": 2, "row3": 3},
    "col2": {"row1": "x", "row2": "y", "row4": "z"}
}
df = pd.DataFrame(s)
print(df)
print('-' * 10)
new_df = df.dropna()  # 数据清洗，删除包含空数据的行
print(new_df.to_string())
print('-' * 10)
df.fillna(99, inplace=True)  # fillna() 方法来替换一些空字段
print(df.to_string())

输出：不同的行会用 NaN 填充

col1 col2
row1   1.0    x
row2   2.0    y
row3   3.0  NaN
row4   NaN    z
----------
      col1 col2
row1   1.0    x
row2   2.0    y
----------
      col1 col2
row1   1.0    x
row2   2.0    y
row3   3.0   99
row4  99.0    z

内嵌的 JSON 数据

nested_list.json 嵌套的JSON数据

{
  "school_name": "ABC primary school",
  "class": "Year 1",
  "students": [
    {
      "id": "A001",
      "name": "Tom",
      "math": 60,
      "physics": 66,
      "chemistry": 61
    },
    {
      "id": "A002",
      "name": "James",
      "math": 89,
      "physics": 76,
      "chemistry": 51
    },
    {
      "id": "A003",
      "name": "Jenny",
      "math": 79,
      "physics": 90,
      "chemistry": 78
    }
  ]
}

运行代码

data = json.loads(f.read()) 使用 Python JSON 模块载入数据。

json_normalize() 使用了参数 record_path 并设置为 ['students'] 用于展开内嵌的 JSON 数据 students。

import pandas as pd
import json
# 打印出结果JSON结构
with open('data/nested_list.json', 'r') as f:
    data = pd.read_json(f.read())
    print(data)
# 使用 Python JSON 模块载入数据
with open('data/nested_list.json', 'r') as f:
    data = json.loads(f.read())
# 展平数据-- json_normalize() 方法将内嵌的数据完整的解析出来：
df_nested_list = pd.json_normalize(data, record_path=['students'])
print(df_nested_list)

import pandas as pd
import json
data_path = 'data/nested_list.json'
print(('-' * 10) + ' 连同上级JSON值一起显示')
# 使用 Python JSON 模块载入数据
with open(data_path, 'r') as f:
    data = json.loads(f.read())
# 展平数据
df_nested_list = pd.json_normalize(
    data,
    record_path=['students'],
    meta=['school_name', 'class']
)
print(df_nested_list)

复杂 JSON

该数据嵌套了列表和字典，数据文件 nested_mix.json 如下

nested_mix.json

{
    "school_name": "local primary school",
    "class": "Year 1",
    "info": {
      "president": "John Kasich",
      "address": "ABC road, London, UK",
      "contacts": {
        "email": "admin@e.com",
        "tel": "123456789"
      }
    },
    "students": [
    {
        "id": "A001",
        "name": "Tom",
        "math": 60,
        "physics": 66,
        "chemistry": 61
    },
    {
        "id": "A002",
        "name": "James",
        "math": 89,
        "physics": 76,
        "chemistry": 51
    },
    {
        "id": "A003",
        "name": "Jenny",
        "math": 79,
        "physics": 90,
        "chemistry": 78
    }]
}

import pandas as pd
import json
# 使用 Python JSON 模块载入数据
with open('data/nested_mix.json', 'r') as f:
    data = json.loads(f.read())
df = pd.json_normalize(
    data,
    record_path=['students'],
    meta=[
        'class',
        ['info', 'president'],  # 类似 info.president
        ['info', 'contacts', 'tel']
    ]
)
print(df)

id   name  math  ...   class  info.president info.contacts.tel
0  A001    Tom    60  ...  Year 1     John Kasich         123456789
1  A002  James    89  ...  Year 1     John Kasich         123456789
2  A003  Jenny    79  ...  Year 1     John Kasich         123456789
[3 rows x 8 columns]

读取内嵌数据中的一组数据

nested_deep.json

{
    "school_name": "local primary school",
    "class": "Year 1",
    "students": [
    {
        "id": "A001",
        "name": "Tom",
        "grade": {
            "math": 60,
            "physics": 66,
            "chemistry": 61
        }
 
    },
    {
        "id": "A002",
        "name": "James",
        "grade": {
            "math": 89,
            "physics": 76,
            "chemistry": 51
        }
       
    },
    {
        "id": "A003",
        "name": "Jenny",
        "grade": {
            "math": 79,
            "physics": 90,
            "chemistry": 78
        }
    }]
}

这里我们需要使用到 glom 模块来处理数据套嵌，glom 模块允许我们使用 . 来访问内嵌对象的属性。

第一次使用我们需要安装 glom：

pip3 install glom -i https://pypi.tuna.tsinghua.edu.cn/simple

import pandas as pd
from glom import glom
df = pd.read_json('nested_deep.json')
data = df['students'].apply(lambda row: glom(row, 'grade.math'))
print(data)

输出：

0    60
1    89
2    79

Pandas 使用教程 JSON

JSON 转换为 CSV

简单 JSON

从 URL 中读取 JSON 数据：

字典转化为 DataFrame 数据

内嵌的 JSON 数据

复杂 JSON

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Pandas 使用教程 JSON

JSON 转换为 CSV

简单 JSON

从 URL 中读取 JSON 数据：

字典转化为 DataFrame 数据

内嵌的 JSON 数据

复杂 JSON

热门文章

最新文章

相关课程

相关电子书