数据分析三剑客【AIoT阶段一（下）】（十万字博文保姆级讲解）—Pandas—pandas高级

数据分析三剑客【AIoT阶段一（下）】（十万字博文保姆级讲解）—Pandas—pandas高级—训练场（2）（十二）

2022-09-01 219

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 你好，感谢你能点进来本篇博客，请不要着急退出，相信我，如果你有一定的 Python 基础，想要学习 Python数据分析的三大库：numpy，pandas，matplotlib；这篇文章不会让你失望，本篇博客是【AIoT阶段一（下）】的内容：Python数据分析，

2.4.3 体测成绩评分表数据转换

首先我们下载一个 Exel 文件：

链接: https://pan.baidu.com/s/1wxeENf0tjx5bWxTGxkZGeg?pwd=szmk

提取码: szmk

复制这段内容后打开百度网盘手机App，操作更方便哦

下载完成之后，把该文件和我们的代码放到同一个文件夹下，这一操作我们在之前的博客中已经反复说到，这里就不再进行演示

我们先按照传统的方法把数据进行加载查看

import numpy as np
import pandas as pd
score = pd.read_excel('./体侧成绩评分表.xls')
score

是不是感觉看起来特别的别扭？这是因为我们在我们的 Excel 文件中，我们最上方一行每个都占了两列的原因，所以代码运行中的 Unnamed1，Unnamed3 其实就是填充。

针对上述现象，我们可以按如下方法执行：

import numpy as np
import pandas as pd
# header:告诉 pandas 第一行和第二行作为列索引
score = pd.read_excel('./体侧成绩评分表.xls',header = [0, 1])
score

我们来观察数据：

发现成绩的形式都是：xx'xx" 的形式，现在我们想把它转为数字的形式：

def convert(x):
    if isinstance(x, str):
        s = float(x.replace('\"', '').replace('\'', '.'))
                  # 删除 "                     替换 ' 为 .
    return s
x = '''3'30"'''
convert(x)

我们用上述方法对男生考核标准进行转换：

def convert(x):
    if isinstance(x, str):
        s = float(x.replace('\"', '').replace('\'', '.'))
    return s
# 男生考核标准转换
score.iloc[:, -4] = score.iloc[:, -4].apply(convert)
score

同样的方法，对女生信息进行转换：

score = pd.read_excel('./体侧成绩评分表.xls',header=[0, 1])
def convert(x):
    if isinstance(x, str):
        s = float(x.replace('\"', '').replace('\'', '.'))
    return s
# 男生考核标准转换
score.iloc[:, -4] = score.iloc[:, -4].apply(convert)
# 女生考核标准转换
score.iloc[:, -2] = score.iloc[: , -2].apply(convert)
score

最后保存我们的文件：

score = pd.read_excel('./体侧成绩评分表.xls',header=[0, 1])
def convert(x):
    if isinstance(x, str):
        s = float(x.replace('\"', '').replace('\'', '.'))
    return s
# 男生考核标准转换
score.iloc[:, -4] = score.iloc[:, -4].apply(convert)
# 女生考核标准转换
score.iloc[:, -2] = score.iloc[: , -2].apply(convert)
score.to_excel('./体侧成绩评分表_处理.xlsx', header=[0, 1])

再来查看以下我们的数据：

pd.read_excel('./体侧成绩评分表_处理.xlsx', header = [0, 1])

可以看到第一列多了一些奇奇怪怪的东西，我们用 index_col = 0 删除：

# index_col = 0   使用第一列作为行索引
pd.read_excel('./体侧成绩评分表_处理.xlsx', header = [0, 1], index_col = 0)

2.4.4 男生体测分数成绩转换

2.4.4.1 男生1000米跑成绩分数转换

注：代码处于运行中将显示：

下列代码运行十几秒，几十秒甚至几分钟都是正常的，耐心等待运行结果即可。

%%time
import pandas as pd
# 加载处理之后的男生体测成绩
df_boy = pd.read_excel('./男生体测成绩.xlsx')
# 加载成绩评分表
score = pd.read_excel('./体侧成绩评分表_处理.xlsx', header = [0,1],index_col = 0)
# 定义转换方法
def convert(x):
    if x == 0: # 说明没有参加体能测试，分数为0分
        return 0
    for i in range(20): # 成绩划分20等级
        if x <= score['男1000米跑']['成绩'][i]:
            return score['男1000米跑']['分数'][i]
    return 0 # 说明跑的太慢了，分数为0分
df_boy['男1000米跑'].apply(convert)

2.4.4.2 男生1000米跑成绩分数转换并赋值

%%time
import pandas as pd
# 加载处理之后的男生体测成绩
df_boy = pd.read_excel('./男生体测成绩.xlsx')
# 加载成绩评分表
score = pd.read_excel('./体侧成绩评分表_处理.xlsx', header = [0, 1])
# 定义转换方法
def convert(x):
    if x == 0: # 说明没有参加体能测试，分数为0分
        return 0
    for i in range(20): # 成绩划分20等级
        if x <= score['男1000米跑']['成绩'][i]:
            return score['男1000米跑']['分数'][i]
    return 0 # 说明跑的太慢了，分数为0分
df_boy['男1000米跑' + '分数'] = df_boy['男1000米跑'].apply(convert)
df_boy.head(10)

2.4.4.3 批量转换男生速度类成绩分数

%%time
import numpy as np
import pandas as pd
df_boy = pd.read_excel('./男生体测成绩.xlsx')
score = pd.read_excel('./体侧成绩评分表_处理.xlsx', header = [0,1], index_col = 0)
cols = ['男1000米跑', '男50米跑']
def convert(x, col):
    if x == 0: # 说明没有参加体能测试，分数为0分
        return 0
    for i in range(20): # 成绩划分20等级
        if x <= score[col]['成绩'][i]:
            return score[col]['分数'][i]
    return 0 # 说明跑的太慢了，分数为0分
for col in cols:
    # args 传入的参数 (col,)元组
    s = df_boy[col].apply(convert, args = (col,))
    columns = df_boy.columns.to_list()
    index = columns.index(col) + 1 # 这一列后面
    # 向这一列后面添加一列：分数
    df_boy.insert(loc = index, column = col + '分数', value = s)
df_boy.head()

2.4.4.4 批量转换男生力量型成绩分数并保存

%%time
# convert自定义，名字任意
def convert(x, col):
    for i in range(20): # 成绩划分20等级
        if x >= score[col]['成绩'][i]:
            return score[col]['分数'][i]
    return 0 # 说明跳远不达标，分数为0分
cols = ['男跳远', '男体前屈', '男引体', '男肺活量']
for col in cols:
    # apply ,args = (col,)代表某一列，成绩分数转换
    s = df_boy[col].apply(convert, args = (col,))
    columns = df_boy.columns.to_list()
    # 后面插入一列
    index = columns.index(col) + 1
    df_boy.insert(loc = index, column = col + '分数', value = s)
display(df_boy.head())
df_boy.to_excel('./男生体测成绩-分数.xlsx', index = False)

2.4.5 女生体测成绩分数转换

%%time
import numpy as np
import pandas as pd
df_girl = pd.read_excel('./女生体测成绩.xlsx')
score = pd.read_excel('./体侧成绩评分表_处理.xlsx', header = [0,1],index_col = 0)
# 速度类型成绩分数批量转换
cols = ['女800米跑', '女50米跑']
def convert(x, col):
    if x == 0: # 说明没有参加体能测试，分数为0分
        return 0
    for i in range(20): # 成绩划分20等级
        if x <= score[col]['成绩'][i]:
            return score[col]['分数'][i]
    return 0 # 说明跑的太慢了，分数为0分
for col in cols:
    s = df_girl[col].apply(convert, args = (col,))
    columns = df_girl.columns.to_list()
    index = columns.index(col) + 1
    df_girl.insert(loc = index, column = col + '分数', value = s)
# 力量型成绩分数批量转换
cols = ['女跳远', '女体前屈', '女仰卧', '女肺活量']
def convert(x, col):
    for i in range(20): # 成绩划分20等级
        if x >= score[col]['成绩'][i]:
            return score[col]['分数'][i]
    return 0 # 说明跳远不达标，分数为0分
for col in cols:
    s = df_girl[col].apply(convert, args = (col,))
    columns = df_girl.columns.to_list()
    index = columns.index(col) + 1
    df_girl.insert(loc = index, column = col + '分数', value = s)
df_girl.to_excel('./女生体测成绩-分数.xlsx', index = False)
df_girl.head(10)

2.4.6 保存成绩

最后我们把男女生成绩整合到一个文件当中：

with pd.ExcelWriter('./分数汇总.xlsx') as writer:
    df_boy.to_excel(writer, sheet_name = '男生', index = False)
    df_girl.to_excel(writer, sheet_name = '女生', index = False)

数据分析三剑客【AIoT阶段一（下）】（十万字博文保姆级讲解）—Pandas—pandas高级—训练场（2）（十二）

2.4.3 体测成绩评分表数据转换

2.4.4 男生体测分数成绩转换

2.4.4.1 男生1000米跑成绩分数转换

2.4.4.2 男生1000米跑成绩分数转换并赋值

2.4.4.3 批量转换男生速度类成绩分数

2.4.4.4 批量转换男生力量型成绩分数并保存

2.4.5 女生体测成绩分数转换

2.4.6 保存成绩

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

数据分析三剑客【AIoT阶段一（下）】（十万字博文 保姆级讲解）—Pandas—pandas高级—训练场（2）（十二）

2.4.3 体测成绩评分表数据转换

2.4.4 男生体测分数成绩转换

2.4.4.1 男生1000米跑成绩分数转换

2.4.4.2 男生1000米跑成绩分数转换并赋值

2.4.4.3 批量转换男生速度类成绩分数

2.4.4.4 批量转换男生力量型成绩分数并保存

2.4.5 女生体测成绩分数转换

2.4.6 保存成绩

热门文章

最新文章

相关课程

相关电子书

相关实验场景

数据分析三剑客【AIoT阶段一（下）】（十万字博文保姆级讲解）—Pandas—pandas高级—训练场（2）（十二）