Pandas_Dataframe
1、按元素获取数组值的幂
编写一个 Pandas 程序,以从元素上获取数组值的幂。
- 注意:第一个数组元素从第二个数组提升为幂
- 示例数据:
{'X':[78,85,96,80,86], 'Y':[84,94,89,83,86], 'Z':[86,97,96,72,83]}
- 预期输出:
X Y Z 0 78 84 86 1 85 94 97 2 96 89 96 3 80 83 72 4 86 86 83
解答:
import pandas as pd df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86], 'Z':[86,97,96,72,83]}); print(df)
2、从具有索引标签的字典数据创建数据帧
编写一个 Pandas 程序,以从具有索引标签的指定字典数据创建和显示 DataFrame。
示例数据帧:
exam_data =
{‘name’: [‘Anastasia’, ‘Dima’, ‘Katherine’, ‘James’, ‘Emily’, ‘Michael’, ‘Matthew’, ‘Laura’, ‘Kevin’, ‘Jonas’],
‘score’: [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
‘trys’: [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
‘qualify’: [‘yes’, ‘no’, ‘no’, ‘yes’, ‘yes’, ‘no’, ‘yes’, ‘no’, ‘yes’]}
labels = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’]
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] df = pd.DataFrame(exam_data , index=labels) print(df)
3、显示有关指定数据帧及其数据的基本信息摘要
编写 Pandas 程序以显示有关指定数据帧及其数据的基本信息的摘要。转到编辑器
示例Python字典数据和列表标签:
exam_data = {‘name’: [‘Anastasia’, ‘Dima’, ‘Katherine’, ‘James’, ‘Emily’, ‘Michael’, ‘Matthew’, ‘Laura’, ‘Kevin’, ‘Jonas’],
‘score’: [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
‘trys’: [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
‘qualify’: [‘yes’, ‘no’, ‘no’, ‘no’, ‘yes’, ‘no’, ‘yes’, ‘no’, ‘no’, ‘’ ‘yes’]}
标签 = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’]
预期输出:
有关此 DataFrame 及其数据的基本信息摘要:
<类 ‘pandas.core.frame.DataFrame’>
索引: 10 个条目, a to j
数据列 (总共 4 列):
…dtypes: float64(1), int64(1), object(2)
内存使用情况: 400.0+ 字节
无
4、获取给定数据帧的前 3 行
- 示例Python字典数据和列表标签:
exam_data = {‘name’: [‘Anastasia’, ‘Dima’, ‘Katherine’, ‘James’, ‘Emily’, ‘Michael’, ‘Matthew’, ‘Laura’, ‘Kevin’, ‘Jonas’],
‘score’: [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
‘trys’: [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
‘qualify’: [‘yes’, ‘no’, ‘no’, ‘no’, ‘yes’, ‘no’, ‘yes’, ‘no’, ‘no’, ‘’ ‘yes’]}
标签 = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’]
预期输出:
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] df = pd.DataFrame(exam_data , index=labels) print(df) print("\n") print(df.iloc[:3]) print("\n") print(df.head(3)) print("\n") print(df.iloc[0:1,1:3])
5、从给定的数据帧中选择两个指定的列
编写一个 Pandas 程序,从以下数据帧中选择“名称”和“评分”列。
- 示例Python字典数据和列表标签:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'trys': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'no', 'no', 'yes', 'no', 'yes', 'no', 'no', '' 'yes']} 标签 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
- 预期输出:
a Anastasia 12.5 b Dima 9.0 c Katherine 16.5 ...h Laura NaN i Kevin 8.0 j Jonas 19.0
解答:
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] print(df) print("\n") df = pd.DataFrame(exam_data , index=labels) print(df[['name', 'score']])
6、从给定的数据帧中选择指定的列和行
编写一个 Pandas 程序,从给定的数据帧中选择指定的列和行。转到编辑器
- 示例 Python 字典数据并列出标签:
从以下数据框中选择第 1、3、5、6 行中的“name”和“score”列。
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'trys': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
- 预期输出:
b 9.0 no d NaN no f 20.0 yes g 14.5 yes
解答:
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] df = pd.DataFrame(exam_data , index=labels) print(df) print("\n") print(df.iloc[[1, 3, 5, 6], [1, 3]])
7、选择检查中尝试次数大于 2 的行
编写 Pandas 程序,选择考试中尝试次数大于 2 的行。
- 示例Python字典数据和列表标签:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'trys': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'no', 'no', 'yes', 'no', 'yes', 'no', 'no', '' 'yes']} 标签 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
- 预期输出:
考试中的尝试次数大于 2,且名称分数尝试次数合格。
b Dima 9.0 3 no d James NaN 3 no f Michael 20.0 3 是
解答:
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts' : [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] df = pd.DataFrame(exam_data , index=labels) print("Number of attempts in the examination is greater than 2:") print(df[df['attempts'] > 2])
关键:df[df['attempts'] > 2]
:df['attempts'] > 2
判断每行是否满足该条件,若满足,则是True,我的理解是布尔索引。
8、计算数据帧的行数和列数
编写一个 Pandas 程序来计算 DataFrame 的行数和列数。转到编辑器
- 示例Python字典数据和列表标签:
exam_data = {‘name’: [‘Anastasia’, ‘Dima’, ‘Katherine’, ‘James’, ‘Emily’, ‘Michael’, ‘Matthew’, ‘Laura’, ‘Kevin’, ‘Jonas’],
‘score’: [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
‘trys’: [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
‘qualify’: [‘yes’, ‘no’, ‘no’, ‘no’, ‘yes’, ‘no’, ‘yes’, ‘no’, ‘no’, ‘’ ‘yes’]}
标签 = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’]
预期输出:
行数: 10
列数: 4
关键:
【1】计算数据帧的行数:len(df.axes[0])
【2】计算数据帧的列数:len(df.axes[1])
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] df = pd.DataFrame(exam_data , index=labels) total_rows=len(df.axes[0]) total_cols=len(df.axes[1]) print(df) print("\n") print("Number of Rows: "+str(total_rows)) print("Number of Columns: "+str(total_cols))
9、打印数据缺失的行
打印分数列中含有缺失数值的行
关键点:df[df['score'].isnull()]
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] df = pd.DataFrame(exam_data , index=labels) print("Rows where score is missing:") print(df[df['score'].isnull()])
10、打印行,数据在15到20之间
打印分数列中,数据在15到20之间的(含)
关键:df[df['score'].between(15, 20)]
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] df = pd.DataFrame(exam_data , index=labels) print(df) print("\n") print(df[df['score'].between(15, 20)])
11、打印某一列的平均值
打印分数列的平均值
关键:df['score'].mean()
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] df = pd.DataFrame(exam_data , index=labels) print(df) print("\n") print(df['score'].mean())
注:如果是求总和的话就把mean()
改成sum()
即可
12、打印数据帧中多条件的行
编写一个 Pandas 程序,以选择考试中尝试次数小于 2 且分数大于 15 的行。
关键:df[(df[‘attempts’] < 2) & (df[‘score’] > 10)]
注:多条件则必须要在条件外添加括号
import pandas as pd import numpy as np exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'], 'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19], 'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1], 'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] df = pd.DataFrame(exam_data , index=labels) print("attempts列小于2并且score列大于10的行:") print(df[(df['attempts'] < 2) & (df['score'] > 10)])