Python数据分析可视化综合实例-阿里云开发者社区

Python数据分析可视化综合实例

2023-08-14 242

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Python数据分析可视化综合实例

本期内容为大家带来Python数据分析可视化综合实例

给定素如下：

sex=["男","女"] 
df1=pd.DataFrame({ 
    "names":["student"+str(i) for i in range(1,32) ], 
    "sex":[sex[np.random.randint(2)] for i in range(31) ], 
    "python":np.random.randint(60,101,31), 
    "spark":np.random.randint(60,90,31), 
    "linux":np.random.randint(60,98,31) 
})

折线图

1.用折线图呈现python,spark,linux的前5名

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
plt.rcParams['font.sans-serif'] = ['SimHei'] 
plt.rcParams['axes.unicode_minus'] = False 
df1 = pd.DataFrame({ 
    'names': [f'student{i}' for i in range(1, 48)], 
    'sex': np.random.choice(['男', '女'], 47), 
    'python': np.random.randint(50, 101, 47), 
    'spark': np.random.randint(50, 90, 47), 
    'linux': np.random.randint(50, 95, 47), 
}) 
print(df1) 
# 用折线图分别呈现Python，spark，Linux的前五名 
score1 = df1['python'].sort_values(ascending=False).head(5) 
score2 = df1['spark'].sort_values(ascending=False).head(5) 
score3 = df1['linux'].sort_values(ascending=False).head(5) 
nums = [1, 2, 3, 4, 5] 
labels = [f'第{i}名' for i in range(1, 6)] 
# 绘制Python、spark、Linux折线 
plt.plot(nums, score1, color='red', marker='o', linestyle=':') 
plt.plot(nums, score2, color='blue', marker='p', linestyle=':') 
plt.plot(nums, score3, color='green', marker='x', linestyle=':') 
plt.xticks(nums, labels)  # 设置x轴刻度及标签 
plt.ylabel('成绩')  # y轴标签 
plt.xlabel('名次')  # x轴标签 
plt.title('学生成绩分析')  # 标题 
plt.legend(['python', 'spark', 'linux'])  # 图例 
plt.show()

效果：

并列柱状图

2.用并列柱状图呈现总分前5名的各科成绩，横轴显示学生的姓名，纵轴显示分数，共3个图例。

df1['total'] = df1['python'] + df1['spark'] + df1['linux'] 
result1 = df1.sort_values('total', ascending=False).head(5) 
width = 0.2 
nums = np.arange(5) 
plt.bar(nums, result1['python'], color='blue', width=width) 
plt.bar(nums + width, result1['spark'], color='red', width=width) 
plt.bar(nums + width * 2, result1['linux'], color='green', width=width) 
plt.xticks(nums, result1['names']) 
plt.ylim(0, 120) 
for x, y in zip(nums, result1['python']): 
    plt.text(x, y, y, ha='center') 
for x, y in zip(nums + width, result1['spark']): 
    plt.text(x, y, y, ha='center') 
for x, y in zip(nums + width * 2, result1['python']): 
    plt.text(x, y, y, ha='center') 
plt.xlabel('学生') 
plt.ylabel('成绩') 
plt.title('学生总分top5') 
plt.legend(['python', 'spark', 'linux']) 
plt.show()

效果：

4.用并列柱状图分别呈现男生和女孩的python，spark,linux的平均分

result1 = df1.groupby('sex')['python', 'spark', 'linux'].mean() 
print(result1) 
w = 0.2 
nums = np.arange(2) 
plt.bar(nums, result1['python'], color='red', width=w) 
plt.bar(nums + w, result1['spark'], color='blue', width=w) 
plt.bar(nums + w * 2, result1['linux'], color='green', width=w) 
plt.xticks(nums, result1.index) 
plt.ylim(0, 120) 
for x, y in zip(nums, result1['python']): 
    plt.text(x, y, f'{round(y, 1)}分', ha='center') 
for x, y in zip(nums, result1['spark']): 
    plt.text(x + w, y, f'{round(y, 1)}分', ha='center') 
for x, y in zip(nums + w, result1['spark']): 
    plt.text(x + w, y, f'{round(y, 1)}分', ha='center') 
plt.legend(['python', 'spark', 'linux']) 
plt.title('男女生各科成绩平均分') 
plt.show()

效果

显示前5个电影三天的总票房及票房趋势 (三天票房的并列柱状图，并做单日票房的数据标注).

给定素材为:豆瓣电影数据1.csv,其中r1,r2,r3分别表示前3天的票房

df1 = pd.read_csv('data/豆瓣电影数据1.csv') 
result1 = df1.head(5) 
print(result1.index) 
width = 0.2 
nums = np.arange(result1.shape[0]) 
plt.bar(nums, result1['r1'], color='blue', width=width) 
plt.bar(nums + width, result1['r2'], color='red', width=width) 
plt.bar(nums + width * 2, result1['r3'], color='green', width=width) 
plt.xticks(nums, result1['name']) 
for x, y in zip(nums, result1['r1']): 
    plt.text(x, y, y, ha='center') 
for x, y in zip(nums + width, result1['r2']): 
    plt.text(x, y, y, ha='center') 
for x, y in zip(nums + width * 2, result1['r3']): 
    plt.text(x, y, y, ha='center') 
plt.xlabel('学生') 
plt.ylabel('成绩') 
plt.title('学生总分top5') 
plt.legend(['第一天', '第二天', '第三天']) 
plt.show()

效果

堆积折线图

显示各个手机各个月份的总销量和销量趋势,使用堆积折线图

df1 = pd.read_csv('data/手机年销量.txt') 
print(df1) 
plt.xlabel('品牌') 
plt.ylabel('销量') 
plt.ylim(0, 1500) 
plt.title('手机总销量趋势图') 
plt.stackplot(df1['品牌'], df1['6月'], df1['7月'], df1['8月'], df1['9月']) 
plt.legend(['6月', '7月', '8月', '9月']) 
plt.show()

效果

饼图

使用饼图显示每个城市的学生的人数

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = 'SimHei'  # 设置字体
plt.rcParams["font.size"] = 10  # 设置字号
plt.rcParams["axes.unicode_minus"] = False  # 正常显示负号
sex = ["男", "女"]
address = ["北京", "上海", "襄阳"]
df1 = pd.DataFrame({
    "names": ["student" + str(i) for i in range(1, 32)],
    "sex": [sex[np.random.randint(2)] for i in range(31)],
    "address": [address[np.random.randint(3)] for i in range(31)],
    "python": np.random.randint(60, 101, 31),
    "spark": np.random.randint(60, 90, 31),
    "linux": np.random.randint(60, 98, 31)
})
# 使用饼图显示每个城市的学生的人数
data1 = df1.groupby('address')['sex'].count()
plt.pie(data1.values, labels=data1.index,
        explode=[0, 0.1, 0],
        autopct='%.2f%%')
plt.legend(data1.index)
plt.title('各城市学生人数占比')
plt.show()

散点图

随机生成大数据专业200名学生的性别,姓名,身高,体重,绘制身高和体重分布的散点图,女生数据用红色绘制,男生数据用蓝色绘制

size = 200 
df1 = pd.DataFrame({ 
    'name': [f'student{i}' for i in range(1, size + 1)], 
    'sex': np.random.choice(['男', '女'], size), 
    'height': np.random.randint(150, 200, size), 
    'width': np.random.randint(40, 100, size) 
}) 
print(df1.head()) 
men = df1[df1['sex'] == '男'] 
women = df1[df1['sex'] == '女'] 
plt.scatter(men['height'], men['width'], color='red') 
plt.scatter(women['height'], women['width'], color='blue', cmap='viridis') 
plt.xlabel('身高（单位：CM）') 
plt.ylabel('体重：(单位：KG）') 
plt.title('男女生身高体重分布图') 
plt.legend(['男', '女']) 
plt.show()

雷达图

使用雷达图绘制小明各科成绩

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties  
# 加载中文字体
font = FontProperties(fname="./data/SimHei.ttf", size=14)  
# 数据准备
labels=np.array(["英语","数学","语文","化学","物理","体育"])
stats=[90, 80, 76, 70, 75, 88]
# 画图数据准备，角度、状态值
angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)
stats=np.concatenate((stats,[stats[0]]))
angles=np.concatenate((angles,[angles[0]]))
# 画蜘蛛图
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111, polar=True)   
ax.plot(angles, stats, 'o-', linewidth=2)
ax.fill(angles, stats, alpha=0.25)
# 设置中文标题和维度名称
ax.set_thetagrids(angles * 180/np.pi, labels, FontProperties=font)
ax.set_title("小明各科成绩",FontProperties = font, size = 20)
plt.show()

热力图

使用热力图绘制学生各科成绩

import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['font.family'] = ['sans-serif']
plt.rcParams['font.size'] = '20'
plt.rcParams['font.sans-serif'] = ['SimHei']
labels=np.array(["英语","数学","语文","化学","物理","体育"])
names = ["李雷","韩梅梅","汤姆","安"]
scores=np.array([[90, 80, 76, 70, 75, 88],[70, 60, 73, 80, 95, 55],
                 [70, 60, 56, 30, 65, 95],[50, 40, 66, 75, 74, 98]])
fig = plt.figure(0,figsize = (10,6))
plt.matshow(scores,fignum = 0)
plt.xticks(ticks = range(len(labels)),labels = labels)
plt.yticks(ticks = range(len(names)),labels = names)
# 绘制⽂本
for i in range(len(names)):
    for j in range(len(labels)):
        plt.text(j, i, round(scores[i, j],1), ha="center", va="center", color='r')
plt.colorbar()
plt.show()

Python数据分析可视化综合实例

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Python数据分析可视化综合实例

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像