需要源码和数据集请点赞关注收藏后评论区留言私信~~~
众所周知 葡萄酒的价格是与其品质相关的,下面根据数据对白葡萄酒品质进行分析与处理
白葡萄酒的各项指标含义如下
fixed acidity 固定酸度
volatile acidity 挥发性酸度
citric acid 柠檬酸
residual sugar 剩余糖
chlorides 氯化物
free sulfur dioxide 游离二氧化碳
total sulfur dioxide 总二氧化硫
density 密度
PH ph值
sulphates 酸碱盐
alcohol 酒精
quality 品质
首先读取数据 打印部分部分数据如下图
import csv f = open("data//white_wine.csv", "r") reader = csv.reader(f) content = [] for row in reader: content.append(row) f.close() for i in range(5): print(content[i])
接着处理数据 首先查看白葡萄酒总共分为几种品质等级
quality_list = [] for row in content[1:]: quality_list.append(int(row[-1])) quality_count = set(quality_list) print("白葡萄酒共有%d种等级,分别是:%r"%(len(quality_count),quality_count))
然后按白葡萄酒等级将数据集划分为七个自己 并统计每种等级的数量
content_dict = {} for row in content[1:]: quality = int(row[-1]) if quality not in content_dict.keys(): content_dict[quality] = [row] else: content_dict[quality].append(row) for key in content_dict: print(key,":",len(content_dict[key]))
最后再计算每个数据集中fixed acidity的均值
mean_list = [] for key, value in content_dict.items(): sum= 0 for row in value: sum += float(row[0]) #fixed acidity是第一列数据 mean_list.append((key, sum/len(value))) for item in mean_list: print(item[0],":",item[1]) # print("\n") # print(mean_list)
创作不易 觉得有帮助请点赞关注收藏~~~