④ Series:分组排序(很重要)
df = pd.DataFrame({"部门":["A", "A", "A", "B", "B", "B"], "利润":[10, 32, 20, 15, 28, 10], "销售量":[20, 15, 33, 18, 30, 22]}) display(df) df["排名"] = df["销售量"].groupby(df["部门"]).rank() df
、
结果如下:
⑤ 自定义函数:将部门A、B分为一组,C单独成为一组(很特别的需求)
df = pd.DataFrame({"部门":["A", "A", "B", "B", "C", "C"], "小组":["g1", "g2", "g1", "g2", "g1", "g2"], "利润":[10, 20, 15, 28, 12, 14], "人员":["a", "b", "c", "d", "e", "f"], "年龄":[20, 15, 18, 30, 23, 34]}) df = df.set_index("部门") display(df) def func(x): if x=="A" or x=="B": return 0 else: return 1 g = df.groupby(func) display(g) for (x,y) in g: display(x, y)
结果如下:
4、agg()聚合操作的相关说明
当使用了groupby()分组的时候,得到的就是一个分组对象。当没有使用groupby()分组的时候,整张表可以看成是一个组,也相当于是一个分组对象。
针对分组对象,我们既可以直接调用聚合函数sum()、mean()、count()、max()、min(),还可以调用分组对象的agg()方法,然后像agg()中传入指定的参数。
1)直接针对分组对象,调用聚合函数
① 针对df整张表,直接调用聚合函数
df = pd.DataFrame({"部门":["A", "A", "B", "B", "C", "C"], "小组":["g1", "g2", "g1", "g2", "g1", "g2"], "利润":[10, 20, 15, 28, 12, 14], "人员":["a", "b", "c", "d", "e", "f"], "年龄":[20, 15, 18, 30, 23, 34]}) display(df) df["利润"].mean() df[["年龄","利润"]].mean()
结果如下:
② 针对df分组后的对象,直接调用聚合函数
df = pd.DataFrame({"部门":["A", "A", "B", "B", "C", "C"], "小组":["g1", "g2", "g1", "g2", "g1", "g2"], "利润":[10, 20, 15, 28, 12, 14], "人员":["a", "b", "c", "d", "e", "f"], "年龄":[20, 15, 18, 30, 23, 34]}) display(df) df.groupby("部门")["利润"].mean() df.groupby("部门").mean()
结果如下:
、