实例 1 将分组后的字符拼接
import pandas as pd df=pd.DataFrame({ 'user_id':[1,2,1,3,3], 'content_id':[1,1,2,2,2], 'tag':['cool','nice','clever','clever','not-bad'] }) df
将df按content_id分组,然后将每组的tag用逗号拼接
df.groupby('content_id')['tag'].apply(lambda x:','.join(x)).to_frame()
实例2 统计每个content_id有多少个不同的用户
import pandas as pd df = pd.DataFrame({ 'user_id':[1,2,1,3,3,], 'content_id':[1,1,2,2,2], 'tag':['cool','nice','clever','clever','not-bad'] }) df.groupby("content_id")["user_id"].nunique().to_frame()
实例3 分组结果排序
import pandas as pd df = pd.DataFrame({ 'value':[20.45,22.89,32.12,111.22,33.22,100.00,99.99], 'product':['table','chair','chair','mobile phone','table','mobile phone','table'] }) df
df1 = df.groupby('product')['value'].sum().to_frame().reset_index() df1
按产品product分组后,然后value求和:
df2 = df.groupby('product')['value'].sum().to_frame().reset_index().sort_values(by='value') df2
实例4 分组大小绘图
import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({ 'value':[20.45,22.89,32.12,111.22,33.22,100.00,99.99], 'product':['table','chair','chair','mobile phone','table','mobile phone','table'] }) df
plt.clf() df.groupby('product').size().plot(kind='bar') plt.show()
实例5 分组求和绘图
import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({ 'value':[20.45,22.89,32.12,111.22,33.22,100.00,99.99], 'product':['table','chair','chair','mobile phone','table','mobile phone','table'] }) df
plt.clf() df.groupby('product').sum().plot(kind='bar') plt.show()
实例 6 使用agg函数
import pandas as pd df = pd.DataFrame({ 'value':[20.45,22.89,32.12,111.22,33.22,100.00,99.99], 'product':['table','chair','chair','mobile phone','table','mobile phone','table'] }) grouped_df = df.groupby('product').agg({'value':['min','max','mean']}) grouped_df
grouped_df.columns = ['_'.join(col).strip() for col in grouped_df.columns.values] grouped_df = grouped_df.reset_index() grouped_df
实例7 遍历分组
for key,group_df in df.groupby('product'): print("the group for product '{}' has {} rows".format(key,len(group_df)))
the group for product 'chair' has 2 rows the group for product 'mobile phone' has 2 rows the group for product 'table' has 3 rows
源代码:Python008-Pandas GroupBy 使用教程.ipynb