3)left outer join左连接
在SQL中:
SELECT * FROM df1 LEFT OUTER JOIN df2 ON df1.key = df2.key;
在Dataframe中:
pd.merge(df1, df2, on='key', how='left')
结果如下:
4)right join右连接
在SQL中:
SELECT * FROM df1 RIGHT OUTER JOIN df2 ON df1.key = df2.key;
在Dataframe中:
pd.merge(df1, df2, on='key', how='right')
结果如下:
5)full join全连接
注意在MySQL中是不支持全连接的,一般是使用union完成这个操作的,这将在下面一个知识点中讲述。
在Dataframe中:
pd.merge(df1, df2, on='key', how='outer')
结果如下:
6、union数据合并
UNION (ALL)操作在Dataframe中可以使用concat()来执行。
1)数据准备
df1 = pd.DataFrame({'city': ['Chicago', 'San Francisco', 'New York City'], 'rank': range(1, 4)}) df2 = pd.DataFrame({'city': ['Chicago', 'Boston', 'Los Angeles'], 'rank': [1, 4, 5]})
结果如下:
2)union all不去重合并
在SQL中:
SELECT city, rank FROM df1 UNION ALL SELECT city, rank FROM df2; """ city rank Chicago 1 San Francisco 2 New York City 3 Chicago 1 Boston 4 Los Angeles 5 """
在Dataframe中:
# 默认就是axis=0 pd.concat([df1, df2],axis=0)
结果如下:
3)union去重合并
在SQL中:
SELECT city, rank FROM df1 UNION SELECT city, rank FROM df2; -- notice that there is only one Chicago record this time """ city rank Chicago 1 San Francisco 2 New York City 3 Boston 4 Los Angeles 5 """
在Dataframe中:
pd.concat([df1, df2]).drop_duplicates()
结果如下:
7.取group分组后的Topn
在MySQL8.0以前的版本,可能是不支持窗口函数,因此求Topn可能有些费劲,以前的文章中已经讲述过,这里也就不在赘述。
有下面一堆数据,怎么求出Topn呢?
df = pd.DataFrame({"name":["张三","王五","李四","张三","王五","张三","李四","李四","王五"], "subject":["语文","英语","数学","数学","语文","英语","语文","英语","数学"], "score":[95,80,83,80,90,71,88,70,78]}) df
结果如下:
在Dataframe中:
df.groupby(["subject"]).apply(lambda df:df.sort_values("score",ascending=True))
结果如下: