3 多表操作Join
3.1 数据准备
先构建两个DataFrame
scala> val df1 = spark.createDataset(Seq(("a", 1,2), ("b",2,3) )).toDF("k1","k2","k3") df1: org.apache.spark.sql.DataFrame = [k1: string, k2: int ... 1 more field] scala> val df2 = spark.createDataset(Seq(("a", 2,2), ("b",3,3), ("b", 2,1), ("c", 1,1)) ).toDF("k1","k2","k4") df2: org.apache.spark.sql.DataFrame = [k1: string, k2: int ... 1 more field] scala> df1.show +---+---+---+ | k1| k2| k3| +---+---+---+ | a| 1| 2| | b| 2| 3| +---+---+---+ scala> df2.show +---+---+---+ | k1| k2| k4| +---+---+---+ | a| 2| 2| | b| 3| 3| | b| 2| 1| | c| 1| 1| +---+---+---+
3.2 Join算子说明
join比较通用两种调用方式,注意在usingColumns里的字段必须在两个DF中都存在
joinType:默认是 inner
. 必须是以下类型的一种:inner
, cross
, outer
, full
, full_outer
, left
, left_outer
,right
, right_outer
, left_semi
, left_anti
.
def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame def join(right: Dataset[_], joinExprs: Column, joinType: String): DataFrame
3.2.1 join内连接
//select * from df1 join df2 on df1.key1=df2.key1 //方法一 scala> df1.join(df2,"k1").show +---+---+---+---+---+ | k1| k2| k3| k2| k4| +---+---+---+---+---+ | a| 1| 2| 2| 2| | b| 2| 3| 2| 1| | b| 2| 3| 3| 3| +---+---+---+---+---+ //方法二 scala> df1.join(df2,df1("k1") === df2("k1")).show +---+---+---+---+---+---+ | k1| k2| k3| k1| k2| k4| +---+---+---+---+---+---+ | a| 1| 2| a| 2| 2| | b| 2| 3| b| 2| 1| | b| 2| 3| b| 3| 3| +---+---+---+---+---+---+ //方法三 scala> df1.join(df2,df1("k1") === df2("k1"),"inner").show +---+---+---+---+---+---+ | k1| k2| k3| k1| k2| k4| +---+---+---+---+---+---+ | a| 1| 2| a| 2| 2| | b| 2| 3| b| 2| 1| | b| 2| 3| b| 3| 3| +---+---+---+---+---+---+ //不同字段比较 //select * from df1 join df2 on df1.key2=df2.key4 scala> df1.join(df2,df1("k2") === df2("k4"),"inner").show +---+---+---+---+---+---+ | k1| k2| k3| k1| k2| k4| +---+---+---+---+---+---+ | a| 1| 2| c| 1| 1| | a| 1| 2| b| 2| 1| | b| 2| 3| a| 2| 2| +---+---+---+---+---+---+ //多个字段比较 //select * from df1 join df2 on df1.key1=df2.key1 and df1.key2=df2.key2 scala> df1.join(df2,Seq("k1","k2"),"inner").show +---+---+---+---+ | k1| k2| k3| k4| +---+---+---+---+ | b| 2| 3| 1| +---+---+---+---+