一、List集合
scala中列表操作:
package collectionDemo /** * @author : 蔡政洁 * @email :caizhengjie888@icloud.com * @date : 2020/11/18 * @time : 9:22 下午 * * scala中列表操作: * Scala中列表list,首先是Seq的子trait,其要么是使用空;列表(Nil),要么就是又一个head头元素 * 和其余元素组成tail子列表来构成。其中head、tail、以及isEmpty是list列表的基本操作。 */ object ListOps1 { def main(args: Array[String]): Unit = { val list = List(3,-6,7,1,3,4,-8,2) // head val headEle = list.head println("head:"+headEle) // tail val tailList = list.tail println("tailList:"+tailList.mkString("[",",","]")) // isEmpty val isNull = list.isEmpty println("isNull:"+isNull) // 构建一个空列表 val newList = Nil // 使用递归思想,来完成list元素的求和 val sum = getTotal(list) println("sum:"+sum) } def getTotal(list: List[Int]): Int ={ if (list.isEmpty){ 0 } else { list.head + getTotal(list.tail) } // 使用模式匹配 list match { case Nil => 0 case array => list.head + getTotal(list.tail) } } }
运行结果:
head:3 tailList:[-6,7,1,3,4,-8,2] isNull:false sum:6
list基本API学习:
package collectionDemo /** * @author : 蔡政洁 * @email :caizhengjie888@icloud.com * @date : 2020/11/18 * @time : 9:39 下午 * list基本API学习 */ object ListOps2 { def main(args: Array[String]): Unit = { val left = List(1,2,3,4) val right = List(4,5,6,7,8) // crud操作 /* 增加一个元素: list.+:(A),将元素A添加到集合list的首部,返回一个新集合,原集合不变 list.::(A),将元素A添加到集合list的首部,返回一个新集合,原集合不变 list.:+(A),将元素A添加到集合list的尾部,返回一个新集合,原集合不变 增加一个集合: list.:::(A),将集合A添加到集合list的首部,返回一个新集合,原集合不变 list.++:(A),将集合A添加到集合list的首部,返回一个新集合,原集合不变 list.++(A),将集合A添加到集合list的尾部,返回一个新集合,原集合不变 */ // 添加元素在集合首部 var newList = left.+:(5) println("left.+:(5): " + newList) // 添加元素在集合首部 newList = left.::(5) println("left.::(5): " + newList) // 添加元素在集合尾部 newList = left.:+(5) println("left.:+(5): " + newList) // 添加集合在集合首部 newList = left.:::(right) println("left.:::(right): " + newList) // 添加集合在集合首部 newList = left.++:(right) println("left.++:(right): " + newList) // 添加集合在集合尾部 newList = left.++(right) println("left.++(right): " + newList) // 查询 list(index) val ret = newList(3) println("newList(3) = " + ret) // 修改:不可变集合,所以不能被修改 // 删除 newList = newList.dropRight(1) println("newList.dropRight(1): " + newList) // 判断 println("newList.contains(3): " + newList.contains(3)) // union newList = left.union(right) println("left.union(right): " + newList) // interesct交集 newList = left.intersect(right) println("left.intersect(right): " + newList) // diff差集 newList = left.diff(right) println("left.diff(right): " + newList) // take:获取集合中的前n个元素,如果该集合是有序的,那就是TopN val take2 = left.take(2) println("left.take(2): " + take2) } }
运行结果:
left.+:(5): List(5, 1, 2, 3, 4) left.::(5): List(5, 1, 2, 3, 4) left.:+(5): List(1, 2, 3, 4, 5) left.:::(right): List(4, 5, 6, 7, 8, 1, 2, 3, 4) left.++:(right): List(4, 5, 6, 7, 8, 1, 2, 3, 4) left.++(right): List(1, 2, 3, 4, 4, 5, 6, 7, 8) newList(3) = 4 newList.dropRight(1): List(1, 2, 3, 4, 4, 5, 6, 7) newList.contains(3): true left.union(right): List(1, 2, 3, 4, 4, 5, 6, 7, 8) left.intersect(right): List(4) left.diff(right): List(1, 2, 3) left.take(2): List(1, 2)
二、Set集合
scala集合set的操作:
package collectionDemo import scala.collection.SortedSet /** * @author : 蔡政洁 * @email :caizhengjie888@icloud.com * @date : 2020/11/19 * @time : 4:42 下午 * scala集合set的操作 */ object SetOps1 { def main(args: Array[String]): Unit = { val set = Set(2,5,1,3,4,2,5) println(set) println("--------有序集合--------") // 按照字符串的字典顺序来进行排序 val sorted = SortedSet( "abc", "aa", "acb", "abdc" ) println(sorted) println("--------基于自定义数据构建有序的集合--------") /* 自定义的比较,就必须要让元素具备可比性,或者为容器提供比较器 第一种:让元素具备比较性 Java中,就是让该类型实现接口Comparable Scala中,就是让该类型实现特质Ordered Ordered和Comparable的关系 trait Ordered[T] extends Comparable[T] with Serializable 第二种:让容器具备比较器 Java中,就是让该容器实现接口Comparator Scala中,就是让该容器实现特质Ordering trait Ordering[T] extends Comparator[T] with Serializable 注意的是:当比较器和比较性遇到一起时,比较器优先 */ val persons = SortedSet( Person("alex",23,180.5), Person("jack",17,178.5), Person("jone",17,189.5), Person("cherry",29,181.5), Person("lili",20,180.5) )(new Ordering[Person](){ /* 定义比较规则: 先按照身高升序比较,身高相对,在按年龄降序排序 */ override def compare(x: Person, y: Person): Int = { var ret = x.height.compareTo(y.height) if(ret == 0){ ret = y.age.compareTo(x.age) } ret } }) persons.foreach(println) } } case class Person(name:String, age:Int, height:Double) extends Ordered[Person]{ /* 定义比较规则: 先按照年龄升序比较,年龄相对,在按照身高降序排序 */ override def compare(that: Person): Int = { var ret = this.age.compareTo(that.age) if(ret == 0){ ret = that.height.compareTo(this.height) } ret } }
运行结果:
Set(5, 1, 2, 3, 4) --------有序集合-------- TreeSet(aa, abc, abdc, acb) --------基于自定义数据构建有序的集合-------- Person(jack,17,178.5) Person(alex,23,180.5) Person(lili,20,180.5) Person(cherry,29,181.5) Person(jone,17,189.5)
三、案例:Scala实现wordcount
准备数据:
java python java java hadoop php hive sqoop spark linux shell
package collectionDemo import scala.collection.mutable import scala.io.Source /** * @author : 蔡政洁 * @email :caizhengjie888@icloud.com * @date : 2020/11/19 * @time : 6:57 下午 * 使用Scala来完成word count的统计 * 要求使用集合和函数编程思想 */ object CollectionOps { def main(args: Array[String]): Unit = { // 加载数据 val lines = Source.fromFile("data/wordcount.txt").getLines().toList lines.foreach(println) println("------------传统做法一统计每一个单词的次数------------") /* 传统做法: 创建一个map容器,用来存放每一个单词出现的次数,key=word,value-count */ val map = mutable.Map[String,Int]() for (line <- lines){ val words = line.split(" ") for (word <- words){ val countOption = map.get(word) if (countOption.isDefined){ map.put(word,1 + countOption.get) } else { map.put(word,1 + 0) } } } for ((word,count) <- map){ println(s"word: ${word}\tcount=${count}") } println("------------传统做法二(优化)统计每一个单词的次数------------") val newMap = mutable.Map[String,Int]() for (line <- lines){ val words = line.split(" ") for (word <- words){ newMap.put(word,newMap.getOrElse(word,0) + 1) } } for ((word,count) <- map){ println(s"word: ${word}\tcount=${count}") } println("------------使用函数式编程统计每一个单词的次数------------") val words = lines.flatMap(line => line.split(" ")) val word2Pairs:Map[String,List[String]] = words.groupBy(word => word) val word2Count = word2Pairs.map(t => (t._1,t._2.size)) for ((word,count) <- word2Count){ println(s"word: ${word}\tcount=${count}") } println("------------使用函数式编程(优化)统计每一个单词的次数------------") lines.flatMap(line => line.split(" ")) .groupBy(word => word) .foreach{case(word,words) => { println(s"word: ${word}\tcount=${words.size}") }} } }
运行结果:
java python java java hadoop php hive sqoop spark linux shell ------------传统做法一统计每一个单词的次数------------ word: spark count=1 word: hadoop count=1 word: sqoop count=1 word: php count=1 word: java count=3 word: hive count=1 word: shell count=1 word: linux count=1 word: python count=1 ------------传统做法二(优化)统计每一个单词的次数------------ word: spark count=1 word: hadoop count=1 word: sqoop count=1 word: php count=1 word: java count=3 word: hive count=1 word: shell count=1 word: linux count=1 word: python count=1 ------------使用函数式编程统计每一个单词的次数------------ word: shell count=1 word: sqoop count=1 word: java count=3 word: hadoop count=1 word: spark count=1 word: hive count=1 word: php count=1 word: linux count=1 word: python count=1 ------------使用函数式编程(优化)统计每一个单词的次数------------ word: shell count=1 word: sqoop count=1 word: java count=3 word: hadoop count=1 word: spark count=1 word: hive count=1 word: php count=1 word: linux count=1 word: python count=1