Groupby - collection processing

简介:

Iterator and Iterable have most of the most useful methods when dealing with collections. Fold, Map, Filter are probably the most common. But other very useful methods include grouped/groupBy, sliding, find, forall, foreach, and many more. I want to cover Iterable's groupBy method in this topic.

This is a Scala 2.8 and later method. It is similar to partition in that it allows the collection to be divided (or partitioned). Partition takes a method with returns a boolean and partitions the collection into two depending on a result. GroupBy takes a function that returns an object and returns a Map with the key being the return value. This allows an arbitrary number of partitions to be made from the collection.

Here is the method signature:

def groupBy[K](f : (A) => K) : Map[K, This]

A bit of context is require to understand the three Type parameters A, K and This. This method is defined in a super class of collections called TraversableLike (I will briefly discuss this in the next topic.) TraversableLike takes two type parameters: the type of the collection and the type contained in the collection. Therefore in this method definition, 'This' refers to the collection type (List for example) and A refers to contained type (perhaps Int). Finally K refers to the type returned by the function and are the keys of the groups formed by the method.

scala> val groups = (1 to 20).toList groupBy {
      case i if(i<5) => "g1"
      case i if(i<10) => "g2"
      case i if(i<15) => "g3"
      case _ => "g4"
      }

      res4: scala.collection.Map[java.lang.String,List[Int]] = Map(g1 -> List(1, 2, 3, 4), 
      g2 -> List(5, 6, 7, 8, 9), g3 -> List(10, 11, 12, 13, 14), g4 -> List(15, 16, 17, 18, 19, 20))

scala> val mods = (1 to 20).toList groupBy ( _ % 4 )

mods: scala.collection.Map[Int,List[Int]] = Map(1 -> List(1, 5, 9, 13, 17), 2 -> List(2, 6, 10, 14, 18), 
3 -> List(3, 7,11, 15, 19), 0 -> List(4, 8, 12, 16, 20))
目录
相关文章
|
C#
C#—Collection was modified;enumeration operation may not execute
错误 Collection was modified; enumeration operation may not execute翻译是 集合已修改;枚举操作可能无法执行。也就是说我们在遍历集合等可迭代元素时,进行了集合的修改导致的错误。本质上因为Collection返回的IEnumerator把当前的属性暴露为只读属性,所以对其的修改会导致运行时错误,只需要把foreach改为for来遍...
527 0
Stream方法使用-filter、sorted、distinct、limit
Stream方法使用-filter、sorted、distinct、limit
132 0
|
数据可视化 数据处理
解决 FAILED: UDFArgumentException explode() takes an array or a map as a parameter 并理解炸裂函数和侧视图
解决 FAILED: UDFArgumentException explode() takes an array or a map as a parameter 并理解炸裂函数和侧视图
107 0
|
SQL 分布式计算 Spark
SPARK Expand问题的解决(由count distinct、group sets、cube、rollup引起的)
SPARK Expand问题的解决(由count distinct、group sets、cube、rollup引起的)
756 0
SPARK Expand问题的解决(由count distinct、group sets、cube、rollup引起的)
解决Mapped Statements collection already contains value for experiment4.UserMapper.listUser错误~
解决Mapped Statements collection already contains value for experiment4.UserMapper.listUser错误~
107 0
|
NoSQL MongoDB 数据库
DeprecationWarning: count is deprecated. Use Collection.count_documents instead
当我使用pymongo查询出对应的cursor(find出的document的迭代器),然后查看查询出数据的数量时使用如下代码: ```python db = MongoClient(host='192.168.1.3', port=27017) # dbname为操作的数据库名称,collectionname为操作的集合名称
357 0
1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause
1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause
211 0
1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause
Pandas报错AttributeError: Cannot access callable attribute 'sort_values' of 'DataFrameGroupBy' objects
Pandas报错AttributeError: Cannot access callable attribute 'sort_values' of 'DataFrameGroupBy' objects
Neither Quantity object nor its magnitude supports indexing
Neither Quantity object nor its magnitude supports indexing
成功解决AttributeError: ‘DataFrame‘ object has no attribute ‘tolist‘
成功解决AttributeError: ‘DataFrame‘ object has no attribute ‘tolist‘