Groupby - collection processing

简介:

Iterator and Iterable have most of the most useful methods when dealing with collections. Fold, Map, Filter are probably the most common. But other very useful methods include grouped/groupBy, sliding, find, forall, foreach, and many more. I want to cover Iterable's groupBy method in this topic.

This is a Scala 2.8 and later method. It is similar to partition in that it allows the collection to be divided (or partitioned). Partition takes a method with returns a boolean and partitions the collection into two depending on a result. GroupBy takes a function that returns an object and returns a Map with the key being the return value. This allows an arbitrary number of partitions to be made from the collection.

Here is the method signature:

def groupBy[K](f : (A) => K) : Map[K, This]

A bit of context is require to understand the three Type parameters A, K and This. This method is defined in a super class of collections called TraversableLike (I will briefly discuss this in the next topic.) TraversableLike takes two type parameters: the type of the collection and the type contained in the collection. Therefore in this method definition, 'This' refers to the collection type (List for example) and A refers to contained type (perhaps Int). Finally K refers to the type returned by the function and are the keys of the groups formed by the method.

scala> val groups = (1 to 20).toList groupBy {
      case i if(i<5) => "g1"
      case i if(i<10) => "g2"
      case i if(i<15) => "g3"
      case _ => "g4"
      }

      res4: scala.collection.Map[java.lang.String,List[Int]] = Map(g1 -> List(1, 2, 3, 4), 
      g2 -> List(5, 6, 7, 8, 9), g3 -> List(10, 11, 12, 13, 14), g4 -> List(15, 16, 17, 18, 19, 20))

scala> val mods = (1 to 20).toList groupBy ( _ % 4 )

mods: scala.collection.Map[Int,List[Int]] = Map(1 -> List(1, 5, 9, 13, 17), 2 -> List(2, 6, 10, 14, 18), 
3 -> List(3, 7,11, 15, 19), 0 -> List(4, 8, 12, 16, 20))
目录
相关文章
Stream方法使用-filter、sorted、distinct、limit
Stream方法使用-filter、sorted、distinct、limit
114 0
|
数据可视化 数据处理
解决 FAILED: UDFArgumentException explode() takes an array or a map as a parameter 并理解炸裂函数和侧视图
解决 FAILED: UDFArgumentException explode() takes an array or a map as a parameter 并理解炸裂函数和侧视图
89 0
解决Mapped Statements collection already contains value for experiment4.UserMapper.listUser错误~
解决Mapped Statements collection already contains value for experiment4.UserMapper.listUser错误~
|
NoSQL MongoDB 数据库
DeprecationWarning: count is deprecated. Use Collection.count_documents instead
当我使用pymongo查询出对应的cursor(find出的document的迭代器),然后查看查询出数据的数量时使用如下代码: ```python db = MongoClient(host='192.168.1.3', port=27017) # dbname为操作的数据库名称,collectionname为操作的集合名称
347 0
Pandas报错AttributeError: Cannot access callable attribute 'sort_values' of 'DataFrameGroupBy' objects
Pandas报错AttributeError: Cannot access callable attribute 'sort_values' of 'DataFrameGroupBy' objects
Neither Quantity object nor its magnitude supports indexing
Neither Quantity object nor its magnitude supports indexing
成功解决AttributeError: ‘DataFrame‘ object has no attribute ‘tolist‘
成功解决AttributeError: ‘DataFrame‘ object has no attribute ‘tolist‘
Data Structures and Algorithms (English) - 6-10 Sort Three Distinct Keys(30 分)
Data Structures and Algorithms (English) - 6-10 Sort Three Distinct Keys(30 分)
104 0
|
C++
Data Structures and Algorithms (English) - 6-9 Sort Three Distinct Keys(20 分)
Data Structures and Algorithms (English) - 6-9 Sort Three Distinct Keys(20 分)
113 0
|
SQL 关系型数据库 MySQL
Accelerating Queries with Group-By and Join By Groupjoin
这篇paper介绍了HyPer中引入的groupjoin算子,针对 join + group by这种query,可以在某些前提条件下,在join的过程中同时完成grouping+agg的计算。 比如用hash table来实现hash join和group by,就可以避免再创建一个hash table,尤其当join的数据量很大,产生的group结果又较少时,可以很好的提升执行效率。
350 0
Accelerating Queries with Group-By and Join By Groupjoin