Groupby - collection processing

简介:

Groupby - collection processing

Iterator and Iterable have most of the most useful methods when dealing with collections. Fold, Map, Filter are probably the most common. But other very useful methods include grouped/groupBy, sliding, find, forall, foreach, and many more. I want to cover Iterable's groupBy method in this topic.

This is a Scala 2.8 and later method. It is similar to partition in that it allows the collection to be divided (or partitioned). Partition takes a method with returns a boolean and partitions the collection into two depending on a result. GroupBy takes a function that returns an object and returns a Map with the key being the return value. This allows an arbitrary number of partitions to be made from the collection.

Here is the method signature:

def groupBy[K](f : (A) => K) : Map[K, This]

A bit of context is require to understand the three Type parameters A, K and This. This method is defined in a super class of collections called TraversableLike (I will briefly discuss this in the next topic.) TraversableLike takes two type parameters: the type of the collection and the type contained in the collection. Therefore in this method definition, 'This' refers to the collection type (List for example) and A refers to contained type (perhaps Int). Finally K refers to the type returned by the function and are the keys of the groups formed by the method.

scala> val groups = (1 to 20).toList groupBy {
      case i if(i<5) => "g1"
      case i if(i<10) => "g2"
      case i if(i<15) => "g3"
      case _ => "g4"
      }

      res4: scala.collection.Map[java.lang.String,List[Int]] = Map(g1 -> List(1, 2, 3, 4), 
      g2 -> List(5, 6, 7, 8, 9), g3 -> List(10, 11, 12, 13, 14), g4 -> List(15, 16, 17, 18, 19, 20))

scala> val mods = (1 to 20).toList groupBy ( _ % 4 )

mods: scala.collection.Map[Int,List[Int]] = Map(1 -> List(1, 5, 9, 13, 17), 2 -> List(2, 6, 10, 14, 18), 
3 -> List(3, 7,11, 15, 19), 0 -> List(4, 8, 12, 16, 20))

==============================================================================
本文转自被遗忘的博客园博客,原文链接:http://www.cnblogs.com/rollenholt/p/4170820.html,如需转载请自行联系原作者

相关文章
|
数据可视化 数据处理
解决 FAILED: UDFArgumentException explode() takes an array or a map as a parameter 并理解炸裂函数和侧视图
解决 FAILED: UDFArgumentException explode() takes an array or a map as a parameter 并理解炸裂函数和侧视图
86 0
解决Mapped Statements collection already contains value for experiment4.UserMapper.listUser错误~
解决Mapped Statements collection already contains value for experiment4.UserMapper.listUser错误~
成功解决AttributeError: ‘Series‘ object has no attribute ‘columns‘
成功解决AttributeError: ‘Series‘ object has no attribute ‘columns‘
|
NoSQL MongoDB 数据库
DeprecationWarning: count is deprecated. Use Collection.count_documents instead
当我使用pymongo查询出对应的cursor(find出的document的迭代器),然后查看查询出数据的数量时使用如下代码: ```python db = MongoClient(host='192.168.1.3', port=27017) # dbname为操作的数据库名称,collectionname为操作的集合名称
346 0
1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause
1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause
202 0
1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause
Pandas报错AttributeError: Cannot access callable attribute 'sort_values' of 'DataFrameGroupBy' objects
Pandas报错AttributeError: Cannot access callable attribute 'sort_values' of 'DataFrameGroupBy' objects
Neither Quantity object nor its magnitude supports indexing
Neither Quantity object nor its magnitude supports indexing
成功解决AttributeError: ‘DataFrame‘ object has no attribute ‘tolist‘
成功解决AttributeError: ‘DataFrame‘ object has no attribute ‘tolist‘
成功解决KeyError: “Passing list-likes to .loc or [] with any missing labels is no longer supported. The
成功解决KeyError: “Passing list-likes to .loc or [] with any missing labels is no longer supported. The
成功解决AttributeError: ‘DataFrame‘ object has no attribute ‘ix‘
成功解决AttributeError: ‘DataFrame‘ object has no attribute ‘ix‘