def main(args:Array[String]){
Logger.getLogger("org").setLevel(Level.ERROR) val sc = new SparkContext("local[*]","WordCountRe") val input = sc.textFile("data/book.txt") //With regexp val words = input.flatMap(x=>x.split("\\W+")) //Lower case val lowerCaseWords = words.map(x => x.toLowerCase()) val wordCounts = lowerCaseWords.map(x => (x,1)).reduceByKey((x,y)=>x+y) val sortedWordCounts = wordCounts.sortBy(-_._2) val commonEnglishStopWords = List("you","to","your","the","a","of","and","that","it","in","is","for","on","are","if","s","i","with","t","this","or","but","they","will","what","at","my","re","do","not","about","more","an","up","need","them","from","how","there","out","new","work","so","just","don","","get","their","by","some","ll","self","make","may","even","when","one","than","also","much","job","who","was","these","find","into","only") val filteredWordCounts = sortedWordCounts.filter{ x => val inspectVariable = commonEnglishStopWords.contains(x._1)} //Error here filteredWordCounts.collect().foreach(println) } }
当我尝试使用此代码时,出现编译错误:
类型不匹配; 发现:需要的单位:布尔WordCountRe.scala / SparkScalaCourse / src / com / sundogsoftware / spark line 29 Scala问题
发现我的代码出了什么问题(._1为了解析元组中的单词(word,count)需要放入包含),但我仍然不知道在这种情况下如何调试/检查值。
问题是你将方法的布尔结果赋给contains了val inspectVariable。此操作的返回类型为Unit。但filter方法需要布尔值。
只需删除val inspectVariable =,这应该解决它。
或者inspectVariable在分配值后通过添加包含内容的新行来返回值。
如图所示
val filteredWordCounts = sortedWordCounts.filter { x =>
val inspectVariable = commonEnglishStopWords.contains(x._1)//put your breakpoint here
inspectVariable
}
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。