2018spark技术问答集锦，希望能给喜欢spark的同学一些帮助

小编发现问答专区中有很多人在问关于spark的问题，小编把这些问题汇总一下，希望能给喜欢spark的大家一些启示和帮助

本帖不定期更新，喜欢的可以收藏哦

如何在Apache Beam中实现类似Spark的zipWithIndex？
https://yq.aliyun.com/ask/489799

Spark，Scala：如何从Rdd或dataframe中删除空行？
https://yq.aliyun.com/ask/489763

想了解Spark ShuffleMapTask计算的输出文件，是如何把大于内存的输入数据(HDFS数据源)进行合并相同key,并进行排序的
https://yq.aliyun.com/ask/479091

如何在Homebrew中找到Apache Spark包的安装目录？
https://yq.aliyun.com/ask/479064

动态查询准备和执行spark
https://yq.aliyun.com/ask/471278

spark sql是否区分大小写？
https://yq.aliyun.com/ask/471249

创建一个Spark udf函数来迭代一个字节数组并将其转换为数字
https://yq.aliyun.com/ask/471263

spark streaming的监控报警
https://yq.aliyun.com/ask/448677

spark读取parquet 找不到 org/apache/hadoop/fs/FSDataInputStream
https://yq.aliyun.com/ask/457733

spark streaming对接kafka，出现延迟，如何处理？
https://yq.aliyun.com/ask/450143

spark怎么分析hbase的数据？
https://yq.aliyun.com/ask/450092

从Redshift读入Spark Dataframe（Spark-Redshift模块）
https://yq.aliyun.com/ask/493215

解析Apache Spark Scala中的数据org.apache.spark.SparkException：尝试使用textinputformat.record.delimiter时出现任务无序列化错误
https://yq.aliyun.com/ask/493232

查询Yarn and Spark
https://yq.aliyun.com/ask/493218

Mongodb在Spark和大数据领域中的实际应用和整合
https://yq.aliyun.com/ask/447402

flink和spark的最大区别是什么来着？就是双重groupby报错的那段
https://yq.aliyun.com/ask/426774

Scala，Spark-shell，Groupby失败
https://yq.aliyun.com/ask/489760

如何从代码外部提供spark / scala中的模式
https://yq.aliyun.com/ask/489738

一般是使用spark的standalone集群还是spark on yarn的方式呢，哪种比较好
https://yq.aliyun.com/ask/484069

Apache Spark to_json选项参数
https://yq.aliyun.com/ask/479058

SELECT语句中的Spark IN / EXISTS谓词
https://yq.aliyun.com/ask/479081

如何计算数据框每行中缺失值的数量-spark scala？
https://yq.aliyun.com/ask/479094

如何将Spark Dataframe列的每个值作为字符串传递给python UDF？
https://yq.aliyun.com/ask/479097

Spark SVD不可重复
https://yq.aliyun.com/ask/472378

科普Spark，Spark是什么，如何使用Spark？
https://yq.aliyun.com/ask/124780

加载本地文件时spark_session和sqlContext之间的区别
https://yq.aliyun.com/ask/471248

Spark如何使用Akka实现进程，节点通信的简
https://yq.aliyun.com/ask/208464

spark集群搭建时报TimeoutException是怎么回事
https://yq.aliyun.com/ask/208474

spark怎么分析hbase的数据？
https://yq.aliyun.com/ask/438642

Spark - Python - 获取RDD上的年/月
https://yq.aliyun.com/ask/489798

Spark在创建数据集时无法反序列化记录
https://yq.aliyun.com/ask/487615

如何根据条件为日期列的列中值的出现来过滤spark数据帧？
https://yq.aliyun.com/ask/478013

Spark如何从一行中仅提取Json数据
https://yq.aliyun.com/ask/471259

如何使用PyCharm编写Spark程序
https://yq.aliyun.com/ask/208481

pyspark有条件地解析固定宽度的文本文件
https://yq.aliyun.com/ask/487564

强制maven使用本地依赖
https://yq.aliyun.com/ask/471258

Spark如何统计多个MySQL的数据？
https://yq.aliyun.com/ask/64832

flattern scala数组类型列到多列
https://yq.aliyun.com/ask/487597

SparkContext无法以master设置为“Yarn”开始
https://yq.aliyun.com/ask/487610

实时计算 Flink