Windows上搭建Standalone模式的Spark环境

简介: 在Windows上搭建Standalone Spark环境

Java

安装Java8,设置JAVA_HOME,并添加 %JAVA\_HOME%\bin 到环境变量PATH中

E:\java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

Scala

下载解压Scala 2.11,设置SCALA_HOME,并添加 %SCALA\_HOME%\bin 到PATH中

E:\ scala -verion
Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL

Spark

下载解压Spark 2.1, 设置SPARK_HOME,并添加 %SPARK\_HOME%\bin 到PATH中,此时尝试在控制台运行spark-shell,出现如下错误提示无法定位winutils.exe

E:\>spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/06/05 21:34:43 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
        at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2327)
        at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:365)
        at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
        at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:991)
        at org.apache.spark.repl.Main$.createSparkSession(Main.scala:92)
        at $line3.$read$$iw$$iw.<init>(<console>:15)
        at $line3.$read$$iw.<init>(<console>:42)
        at $line3.$read.<init>(<console>:44)
        at $line3.$read$.<init>(<console>:48)
        at $line3.$read$.<clinit>(<console>)
        at $line3.$eval$.$print$lzycompute(<console>:7)
        at $line3.$eval$.$print(<console>:6)
        at $line3.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
        at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
        at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
        at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
        at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
        at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
        at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
        at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
        at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
        at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:105)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
        at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
        at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
        at org.apache.spark.repl.Main$.doMain(Main.scala:69)
        at org.apache.spark.repl.Main$.main(Main.scala:52)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

从错误消息中可以看出Spark需要用到Hadoop中的一些类库(通过HADOOP_HOME环境变量,因为我们之前并未设置过,所以文件路径null\bin\winutils.exe里面出现了null),但这并不意味这我们一定要安装Hadoop,我们可以直接下载所需要的winutils.exe到磁盘上的任何位置,比如C:\winutils\bin\winutils.exe,同时设置 HADOOP_HOME=C:\winutils

现在我们再次运行spark-shell,又有一个新的错误:

java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
  at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
  at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
  ... 47 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
  ... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
  at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
  at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
  at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
  at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
  at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
  at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
  ... 63 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
  ... 71 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
  at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
  ... 76 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
  ... 84 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
  at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
  at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
  ... 85 more
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql
              ^
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

错误消息中提示零时目录 /tmp/hive 没有写的权限:

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------

所以我们需要更新E:/tmp/hive的权限(我在E盘下运行的spark-shell命令,如果在其他盘运行,就改成对应的盘符+/tmp/hive)。运行如下命令:

E:\>C:\winutils\bin\winutils.exe chmod 777 E:\tmp\hive

再次运行spark-shell,spark启动成功。

相关文章
|
5月前
|
XML 存储 搜索推荐
Omnissa Dynamic Environment Manager 2503 - 个性化动态 Windows 桌面环境管理
Omnissa Dynamic Environment Manager 2503 - 个性化动态 Windows 桌面环境管理
90 7
Omnissa Dynamic Environment Manager 2503 - 个性化动态 Windows 桌面环境管理
|
5月前
|
Ubuntu 数据库 虚拟化
Windows 环境下 Odoo 安装保姆级教程
本教程详细介绍了在 Windows 系统上通过虚拟机部署 Odoo 的完整流程。首先确认硬件需求,确保 CPU、内存和磁盘空间满足最低配置;接着安装 VMware Workstation Pro 并创建 Ubuntu 虚拟机,配置桥接网络以实现主机与虚拟机的通信;随后借助微聚云快速安装预配置好的 Odoo 环境,简化复杂环境搭建;最后通过浏览器访问虚拟机 IP,完成 Odoo 数据库初始化及基础设置。整个过程清晰易懂,适合新手快速上手 Odoo 部署。
596 4
|
11月前
|
分布式计算 Kubernetes Hadoop
大数据-82 Spark 集群模式启动、集群架构、集群管理器 Spark的HelloWorld + Hadoop + HDFS
大数据-82 Spark 集群模式启动、集群架构、集群管理器 Spark的HelloWorld + Hadoop + HDFS
418 6
|
11月前
|
分布式计算 资源调度 Hadoop
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
213 2
|
11月前
|
SQL 机器学习/深度学习 分布式计算
大数据-81 Spark 安装配置环境 集群环境配置 超详细 三台云服务器
大数据-81 Spark 安装配置环境 集群环境配置 超详细 三台云服务器
503 1
|
6月前
|
存储 运维 监控
提升Windows Server环境安全性:ADAudit Plus的五大关键优势
在Windows Server环境中,内置的安全审计工具虽有用,但存在专业门槛高、耗时及功能缺失等问题。第三方工具ADAudit Plus应运而生,其五大优势包括:日志聚合、关键活动检测、定制化报告、灵活安全配置和长期日志保留,有效提升系统监控与合规能力。选择ADAudit Plus,助力企业更高效应对审计挑战,强化安全性。
141 2
|
9月前
|
弹性计算 开发框架 安全
基于云效 Windows 构建环境和 Nuget 制品仓库进行 .Net 应用开发
本文将基于云效 Flow 流水线 Windows 构建环境和云效 Packages Nuget 制品仓库手把手教你如何开发并部署一个 .NET 应用,从环境搭建到实战应用发布的详细教程,帮助你掌握 .NET 开发的核心技能。
|
10月前
|
Dart 搜索推荐 IDE
Windows下Zed编辑器配置Dart环境
本文介绍了Dart编程语言及其主要框架Flutter的优势,并推荐使用轻量级编辑器Zed进行Dart开发。详细步骤包括Dart环境的安装与配置,Zed编辑器的安装与个性化设置,以及如何在Zed中编写并运行Dart的HelloWorld程序。通过自定义任务实现Dart文件的快速运行,提高了开发效率。
|
10月前
|
分布式计算 资源调度 Hadoop
Spark Standalone与YARN的区别?
本文详细解析了 Apache Spark 的两种常见部署模式:Standalone 和 YARN。Standalone 模式自带轻量级集群管理服务,适合小规模集群;YARN 模式与 Hadoop 生态系统集成,适合大规模生产环境。文章通过示例代码展示了如何在两种模式下运行 Spark 应用程序,并总结了两者的优缺点,帮助读者根据需求选择合适的部署模式。
410 3
|
11月前
|
应用服务中间件 Shell PHP
windows系统配置nginx环境运行pbootcms访问首页直接404的问题
windows系统配置nginx环境运行pbootcms访问首页直接404的问题