IntelliJ-IDEA-Mavne-Scala-Spark开发环境搭建

简介: IntelliJ-IDEA-Mavne-Scala-Spark开发环境搭建

背景


  • 几乎所有编程语言的第一个程序都是 Hello World。

下载并安装JDK、Scala、Maven


  • 之前的Hadoop HA 和 Spark集群的文章中已经安装过JDK、Scala。Maven安装也很简单,略。

下载Idea并安装Scala插件


  • 在线安装有点慢,但网上很多方法解决,略。

创建一个maven-scala工程


按向导一步步填写、下一步。

修改 pom.xml文件中的版本号


  • 将scala.version修改成本机安装的Scala版本,并加入hadoop以及spark所需要的依赖,完整的内容如下:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.gemantic.bigdata</groupId>
  <artifactId>bigdata-spark</artifactId>
  <version>1.0-SNAPSHOT</version>
  <inceptionYear>2008</inceptionYear>
  <properties>
    <maven.compiler.source>1.7</maven.compiler.source>
    <maven.compiler.target>1.7</maven.compiler.target>
    <scala.version>2.11.4</scala.version>
    <spark.version>2.0.0</spark.version>
    <spark.artifact>2.11</spark.artifact>
    <hbase.version>1.2.2</hbase.version>
    <hadoop.version>2.6.0</hadoop.version>
    <dependency.scope>compile</dependency.scope>
  </properties>
  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
  </repositories>
  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </pluginRepository>
  </pluginRepositories>
  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.4</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.specs</groupId>
      <artifactId>specs</artifactId>
      <version>1.2.5</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-lang3</artifactId>
      <version>3.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_${spark.artifact}</artifactId>
      <version>${spark.version}</version>
      <scope>${dependency.scope}</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_${spark.artifact}</artifactId>
      <version>${spark.version}</version>
      <scope>${dependency.scope}</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_${spark.artifact}</artifactId>
      <version>${spark.version}</version>
      <scope>${dependency.scope}</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-mllib_${spark.artifact}</artifactId>
      <version>${spark.version}</version>
      <scope>${dependency.scope}</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
      <version>${spark.version}</version>
      <scope>${dependency.scope}</scope>
    </dependency>
  </dependencies>
  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
          <args>
            <arg>-target:jvm-${maven.compiler.target}</arg>
          </args>
        </configuration>
      </plugin>
      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <configuration>
          <descriptors>
            <descriptor>src/main/assembly/distribution.xml</descriptor>
          </descriptors>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-eclipse-plugin</artifactId>
        <configuration>
          <downloadSources>true</downloadSources>
          <buildcommands>
            <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
          </buildcommands>
          <additionalProjectnatures>
            <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
          </additionalProjectnatures>
          <classpathContainers>
            <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
            <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
          </classpathContainers>
        </configuration>
      </plugin>
    </plugins>
    <resources>
      <resource>
        <directory>src/main/resources</directory>
        <includes>
          <include>**/*</include>
        </includes>
      </resource>
    </resources>
  </build>
  <reporting>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
        </configuration>
      </plugin>
    </plugins>
  </reporting>
</project>

删除自动生成的代码,创建自己的 HelloWorld



  • 实现功能是:把 spark 目录下的 README.md文件中包含 Python的行,然后做 Word Count。最后将结果保存到HDFS上。

打包命令

mvn clean package

或者

  • 在 target 目录下生成 bigdata-spark-1.0-SNAPSHOT.jar

上传测试


  • 将上面的 bigdata-spark-1.0-SNAPSHOT.jar 上传到服务器,提交任务到集群,命令如下:
root@ubuntu238:/usr/local/spark-1.6.0-bin-hadoop2.6# ./bin/spark-submit --class com.gemantic.bigdata.WordCount --master yarn-cluster --executor-memory 512m /data/bigdata/spark/lib/bigdata-spark-1.0-SNAPSHOT.jar 10
  • 执行过程中的日志输出:
2. 17/12/28 11:42:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3. 17/12/28 11:42:08 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
4. 17/12/28 11:42:08 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
5. 17/12/28 11:42:08 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
6. 17/12/28 11:42:08 INFO yarn.Client: Setting up container launch context for our AM
7. 17/12/28 11:42:08 INFO yarn.Client: Setting up the launch environment for our AM container
8. 17/12/28 11:42:08 INFO yarn.Client: Preparing resources for our AM container
9. 17/12/28 11:42:09 INFO yarn.Client: Uploading resource file:/usr/local/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar -> hdfs://masters/user/root/.sparkStaging/application_1514254657629_0009/spark-assembly-1.6.0-hadoop2.6.0.jar
10. 17/12/28 11:42:12 INFO yarn.Client: Uploading resource file:/data/bigdata/spark/lib/bigdata-spark-1.0-SNAPSHOT.jar -> hdfs://masters/user/root/.sparkStaging/application_1514254657629_0009/bigdata-spark-1.0-SNAPSHOT.jar
11. 17/12/28 11:42:12 INFO yarn.Client: Uploading resource file:/tmp/spark-add007da-644d-47f5-99be-2ce1ddf89a4f/__spark_conf__5606044700861845297.zip -> hdfs://masters/user/root/.sparkStaging/application_1514254657629_0009/__spark_conf__5606044700861845297.zip
12. 17/12/28 11:42:12 INFO spark.SecurityManager: Changing view acls to: root
13. 17/12/28 11:42:12 INFO spark.SecurityManager: Changing modify acls to: root
14. 17/12/28 11:42:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15. 17/12/28 11:42:12 INFO yarn.Client: Submitting application 9 to ResourceManager
16. 17/12/28 11:42:12 INFO impl.YarnClientImpl: Submitted application application_1514254657629_0009
17. 17/12/28 11:42:13 INFO yarn.Client: Application report for application_1514254657629_0009 (state: ACCEPTED)
18. 17/12/28 11:42:13 INFO yarn.Client:
19.      client token: N/A
20.      diagnostics: N/A
21.      ApplicationMaster host: N/A
22.      ApplicationMaster RPC port: -1
23.      queue: default
24.      start time: 1514432532552
25.      final status: UNDEFINED
26.      tracking URL: http://master:8088/proxy/application_1514254657629_0009/
27.      user: root
28. 17/12/28 11:42:14 INFO yarn.Client: Application report for application_1514254657629_0009 (state: ACCEPTED)
29. 
30. ...
31. 
32. 
33. 17/12/28 11:42:22 INFO yarn.Client: Application report for application_1514254657629_0009 (state: RUNNING)
34. 17/12/28 11:42:22 INFO yarn.Client:
35.      client token: N/A
36.      diagnostics: N/A
37.      ApplicationMaster host: 192.168.111.239
38.      ApplicationMaster RPC port: 0
39.      queue: default
40.      start time: 1514432532552
41.      final status: UNDEFINED
42.      tracking URL: http://master:8088/proxy/application_1514254657629_0009/
43.      user: root
44. 17/12/28 11:42:23 INFO yarn.Client: Application report for application_1514254657629_0009 (state: RUNNING)
45. 
46. ...
47. 
48. 
49. 17/12/28 11:42:39 INFO yarn.Client: Application report for application_1514254657629_0009 (state: FINISHED)
50. 17/12/28 11:42:39 INFO yarn.Client:
51.      client token: N/A
52.      diagnostics: N/A
53.      ApplicationMaster host: 192.168.111.239
54.      ApplicationMaster RPC port: 0
55.      queue: default
56.      start time: 1514432532552
57.      final status: SUCCEEDED
58.      tracking URL: http://master:8088/proxy/application_1514254657629_0009/
59.      user: root
60. 17/12/28 11:42:39 INFO util.ShutdownHookManager: Shutdown hook called
61. 17/12/28 11:42:39 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-add007da-644d-47f5-99be-2ce1ddf89a4f
• 检查输出
62. root@ubuntu238:/usr/local/hadoop-2.6.1# ./bin/hdfs dfs -ls /user/root/outputFile
63. 17/12/28 13:09:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
64. Found 3 items
65. -rw-r--r--   3 root supergroup          0 2017-12-28 11:42 /user/root/outputFile/_SUCCESS
66. -rw-r--r--   3 root supergroup        144 2017-12-28 11:42 /user/root/outputFile/part-00000
67. -rw-r--r--   3 root supergroup        100 2017-12-28 11:42 /user/root/outputFile/part-00001
68. 
69. root@ubuntu238:/usr/local/hadoop-2.6.1# ./bin/hdfs dfs -text /user/root/outputFile/part-00000
70. 17/12/28 13:10:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
71. (Python,2)
72. (Interactive,1)
73. (R,,1)
74. (can,1)
75. (Java,,1)
76. (Shell,1)
77. (Alternatively,,1)
78. (shell:,1)
79. (Scala,,1)
80. (Python,,2)
81. (prefer,1)
82. (engine,1)
83. (##,1)
84. root@ubuntu238:/usr/local/hadoop-2.6.1# ./bin/hdfs dfs -text /user/root/outputFile/part-00001
85. 17/12/28 13:10:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
86. (you,2)
87. (if,1)
88. (APIs,1)
89. (that,1)
90. (high-level,1)
91. (optimized,1)
92. (in,1)
93. (an,1)
94. (and,2)
95. (use,1)
96. (the,1)
相关文章
|
6月前
|
Java Maven 开发者
入职必会-开发环境搭建14-IDEA配置Maven
在 IDEA 中配置 Maven 可以帮助开发者更方便地管理项目依赖、构建项目和部署应用程序。要在 IDEA 中配置 Maven,可以按照以下步骤进行。
102 1
入职必会-开发环境搭建14-IDEA配置Maven
|
6月前
|
应用服务中间件
入职必会-开发环境搭建23-IDEA配置Tomcat
IDEA配置Tomcat分为两部分: 1. IDEA集成本地Tomcat 2. IDEA中使用Tomcat部署Web项目 在配置IntelliJ IDEA中的Tomcat时,首先需要打开IDEA,选择菜单中的Run -> Edit Configurations,在左侧菜单中找到+并点击,然后选择Tomcat Server下的Local(注意不要选择错了,下方还有个TomEE Server,不是选这个)。接下来,输入一个自定义的名字作为Tomcat的配置名称,点击Configure...配置Tomcat的安装路径。这样IDEA就配置好了Tomcat。
|
6月前
|
Java 开发工具
入职必会-开发环境搭建05-IDEA使用
本文介绍了IDEA的核心概念,项目创建,模块创建,包的创建,类的创建,代码编写也运行。
入职必会-开发环境搭建05-IDEA使用
|
6月前
|
Shell iOS开发 MacOS
入职必会-开发环境搭建04-IDEA激活
本方法是市面上最简单、方便的JetBrains全家桶激活方法,包含IDEA、PyCharm、CLion、DataGrip、GoLand、PhpStorm、WebStorm均可激活并且支持最新的2023版本。
126 0
入职必会-开发环境搭建04-IDEA激活
|
6月前
|
Oracle 关系型数据库 MySQL
入职必会-开发环境搭建17-IDEA连接数据库
IntelliJ IDEA集成了众多插件,方便开发者使用,使用IDEA自带的Database模块就可以很方便的配置、连接数据库,在 IntelliJ IDEA 中连接数据库,可以按照以下步骤进行操作。
206 0
|
6月前
|
IDE Java 开发工具
入职必会-开发环境搭建03-IDEA下载和安装
IntelliJ IDEA(简称IDEA),由JetBrains开发,是一款专为Java、Kotlin、Groovy等语言设计的集成开发环境(IDE)。它具备智能代码编辑、高效调试器、版本控制集成、丰富的插件生态、内置工具与高度定制性等特点,广泛应用于企业级软件、Web应用和移动应用开发。 完成上述步骤,即可开启IDEA的高效开发之旅。
|
7月前
|
IDE Java 编译器
07. 【Java教程】Java 集成开发环境 - IntelliJ IDEA
07. 【Java教程】Java 集成开发环境 - IntelliJ IDEA
112 1
|
8月前
|
Go 开发者 开发工具
Intellij IDEA 配置 Go 语言开发环境
【4月更文挑战第14天】本篇文章 Huazie 向大家介绍使用 Intellij IDEA 搭建 Go 语言开发环境,并演示编译运行Go语言代码
574 1
Intellij IDEA 配置 Go 语言开发环境
|
8月前
|
IDE Go 开发工具
【GO基础】2. IDEA配置Go语言开发环境
【GO基础】2. IDEA配置Go语言开发环境
730 2
|
7月前
|
分布式计算 Hadoop Scala
搭建 Spark 的开发环境
搭建 Spark 的开发环境
54 0