本地windows跑Scala程序调用Spark

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS PostgreSQL,集群系列 2核4GB
简介: 应用场景 spark是用scala写的一种极其强悍的计算工具,spark内存计算,提供了图计算,流式计算,机器学习,即时查询等十分方便的工具,所以利用scala来进行spark编程是十分必要的,下面简单书写一个spark连接mysql读取信息的例子。

应用场景

spark是用scala写的一种极其强悍的计算工具,spark内存计算,提供了图计算,流式计算,机器学习,即时查询等十分方便的工具,所以利用scala来进行spark编程是十分必要的,下面简单书写一个spark连接mysql读取信息的例子。

操作流程

按照windows搭建Scala开发环境博文,搭建scala开发环境,实际已经将Spark环境部署完成了,所以直接可以用scala语言写一些spark相关的程序!

package epoint.com.cn.test001

import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

object SparkConnMysql {
 def main(args: Array[String]) {
 println("Hello, world!")
 val conf = new SparkConf()
 conf.setAppName("wow,my first spark app")
 conf.setMaster("local")
 val sc = new SparkContext(conf)
 val sqlContext = new SQLContext(sc)
 val url = "jdbc:mysql://192.168.114.67:3306/user"
 val table = "user"
 val reader = sqlContext.read.format("jdbc")
 reader.option("url", url)
 reader.option("dbtable", table)
 reader.option("driver", "com.mysql.jdbc.Driver")
 reader.option("user", "root")
 reader.option("password", "11111")
 val df = reader.load()
 df.show()
 }
}

运行结果:

Hello, world!
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/spark1.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/spark1.6/lib/spark-examples-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/kettle7.1/inceptor-driver.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/11/21 11:43:53 INFO SparkContext: Running Spark version 1.6.1
17/11/21 11:43:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/21 11:43:56 INFO SecurityManager: Changing view acls to: lenovo
17/11/21 11:43:56 INFO SecurityManager: Changing modify acls to: lenovo
17/11/21 11:43:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(lenovo); users with modify permissions: Set(lenovo)
17/11/21 11:43:59 INFO Utils: Successfully started service 'sparkDriver' on port 55824.
17/11/21 11:43:59 INFO Slf4jLogger: Slf4jLogger started
17/11/21 11:43:59 INFO Remoting: Starting remoting
17/11/21 11:43:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.114.67:55837]
17/11/21 11:43:59 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 55837.
17/11/21 11:43:59 INFO SparkEnv: Registering MapOutputTracker
17/11/21 11:43:59 INFO SparkEnv: Registering BlockManagerMaster
17/11/21 11:43:59 INFO DiskBlockManager: Created local directory at C:\Users\lenovo\AppData\Local\Temp\blockmgr-16383e3c-7cb6-43c7-b300-ccc1a1561bb4
17/11/21 11:43:59 INFO MemoryStore: MemoryStore started with capacity 1129.9 MB
17/11/21 11:44:00 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/21 11:44:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/21 11:44:00 INFO SparkUI: Started SparkUI at http://192.168.114.67:4040
17/11/21 11:44:00 INFO Executor: Starting executor ID driver on host localhost
17/11/21 11:44:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 55844.
17/11/21 11:44:00 INFO NettyBlockTransferService: Server created on 55844
17/11/21 11:44:00 INFO BlockManagerMaster: Trying to register BlockManager
17/11/21 11:44:00 INFO BlockManagerMasterEndpoint: Registering block manager localhost:55844 with 1129.9 MB RAM, BlockManagerId(driver, localhost, 55844)
17/11/21 11:44:00 INFO BlockManagerMaster: Registered BlockManager
17/11/21 11:44:05 INFO SparkContext: Starting job: show at SparkConnMysql.scala:25
17/11/21 11:44:05 INFO DAGScheduler: Got job 0 (show at SparkConnMysql.scala:25) with 1 output partitions
17/11/21 11:44:05 INFO DAGScheduler: Final stage: ResultStage 0 (show at SparkConnMysql.scala:25)
17/11/21 11:44:05 INFO DAGScheduler: Parents of final stage: List()
17/11/21 11:44:05 INFO DAGScheduler: Missing parents: List()
17/11/21 11:44:05 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at show at SparkConnMysql.scala:25), which has no missing parents
17/11/21 11:44:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 5.2 KB, free 5.2 KB)
17/11/21 11:44:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.5 KB, free 7.7 KB)
17/11/21 11:44:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:55844 (size: 2.5 KB, free: 1129.9 MB)
17/11/21 11:44:06 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/11/21 11:44:06 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at show at SparkConnMysql.scala:25)
17/11/21 11:44:06 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/11/21 11:44:06 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 1922 bytes)
17/11/21 11:44:06 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/11/21 11:44:06 INFO JDBCRDD: closed connection
17/11/21 11:44:06 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 3472 bytes result sent to driver
17/11/21 11:44:06 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 224 ms on localhost (1/1)
17/11/21 11:44:06 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/11/21 11:44:06 INFO DAGScheduler: ResultStage 0 (show at SparkConnMysql.scala:25) finished in 0.261 s
17/11/21 11:44:06 INFO DAGScheduler: Job 0 finished: show at SparkConnMysql.scala:25, took 1.467252 s
+---+----+----+------------+------------------+---------+-------+
| id|name| age|       phone|             email|startdate|enddate|
+---+----+----+------------+------------------+---------+-------+
| 11| 徐心三|  24|     2423424|    2423424@qq.com|     null|   null|
| 33| 徐心七|  23|    23232323|          13131@qe|     null|   null|
| 55|  徐彬|  22| 15262301036|徐彬757661238@ww.com|     null|   null|
| 44|  徐成|3333| 23423424332|    2342423@qq.com|     null|   null|
| 66| 徐心四|  23|242342342423|    徐彬23424@qq.com|     null|   null|
| 11| 徐心三|  24|     2423424|    2423424@qq.com|     null|   null|
| 33| 徐心七|  23|    23232323|          13131@qe|     null|   null|
| 55|  徐彬|  22| 15262301036|徐彬757661238@ww.com|     null|   null|
| 44|  徐成|3333| 23423424332|    2342423@qq.com|     null|   null|
| 66| 徐心四|  23|242342342423|    徐彬23424@qq.com|     null|   null|
| 88| 徐心八| 123|   131231312|       123123@qeqe|     null|   null|
| 99| 徐心二|  23|    13131313|   1313133@qeq.com|     null|   null|
|121| 徐心五|  13|   123131231|    1231312@qq.com|     null|   null|
|143| 徐心九|  23|      234234|        徐彬234@wrwr|     null|   null|
+---+----+----+------------+------------------+---------+-------+
only showing top 14 rows

17/11/21 11:44:06 INFO SparkContext: Invoking stop() from shutdown hook
17/11/21 11:44:06 INFO SparkUI: Stopped Spark web UI at http://192.168.114.67:4040
17/11/21 11:44:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/11/21 11:44:06 INFO MemoryStore: MemoryStore cleared
17/11/21 11:44:06 INFO BlockManager: BlockManager stopped
17/11/21 11:44:06 INFO BlockManagerMaster: BlockManagerMaster stopped
17/11/21 11:44:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/11/21 11:44:06 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/11/21 11:44:06 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/11/21 11:44:06 INFO SparkContext: Successfully stopped SparkContext
17/11/21 11:44:07 INFO ShutdownHookManager: Shutdown hook called
17/11/21 11:44:07 INFO ShutdownHookManager: Deleting directory C:\Users\lenovo\AppData\Local\Temp\spark-7877d903-f8f7-4efb-9e0c-7a11ac147153
相关实践学习
如何在云端创建MySQL数据库
开始实验后,系统会自动创建一台自建MySQL的 源数据库 ECS 实例和一台 目标数据库 RDS。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助     相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
1月前
|
分布式计算 大数据 Java
大数据-87 Spark 集群 案例学习 Spark Scala 案例 手写计算圆周率、计算共同好友
大数据-87 Spark 集群 案例学习 Spark Scala 案例 手写计算圆周率、计算共同好友
49 5
|
1月前
|
分布式计算 关系型数据库 MySQL
大数据-88 Spark 集群 案例学习 Spark Scala 案例 SuperWordCount 计算结果数据写入MySQL
大数据-88 Spark 集群 案例学习 Spark Scala 案例 SuperWordCount 计算结果数据写入MySQL
49 3
|
1月前
|
消息中间件 分布式计算 NoSQL
大数据-104 Spark Streaming Kafka Offset Scala实现Redis管理Offset并更新
大数据-104 Spark Streaming Kafka Offset Scala实现Redis管理Offset并更新
40 0
|
1月前
|
消息中间件 存储 分布式计算
大数据-103 Spark Streaming Kafka Offset管理详解 Scala自定义Offset
大数据-103 Spark Streaming Kafka Offset管理详解 Scala自定义Offset
83 0
|
1月前
|
分布式计算 大数据 Java
大数据-86 Spark 集群 WordCount 用 Scala & Java 调用Spark 编译并打包上传运行 梦开始的地方
大数据-86 Spark 集群 WordCount 用 Scala & Java 调用Spark 编译并打包上传运行 梦开始的地方
24 1
大数据-86 Spark 集群 WordCount 用 Scala & Java 调用Spark 编译并打包上传运行 梦开始的地方
|
2月前
|
消息中间件 分布式计算 Java
Linux环境下 java程序提交spark任务到Yarn报错
Linux环境下 java程序提交spark任务到Yarn报错
42 5
|
2月前
|
Windows Python
python获取windows机子上运行的程序名称
python获取windows机子上运行的程序名称
|
1月前
|
SQL 分布式计算 Java
大数据-96 Spark 集群 SparkSQL Scala编写SQL操作SparkSQL的数据源:JSON、CSV、JDBC、Hive
大数据-96 Spark 集群 SparkSQL Scala编写SQL操作SparkSQL的数据源:JSON、CSV、JDBC、Hive
35 0
|
1月前
|
SQL 分布式计算 大数据
大数据-91 Spark 集群 RDD 编程-高阶 RDD广播变量 RDD累加器 Spark程序优化
大数据-91 Spark 集群 RDD 编程-高阶 RDD广播变量 RDD累加器 Spark程序优化
38 0
|
1月前
|
缓存 分布式计算 大数据
大数据-90 Spark 集群 RDD 编程-高阶 RDD容错机制、RDD的分区、自定义分区器(Scala编写)、RDD创建方式(一)
大数据-90 Spark 集群 RDD 编程-高阶 RDD容错机制、RDD的分区、自定义分区器(Scala编写)、RDD创建方式(一)
45 0