开发者社区> 问答> 正文

再emapreduce中使用spark访问java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver

已解决

16/04/28 16:46:19 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, emr-worker-1.cluster-18938): java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver

at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:347)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

展开
收起
数屏 2016-04-28 18:29:43 9406 0
2 条回答
写回答
取消 提交回答
  • 专注在大数据分布式计算、数据库及存储领域,拥有13+年大数据引擎、数据仓库、宽表引擎、平台研发经验,6年云智能大数据产品技术一号位经验,10年技术团队管理经验;云智能技术架构/云布道师; 研发阿里历代的大数据技术产品包括ODPS、DLA、ADB,最近五年主导宽表引擎研发、DLA、ADB湖仓研发;
    采纳回答

    具体的执行命令为:

    /opt/apps/spark-1.6.1-bin-hadoop2.6/bin/spark-submit --master yarn --deploy-mode client --driver-memory 4g --num-executors 2 --executor-memory 2g --executor-cores 2 --jars mysql-connector-java-5.1.38-bin.jar  --class xx.xxx.test xx-1.0-SNAPSHOT-jar-with-dependencies.jar 
    

    这个问题是spark1.6.1的一个bug,在1.6.0下是可以访问的。
    具体的issue为:https://issues.apache.org/jira/browse/SPARK-14162
    类似的问题在stackoverflow也出现了:http://stackoverflow.com/questions/36326066/working-with-jdbc-jar-in-pyspark

    目前修复的办法
    1、使用1.6.0的版本
    2、等待新的版本修复
    3、按照下面这么写

    df = sqlContext.read.format("jdbc").options(url="jdbc:postgresql://ip_address:port/db_name?user=myuser&password=mypasswd", dbtable="table_name",driver="com.mysql.jdbc.Driver").load()
    df.count()
    2019-07-17 18:49:58
    赞同 2 展开评论 打赏
  • 亲测,mysql-connector 6.0.4不好使。。。换5.1.38好了 。建议大家不要用高版本的,用封神说的版本好使!

    2019-07-17 18:49:58
    赞同 1 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
Spring Cloud Alibaba - 重新定义 Java Cloud-Native 立即下载
The Reactive Cloud Native Arch 立即下载
JAVA开发手册1.5.0 立即下载

相关镜像