开发者社区> 问答> 正文

再emapreduce中使用spark访问java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver

数屏 2016-04-28 18:29:43 5475

16/04/28 16:46:19 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, emr-worker-1.cluster-18938): java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver

at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:347)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
分布式计算 Spark
分享到
取消 提交回答
全部回答(2)
  • 封神
    2019-07-17 18:49:58
    已采纳

    具体的执行命令为:

    /opt/apps/spark-1.6.1-bin-hadoop2.6/bin/spark-submit --master yarn --deploy-mode client --driver-memory 4g --num-executors 2 --executor-memory 2g --executor-cores 2 --jars mysql-connector-java-5.1.38-bin.jar  --class xx.xxx.test xx-1.0-SNAPSHOT-jar-with-dependencies.jar 
    

    这个问题是spark1.6.1的一个bug,在1.6.0下是可以访问的。
    具体的issue为:https://issues.apache.org/jira/browse/SPARK-14162
    类似的问题在stackoverflow也出现了:http://stackoverflow.com/questions/36326066/working-with-jdbc-jar-in-pyspark

    目前修复的办法
    1、使用1.6.0的版本
    2、等待新的版本修复
    3、按照下面这么写

    df = sqlContext.read.format("jdbc").options(url="jdbc:postgresql://ip_address:port/db_name?user=myuser&password=mypasswd", dbtable="table_name",driver="com.mysql.jdbc.Driver").load()
    df.count()
    2 0
  • nox1234
    2019-07-17 18:49:58

    亲测,mysql-connector 6.0.4不好使。。。换5.1.38好了 。建议大家不要用高版本的,用封神说的版本好使!

    1 0
添加回答
+ 订阅

大数据计算实践乐园,近距离学习前沿技术

推荐文章
相似问题
推荐课程