环境
centos7
spark3.3
hive3.3
hadoop3.2
创建pyspark作业
作业用于访问hive元数据的案例
frompyspark.sqlimportSparkSessionfrompyspark.sql.functionsimportudf, colfrompyspark.sql.typesimportStringTypedefconvertCase(str): resStr=""arr=str.split(" ") forxinarr: resStr=resStr+x[0:1].upper() +x[1:len(x)] +" "returnresStrdefmain(): # hive.metastore.uris: 访问hive metastore 服务的地址spark=SparkSession.builder \ .appName('SparkByTry') \ .config("hive.metastore.uris", "thrift://192.168.10.100:9083") \ .enableHiveSupport() \ .getOrCreate() spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", 'true') df=spark.createDataFrame([("Scala", 25000), ("Spark", 35000), ("PHP", 21000)]) df.show() # Spark SQLdf.createOrReplaceTempView("sample_table") df2=spark.sql("SELECT _1,_2 FROM sample_table") df2.show() # Create Hive table & query it.spark.table("sample_table").write.saveAsTable("sample_hive_table") df3=spark.sql("SELECT _1,_2 FROM sample_hive_table") df3.show() dbs=spark.catalog.listDatabases() print(dbs) if__name__=='__main__': main() # 1. 把宿主机上的.py文件, 提交到虚拟机hadoop集群中# scp try_pyspark.py root@192.168.10.100:/opt/spark-3.3.0-bin-hadoop3/app/# 2. 虚拟机hadoop集群中,提交作业到yarn上# spark-submit --master yarn --deploy-mode client --executor-cores 1 try_pyspark.py
提交作业时,产生问题集合
问题1
[hadoop@devapp]$spark-submit--masteryarn--deploy-modecluster--executor-cores1try_pyspark.py22/07/2502:08:59WARNUtils: Yourhostname, devresolvestoaloopbackaddress: 127.0.0.1; using192.168.10.100instead (oninterfaceens33) 22/07/2502:08:59WARNUtils: SetSPARK_LOCAL_IPifyouneedtobindtoanotheraddress22/07/2502:08:59WARNNativeCodeLoader: Unabletoloadnative-hadooplibraryforyourplatform... usingbuiltin-javaclasseswhereapplicable22/07/2502:08:59INFODefaultNoHARMFailoverProxyProvider: ConnectingtoResourceManagerat/0.0.0.0:803222/07/2502:09:00INFOConfiguration: resource-types.xmlnotfound22/07/2502:09:00INFOResourceUtils: Unabletofind'resource-types.xml'. 22/07/2502:09:00INFOClient: Verifyingourapplicationhasnotrequestedmorethanthemaximummemorycapabilityofthecluster (8192MBpercontainer) 22/07/2502:09:00INFOClient: WillallocateAMcontainer, with1408MBmemoryincluding384MBoverhead22/07/2502:09:00INFOClient: SettingupcontainerlaunchcontextforourAM22/07/2502:09:00INFOClient: SettingupthelaunchenvironmentforourAMcontainer22/07/2502:09:00INFOClient: PreparingresourcesforourAMcontainer22/07/2502:09:00WARNClient: Neitherspark.yarn.jarsnorspark.yarn.archiveisset, fallingbacktouploadinglibrariesunderSPARK_HOME. 22/07/2502:09:02INFOClient: Uploadingresourcefile:/tmp/spark-dcc409bb-6353-4726-b1dd-c9b4950f0c26/__spark_libs__7142235299947056098.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658714171526_0005/__spark_libs__7142235299947056098.zip22/07/2502:09:04INFOClient: Uploadingresourcefile:/opt/spark-3.3.0-bin-hadoop3/app/try_pyspark.py->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658714171526_0005/try_pyspark.py22/07/2502:09:04INFOClient: Uploadingresourcefile:/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658714171526_0005/pyspark.zip22/07/2502:09:04INFOClient: Uploadingresourcefile:/opt/spark-3.3.0-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658714171526_0005/py4j-0.10.9.5-src.zip22/07/2502:09:05INFOClient: Uploadingresourcefile:/tmp/spark-dcc409bb-6353-4726-b1dd-c9b4950f0c26/__spark_conf__6560447230202963905.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658714171526_0005/__spark_conf__.zip22/07/2502:09:05INFOSecurityManager: Changingviewaclsto: hadoop22/07/2502:09:05INFOSecurityManager: Changingmodifyaclsto: hadoop22/07/2502:09:05INFOSecurityManager: Changingviewaclsgroupsto: 22/07/2502:09:05INFOSecurityManager: Changingmodifyaclsgroupsto: 22/07/2502:09:05INFOSecurityManager: SecurityManager: authenticationdisabled; uiaclsdisabled; userswithviewpermissions: Set(hadoop); groupswithviewpermissions: Set(); userswithmodifypermissions: Set(hadoop); groupswithmodifypermissions: Set() 22/07/2502:09:05INFOClient: Submittingapplicationapplication_1658714171526_0005toResourceManager22/07/2502:09:05INFOYarnClientImpl: Submittedapplicationapplication_1658714171526_000522/07/2502:09:06INFOClient: Applicationreportforapplication_1658714171526_0005 (state: ACCEPTED) 22/07/2502:09:06INFOClient: clienttoken: N/Adiagnostics: AMcontainerislaunched, waitingforAMcontainertoRegisterwithRMApplicationMasterhost: N/AApplicationMasterRPCport: -1queue: defaultstarttime: 1658729345928finalstatus: UNDEFINEDtrackingURL: http://localhost:8088/proxy/application_1658714171526_0005/user: hadoop22/07/2502:09:08INFOClient: Applicationreportforapplication_1658714171526_0005 (state: ACCEPTED) 22/07/2502:09:09INFOClient: Applicationreportforapplication_1658714171526_0005 (state: ACCEPTED) 22/07/2502:09:10INFOClient: Applicationreportforapplication_1658714171526_0005 (state: ACCEPTED) 22/07/2502:09:11INFOClient: Applicationreportforapplication_1658714171526_0005 (state: FAILED) 22/07/2502:09:11INFOClient: clienttoken: N/Adiagnostics: Applicationapplication_1658714171526_0005failed2timesduetoAMContainerforappattempt_1658714171526_0005_000002exitedwithexitCode: 127Failingthisattempt.Diagnostics: [2022-07-2502:09:10.245]Exceptionfromcontainer-launch. Containerid: container_1658714171526_0005_02_000001Exitcode: 127[2022-07-2502:09:10.247]Containerexitedwithanon-zeroexitcode127.Errorfile: prelaunch.err. Last4096bytesofprelaunch.err : Last4096bytesofstderr : /bin/bash: /bin/java: Nosuchfileordirectory[2022-07-2502:09:10.247]Containerexitedwithanon-zeroexitcode127.Errorfile: prelaunch.err. Last4096bytesofprelaunch.err : Last4096bytesofstderr : /bin/bash: /bin/java: NosuchfileordirectoryFormoredetailedoutput, checktheapplicationtrackingpage: http://localhost:8088/cluster/app/application_1658714171526_0005Thenclickonlinkstologsofeachattempt. . Failingtheapplication. ApplicationMasterhost: N/AApplicationMasterRPCport: -1queue: defaultstarttime: 1658729345928finalstatus: FAILEDtrackingURL: http://localhost:8088/cluster/app/application_1658714171526_0005user: hadoop22/07/2502:09:11INFOClient: Deletedstagingdirectoryhdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658714171526_000522/07/2502:09:11ERRORClient: Applicationdiagnosticsmessage: Applicationapplication_1658714171526_0005failed2timesduetoAMContainerforappattempt_1658714171526_0005_000002exitedwithexitCode: 127Failingthisattempt.Diagnostics: [2022-07-2502:09:10.245]Exceptionfromcontainer-launch. Containerid: container_1658714171526_0005_02_000001Exitcode: 127[2022-07-2502:09:10.247]Containerexitedwithanon-zeroexitcode127.Errorfile: prelaunch.err. Last4096bytesofprelaunch.err : Last4096bytesofstderr : /bin/bash: /bin/java: Nosuchfileordirectory[2022-07-2502:09:10.247]Containerexitedwithanon-zeroexitcode127.Errorfile: prelaunch.err. Last4096bytesofprelaunch.err : Last4096bytesofstderr : /bin/bash: /bin/java: NosuchfileordirectoryFormoredetailedoutput, checktheapplicationtrackingpage: http://localhost:8088/cluster/app/application_1658714171526_0005Thenclickonlinkstologsofeachattempt. . Failingtheapplication. Exceptioninthread"main"org.apache.spark.SparkException: Applicationapplication_1658714171526_0005finishedwithfailedstatusatorg.apache.spark.deploy.yarn.Client.run(Client.scala:1342) atorg.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764) atorg.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) atorg.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) atorg.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) atorg.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) atorg.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) atorg.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) atorg.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 22/07/2502:09:11INFOShutdownHookManager: Shutdownhookcalled22/07/2502:09:11INFOShutdownHookManager: Deletingdirectory/tmp/spark-dcc409bb-6353-4726-b1dd-c9b4950f0c2622/07/2502:09:11INFOShutdownHookManager: Deletingdirectory/tmp/spark-575f8a55-72f6-471d-8bb6-9fdb6af8dce5
问题原因:
没有在etc/hadoop/yarn-env.sh中配置java的环境变量的路径
解决:
编辑yarn-env.sh文件,在文件内添加如下JAVA_HOME信息即可。
问题2
[hadoop@devapp]$spark-submit--masteryarn--deploy-modecluster--executor-cores1try_pyspark.py22/07/2502:34:13WARNUtils: Yourhostname, devresolvestoaloopbackaddress: 127.0.0.1; using192.168.10.100instead (oninterfaceens33) 22/07/2502:34:13WARNUtils: SetSPARK_LOCAL_IPifyouneedtobindtoanotheraddress22/07/2502:34:13WARNNativeCodeLoader: Unabletoloadnative-hadooplibraryforyourplatform... usingbuiltin-javaclasseswhereapplicable22/07/2502:34:13INFODefaultNoHARMFailoverProxyProvider: ConnectingtoResourceManagerat/0.0.0.0:803222/07/2502:34:14INFOConfiguration: resource-types.xmlnotfound22/07/2502:34:14INFOResourceUtils: Unabletofind'resource-types.xml'. 22/07/2502:34:14INFOClient: Verifyingourapplicationhasnotrequestedmorethanthemaximummemorycapabilityofthecluster (8192MBpercontainer) 22/07/2502:34:14INFOClient: WillallocateAMcontainer, with1408MBmemoryincluding384MBoverhead22/07/2502:34:14INFOClient: SettingupcontainerlaunchcontextforourAM22/07/2502:34:14INFOClient: SettingupthelaunchenvironmentforourAMcontainer22/07/2502:34:14INFOClient: PreparingresourcesforourAMcontainer22/07/2502:34:14WARNClient: Neitherspark.yarn.jarsnorspark.yarn.archiveisset, fallingbacktouploadinglibrariesunderSPARK_HOME. 22/07/2502:34:16INFOClient: Uploadingresourcefile:/tmp/spark-1418dbb5-6bef-4ee2-a299-2540be42102b/__spark_libs__4897785966014047449.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658730416274_0003/__spark_libs__4897785966014047449.zip22/07/2502:34:17INFOClient: Uploadingresourcefile:/opt/spark-3.3.0-bin-hadoop3/app/try_pyspark.py->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658730416274_0003/try_pyspark.py22/07/2502:34:17INFOClient: Uploadingresourcefile:/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658730416274_0003/pyspark.zip22/07/2502:34:17INFOClient: Uploadingresourcefile:/opt/spark-3.3.0-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658730416274_0003/py4j-0.10.9.5-src.zip22/07/2502:34:18INFOClient: Uploadingresourcefile:/tmp/spark-1418dbb5-6bef-4ee2-a299-2540be42102b/__spark_conf__4669748042767956956.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658730416274_0003/__spark_conf__.zip22/07/2502:34:18INFOSecurityManager: Changingviewaclsto: hadoop22/07/2502:34:18INFOSecurityManager: Changingmodifyaclsto: hadoop22/07/2502:34:18INFOSecurityManager: Changingviewaclsgroupsto: 22/07/2502:34:18INFOSecurityManager: Changingmodifyaclsgroupsto: 22/07/2502:34:18INFOSecurityManager: SecurityManager: authenticationdisabled; uiaclsdisabled; userswithviewpermissions: Set(hadoop); groupswithviewpermissions: Set(); userswithmodifypermissions: Set(hadoop); groupswithmodifypermissions: Set() 22/07/2502:34:18INFOClient: Submittingapplicationapplication_1658730416274_0003toResourceManager22/07/2502:34:18INFOYarnClientImpl: Submittedapplicationapplication_1658730416274_000322/07/2502:34:19INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:19INFOClient: clienttoken: N/Adiagnostics: AMcontainerislaunched, waitingforAMcontainertoRegisterwithRMApplicationMasterhost: N/AApplicationMasterRPCport: -1queue: defaultstarttime: 1658730858710finalstatus: UNDEFINEDtrackingURL: http://localhost:8088/proxy/application_1658730416274_0003/user: hadoop22/07/2502:34:20INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:21INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:22INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:23INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:24INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:25INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:26INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:27INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:28INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:29INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:30INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:31INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:32INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:33INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:34INFOClient: Applicationreportforapplication_1658730416274_0003 (state: ACCEPTED) 22/07/2502:34:35INFOClient: Applicationreportforapplication_1658730416274_0003 (state: FAILED) 22/07/2502:34:35INFOClient: clienttoken: N/Adiagnostics: Applicationapplication_1658730416274_0003failed2timesduetoAMContainerforappattempt_1658730416274_0003_000002exitedwithexitCode: 13Failingthisattempt.Diagnostics: [2022-07-2502:34:35.030]Exceptionfromcontainer-launch. Containerid: container_1658730416274_0003_02_000001Exitcode: 13[2022-07-2502:34:35.032]Containerexitedwithanon-zeroexitcode13.Errorfile: prelaunch.err. Last4096bytesofprelaunch.err : Last4096bytesofstderr : UsingSpark's default log4j profile: org/apache/spark/log4j2-defaults.properties[2022-07-2502:34:35.034]Containerexitedwithanon-zeroexitcode13.Errorfile: prelaunch.err. Last4096bytesofprelaunch.err : Last4096bytesofstderr : UsingSpark's default log4j profile: org/apache/spark/log4j2-defaults.propertiesFormoredetailedoutput, checktheapplicationtrackingpage: http://localhost:8088/cluster/app/application_1658730416274_0003Thenclickonlinkstologsofeachattempt. . Failingtheapplication. ApplicationMasterhost: N/AApplicationMasterRPCport: -1queue: defaultstarttime: 1658730858710finalstatus: FAILEDtrackingURL: http://localhost:8088/cluster/app/application_1658730416274_0003user: hadoop22/07/2502:34:35ERRORClient: Applicationdiagnosticsmessage: Applicationapplication_1658730416274_0003failed2timesduetoAMContainerforappattempt_1658730416274_0003_000002exitedwithexitCode: 13Failingthisattempt.Diagnostics: [2022-07-2502:34:35.030]Exceptionfromcontainer-launch. Containerid: container_1658730416274_0003_02_000001Exitcode: 13[2022-07-2502:34:35.032]Containerexitedwithanon-zeroexitcode13.Errorfile: prelaunch.err. Last4096bytesofprelaunch.err : Last4096bytesofstderr : UsingSpark's default log4j profile: org/apache/spark/log4j2-defaults.properties[2022-07-2502:34:35.034]Containerexitedwithanon-zeroexitcode13.Errorfile: prelaunch.err. Last4096bytesofprelaunch.err : Last4096bytesofstderr : UsingSpark's default log4j profile: org/apache/spark/log4j2-defaults.propertiesFormoredetailedoutput, checktheapplicationtrackingpage: http://localhost:8088/cluster/app/application_1658730416274_0003Thenclickonlinkstologsofeachattempt. . Failingtheapplication. Exceptioninthread"main"org.apache.spark.SparkException: Applicationapplication_1658730416274_0003finishedwithfailedstatusatorg.apache.spark.deploy.yarn.Client.run(Client.scala:1342) atorg.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764) atorg.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) atorg.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) atorg.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) atorg.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) atorg.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) atorg.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) atorg.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 22/07/2502:34:35INFOShutdownHookManager: Shutdownhookcalled22/07/2502:34:35INFOShutdownHookManager: Deletingdirectory/tmp/spark-cd1e6fe8-443d-4660-8a60-eaaf5b2f828422/07/2502:34:35INFOShutdownHookManager: Deletingdirectory/tmp/spark-1418dbb5-6bef-4ee2-a299-2540be42102b
主要问题
Application application_1658730416274_0003 failed 2 times due to AM Container for appattempt_1658730416274_0003_000002 exited with exitCode: 13
解决方法
把
spark-submit --master yarn --deploy-mode cluster --executor-cores 1 try_pyspark.py
修改为:
spark-submit --master yarn --deploy-mode client --executor-cores 1 try_pyspark.py
问题3
[hadoop@devapp]$spark-submit--masteryarn--deploy-modeclient--executor-cores1try_pyspark.py22/07/2502:34:53WARNUtils: Yourhostname, devresolvestoaloopbackaddress: 127.0.0.1; using192.168.10.100instead (oninterfaceens33) 22/07/2502:34:53WARNUtils: SetSPARK_LOCAL_IPifyouneedtobindtoanotheraddress22/07/2502:34:54INFOSparkContext: RunningSparkversion3.3.022/07/2502:34:54WARNNativeCodeLoader: Unabletoloadnative-hadooplibraryforyourplatform... usingbuiltin-javaclasseswhereapplicable22/07/2502:34:54INFOResourceUtils: ==============================================================22/07/2502:34:54INFOResourceUtils: Nocustomresourcesconfiguredforspark.driver. 22/07/2502:34:54INFOResourceUtils: ==============================================================22/07/2502:34:54INFOSparkContext: Submittedapplication: SparkByTry22/07/2502:34:54INFOResourceProfile: DefaultResourceProfilecreated, executorresources: Map(cores->name: cores, amount: 1, script: , vendor: , memory->name: memory, amount: 1024, script: , vendor: , offHeap->name: offHeap, amount: 0, script: , vendor: ), taskresources: Map(cpus->name: cpus, amount: 1.0) 22/07/2502:34:54INFOResourceProfile: Limitingresourceiscpusat1tasksperexecutor22/07/2502:34:54INFOResourceProfileManager: AddedResourceProfileid: 022/07/2502:34:54INFOSecurityManager: Changingviewaclsto: hadoop22/07/2502:34:54INFOSecurityManager: Changingmodifyaclsto: hadoop22/07/2502:34:54INFOSecurityManager: Changingviewaclsgroupsto: 22/07/2502:34:54INFOSecurityManager: Changingmodifyaclsgroupsto: 22/07/2502:34:54INFOSecurityManager: SecurityManager: authenticationdisabled; uiaclsdisabled; userswithviewpermissions: Set(hadoop); groupswithviewpermissions: Set(); userswithmodifypermissions: Set(hadoop); groupswithmodifypermissions: Set() 22/07/2502:34:54INFOUtils: Successfullystartedservice'sparkDriver'onport38198.22/07/2502:34:54INFOSparkEnv: RegisteringMapOutputTracker22/07/2502:34:54INFOSparkEnv: RegisteringBlockManagerMaster22/07/2502:34:54INFOBlockManagerMasterEndpoint: Usingorg.apache.spark.storage.DefaultTopologyMapperforgettingtopologyinformation22/07/2502:34:54INFOBlockManagerMasterEndpoint: BlockManagerMasterEndpointup22/07/2502:34:54INFOSparkEnv: RegisteringBlockManagerMasterHeartbeat22/07/2502:34:54INFODiskBlockManager: Createdlocaldirectoryat/tmp/blockmgr-4f892dc2-c4f8-4a1c-8a37-61193b7e411322/07/2502:34:54INFOMemoryStore: MemoryStorestartedwithcapacity366.3MiB22/07/2502:34:55INFOSparkEnv: RegisteringOutputCommitCoordinator22/07/2502:34:55INFOUtils: Successfullystartedservice'SparkUI'onport4040.22/07/2502:34:55INFODefaultNoHARMFailoverProxyProvider: ConnectingtoResourceManagerat/0.0.0.0:803222/07/2502:34:56INFOConfiguration: resource-types.xmlnotfound22/07/2502:34:56INFOResourceUtils: Unabletofind'resource-types.xml'. 22/07/2502:34:56INFOClient: Verifyingourapplicationhasnotrequestedmorethanthemaximummemorycapabilityofthecluster (8192MBpercontainer) 22/07/2502:34:56INFOClient: WillallocateAMcontainer, with896MBmemoryincluding384MBoverhead22/07/2502:34:56INFOClient: SettingupcontainerlaunchcontextforourAM22/07/2502:34:56INFOClient: SettingupthelaunchenvironmentforourAMcontainer22/07/2502:34:56INFOClient: PreparingresourcesforourAMcontainer22/07/2502:34:56WARNClient: Neitherspark.yarn.jarsnorspark.yarn.archiveisset, fallingbacktouploadinglibrariesunderSPARK_HOME. 22/07/2502:34:57INFOClient: Uploadingresourcefile:/tmp/spark-74750464-cbd4-48e9-97b0-8905a14121cf/__spark_libs__4495574967982100833.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658730416274_0004/__spark_libs__4495574967982100833.zip22/07/2502:34:59INFOClient: Uploadingresourcefile:/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658730416274_0004/pyspark.zip22/07/2502:34:59INFOClient: Uploadingresourcefile:/opt/spark-3.3.0-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658730416274_0004/py4j-0.10.9.5-src.zip22/07/2502:34:59INFOClient: Uploadingresourcefile:/tmp/spark-74750464-cbd4-48e9-97b0-8905a14121cf/__spark_conf__5291090342782308591.zip->hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1658730416274_0004/__spark_conf__.zip22/07/2502:35:00INFOSecurityManager: Changingviewaclsto: hadoop22/07/2502:35:00INFOSecurityManager: Changingmodifyaclsto: hadoop22/07/2502:35:00INFOSecurityManager: Changingviewaclsgroupsto: 22/07/2502:35:00INFOSecurityManager: Changingmodifyaclsgroupsto: 22/07/2502:35:00INFOSecurityManager: SecurityManager: authenticationdisabled; uiaclsdisabled; userswithviewpermissions: Set(hadoop); groupswithviewpermissions: Set(); userswithmodifypermissions: Set(hadoop); groupswithmodifypermissions: Set() 22/07/2502:35:00INFOClient: Submittingapplicationapplication_1658730416274_0004toResourceManager22/07/2502:35:00INFOYarnClientImpl: Submittedapplicationapplication_1658730416274_000422/07/2502:35:01INFOClient: Applicationreportforapplication_1658730416274_0004 (state: ACCEPTED) 22/07/2502:35:01INFOClient: clienttoken: N/Adiagnostics: AMcontainerislaunched, waitingforAMcontainertoRegisterwithRMApplicationMasterhost: N/AApplicationMasterRPCport: -1queue: defaultstarttime: 1658730900453finalstatus: UNDEFINEDtrackingURL: http://localhost:8088/proxy/application_1658730416274_0004/user: hadoop22/07/2502:35:02INFOClient: Applicationreportforapplication_1658730416274_0004 (state: ACCEPTED) 22/07/2502:35:03INFOClient: Applicationreportforapplication_1658730416274_0004 (state: ACCEPTED) 22/07/2502:35:04INFOClient: Applicationreportforapplication_1658730416274_0004 (state: ACCEPTED) 22/07/2502:35:05INFOClient: Applicationreportforapplication_1658730416274_0004 (state: ACCEPTED) 22/07/2502:35:06INFOClient: Applicationreportforapplication_1658730416274_0004 (state: ACCEPTED) 22/07/2502:35:07INFOClient: Applicationreportforapplication_1658730416274_0004 (state: ACCEPTED) 22/07/2502:35:07INFOYarnClientSchedulerBackend: AddWebUIFilter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS->localhost, PROXY_URI_BASES->http://localhost:8088/proxy/application_1658730416274_0004), /proxy/application_1658730416274_000422/07/2502:35:08INFOClient: Applicationreportforapplication_1658730416274_0004 (state: RUNNING) 22/07/2502:35:08INFOClient: clienttoken: N/Adiagnostics: N/AApplicationMasterhost: 192.168.10.100ApplicationMasterRPCport: -1queue: defaultstarttime: 1658730900453finalstatus: UNDEFINEDtrackingURL: http://localhost:8088/proxy/application_1658730416274_0004/user: hadoop22/07/2502:35:08INFOYarnClientSchedulerBackend: Applicationapplication_1658730416274_0004hasstartedrunning. 22/07/2502:35:08INFOUtils: Successfullystartedservice'org.apache.spark.network.netty.NettyBlockTransferService'onport34986.22/07/2502:35:08INFONettyBlockTransferService: Servercreatedondev:3498622/07/2502:35:08INFOBlockManager: Usingorg.apache.spark.storage.RandomBlockReplicationPolicyforblockreplicationpolicy22/07/2502:35:08INFOBlockManagerMaster: RegisteringBlockManagerBlockManagerId(driver, dev, 34986, None) 22/07/2502:35:08INFOBlockManagerMasterEndpoint: Registeringblockmanagerdev:34986with366.3MiBRAM, BlockManagerId(driver, dev, 34986, None) 22/07/2502:35:08INFOBlockManagerMaster: RegisteredBlockManagerBlockManagerId(driver, dev, 34986, None) 22/07/2502:35:08INFOBlockManager: InitializedBlockManager: BlockManagerId(driver, dev, 34986, None) 22/07/2502:35:08ERRORSparkContext: ErrorinitializingSparkContext. java.io.FileNotFoundException: /tmp/spark-events/application_1658730416274_0004.inprogress (Permissiondenied) atjava.io.FileOutputStream.open0(NativeMethod) atjava.io.FileOutputStream.open(FileOutputStream.java:270) atjava.io.FileOutputStream.<init>(FileOutputStream.java:213) atjava.io.FileOutputStream.<init>(FileOutputStream.java:101) atorg.apache.spark.deploy.history.EventLogFileWriter.initLogFile(EventLogFileWriters.scala:95) atorg.apache.spark.deploy.history.SingleEventLogFileWriter.start(EventLogFileWriters.scala:223) atorg.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:83) atorg.apache.spark.SparkContext.<init>(SparkContext.scala:612) atorg.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) atsun.reflect.NativeConstructorAccessorImpl.newInstance0(NativeMethod) atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) atjava.lang.reflect.Constructor.newInstance(Constructor.java:423) atpy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) atpy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) atpy4j.Gateway.invoke(Gateway.java:238) atpy4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) atpy4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) atpy4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) atpy4j.ClientServerConnection.run(ClientServerConnection.java:106) atjava.lang.Thread.run(Thread.java:748) 22/07/2502:35:08INFOSparkUI: StoppedSparkwebUIathttp://dev:404022/07/2502:35:08WARNYarnSchedulerBackend$YarnSchedulerEndpoint: AttemptedtosendshutdownmessagebeforetheAMhasregistered!22/07/2502:35:08INFOYarnClientSchedulerBackend: Interruptingmonitorthread22/07/2502:35:08WARNYarnSchedulerBackend$YarnSchedulerEndpoint: AttemptedtorequestexecutorsbeforetheAMhasregistered!22/07/2502:35:08INFOYarnClientSchedulerBackend: Shuttingdownallexecutors22/07/2502:35:08INFOYarnSchedulerBackend$YarnDriverEndpoint: Askingeachexecutortoshutdown22/07/2502:35:08INFOYarnClientSchedulerBackend: YARNclientschedulerbackendStopped22/07/2502:35:08INFOMapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpointstopped!22/07/2502:35:08INFOMemoryStore: MemoryStorecleared22/07/2502:35:08INFOBlockManager: BlockManagerstopped22/07/2502:35:08INFOBlockManagerMaster: BlockManagerMasterstopped22/07/2502:35:08INFOOutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinatorstopped!22/07/2502:35:08INFOSparkContext: SuccessfullystoppedSparkContextTraceback (mostrecentcalllast): File"/opt/spark-3.3.0-bin-hadoop3/app/try_pyspark.py", line57, in<module>main() File"/opt/spark-3.3.0-bin-hadoop3/app/try_pyspark.py", line19, inmain .config("hive.metastore.uris", "thrift:192.168.10.100:9083") \ File"/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/session.py", line269, ingetOrCreateFile"/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/context.py", line483, ingetOrCreateFile"/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/context.py", line208, in__init__File"/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/context.py", line282, in_do_initFile"/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/context.py", line402, in_initialize_contextFile"/opt/spark-3.3.0-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line1586, in__call__File"/opt/spark-3.3.0-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line328, inget_return_valuepy4j.protocol.Py4JJavaError: AnerroroccurredwhilecallingNone.org.apache.spark.api.java.JavaSparkContext. : java.io.FileNotFoundException: /tmp/spark-events/application_1658730416274_0004.inprogress (Permissiondenied) atjava.io.FileOutputStream.open0(NativeMethod) atjava.io.FileOutputStream.open(FileOutputStream.java:270) atjava.io.FileOutputStream.<init>(FileOutputStream.java:213) atjava.io.FileOutputStream.<init>(FileOutputStream.java:101) atorg.apache.spark.deploy.history.EventLogFileWriter.initLogFile(EventLogFileWriters.scala:95) atorg.apache.spark.deploy.history.SingleEventLogFileWriter.start(EventLogFileWriters.scala:223) atorg.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:83) atorg.apache.spark.SparkContext.<init>(SparkContext.scala:612) atorg.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) atsun.reflect.NativeConstructorAccessorImpl.newInstance0(NativeMethod) atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) atjava.lang.reflect.Constructor.newInstance(Constructor.java:423) atpy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) atpy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) atpy4j.Gateway.invoke(Gateway.java:238) atpy4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) atpy4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) atpy4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) atpy4j.ClientServerConnection.run(ClientServerConnection.java:106) atjava.lang.Thread.run(Thread.java:748) 22/07/2502:35:09INFOShutdownHookManager: Shutdownhookcalled22/07/2502:35:09INFOShutdownHookManager: Deletingdirectory/tmp/spark-de23b347-72ec-4d01-bf65-64e173fba79622/07/2502:35:09INFOShutdownHookManager: Deletingdirectory/tmp/spark-74750464-cbd4-48e9-97b0-8905a14121cf
核心问题
ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: /tmp/spark-events/application_1658730416274_0004.inprogress (Permission denied)
解决方法
在root用户下,将/tmp/spark-events/文件夹,修改为hadoop用户和用户组chown-Rhadoop:hadoop/tmp/spark-events
问题4
22/07/2503:04:22INFOHiveConf: Foundconfigurationfilefile:/opt/hive-3.3.3/conf/hive-site.xml22/07/2503:04:22INFOHiveUtils: InitializingHiveMetastoreConnectionversion2.3.9usingSparkclasses. 22/07/2503:04:23INFOHiveClientImpl: WarehouselocationforHiveclient (version2.3.9) isfile:/opt/spark-3.3.0-bin-hadoop3/app/spark-warehouse22/07/2503:04:26WARNHiveClientImpl: HiveClientgotthriftexception, destroyingclientandretrying (0triesremaining) org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unabletoinstantiateorg.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientatorg.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1567) atorg.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1552) atorg.apache.spark.sql.hive.client.Shim_v0_12.databaseExists(HiveShim.scala:609) atorg.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$databaseExists$1(HiveClientImpl.scala:394) atscala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) atorg.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294) atorg.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225) atorg.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224) atorg.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274) atorg.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:394) atorg.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223) atscala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) atorg.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101) atorg.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:223) atorg.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) atorg.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) atorg.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:54) atorg.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:69) atorg.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:121) atorg.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:121) atorg.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488) atorg.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:642) atorg.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:573) atsun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod) atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) atjava.lang.reflect.Method.invoke(Method.java:498) atpy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) atpy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) atpy4j.Gateway.invoke(Gateway.java:282) atpy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) atpy4j.commands.CallCommand.execute(CallCommand.java:79) atpy4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) atpy4j.ClientServerConnection.run(ClientServerConnection.java:106) atjava.lang.Thread.run(Thread.java:748) Causedby: java.lang.RuntimeException: Unabletoinstantiateorg.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientatorg.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1742) atorg.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83) atorg.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133) atorg.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) atorg.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3607) atorg.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3659) atorg.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3639) atorg.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1563) ... 34moreCausedby: java.lang.reflect.InvocationTargetExceptionatsun.reflect.NativeConstructorAccessorImpl.newInstance0(NativeMethod) atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) atjava.lang.reflect.Constructor.newInstance(Constructor.java:423) atorg.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740) ... 41moreCausedby: MetaException(message:CouldnotconnecttometastoreusinganyoftheURIsprovided. Mostrecentfailure: org.apache.thrift.transport.TTransportException: Cannotopennullhost. atorg.apache.thrift.transport.TSocket.open(TSocket.java:210) atorg.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478) atorg.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:245) atorg.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70) atsun.reflect.NativeConstructorAccessorImpl.newInstance0(NativeMethod) atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) atjava.lang.reflect.Constructor.newInstance(Constructor.java:423) atorg.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740) atorg.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83) atorg.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133) atorg.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) atorg.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3607) atorg.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3659) atorg.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3639) atorg.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1563) atorg.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1552) atorg.apache.spark.sql.hive.client.Shim_v0_12.databaseExists(HiveShim.scala:609) atorg.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$databaseExists$1(HiveClientImpl.scala:394) atscala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) atorg.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294) atorg.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225) atorg.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224) atorg.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274) atorg.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:394) atorg.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223) atscala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) atorg.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101) atorg.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:223) atorg.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) atorg.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) atorg.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:54) atorg.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:69) atorg.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:121) atorg.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:121) atorg.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488) atorg.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:642) atorg.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:573) atsun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod) atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) atjava.lang.reflect.Method.invoke(Method.java:498) atpy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) atpy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) atpy4j.Gateway.invoke(Gateway.java:282) atpy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) atpy4j.commands.CallCommand.execute(CallCommand.java:79) atpy4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) atpy4j.ClientServerConnection.run(ClientServerConnection.java:106) atjava.lang.Thread.run(Thread.java:748) ) atorg.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:527) atorg.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:245) atorg.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70) ... 46more22/07/2503:04:27WARNHiveClientImpl: DeadlineexceededTraceback (mostrecentcalllast): File"/opt/spark-3.3.0-bin-hadoop3/app/try_pyspark.py", line59, in<module>main() File"/opt/spark-3.3.0-bin-hadoop3/app/try_pyspark.py", line34, inmainspark.table("sample_table").write.saveAsTable("sample_hive_table") File"/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line1041, insaveAsTableFile"/opt/spark-3.3.0-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line1322, in__call__File"/opt/spark-3.3.0-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/utils.py", line196, indecopyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unabletoinstantiateorg.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient22/07/2503:04:27INFOSparkContext: Invokingstop() fromshutdownhook22/07/2503:04:27INFOSparkUI: StoppedSparkwebUIathttp://dev:404022/07/2503:04:27INFOYarnClientSchedulerBackend: Interruptingmonitorthread22/07/2503:04:27INFOYarnClientSchedulerBackend: Shuttingdownallexecutors22/07/2503:04:27INFOYarnSchedulerBackend$YarnDriverEndpoint: Askingeachexecutortoshutdown22/07/2503:04:27INFOYarnClientSchedulerBackend: YARNclientschedulerbackendStopped22/07/2503:04:27INFOMapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpointstopped!22/07/2503:04:27INFOMemoryStore: MemoryStorecleared22/07/2503:04:27INFOBlockManager: BlockManagerstopped22/07/2503:04:27INFOBlockManagerMaster: BlockManagerMasterstopped22/07/2503:04:27INFOOutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinatorstopped!22/07/2503:04:27INFOSparkContext: SuccessfullystoppedSparkContext22/07/2503:04:27INFOShutdownHookManager: Shutdownhookcalled22/07/2503:04:27INFOShutdownHookManager: Deletingdirectory/tmp/spark-f401ca87-530a-4af0-bac9-f02e67045abb22/07/2503:04:27INFOShutdownHookManager: Deletingdirectory/tmp/spark-4f77b4fc-f324-4bc7-809c-9b7988a9429722/07/2503:04:27INFOShutdownHookManager: Deletingdirectory/tmp/spark-4f77b4fc-f324-4bc7-809c-9b7988a94297/pyspark-b6c6a3ca-d1c5-441d-b0b7-a40085a158d0
主要问题:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
解决方法
1. 在pyspark 程序中,添加访问hive metastore 服务的支持。
注意: 192.168.10.100 这是hive所在服务器的ip地址
# hive.metastore.uris: 访问hive metastore 服务的地址
spark = SparkSession.builder \
.appName('SparkByTry') \
.config("hive.metastore.uris", "thrift://192.168.10.100:9083") \
.enableHiveSupport() \
.getOrCreate()
2. 在 $HIVE_HOME/conf下的 hive-site.xml文件中,配置一下hive.metastore.uris
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
3. 重启hiveserver2 和 metastore 服务
# 启动hive metastore
nohup /opt/hive-3.3.3/bin/hive --service metastore 1>/dev/null 2>&1 &
# 启动后台的hiveserver2
nohup /opt/hive-3.3.3/bin/hiveserver2 1>/dev/null 2>&1 &