##一、版本
hudi-0.12.0,spark-3.1.2,hadoop-3.3.0
##二、问题描述
###目的: 使用spark-shell,创建一些数据,写入hudi表,写入的同时同步给hive,将数据映射为hive表,实现hudi和hive的双写; ###结果: 能成功的创建hive表,但是张空表,查询不出数据;但是使用sparkSQL查询反而正常;
三、spark-shell
- hudi表名,数据存储路径 val tableName = "hudi_trips_cow_spark_hudi_hive" val basePath = "hdfs://node1:8020/tmp/hudi_trips_cow_spark_shell"
- 写入hudi表,同时同步给hive df.write.format("hudi"). options(getQuickstartWriteConfigs). option(PRECOMBINE_FIELD_OPT_KEY, "ts"). option(RECORDKEY_FIELD_OPT_KEY, "uuid"). option("hoodie.table.name", tableName). option("hoodie.datasource.hive_sync.enable","true"). option("hoodie.datasource.hive_sync.mode","hms"). option("hoodie.datasource.hive_sync.metastore.uris", "thrift://node1:9083"). option("hoodie.datasource.hive_sync.database", "default"). option("hoodie.datasource.hive_sync.table", "spark_hudi_hive"). option("hoodie.datasource.hive_sync.partition_fields", "a,b,c"). option("hoodie.datasource.hive_sync.partition_extractor_class", "org.apache.hudi.hive.MultiPartKeysValueExtractor"). mode(Overwrite). save(basePath)
##四、idea连接hive 1. show tables; 成功查询到同步的hive表 spark_hudi_hive
- desc formatted spark_hudi_hive; 成功查看表结构 ... Location: hdfs://node1:8020/tmp/hudi_trips_cow_spark_shell ...
- select * from spark_hudi_hive 空表,查询不到数据
##五、pycharm连接spark 使用sparkSQL查询 select * from spark_hudi_hive; 反而能够查询出数据