通过Java API获取Hive Metastore中的元数据信息-阿里云开发者社区

通过Java API获取Hive Metastore中的元数据信息

2022-01-09 2767

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 本文以Java API为例，介绍如何获取hive standalone metastore中的catalog、database、table等信息，通过该方式，我们可以方便地对元数据中心进行监控与管理。

在文章hive metastore 3.0介绍中，我们说到Hive 3.0.0版本开始，其单独提供了standalone metastore服务以作为像presto等处理引擎的元数据管理中心。

本文以Java API为例，介绍如何获取hive standalone metastore中的catalog、database、table等信息，通过该方式，我们可以方便地对元数据中心进行监控与管理。

当然，首先要在maven项目中导入如下依赖（以hive 3.1.2为例）

    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-standalone-metastore</artifactId>
      <version>3.1.2</version>
    </dependency>

接着便可以通过如下方式建立客户端IMetaStoreClient与HMS进行连接

    /**
     * 初始化HMS连接
     * @param conf org.apache.hadoop.conf.Configuration HMS连接信息
     * @return IMetaStoreClient
     * @throws MetaException 异常
     */
    public static IMetaStoreClient init(Configuration conf) throws MetaException {
        try {
            return RetryingMetaStoreClient.getProxy(conf, false);
        } catch (MetaException e) {
            LOGGER.error("hms连接失败", e);
            throw e;
        }
    }

而HMS的连接信息有两种方式可以提供，一种是通过配置文件hive-site.xml的形式，另一种则是指定"hive.metastore.uris"参数，具体如下所示：

        Configuration conf = new Configuration();
        // 通过"hive.metastore.uris"参数提供HMS连接信息
        conf.set("hive.metastore.uris", "thrift://192.168.1.3:9083");    
         
        // 通过hive-site.xml方式提供HMS连接信息
        // conf.addResource("hive-site.xml");
        IMetaStoreClient client = HMSClient.init(conf);

通过上述方式建立与HMS连接的客户端之后，便可以通过下述接口获取catalog等信息

        System.out.println("----------------------------获取所有catalogs-------------------------------------");
        client.getCatalogs().forEach(System.out::println);

        System.out.println("------------------------获取catalog为hive的描述信息--------------------------------");
        System.out.println(client.getCatalog("hive").toString());

        System.out.println("--------------------获取catalog为hive的所有database-------------------------------");
        client.getAllDatabases("hive").forEach(System.out::println);

        System.out.println("---------------获取catalog为hive，database为hive的描述信息--------------------------");
        System.out.println(client.getDatabase("hive", "hive_storage"));

        System.out.println("-----------获取catalog为hive，database名为hive_storage下的所有表--------------------");
        client.getTables("hive", "hive_storage", "*").forEach(System.out::println);

        System.out.println("------获取catalog为hive，database名为hive_storage，表名为sample_table_1的描述信息-----");
        System.out.println(client.getTable("hive", "hive_storage", "sample_table_1").toString());

如果要了解更多使用方法，可参考HiveMetaStoreClient.java类

下面为具体代码实现：

maven项目的pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.zh.ch.bigdata.hms</groupId>
  <artifactId>hms-client</artifactId>
  <version>1.0-SNAPSHOT</version>

  <name>hms-client</name>
  <!-- FIXME change it to the project's website -->
  <url>http://www.example.com</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.7</maven.compiler.source>
    <maven.compiler.target>1.7</maven.compiler.target>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-standalone-metastore</artifactId>
      <version>3.1.2</version>
    </dependency>
  </dependencies>

  <build>
    <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
      <plugins>
        <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
        <plugin>
          <artifactId>maven-clean-plugin</artifactId>
          <version>3.1.0</version>
        </plugin>
        <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
        <plugin>
          <artifactId>maven-resources-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>3.8.0</version>
        </plugin>
        <plugin>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.22.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-jar-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-install-plugin</artifactId>
          <version>2.5.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-deploy-plugin</artifactId>
          <version>2.8.2</version>
        </plugin>
        <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
        <plugin>
          <artifactId>maven-site-plugin</artifactId>
          <version>3.7.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-project-info-reports-plugin</artifactId>
          <version>3.0.0</version>
        </plugin>
      </plugins>
    </pluginManagement>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>8</source>
          <target>8</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

HMSClient.java测试代码

package com.zh.ch.bigdata.hms;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hive.metastore.IMetaStoreClient;
import org.apache.hadoop.hive.metastore.RetryingMetaStoreClient;
import org.apache.hadoop.hive.metastore.api.MetaException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


public class HMSClient {

    public static final Logger LOGGER = LoggerFactory.getLogger(HMSClient.class);

    /**
     * 初始化HMS连接
     * @param conf org.apache.hadoop.conf.Configuration
     * @return IMetaStoreClient
     * @throws MetaException 异常
     */
    public static IMetaStoreClient init(Configuration conf) throws MetaException {
        try {
            return RetryingMetaStoreClient.getProxy(conf, false);
        } catch (MetaException e) {
            LOGGER.error("hms连接失败", e);
            throw e;
        }
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        conf.set("hive.metastore.uris", "thrift://192.168.1.3:9083");

        // conf.addResource("hive-site.xml");
        IMetaStoreClient client = HMSClient.init(conf);

        System.out.println("----------------------------获取所有catalogs-------------------------------------");
        client.getCatalogs().forEach(System.out::println);

        System.out.println("------------------------获取catalog为hive的描述信息--------------------------------");
        System.out.println(client.getCatalog("hive").toString());

        System.out.println("--------------------获取catalog为hive的所有database-------------------------------");
        client.getAllDatabases("hive").forEach(System.out::println);

        System.out.println("---------------获取catalog为hive，database为hive的描述信息--------------------------");
        System.out.println(client.getDatabase("hive", "hive_storage"));

        System.out.println("-----------获取catalog为hive，database名为hive_storage下的所有表--------------------");
        client.getTables("hive", "hive_storage", "*").forEach(System.out::println);

        System.out.println("------获取catalog为hive，database名为hive_storage，表名为sample_table_1的描述信息-----");
        System.out.println(client.getTable("hive", "hive_storage", "sample_table_1").toString());
    }
}

运行结果

----------------------------获取所有catalogs-------------------------------------
hive
------------------------获取catalog为hive的描述信息--------------------------------
Catalog(name:hive, description:Default catalog for Hive, locationUri:file:/user/hive/warehouse)
--------------------获取catalog为hive的所有database-------------------------------
default
hive
hive_storage
---------------获取catalog为hive，database为hive的描述信息--------------------------
Database(name:hive_storage, description:null, locationUri:s3a://hive-storage/, parameters:{}, ownerName:root, ownerType:USER, catalogName:hive)
-----------获取catalog为hive，database名为hive_storage下的所有表--------------------
sample_table_1
------获取catalog为hive，database名为hive_storage，表名为sample_table_1的描述信息-----
Table(tableName:sample_table_1, dbName:hive_storage, owner:root, createTime:1641540923, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:string, comment:null)], location:s3a://hive-storage/sample_table_1, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:sample_table_1, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{presto_query_id=20220107_073521_00018_favj9, totalSize=366, numRows=1, rawDataSize=22, COLUMN_STATS_ACCURATE={"COLUMN_STATS":{"col1":"true","col2":"true"}}, numFiles=1, transient_lastDdlTime=1641540923, auto.purge=false, STATS_GENERATED_VIA_STATS_TASK=workaround for potential lack of HIVE-12730, presto_version=366}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER)

通过Java API获取Hive Metastore中的元数据信息

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

通过Java API获取Hive Metastore中的元数据信息

热门文章

最新文章

相关课程

相关电子书

相关实验场景