开发者社区> 炉火纯青> 正文
阿里云
为了无法计算的价值
打开APP
阿里云APP内打开

微软Azure云平台Hbase 的使用

简介: In this article What is HBase? Prerequisites Provision HBase clusters using Azure Management portal Mange HBase tables using HBase shell Use Hiv...
+关注继续查看

In this article

What is HBase?

HBase is a low-latency NoSQL database that allows online transactional processing of big data. HBase is offered as a managed cluster integrated into the Azure environment. The clusters are configured to store data directly in Azure Blob storage, which provides low latency and increased elasticity in performance/cost choices. This enables customers to build interactive websites that work with large datasets, to build services that store sensor and telemetry data from millions of end points, and to analyze this data with Hadoop jobs. For more information on HBase and the scenarios it can be used for, see HDInsight HBase overview.

Prerequisites

Before you begin this tutorial, you must have the following:

Provision an HBase cluster on the Azure portal

This section describes how to provision an HBase cluster using the Azure Management portal.

To provision an HDInsight cluster in the Azure Management portal

  1. Sign in to the Azure Management Portal.
  2. Click NEW on the lower left, and then click DATA SERVICES, HDINSIGHT, HBASE.
  3. Enter CLUSTER NAME, CLUSTER SIZE, CLUSTER USER PASSWORD, and STORAGE ACCOUNT.

    Choosing and HBase cluster type and entering cluster login credentials.

  4. Click on the check icon on the lower left to create the HBase cluster.

Create an HBase sample table from the HBase shell

This section describes how to enable and use the Remote Desktop Protocol (RDP) to access the HBase shell and then use it to create an HBase sample table, add rows, and then list the rows in the table.

It assumes you have completed the procedure outlined in the first section, and so have already successfully created an HBase cluster.

To enable the RDP connection to the HBase cluster

  1. From the Management portal, click HDINSIGHT from the left to view the list of the existing clusters.
  2. Click the HBase cluster where you want to open HBase Shell.
  3. Click CONFIGURATION from the top.
  4. Click ENABLE REMOTE from the bottom.
  5. Enter the RDP user name and password. The user name must be different from the cluster user name you used when provisioning the cluster. TheEXPIRES ON data can be up to seven days from today.
  6. Click the check on the lower right to enable remote desktop.
  7. After the RPD is enabled, click CONNECT from the bottom of the CONFIGURATION tab, and follow the instructions.

To open the HBase Shell

  1. Within your RDP session, click on the Hadoop Command Line shortcut located on the desktop.

  2. Change the folder to the HBase home directory:

    cd %HBASE_HOME%\bin
  3. Open the HBase shell:

    hbase shell

To create a sample table, add data and retrieve the data

  1. Create a sample table:

    create 'sampletable', 'cf1'
  2. Add a row to the sample table:

    put 'sampletable', 'row1', 'cf1:col1', 'value1'
  3. List the rows in the sample table:

    scan 'sampletable'

Check cluster status in the HBase WebUI

HBase also ships with a WebUI that helps monitoring your cluster, for example by providing request statistics or information about regions. On the HBase cluster you can find the WebUI under the address of the zookeepernode.

http://zookeepernode:60010/master-status

In a HighAvailability (HA) cluster, you will find a link to the current active HBase master node hosting the WebUI.

Bulk load a sample table

  1. Create samplefile1.txt containing the following data, and upload to Azure Blob Storage to /tmp/samplefile1.txt:

    row1    c1  c2
    row2    c1  c2
    row3    c1  c2
    row4    c1  c2
    row5    c1  c2
    row6    c1  c2
    row7    c1  c2
    row8    c1  c2
    row9    c1  c2
    row10    c1 c2
  2. Change the folder to the HBase home directory:

    cd %HBASE_HOME%\bin
  3. Execute ImportTsv:

    hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,a:b,a:c" -Dimporttsv.bulk.output=/tmpOutput sampletable2 /tmp/samplefile1.txt
  4. Load the output from prior command into HBase:

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmpOutput sampletable2

Use Hive to query an HBase table

Now you have an HBase cluster provisioned and have created an HBase table, you can query it using Hive. This section creates a Hive table that maps to the HBase table and uses it to queries the data in your HBase table.

To open cluster dashboard

  1. Sign in to the Azure Management Portal.
  2. Click HDINSIGHT from the left pane. You shall see a list of clusters created including the one you just created in the last section.
  3. Click the cluster name where you want to run the Hive job.
  4. Click QUERY CONSOLE from the bottom of the page to open cluster dashboard. It opens a Web page on a different browser tab.
  5. Enter the Hadoop User account username and password. The default username is admin, the password is what you entered during the provision process. A new browser tab is opened.
  6. Click Hive Editor from the top. The Hive Editor looks like :

    HDInsight cluster dashboard.

To run Hive queries

  1. Enter the HiveQL script below into Hive Editor and click SUBMIT to create an Hive Table mapping to the HBase table. Make sure that you have created the sampletable table referenced here in HBase using the HBase Shell before executing this statement.

    CREATE EXTERNAL TABLE hbasesampletable(rowkey STRING, col1 STRING, col2 STRING)
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf1:col1,cf1:col2')
    TBLPROPERTIES ('hbase.table.name' = 'sampletable');

    Wait until the Status is updated to Completed.

  2. Enter the HiveQL script below into Hive Editor, and then click SUBMIT button. The Hive query queries the data in the HBase table:

    SELECT count(*) FROM hbasesampletable;
  3. To retrieve the results of the Hive query, click on the View Details link in the Job Session window when the job finishes executing. The Job Output shall be 1 because you only put one record into the HBase table.

To browse the output file

  1. From Query Console, click File Browser from the top.
  2. Click the Azure Storage account used as the default file system for the HBase cluster.
  3. Click the HBase cluster name. The default Azure storage account container uses the cluster name.
  4. Click user.
  5. Click admin. This is the Hadoop user name.
  6. Click the job name with the Last Modified time matching the time when the SELECT Hive query ran.
  7. Click stdout. Save the file and open the file with Notepad. The output shall be 1.

    HDInsight HBase Hive Editor File Browser

Use HBase REST Client Library for .NET C# APIs to create an HBase table and retrieve data from the table

The Microsoft HBase REST Client Library for .NET project must be downloaded from GitHub and the project built to use the HBase .NET SDK. The following procedure includes the instructions for this task.

  1. Create a new C# Visual Studio Windows Desktop Console application.
  2. Open NuGet Package Manager Console by click the TOOLS menu, NuGet Package Manager, Package Manager Console.
  3. Run the following NuGet command in the console:

    Install-Package Microsoft.HBase.Client

  4. Add the following using statements on the top of the file:

    using Microsoft.HBase.Client;
    using org.apache.hadoop.hbase.rest.protobuf.generated;
  5. Replace the Main function with the following:

    static void Main(string[] args)
    {
        string clusterURL = "https://<yourHBaseClusterName>.azurehdinsight.net";
        string hadoopUsername= "<yourHadoopUsername>";
        string hadoopUserPassword = "<yourHadoopUserPassword>";
    
        string hbaseTableName = "sampleHbaseTable";
    
        // Create a new instance of an HBase client.
        ClusterCredentials creds = new ClusterCredentials(new Uri(clusterURL), hadoopUsername, hadoopUserPassword);
        HBaseClient hbaseClient = new HBaseClient(creds);
    
        // Retrieve the cluster version
        var version = hbaseClient.GetVersion();
        Console.WriteLine("The HBase cluster version is " + version);
    
        // Create a new HBase table.
        TableSchema testTableSchema = new TableSchema();
        testTableSchema.name = hbaseTableName;
        testTableSchema.columns.Add(new ColumnSchema() { name = "d" });
        testTableSchema.columns.Add(new ColumnSchema() { name = "f" });
        hbaseClient.CreateTable(testTableSchema);
    
        // Insert data into the HBase table.
        string testKey = "content";
        string testValue = "the force is strong in this column";
        CellSet cellSet = new CellSet();
        CellSet.Row cellSetRow = new CellSet.Row { key = Encoding.UTF8.GetBytes(testKey) };
        cellSet.rows.Add(cellSetRow);
    
        Cell value = new Cell { column = Encoding.UTF8.GetBytes("d:starwars"), data = Encoding.UTF8.GetBytes(testValue) };
        cellSetRow.values.Add(value);
        hbaseClient.StoreCells(hbaseTableName, cellSet);
    
        // Retrieve a cell by its key.
        cellSet = hbaseClient.GetCells(hbaseTableName, testKey);
        Console.WriteLine("The data with the key '" + testKey + "' is: " + Encoding.UTF8.GetString(cellSet.rows[0].values[0].data));
        // with the previous insert, it should yield: "the force is strong in this column"
    
        //Scan over rows in a table. Assume the table has integer keys and you want data between keys 25 and 35. 
        Scanner scanSettings = new Scanner()
        {
            batch = 10,
            startRow = BitConverter.GetBytes(25),
            endRow = BitConverter.GetBytes(35)
        };
    
        ScannerInformation scannerInfo = hbaseClient.CreateScanner(hbaseTableName, scanSettings);
        CellSet next = null;
        Console.WriteLine("Scan results");
    
        while ((next = hbaseClient.ScannerGetNext(scannerInfo)) != null)
        {
            foreach (CellSet.Row row in next.rows)
            {
                Console.WriteLine(row.key + " : " + Encoding.UTF8.GetString(row.values[0].data));
            }
        }
    
        Console.WriteLine("Press ENTER to continue ...");
        Console.ReadLine();
    }
  6. Set the first three variables in the Main function.

  7. Press F5 to run the application.

What's Next?

In this tutorial, you have learned how to provision an HBase cluster, how to create tables, and and view the data in those tables from the HBase shell. You also learned how use Hive to query the data in HBase tables and how to use the HBase C# APIs to create an HBase table and retrieve data from the table.

To learn more, see:

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
Mybatis从小白到小黑(七)Mybatis缓存详解
在 Web 系统中,最重要的操作就是查询数据库中的数据。但是有些时候查询数据的频率非常高,这是很耗费数据库资源的,往往会导致数据库查询效率极低,影响客户的操作体验。于是我们可以将一些变动不大且访问频率高的数据,放置在一个缓存容器中,用户下一次查询时就从缓存容器中获取结果。
41 0
当HBase与云邂逅,又碰撞出了什么样的火花?
阿里云HBase2.0也就是阿里云即将要上线的ApsaraDB for HBase2.0。它不仅兼容开源HBase2.0,也承载着阿里多年大规模HBase使用的技术积淀,还有广大公有云用户喜欢的商业化功能。
3241 0
当HBase与云邂逅,又碰撞出了什么样的火花?
HBase与云究竟碰撞出了什么样的火花?云能给HBase带来什么样的能力?本文中,阿里云高级技术专家封神(曹龙)就为大家揭晓了答案。
1786 0
云栖科技评论第6期:英国防部使用微软Azure云计算服务
本周热点科技事件,是阿里云“ET”采用分布式爬虫收集全球海量互联网信息,利用文本挖掘和语义分析解析新闻关键词,使用深度神经网络将新闻分类,汇总而选择出的最新鲜科技信息。点击收听人工智能·语音版 制作:人民网研究院 内容提供:阿里云研究中心
1699 0
《HBase权威指南》一导读
你阅读本书的理由可能有很多。可能是因为听说了Hadoop,并了解到它能够在合理的时间范围内处理PB级的数据,在研读Hadoop的过程中发现了一个处理随机读写的系统,它叫做HBase。或者将其称为目前流行的一种新的数据存储架构,传统数据库解决大数据问题时成本更高,更适合的技术范围是NoSQL。
2342 0
利用Node.JS访问Azure用户角色信息
版权声明:本文为博主chszs的原创文章,未经博主允许不得转载。 https://blog.csdn.net/chszs/article/details/7820193 回报CSDN! 我们在微软的Windows Azure云计算平台上进行软件开发时,很有可能会遇到这个需求,那就是获取用户的角色环境信息。
717 0
基于Azure云计算平台的网格计算(1)
在这个由3部分组成的系列文章中,我们将看一下利用Azure云计算平台的网格计算。在第1部分中,我们将看到所涉及的设计模式以及一些有益的观点。在第2和第3部分,我们将看到一个用来展示专门为Azure而开发的网格计算框架的代码例子。
1012 0
+关注
炉火纯青
pangguoming.com
260
文章
0
问答
文章排行榜
最热
最新
相关电子书
更多
低代码开发师(初级)实战教程
立即下载
阿里巴巴DevOps 最佳实践手册
立即下载
冬季实战营第三期:MySQL数据库进阶实战
立即下载