Hive应用实例:WordCount
现在我们通过一个实例——词频统计,来深入学习Hive的具体用法。
首先,创建一个需要分析的输入数据文件,然后编写HiveQL语句实现WordCount算法,在Linux系统中实现步骤如下:
(1)创建input目录,其中input为输入目录,命令如下:
cd /usr/local/hadoop sudo mkdir input
(2)在input文件夹中创建两个测试文件file1.txt和file2.txt,命令如下:
cd /usr/local/hadoop/input sudo sh -c "echo hello world >> file1.txt" sudo sh -c "echo hello haddop >> file2.txt"
(3)进入hive命令行窗口,编写HiveQL语句实现WordCount算法,命令如下:
use hive; create table docs(line string); load data inpath 'file:///usr/local/hadoop/input' overwrite into table docs;
create table word_count as select word,count(1) as count from (select explode(split(line,' ')) as word from docs) w group by word order by word;