在hive中创建表是有如下一个语句
[ROW FORMAT row_format]
row_format 的类型有如下:
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
默认是文本格式例如:
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/93a77615a82c4760bf26c478ee83d4df.webp?x-oss-process=image/resize,w_1400/format,webp)
按照数据存储方式分类
官方文档地址
https://cwiki.apache.org/confluence/display/Hive/SerDe
按照行存储
SEQUENCEFILE
TEXTFILE
按照列存储
RCFILE
ORC
PARQUET
说明:parquet目前已经是apache的顶级项目了,在hive,hbase,spark中都是经常使用的。
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/f8cc6af110394ec3b42e5415bbe63460.webp?x-oss-process=image/resize,w_1400/format,webp)
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/48dfc74f42354e26ac3a71ae99fc673e.webp?x-oss-process=image/resize,w_1400/format,webp)
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/cd2b0d0937854cca923ca81df2430045.webp?x-oss-process=image/resize,w_1400/format,webp)
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/a0cf3cee6e194a4b8cc0dff239d325ec.webp?x-oss-process=image/resize,w_1400/format,webp)
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/2e395669984e41a99ebfba0657334f36.webp?x-oss-process=image/resize,w_1400/format,webp)
文件格式的压缩比较
参考文档http://zh.hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/9a6acbcd5bbb4eec9c7dda85eae33c22.webp?x-oss-process=image/resize,w_1400/format,webp)
实际业务我们是按照列来分析数据及使用数据的。
1、创建文本表
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/52b9f464a9e04acfb807d4a8522eeb41.webp?x-oss-process=image/resize,w_1400/format,webp)
2、加载数据
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/796f5686334e4831ba387acbe580a267.webp?x-oss-process=image/resize,w_1400/format,webp)
image.png
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/957b1f0c848144c4be7f0dfa5c5e02a4.webp?x-oss-process=image/resize,w_1400/format,webp)
3、查看创建OCR文件的格式
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/3df1da30c9df435998b627c18ccf6c42.webp?x-oss-process=image/resize,w_1400/format,webp)
4、创建ocr文件
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/d2c7f456d745459dba27fbe0310582c0.webp?x-oss-process=image/resize,w_1400/format,webp)
5、把文件表中的数据插入到OCR类型的表中
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/6706fbaa1cbc483eaabf4454adf09851.webp?x-oss-process=image/resize,w_1400/format,webp)
6、创建parquet类型的表并插入数据
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/b9dcc6e2aef34b9fa2dead26175537de.webp?x-oss-process=image/resize,w_1400/format,webp)
7、查看文件大小
原始文本文件的大小
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/b56b5f06aa6b4d76a2637178dd3cdce4.webp?x-oss-process=image/resize,w_1400/format,webp)
orc文件大小
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/bc539216e32f44a3852bba11213fd75c.webp?x-oss-process=image/resize,w_1400/format,webp)
partquet文件大小
![](https://ucc.alicdn.com/qotwlfg67zs74/developer-article534632/20241019/1bd03a1594c342d8bd61112b5097a615.webp?x-oss-process=image/resize,w_1400/format,webp)