Kylin-基本知识-阿里云开发者社区

Kylin-基本知识

2016-05-26 1405

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： CUBE Table - This is definition of hive tables as source of cubes, which must be synced before building cubes.

CUBE

Table - This is definition of hive tables as source of cubes, which must be synced before building cubes.
Data Model - This describes a STAR SCHEMA data model, which defines fact/lookup tables and filter condition.
Cube Descriptor - This describes definition and settings for a cube instance, defining which data model to use, what dimensions and measures to have, how to partition to segments and how to handle auto-merge etc.
Cube Instance - This is instance of cube, built from one cube descriptor, and consist of one or more cube segments according partition settings.
Partition - User can define a DATE/STRING column as partition column on cube descriptor, to separate one cube into several segments with different date periods.
Cube Segment - This is actual carrier of cube data, and maps to a HTable in HBase. One building job creates one new segment for the cube instance. Once data change on specified data period, we can refresh related segments to avoid rebuilding whole cube.
Aggregation Group - Each aggregation group is subset of dimensions, and build cuboid with combinations inside. It aims at pruning for optimization.

DIMENSION & MEASURE

Mandotary - This dimension type is used for cuboid pruning, if a dimension is specified as “mandatory”, then those combinations without such dimension are pruned.
Hierarchy - This dimension type is used for cuboid pruning, if dimension A,B,C forms a “hierarchy” relation, then only combinations with A, AB or ABC shall be remained.
Derived - On lookup tables, some dimensions could be generated from its PK, so there’s specific mapping between them and FK from fact table. So those dimensions are DERIVED and don’t participate in cuboid generation.
Count Distinct(HyperLogLog) - Immediate COUNT DISTINCT is hard to calculate, a approximate algorithm - HyperLogLog is introduced, and keep error rate in a lower level.
Count Distinct(Precise) - Precise COUNT DISTINCT will be pre-calculated basing on RoaringBitmap, currently only int or bigint are supported.
Top N - For example, with this measure type, user can easily get specified numbers of top sellers/buyers etc.

CUBE ACTIONS

BUILD - Given an interval of partition column, this action is to build a new cube segment.
REFRESH - This action will rebuilt cube segment in some partition period, which is used in case of source table increasing.
MERGE - This action will merge multiple continuous cube segments into single one. This can be automated with auto-merge settings in cube descriptor.
PURGE - Clear segments under a cube instance. This will only update metadata, and won’t delete cube data from HBase.

JOB STATUS

NEW - This denotes one job has been just created.
PENDING - This denotes one job is paused by job scheduler and waiting for resources.
RUNNING - This denotes one job is running in progress.
FINISHED - This denotes one job is successfully finished.
ERROR - This denotes one job is aborted with errors.
DISCARDED - This denotes one job is cancelled by end users.

JOB ACTION

RESUME - Once a job in ERROR status, this action will try to restore it from latest successful point.
DISCARD - No matter status of a job is, user can end it and release resources with DISCARD action.

Kylin-基本知识

CUBE

DIMENSION & MEASURE

CUBE ACTIONS

JOB STATUS

JOB ACTION

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Kylin-基本知识

CUBE

DIMENSION & MEASURE

CUBE ACTIONS

JOB STATUS

JOB ACTION

热门文章

最新文章

相关课程

相关电子书

相关实验场景