Columnar Storage


You’re going to hear a lot about columnar storage formats in the next few months, as a variety of distributed execution engines are beginning to consider them for their IO efficiency, and the optimisations that they open up for query execution. In this post, I’ll explain why we care so much about IO efficiency and show how columnar storage – which is a simple idea – can drastically improve performance for certain workloads.


所以disk-based query engines需要解决的最大问题, 仍然是如何提高从磁盘读取数据的效率. 
思路, 可以从优化磁盘寻道时间和效率入手, 这个比较困难,进展缓慢 
更多的, 应该考虑怎样尽量从磁盘中读取有效的数据, 即减少查询所需要的磁盘读取量 

Disks are still the major bottleneck in query execution over large datasets. 
Therefore, the best thing an engineer can do to improve the performance of disk-based query engines (like RDBMs and Impala) usually is to improve the performance of reading bytes from disk.

1. The traditional way to improve disk bandwidth has been to wait, and allow disks to get faster
However, disks are not getting faster very quickly (having settled at roughly 100 MB/s, with ~12 disks per server), and SSDs can’t yet achieve the storage density to be directly competitive with HDDs on a per-server basis.


2. The other way to improve disk performance is to maximise the ratio of ‘useful’ bytes read to total bytes read. The idea is not to read more data than is absolutely necessary to serve a query, so the useful bandwidth realised is increased without actually improving the performance of the IO subsystem.


Columns VS Rows


Traditional database file format store data in rows, where each row is comprised of a contiguous collection of column values.

On disk, that looks roughly like the following:


This row-major layout usually has a header for each row that describes, for example, which columns in the row are NULL. Each column value is then stored contiguously after the header, followed by another row with its own header, and so on.



requires returning every single column of every single row in the table

SELECT <col_set> FROM table WHERE <predicate_set>

the set of rows returned by the evaluation of the predicates over the table is a large proportion of the total set of rows (i.e. the selectivity is high)

the predicate_set requires reading a large subset of the set of columns or b) col_set is a large subset of the set of columns (i.e. the projectivity is high)


More simply, a query is going to be efficient if it requires reading most of the columns of most of the rows. 
In these cases, row-major formats allow the query execution engine to achieve good IO efficiency. 
总结的很简单, 对于row-based, 无论实际需要多少column, 总是需要读出整行 
所以如果确实需要读出大部分column (无论在select部分或where部分), 那么row-major是高效的方式



However, there is a general consensus that these SELECT * kinds of queries are not representative of typical analytical workloads; instead either a large number of columns are not projected, or they are projected only for a small subset of rows where only a few columns are required to decide which rows to return.

Coupled with a general trend towards very wide tables with high column counts, the total number of bytes that are required to satisfy a query are often a relatively small fraction of the size on disk of the target table. In these cases, row-major formats often are quite wasteful in the amount of IO they require to execute a query.

但是在现实中, 很多场景往往只需要很少一部分的column, 那么row-major就显得非常低效 
自然的思路是, 既然我只需要某一个column, 那么把每个column分开存放并读取就显得更合理

Instead of a format that makes it efficient to read entire rows, it’s advantageous for analytical workloads to make it efficient to read entire columns at once. 
Based on our understanding of what makes disks efficient, we can see that the obvious approach is to store columns values densely and contiguously on disk. 
This is the basic idea behind columnar file formats. The following diagram shows what this looks like on disk:




The diagram below shows what a simple query plan for SELECT col_b FROM table WHERE col_a > 5

Reading from a traditional row-major file format

对于row-major, 需要读出所有的row来完成这个查询, 明显是比较低效的, 如果row中包含大量的columns


Reading from columnar storage



This, then, is the basic idea of columnar storage: we recognise that analytical workloads rarely require full scans of all table data, but do often require full scans of a small subset of the columns, and so we arrange to make column scans cheap at the expense of extra cost reading individual rows.



Is this a free lunch? Should every analytical database go out and change every file format to be column-major? 
Obviously the story is more complicated than that. There are some query archetypes that suffer when data is stored in a columnar format.

天下没有免费的午餐, 在bigdata领域, 没有one thing fits all 

1. It is expensive to reassemble a row, since the separate values that comprise it are spread far across the disk. Every column included in a projection implies an extra disk seek, and this can add up when the projectivity of a query is high. 所以columnar只适用于读取少量column的case, 否则reassemble的耗费会很高

2. Row-major formats have good in-memory spatial locality, and there are common operations that benefit enormously from this. 比如, 做两个column的相加, 对row-major就很简单, 而column的逻辑就复杂许多

所以对这样的情况, 比如HBase, 就采取column family的策略, 把经常一起使用的column放在一起, 达到很好的优化效果


存储 安全 文件存储
Network Attached Storage
Network Attached Storage
85 0
存储 Web App开发 移动开发
在HTML5出现之前,如果开发者需要在客户端存储少量的数据,只能通过cookie来实现,但是cookie存在几个不足点: 每个域名下cookie的大小限制在4KB。 cookie会包含在每个http请求中,这样会导致发送重复的数据。 cookie在网络传输过程中没有加密,存在安全隐患。 在HTML5新增了Web storage功能,Web Storage官方建议为每个网站是5MB,能存储比cookie更多的数据,并且具有比cookie更强大的功能。Web Storage现在已经得到了Firefox、Opera、Chrome、Safari各主流浏览器的支持。
272 0
JavaScript 前端开发
JavaScript 前端开发 内存技术
关系型数据库 Oracle
存储 数据库 文件存储