理解I/O:随机和顺序

简介: 转自: http://www.violin-memory.com/blog/understanding-io-random-vs-sequential/Storage for DBAs: Ever been to one of those sushi restaurants where the food comes round in dishes on a conveyo

转自: http://www.violin-memory.com/blog/understanding-io-random-vs-sequential/

Storage for DBAs: Ever been to one of those sushi restaurants where the food comes round in dishes on a conveyor belt? As each dish travels around the loop you eye it up and, as long as you can make your mind up in time, grab it. However, if you are as indecisive as me, there’s a chance it will be out of range before you come to your senses – in which case you have to wait for it to complete a further full revolution before getting another chance. And that’s assuming someone else doesn’t get to it first.

曾经去过寿司店吗,那里的食物都是放在一个传送带上。随着每份食品在带上的传送,你瞄准了一些食物,等它们来到跟前时,立刻拿走它。然而如果你像我一样那么迟疑,那就有可能食物已经超出了你能够着的范围了。这时候你就得再等上一圈才可能拿到,前提还是别人没取走

Let’s assume that it takes a dish exactly 4 minutes to complete a whole lap of the conveyor belt. And just for simplicity’s sake let’s also assume that no two dishes on the belt are identical. As a hungry diner you look in the little menu and see a particular dish which you decide you want. It’s somewhere on the belt, so how long will it take to arrive?

我们假定一道食品在传送带上走完一圈需要4分钟,为简单起见还假定传送带上的食品互不相同。作为一个吃货,你看了看菜单,找到了几样你想要的食物,它就在带上的某个地方,那么需要多久才会到达你的旁边呢?

Probability dictates that it could be anywhere on the belt. It could be passing by right now, requiring no wait time – or it could have just passed out of reach, thus requiring 4 minutes of wait time to go all the way round again. As you follow this random method (choose from the menu then look at the belt) it makes sense that the average wait time will tend towards halfway between the min and max wait times, i.e. 2 minutes in this case. So every time you pick a dish you wait an average of 2 minutes: if you have eight dishes the odds say that you will spend (8 x 2) = 16 minutes waiting for your food. Welcome to the disk data diet, I hope you weren’t too hungry?

我们指定它可以在带上的任何地方。可能正在经过,你不需要等待就可以拿到手,或者它刚刚过了你的范围,那么需要4分钟的等待转完一圈。当你遵循这套随机规则(菜单中选择食品然后看传送带),就会意识到平均等待时间将会倾向于最大和最小等待时间的中间位置,也就是2分钟。于是乎你每次取食物时都需要等待2分钟,如果你有8个盘子,那很可能你需要等上16分钟才能取完。欢迎来到磁盘数据饮食,希望你不会太饿?

Now let’s consider an alternative option, where you order eight dishes from the chef and he or she places all of them sequentially (i.e. next to each other) somewhere on the conveyor belt. That location is random, so again you might have to wait anywhere between 0 and 4 minutes (an average of 2 minutes) for the first dish to pass… but the next seven will follow one after the other with no wait time. So now, in this scenario, you only had to wait 2 minutes for all eight dishes. Much better.

现在让我们来考虑另一种方案,你订了8道菜,厨师依次把他们放在了传送带的某个地方,位置是随机的,所以你需要等上平均时间2分钟取得第一份菜。然而剩下的7份菜都不需要等待。所以在这种场景下,取8道菜你只需等2分钟,比刚才好多了。

I’m sure you will have seen through my analogy right from the start. The conveyor belt is a hard disk and the sushi dishes are blocks which are being eaten / read. I haven’t yet worked out how to factor a bottle Asahi Super Dry into this story, but I’ll have one all the same thanks.

我确定你能看懂我在文章开头所作的类比了。传送带就是磁盘,食品就好比要吃/读的块。【最后一句翻译不来。555……】

Random versus Sequential I/O

I have another article planned for later in this series which describes the inescapable mechanics of disk. For now though, I’ll outline the basics: every time you need to access a block on a disk drive, the disk actuator arm has to move the head to the correct track (the seek time), then the disk platter has to rotate to locate the correct sector (the rotational latency). This mechanical action takes time, just like the sushi travelling around the conveyor belt.

我改日会有另外一篇文章来谈磁盘原理。但现在,我大概说一下基本内容:每次访问磁盘的一个块时,磁臂就需移动到正确的磁道上(这段时间为寻址时间),然后盘片就需旋转到正确的扇区上(这叫旋转时延)。这套动作需要时间,正如寿司在传送带上传送需要时间一样。

Obviously the amount of time depends on where the head was previously located and how fortunate you are with the location of the sector on the platter: if it’s directly under the head you do not need to wait, but if it just passed the head you have to wait for a complete revolution. Even on the fastest 15k RPM disk that takes 4 milliseconds (15,000 rotations per minute = 250 rotations per second, which means one rotation is 1/250th of a second or 4ms). Admittedly that’s faster than the sushi in my earlier analogy, but the chances are you will need to read or write a far larger number of blocks than I can eat sushi dishes (and trust me, on a good day I can pack a fair few away).

很明显总共的时间依赖于磁头的初使位置,还有要访问的扇区的位置。如果它刚好就在磁头下方,那不需要等待;如果刚刚经过磁头,那就不得不等上一个周期时间。哪怕对于最快的15k RPM磁盘,每分钟15000转,每秒250转,那么一转需要4ms。很明显比刚才寿司的情况要快得多,但是很多时候需要读上大量的数据块,远远超过我要吃的寿司量。相信我,这种时候的时间我都可以打包好几份了。

What about the next block? Well, if that next block is somewhere else on the disk, you will need to incur the same penalties of seek time and rotational latency. We call this type of operation a random I/O. But if the next block happened to be located directly after the previous one on the same track, the disk head would encounter it immediately afterwards, incurring no wait time (i.e. no latency). This, of course, is a sequential I/O.

那下一个磁盘块又是如何呢?如果它在磁盘的某个地方,访问它会有同样的寻道和旋转时延,我们就把这种方式的IO叫做随机IO;但是如果它刚好就在你刚才访问的那一个磁盘块的后面,磁头就能立刻遇到,不需等待,这种IO就叫顺序IO

Size Matters

In my last post I described the Fundamental Characteristics of Storage: Latency, IOPS and Bandwidth (or Throughput). As a reminder, IOPS stands for I/Os Per Second and indicates the number of distinct Input/Output operations (i.e. reads or writes) that can take place within one second. You might use an IOPS figure to describe the amount of I/O created by a database, or you might use it when defining the maximum performance of a storage system. One is a real-world value and the other a theoretical maximum, but they both use the term IOPS.

在我上一篇博文中讲到了磁盘的基本特征:延时、IOPS和带宽(或叫吞吐量)。这里再说一次,IOPS是每秒I/O数的简称,表示一秒中输入输出操作(比如读和写)的次数。可以用IOPS数值来描述一个数据库的IO操作量,或者在定义一个存储系统的最大性能时采用这个词。前者是一种真实世界的值,后者是一个理论最大值,它们都IOPS这个术语。

When describing volumes of data, things are slightly different. Bandwidth is usually used to describe the maximum theoretical limit of data transfer, while throughput is used to describe a real-world measurement. You might say that the bandwidth is the maximum possible throughput. Bandwidth and throughput figures are usually given in units of size over units of time, e.g. Mb/sec or GB/sec. It pays to look carefully at whether the unit is using bits (b) or bytes (B), otherwise you are likely to end up looking a bit silly (sadly, I speak from experience).

目录
相关文章
|
8天前
|
数据挖掘 索引 Python
如何在处理重复值时保持数据的原始顺序?
可以在处理数据重复值时有效地保持数据的原始顺序,确保数据在清洗和预处理过程中不会因为重复值的处理而导致顺序混乱,从而保证了数据分析结果的准确性和可靠性。
30 8
|
28天前
|
算法 搜索推荐 索引
多种实现随机排序的方法
【10月更文挑战第19天】除了直接交换法和随机索引法,还有多种方法可以实现随机排序,每一种方法都有其独特的特点和适用场景。
|
6月前
最短代码实现随机打乱数组各个元素的顺序
最短代码实现随机打乱数组各个元素的顺序
|
11月前
|
算法 测试技术 C#
C++单调向量算法:得到山形数组的最少删除次数
C++单调向量算法:得到山形数组的最少删除次数
随机生成十个不重复的数组元素
随机生成十个不重复的数组元素
133 0