ogg logical bitstream framing原文
Ogg logical bitstream framing
Ogg bitstreams
The Ogg transport bitstream is designed to provide framing, error protection and seeking structure for higher-level codec streams that consist of raw, unencapsulated data packets, such as the Vorbis audio codec or Theora video codec. Ogg传输比特流旨在为由原始,未封装的数据包(例如Vorbis音频编解码器或Theora视频编解码器)组成的高级编解码器流提供成帧,错误保护和查找结构。
Application example: Vorbis
Vorbis encodes short-time blocks of PCM data into raw packets of bit-packed data. These raw packets may be used directly by transport mechanisms that provide their own framing and packet-separation mechanisms (such as UDP datagrams). For stream based storage (such as files) and transport (such as TCP streams or pipes), Vorbis uses the Ogg bitstream format to provide framing/sync, sync recapture after error, landmarks during seeking, and enough information to properly separate data back into packets at the original packet boundaries without relying on decoding to find packet boundaries. Vorbis将短时间的PCM数据块编码为位打包数据的原始数据包。这些原始数据包可以由提供自己的成帧和数据包分离机制(例如UDP数据报)的传输机制直接使用。对于基于流的存储(例如文件)和传输(例如TCP流或管道),Vorbis使用Ogg比特流格式提供成帧/同步,错误后同步重新捕获,搜索期间的地标以及足够的信息以将数据正确分离回原始分组边界处的所有分组,而无需依靠解码来找到分组边界。
Design constraints for Ogg bitstreams
- True streaming; we must not need to seek to build a 100% complete bitstream.真正的流媒体;我们不必寻求构建100%完整的比特流。
- Use no more than approximately 1-2% of bitstream bandwidth for packet boundary marking, high-level framing, sync and seeking.将不超过比特流带宽的大约1-2%用于数据包边界标记,高级成帧,同步和查找。
- Specification of absolute position within the original sample stream.指定原始样本流中的绝对位置。
- Simple mechanism to ease limited editing, such as a simplified concatenation mechanism.简化受限编辑的简单机制,例如简化的串联机制。
- Detection of corruption, recapture after error and direct, random access to data at arbitrary positions in the bitstream.检测损坏,错误后重新捕获以及直接,随机访问位流中任意位置的数据
Logical and Physical Bitstreams
A logical Ogg bitstream is a contiguous stream of sequential pages belonging only to the logical bitstream. A physical Ogg bitstream is constructed from one or more than one logical Ogg bitstream (the simplest physical bitstream is simply a single logical bitstream). We describe below the exact formatting of an Ogg logical bitstream. Combining logical bitstreams into more complex physical bitstreams is described in the Ogg bitstream overview. The exact mapping of raw Vorbis packets into a valid Ogg Vorbis physical bitstream is described in the Vorbis I Specification. 逻辑Ogg比特流是仅属于该逻辑比特流的连续页面的连续流。物理Ogg比特流是由一个或多个逻辑Ogg比特流构成的(最简单的物理比特流只是一个逻辑比特流)。我们在下面描述Ogg逻辑比特流的确切格式。 Ogg比特流概述中介绍了将逻辑比特流组合为更复杂的物理比特流。 Vorbis I规范中描述了原始Vorbis数据包到有效Ogg Vorbis物理比特流的精确映射。
Bitstream structure
An Ogg stream is structured by dividing incoming packets into segments of up to 255 bytes and then wrapping a group of contiguous packet segments into a variable length page preceded by a page header. Both the header size and page size are variable; the page header contains sizing information and checksum data to determine header/page size and data integrity. Ogg流的结构是将传入的数据包划分为最多255个字节的段,然后将一组连续的数据包段包装到一个可变长度的页中,并在页头之前。标头大小和页面大小都是可变的;页面标题包含大小调整信息和校验和数据,以确定标题/页面大小和数据完整性。 The bitstream is captured (or recaptured) by looking for the beginning of a page, specifically the capture pattern. Once the capture pattern is found, the decoder verifies page sync and integrity by computing and comparing the checksum. At that point, the decoder can extract the packets themselves. 通过查找页面的开头(特别是捕获模式)来捕获(或重新捕获)比特流。一旦找到捕获模式,解码器就会通过计算和比较校验和来验证页面同步和完整性。此时,解码器可以提取数据包本身。
Packet segmentation
Packets are logically divided into multiple segments before encoding into a page. Note that the segmentation and fragmentation process is a logical one; it's used to compute page header values and the original page data need not be disturbed, even when a packet spans page boundaries. The raw packet is logically divided into [n] 255 byte segments and a last fractional segment of < 255 bytes. A packet size may well consist only of the trailing fractional segment, and a fractional segment may be zero length. These values, called "lacing values" are then saved and placed into the header segment table. An example should make the basic concept clear: 数据包在编码为页面之前,在逻辑上分为多个段。注意,分段和分段过程是合乎逻辑的;它用于计算页面标头值,并且即使数据包跨越页面边界,也不必干扰原始页面数据。 原始数据包在逻辑上分为[n] 255个字节段和<255个字节的最后一个小数段。数据包大小可能仅由尾随的小数部分组成,小数部分的长度可能为零。然后将这些值(称为“lacing valuesß”)保存并放入标题段表中。 一个例子应该使基本概念清楚: raw packet:
|packet data____| 753 bytes lacing values for page header segment table: 255,255,243 We simply add the lacing values for the total size; the last lacing value for a packet is always the value that is less than 255. Note that this encoding both avoids imposing a maximum packet size as well as imposing minimum overhead on small packets (as opposed to, eg, simply using two bytes at the head of every packet and having a max packet size of 32k. Small packets (<255, the typical case) are penalized with twice the segmentation overhead). Using the lacing values as suggested, small packets see the minimum possible byte-aligned overhead (1 byte) and large packets, over 512 bytes or so, see a fairly constant ~.5% overhead on encoding space. 我们只需将lacing values加总尺寸即可;数据包的最后一个花边值始终是小于255的值。请注意,此编码既可以避免在小数据包上施加最大的数据包大小,也可以避免施加最小的开销(与之相反,例如,仅在数据包上使用两个字节)每个数据包的首位,最大数据包大小为32k。小数据包(典型值<255)会受到分段开销的两倍的惩罚。使用建议的花边值,小数据包会看到最小的字节对齐开销(1个字节),而大数据包会超过512个字节左右,在编码空间上看到相当恒定的〜.5%开销。 Note that a lacing value of 255 implies that a second lacing value follows in the packet, and a value of < 255 marks the end of the packet after that many additional bytes. A packet of 255 bytes (or a multiple of 255 bytes) is terminated by a lacing value of 0: 请注意,lacing values为255表示在包中紧跟第二个lacing values,而值<255则表示在该字节之后有很多附加字节。 255字节(或255字节的倍数)的数据包以0的花边值终止:
raw packet:
|packet data____| 255 bytes lacing values: 255, 0 Note also that a 'nil' (zero length) packet is not an error; it consists of nothing more than a lacing value of zero in the header.
Packets spanning pages
Packets are not restricted to beginning and ending within a page, although individual segments are, by definition, required to do so. Packets are not restricted to a maximum size, although excessively large packets in the data stream are discouraged. After segmenting a packet, the encoder may decide not to place all the resulting segments into the current page; to do so, the encoder places the lacing values of the segments it wishes to belong to the current page into the current segment table, then finishes the page. The next page is begun with the first value in the segment table belonging to the next packet segment, thus continuing the packet (data in the packet body must also correspond properly to the lacing values in the spanned pages. The segment data in the first packet corresponding to the lacing values of the first page belong in that page; packet segments listed in the segment table of the following page must begin the page body of the subsequent page). The last mechanic to spanning a page boundary is to set the header flag in the new page to indicate that the first lacing value in the segment table continues rather than begins a packet; a header flag of 0x01 is set to indicate a continued packet. Although mandatory, it is not actually algorithmically necessary; one could inspect the preceding segment table to determine if the packet is new or continued. Adding the information to the packet_header flag allows a simpler design (with no overhead) that needs only inspect the current page header after frame capture. This also allows faster error recovery in the event that the packet originates in a corrupt preceding page, implying that the previous page's segment table cannot be trusted. Note that a packet can span an arbitrary number of pages; the above spanning process is repeated for each spanned page boundary. Also a 'zero termination' on a packet size that is an even multiple of 255 must appear even if the lacing value appears in the next page as a zero-length continuation of the current packet. The header flag should be set to 0x01 to indicate that the packet spanned, even though the span is a nil case as far as data is concerned. The encoding looks odd, but is properly optimized for speed and the expected case of the majority of packets being between 50 and 200 bytes (note that it is designed such that packets of wildly different sizes can be handled within the model; placing packet size restrictions on the encoder would have only slightly simplified design in page generation and increased overall encoder complexity). The main point behind tracking individual packets (and packet segments) is to allow more flexible encoding tricks that requiring explicit knowledge of packet size. An example is simple bandwidth limiting, implemented by simply truncating packets in the nominal case if the packet is arranged so that the least sensitive portion of the data comes last.
Page header
The headering mechanism is designed to avoid copying and re-assembly of the packet data (ie, making the packet segmentation process a logical one); the header can be generated directly from incoming packet data. The encoder buffers packet data until it finishes a complete page at which point it writes the header followed by the buffered packet segments. 标头机制旨在避免复制和重组数据包数据(即,使数据包分段过程成为合乎逻辑的过程);标头可以直接从传入的数据包数据中生成。编码器缓冲数据包数据,直到完成一个完整的页面为止,在该点上,编码器将写入标头,然后写入缓冲的数据包段。
capture_pattern
A header begins with a capture pattern that simplifies identifying pages; once the decoder has found the capture pattern it can do a more intensive job of verifying that it has in fact found a page boundary (as opposed to an inadvertent coincidence in the byte stream). 标题以捕获模式开头,该模式简化了识别页面的过程。一旦解码器找到了捕获模式,它就可以做更多的工作来验证它实际上已经发现了页边界(与字节流中的偶然巧合相对)。
byte value 0 0x4f 'O' 1 0x67 'g' 2 0x67 'g' 3 0x53 'S'
stream_structure_version
The capture pattern is followed by the stream structure revision: 捕获模式之后是流结构修订: byte value 4 0x00
header_type_flag
The header type flag identifies this page's context in the bitstream:
byte value 5 bitflags: 0x01: unset = fresh packet 开始包 set = continued packet 0x02: unset = not first page of logical bitstream set = first page of logical bitstream (bos) 0x04: unset = not last page of logical bitstream set = last page of logical bitstream (eos)
absolute granule position
(This is packed in the same way the rest of Ogg data is packed; LSb of LSB first. Note that the 'position' data specifies a 'sample' number (eg, in a CD quality sample is four octets, 16 bits for left and 16 bits for right; in video it would likely be the frame number. It is up to the specific codec in use to define the semantic meaning of the granule position value). The position specified is the total samples encoded after including all packets finished on this page (packets begun on this page but continuing on to the next page do not count). The rationale here is that the position specified in the frame header of the last page tells how long the data coded by the bitstream is. A truncated stream will still return the proper number of samples that can be decoded fully. A special value of '-1' (in two's complement) indicates that no packets finish on this page. 以与打包其他Ogg数据相同的方式打包;首先打包LSB的LSb。请注意,“位置”数据指定了“样本”编号(例如,CD质量样本中为四个八位位组,左为16位 右边是16位;在视频中,可能是帧号。这取决于所使用的特定编解码器,以定义颗粒位置值的语义。)指定的位置是在包括所有包完成后编码的总样本数在此页上的数据包(在此页上开始但继续到下一页的数据包不计算在内)。此处的理由是,在最后一页的帧头中指定的位置指示比特流编码的数据有多长时间。流仍将返回正确数量的样本,可以完全解码。 特殊值“ -1”(以2的补码表示)表示该页面上没有数据包结束。 byte value 6 0xXX LSB 7 0xXX 8 0xXX 9 0xXX 10 0xXX 11 0xXX 12 0xXX 13 0xXX MSB
stream serial number
Ogg allows for separate logical bitstreams to be mixed at page granularity in a physical bitstream. The most common case would be sequential arrangement, but it is possible to interleave pages for two separate bitstreams to be decoded concurrently. The serial number is the means by which pages physical pages are associated with a particular logical stream. Each logical stream must have a unique serial number within a physical stream: Ogg允许将单独的逻辑位流以页面粒度混合到物理位流中。最常见的情况是顺序排列,但是可以对要同时解码的两个独立位流进行页面交织。序列号是将页面物理页面与特定逻辑流关联的方式。每个逻辑流在物理流中必须具有唯一的序列号: byte value 14 0xXX LSB 15 0xXX 16 0xXX 17 0xXX MSB
page sequence no
Page counter; lets us know if a page is lost (useful where packets span page boundaries). 页面计数器;让我们知道页面是否丢失(在数据包跨越页面边界的地方很有用) byte value 18 0xXX LSB 19 0xXX 20 0xXX 21 0xXX MSB
page checksum
32 bit CRC value (direct algorithm, initial val and final XOR = 0, generator polynomial=0x04c11db7). The value is computed over the entire header (with the CRC field in the header set to zero) and then continued over the page. The CRC field is then filled with the computed value. (A thorough discussion of CRC algorithms can be found in "A Painless Guide to CRC Error Detection Algorithms" by Ross Williams ross@ross.net.) 32位CRC值(直接算法,初始val和最终XOR = 0,生成多项式= 0x04c11db7)。该值是在整个标头上计算的(标头中的CRC字段设置为零),然后在页面上继续计算。然后,CRC字段将填充计算出的值。 (可以在Ross Williams ross@ross.net的“无痛CRC错误检测算法指南”中找到有关CRC算法的详尽讨论。) byte value 22 0xXX LSB 23 0xXX 24 0xXX 25 0xXX MSB
page_segments
The number of segment entries to appear in the segment table. The maximum number of 255 segments (255 bytes each) sets the maximum possible physical page size at 65307 bytes or just under 64kB (thus we know that a header corrupted so as destroy sizing/alignment information will not cause a runaway bitstream. We'll read in the page according to the corrupted size information that's guaranteed to be a reasonable size regardless, notice the checksum mismatch, drop sync and then look for recapture). 要在细分表中显示的细分条目数。最多255个段(每个255字节)将最大可能的物理页面大小设置为65307字节或略低于64kB(因此我们知道报头已损坏,因此破坏大小调整/对齐信息不会导致比特流失控。根据已损坏的大小信息读取页面,无论该大小是多少,都应保证大小合理,请注意校验和不匹配,丢包同步,然后查找重新捕获)。 byte value 26 0x00-0xff (0-255)
segment_table (containing packet lacing values)
The lacing values for each packet segment physically appearing in this page are listed in contiguous order. 物理上出现在此页面中的每个数据包段的系带值按连续顺序列出。
byte value 27 0x00-0xff (0-255) [...] n 0x00-0xff (0-255, n=page_segments+26) Total page size is calculated directly from the known header size and lacing values in the segment table. Packet data segments follow immediately after the header. 总页面大小直接根据细分表中已知的标题大小和系带值计算。包数据段紧跟在头之后。 Page headers typically impose a flat .25-.5% space overhead assuming nominal ~8k page sizes. The segmentation table needed for exact packet recovery in the streaming layer adds approximately .5-1% nominal assuming expected encoder behavior in the 44.1kHz, 128kbps stereo encodings. 假设标称〜8k的页面大小,页面标题通常会产生.25-.5%的固定空间开销。假设在44.1kHz,128kbps立体声编码中预期的编码器行为,则在流传输层中进行精确数据包恢复所需的分段表将增加约0.5-1%的标称值。