he libspeex library contains all the functions for encoding and decoding speech with the Speex codec. When linking on a UNIX system, one must add -lspeex -lm to the compiler command line. One important thing to know is that libspeex calls are reentrant, but not thread-safe. That means that it is fine to use calls from many threads, but calls using the same state from multiple threads must be protected by mutexes. Examples of code can also be found in Appendix A and the complete API documentation is included in the Documentation section of the Speex website (http://www.speex.org/).

Speex编解码器的libspeex包囊括了所有的语音编码和解码函数。在Linux系统中连接时,必须在编译器命令行中加入-lspeex –lm。需要知道的是,虽然libspeex的函数调用是可重入的,但不是线程安全的,所以在多线程调用时,如果使用共享资源需要进行互斥保护。附录A中有代码实例,在Speex站点(http://www.speex.org/ )的文档部分能下到完整的API文档。



5.1 编码

In order to encode speech using Speex, one first needs to:
#include <speex/speex.h>
Then in the code, a Speex bit-packing struct must be declared, along with a Speex encoder state:
SpeexBits bits;

void *enc_state;
The two are initialized by:
enc_state = speex_encoder_init(&speex_nb_mode);
For wideband coding, speex_nb_mode will be replaced by speex_wb_mode. In most cases, you will need to know the frame size used at the sampling rate you are using. You can get that value in the frame_size variable (expressed in samples, not
bytes) with:
In practice, frame_size will correspond to 20 ms when using 8, 16, or 32 kHz sampling rate. There are many parameters that can be set for the Speex encoder, but the most useful one is the quality parameter that controls the quality vs bit-rate tradeoff.
This is set by:
where quality is an integer value ranging from 0 to 10 (inclusively). The mapping between quality and bit-rate is described in Fig. 9.2 for narrowband.
Once the initialization is done, for every input frame:
speex_encode_int(enc_state, input_frame, &bits);
nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES);
where input_frame is a (short *) pointing to the beginning of a speech frame, byte_ptr is a (char *) where the encoded frame will be written,MAX_NB_BYTES is the maximumnumber of bytes that can be written to byte_ptr without causing an overflow and nbBytes is the number of bytes actually written to byte_ptr (the encoded size in bytes). Before calling speex_bits_write, it is possible to find the number of bytes that need to be written by calling speex_bits_nbytes(&bits), which returns a number of bytes.
It is still possible to use the speex_encode() function, which takes a (float *) for the audio. However, this would make an eventual port to an FPU-less platform (like ARM) more complicated. Internally, speex_encode() and speex_encode_int() are processed in the same way. Whether the encoder uses the fixed-point version is only decided by the compile-time flags, not at the API level.
After you’re done with the encoding, free all resources with:
That’s about it for the encoder.


#include < speex/speex.h >


SpeexBits bits;

void * enc_state;


speex_bits_init( &bits );

enc_state = speex_encoder_init( &speex_nb_mode );


speex_encoder_ctl( enc_state, SPEEX_GET_FRAME_SIZE, &frame_size );

speex_encoder_ctl( enc_state, SPEEX_SET_QUALITY, &quality );



speex_bits_reset( &bits );

speex_encode_int( enc_state, input_frame, &bits );

nbBytes = speex_bits_write( &bits, byte_ptr, MAX_NB_BYTES );

其中,input_frame是指向每个Speex帧开始的short型指针,byte_ptr是将写入已被编码的帧的char型指针,MAX_NB_BYTES是byte_ptr在不导致溢出时可被写入的最大字节数,nbBytes是byte_ptr实际被写入的字节数(编码大小以字节为单位)。在调用speex_bits_write之前,可能会通过speex_bits_nbytes(&bits)返回的字节数获得需要被写入的字节数,也可能使用speex_encode() 函数,它接受一个携带音频数据的float*型参数。不过这将使缺少浮点运算单元(FPU)的平台(如ARM)变的更为复杂。实际上,speex_encode和speex_encode_int()用同样的方法处理,编码器是否使用定点数取决于编译期的标志位,不由API来控制。

speex_bits_destroy( &bits );

speex_encoder_destroy( enc_state );



5.2 解码

In order to decode speech using Speex, you first need to:
#include <speex/speex.h>
You also need to declare a Speex bit-packing struct
SpeexBits bits;
and a Speex decoder state
void *dec_state;
The two are initialized by:
dec_state = speex_decoder_init(&speex_nb_mode);
For wideband decoding, speex_nb_mode will be replaced by speex_wb_mode. If you need to obtain the size of the frames that will be used by the decoder, you can get that value in the frame_size variable (expressed in samples, not bytes) with:
speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size);
There is also a parameter that can be set for the decoder: whether or not to use a perceptual enhancer. This can be set by:
speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);
where enh is an int with value 0 to have the enhancer disabled and 1 to have it enabled. As of 1.2-beta1, the default is now to enable the enhancer.
Again, once the decoder initialization is done, for every input frame:
speex_bits_read_from(&bits, input_bytes, nbBytes);
speex_decode_int(dec_state, &bits, output_frame);
where input_bytes is a (char *) containing the bit-stream data received for a frame, nbBytes is the size (in bytes) of that bit-stream, and output_frame is a (short *) and points to the area where the decoded speech frame will be written. A NULL value as the second argument indicates that we don’t have the bits for the current frame. When a frame is lost, the Speex decoder will do its best to "guess" the correct signal.
As for the encoder, the speex_decode() function can still be used, with a (float *) as the output for the audio. After you’re done with the decoding, free all resources with:


#include < speex/speex.h>


SpeexBits bits;

void* dec_state;

speex_bits_init( &bits );

dec_state = speex_decoder_init( &speex_nb_mode );


speex_decoder_ctl( dec_state, SPEEX_GET_FRAME_SIZE, &frame_size );


speex_decoder_ctl( dec_state, SPEEX_SET_ENH, &enh );



speex_bits_read_from( &bits, input_bytes, nbBytes );

speex_decode_int( dec_state, &bits, output_frame );




speex_bits_destory( &bits );

speex_decoder_destory( dec_state );


5.3 编解码选项(speex_*_ctl)

The Speex encoder and decoder support many options and requests that can be accessed through the speex_encoder_ctl and
speex_decoder_ctl functions. These functions are similar to the ioctl system call and their prototypes are:
void speex_encoder_ctl(void *encoder, int request, void *ptr);
void speex_decoder_ctl(void *encoder, int request, void *ptr);
Despite those functions, the defaults are usually good for many applications and optional settings should only be used when one understands them and knows that they are needed. A common error is to attempt to set many unnecessary settings.
Here is a list of the values allowed for the requests. Some only apply to the encoder or the decoder. Because the last argument is of type void *, the _ctl() functions are not type safe, and shoud thus be used with care. The type spx_int32_t is the same as the C99 int32_t type.
SPEEX_SET_ENH‡ Set perceptual enhancer to on (1) or off (0) (spx_int32_t, default is on)
SPEEX_GET_ENH‡ Get perceptual enhancer status (spx_int32_t)
SPEEX_GET_FRAME_SIZE Get the number of samples per frame for the current mode (spx_int32_t)
SPEEX_SET_QUALITY† Set the encoder speech quality (spx_int32_t from 0 to 10, default is 8)
SPEEX_GET_QUALITY† Get the current encoder speech quality (spx_int32_t from 0 to 10)
SPEEX_SET_MODE† Set the mode number, as specified in the RTP spec (spx_int32_t)
SPEEX_GET_MODE† Get the current mode number, as specified in the RTP spec (spx_int32_t)
SPEEX_SET_VBR† Set variable bit-rate (VBR) to on (1) or off (0) (spx_int32_t, default is off)
SPEEX_GET_VBR† Get variable bit-rate (VBR) status (spx_int32_t)
SPEEX_SET_VBR_QUALITY† Set the encoder VBR speech quality (float 0.0 to 10.0, default is 8.0)
SPEEX_GET_VBR_QUALITY† Get the current encoder VBR speech quality (float 0 to 10)
SPEEX_SET_COMPLEXITY† Set the CPU resources allowed for the encoder (spx_int32_t from 1 to 10, default is 2)
SPEEX_GET_COMPLEXITY† Get the CPU resources allowed for the encoder (spx_int32_t from 1 to 10, default is 2)
SPEEX_SET_BITRATE† Set the bit-rate to use the closest value not exceeding the parameter (spx_int32_t in bits per second)
SPEEX_GET_BITRATE Get the current bit-rate in use (spx_int32_t in bits per second)
SPEEX_SET_SAMPLING_RATE Set real sampling rate (spx_int32_t in Hz)
SPEEX_GET_SAMPLING_RATE Get real sampling rate (spx_int32_t in Hz)
SPEEX_RESET_STATE Reset the encoder/decoder state to its original state, clearing all memories (no argument)
SPEEX_SET_VAD† Set voice activity detection (VAD) to on (1) or off (0) (spx_int32_t, default is off)
SPEEX_GET_VAD† Get voice activity detection (VAD) status (spx_int32_t)
SPEEX_SET_DTX† Set discontinuous transmission (DTX) to on (1) or off (0) (spx_int32_t, default is off)
SPEEX_GET_DTX† Get discontinuous transmission (DTX) status (spx_int32_t)
SPEEX_SET_ABR† Set average bit-rate (ABR) to a value n in bits per second (spx_int32_t in bits per second)
SPEEX_GET_ABR† Get average bit-rate (ABR) setting (spx_int32_t in bits per second)
SPEEX_SET_PLC_TUNING† Tell the encoder to optimize encoding for a certain percentage of packet loss (spx_int32_t in percent)
SPEEX_GET_PLC_TUNING† Get the current tuning of the encoder for PLC (spx_int32_t in percent)
SPEEX_SET_VBR_MAX_BITRATE† Set the maximum bit-rate allowed in VBR operation (spx_int32_t in bits per second)
SPEEX_GET_VBR_MAX_BITRATE† Get the current maximum bit-rate allowed in VBR operation (spx_int32_t in bits per second)
SPEEX_SET_HIGHPASS Set the high-pass filter on (1) or off (0) (spx_int32_t, default is on)
SPEEX_GET_HIGHPASS Get the current high-pass filter status (spx_int32_t)
† applies only to the encoder
‡ applies only to the decoder


void speex_encoder_ctl( void* encoder, int request, void* ptr );

void speex_decoder_ctl( void* decoder, int request, void* ptr );




SPEEX_GET_ENH:获得知觉增强状态( spx_int32_t)

SPEEX_SET_QUALITY:设置编码质量(spx_int32_t 从0~10,默认为8 )

SPEEX_GET_QUALITY:获得当前语音编码质量(spx_int32_t 从0~10 )



SPEEX_SET_VBR:设置变比特率(VBR),1开启,0关闭(spx_int32_t, 默认关闭)

SPEEX_GET_VBR: 获得变比特率功能当前是否开启(spx_int32_t )


SPEEX_GET_VBR_QUALITY:获得当前变比特率语音的编码质量( 浮点数从0.0~10.0)

SPEEX_SET_COMPLEXITY:设置编码器的可用CPU资源( spx_int32_t从1~10,默认为2)


SPEEX_SET_BITRATE:设置不超过参数设置的最佳比特值(spx_int32_t 单位bits/s )

SPEEX_GET_BITRATE:获取当前使用的比特率( spx_int32_t 单位 bits/s)

SPEEX_SET_SAMPLING_RATE:设置实时采样率(spx_int32_t 单位 Hz )

SPEEX_GET_SAMPLING_RATE:获取实时采样率(spx_int32_t 单位 Hz)


SPEEX_SET_VAD:设置静音检测特性(VAD),1为打开,0为关闭( spx_int32_t, 默认为关闭)

SPEEX_GET_VAD:获取静音检测是否打开( spx_int32_t )

SPEEX_SET_DTX:设计非连续性传输(DTX),1为打开,0为关闭(spx_int32_t, 默认为关闭)

SPEEX_GET_DTX:获取非连续性传输(DTX)是否打开(spx_int32_t )

SPEEX_SET_ABR:设置平均比特率(ABR)值, 单位 bits/s(spx_int32_t,单位 bits/s )

SPEEX_GET_ABR:获得平均比特率设置(spx_int32_t,单位bits/s )

SPEEX_SET_PLC_TUNING:让编码器对一定的失包率开启最优化编码(spx_int32_t,单位 %)


SPEEX_SET_VBR_MAX_BITRATE:设置允许变比特率(VBR)使用的最大比特率(spx_int32_t,单位 bits/s )

SPEEX_GET_VBR_MAX_BITRATE:获取允许变比特率(VBR)使用的最大比特率(spx_int32_t,单位 bits/s )


SPEEX_GET_HIGHPASS:获取高通滤波器状态( spx_int32_t )



5.4 模式查询

Speex modes have a query system similar to the speex_encoder_ctl and speex_decoder_ctl calls. Since modes are read-only,it is only possible to get information about a particular mode. The function used to do that is:
void speex_mode_query(SpeexMode *mode, int request, void *ptr);


void speex_mode_query( SpeexMode* mode, int request, void* ptr );


The admissible values for request are (unless otherwise note, the values are returned through ptr):
SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode
SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified through ptr (integer in bps).





5.5 封包和带内信令

Sometimes it is desirable to pack more than one frame per packet (or other basic unit of storage). The proper way to do it is to call speex_encode N times before writing the stream with speex_bits_write. In cases where the number of frames is not determined by an out-of-band mechanism, it is possible to include a terminator code. That terminator consists of the code 15 (decimal) encoded with 5 bits, as shown in Table 9.2. Note that as of version 1.0.2, calling speex_bits_write automatically inserts the terminator so as to fill the last byte. This doesn’t involves any overhead and makes sure Speex can always detect when there is no more frame in a packet.

有时我们打包的数据不只一帧(或其他基本存储单元),正确做法是在用speex_bits_write写入流数据之前调用N次speex_encode。这种情况下的帧数不是由带外机制决定的,它会包含一个终结码。如表9.2所示,这个终结码是由用5bits编码的Mode 15组成。如果是1.0.2版本需注意,调用speex_bits_write时,为了填充最后字节,它会自动添加终结码。这不会增加开销,并能确保Speex一直检测到包中没有更多帧为止。


It is also possible to send in-band “messages” to the other side. All these messages are encoded as “pseudo-frames” of mode 14 which contain a 4-bit message type code, followed by the message. Table 5.1 lists the available codes, their meaning and the size of the message that follows. Most of these messages are requests that are sent to the encoder or decoder on the other end, which is free to comply or ignore them. By default, all in-band messages are ignored.


In-band signalling codes


表5.1 带内信号代码

Finally, applications may define custom in-band messages using mode 13. The size of the message in bytes is encoded with 5 bits, so that the decoder can skip it if it doesn’t know how to interpret it.

最后,一些应用会使用Mode 13自定义带内消息,消息的字节大小是用5bits编码的,所以如果编码器不知道如何解析它就会跳过。




Analysis-by-synthesis closed-loop optimization on a sub-frame

Figure 9.2: Analysis-by-synthesis closed-loop optimization on a sub-frame.


 Quality versus bit rate

Table 9.2: Quality versus bit-rate




