ffmpeg过滤器filter理论与实战-阿里云开发者社区

前言

ffmpeg 过滤器，当然也有人称为 ffmpeg 滤镜。（用滤镜听起来好像是给 video 用的，所以不太好，因为 audio 也可以用），ffmpeg 目录下，有个文件夹叫 libavfilter，它可以单独编译为一个库。干嘛用的呢？用于音视频过滤。

比如，我有一个 mp4，想把它缩小一半，输出一个新的 mp4，那么，做缩小动作的，就是 libavfilter。

本文进行 ffmpeg 过滤器的理论学习及代码实战。

一、DirectShow

在进行 ffmpeg 过滤器（filter）的学习之前，我们有必要先了解一下 DirectShow，方便后面我们学习 ffmpeg 过滤器时更方便容易理解。

1、简介

DirectShow（简称 DShow）是一个 Windows 平台上的流媒体框架，提供了高质量的多媒体流采集和回放功能。它支持多种多样的媒体文件格式，包括 ASF、MPEG、AVI、MP3 和 WAV 文件，同时支持使用 WDM 驱动或早期的 VFW 驱动来进行多媒体流的采集。

DirectShow 大大简化了媒体回放、格式转换和采集工作。但与此同时，它也为用户自定义的解决方案提供了底层流控制框架，从而使用户可以自行创建支持新的文件格式或其他用户的 DirectShow 组件。

DirectShow 专为 C++ 而设计。 Microsoft 不提供用于 DirectShow 的托管 API。

DirectShow 是基于组件对象模型（COM）的，因此当你编写 DirectShow 应用程序时，你必须具备 COM 客户端程序编写的知识。对于大部分的应用程序，你不需要实现自己的 COM 对象，DirectShow 提供了大部分你需要的 DirectShow 组件，但是假如你需要编写自己的 DirectShow 组件来进行扩充，那么你必须编写实现 COM 对象。

使用 DirectShow 编写的典型应用程序包括：DVD 播放器、视频编辑程序、AVI 到 ASF 转换器、MP3 播放器和数字视频采集应用。

2、程序基本结构

DirectShow 程序基本结构如下图所示：

3、架构

DirectShow 的架构如下图所示：

DirectShow 位于应用层中。它使用一种叫 Filter Graph 的模型来管理整个数据流的处理过程；参与数据处理的各个功能模块叫 Filter；<font color=>各个 Filter 在 Filter Graph 中按一定的顺序连接成一条 “流水线” 协同工作。（可以看出 FilterGraph 是 Filter 的容器）
按照功能来分，Filter 大致分为三类：Source Filters、Transform Filters和Rendering(sink) Filters。

Source Filters 主要负责取得数据，数据源可以是文件、因特网、或者计算机里的采集卡、数字摄像机等，然后将数据往下传输；
Transform Fitlers 主要负责数据的格式转换、传输；
Rendering Filtes 主要负责数据的最终去向，我们可以将数据送给声卡、显卡进行多媒体的演示，也可以输出到文件进行存储。

Filter 一般由一个或多个Pin组成，Filter 之间通过 Pin 相互连接。如下图所示：

在 DirectShow 系统上，我们看到的，即是我们的应用程序（Application）。应用程序要按照一定的意图建立起相应的 Filter Graph，然后通过 Filter Graph Manager 来控制整个的数据处理过程。DirectShow 能在 Filter Graph 运行的时候接收到各种事件，并通过消息的方式发送到我们的应用程序。这样，就实现了应用程序与 DirectShow 系统之间的交互。

DirectShow 使用一种叫 Filter Graph 的模型来管理整个数据流的处理过程；参与数据处理的各个功能模块叫做 Filter；各个 Filter 在 Filter Graph 中按一定的顺序连接成一条"流水线"协同工作。Filter，它是最基本的软件构件，过滤器通常在多媒体流中执行一个操作。各个 Filter在 Filter Graph 中按一定的顺序连接成一条"流水线"协同工作。如果用图论的术语描述，过滤器图是一个有向、无环、非连通图。有向是因为数据在过滤器之间以预定的方向流动；无环是指没有路径可以从一个过滤器出发又返回到它自身；而非连通是指不是所有的过滤器都可以达到所有其他过滤器。

二、过滤器

在多媒体处理中，filter 的意思是被编码到输出文件之前用来修改输入文件内容的一个软件工具。如：视频翻转，旋转，缩放等。

语法： [input_link_label1]… filter_name=parameters [output_link_label1]…

1、视频过滤器 -vf

如 input.mp4 视频按顺时针方向旋转 90 度

ffplay -i input.mp4 -vf transpose=1

如 input.mp4 视频水平翻转（左右翻转）

ffplay -i input.mp4 -vf hflip

2、音频过滤器 -af

实现慢速播放，声音速度是原始速度的 50%

ffplay input.mp3 -af atempo=0.5

3、过滤器链（Filterchain）

Filterchain = 逗号分隔的一组 filter

语法： “filter1,filter2,filter3,…filterN-2,filterN-1,filterN”

顺时针旋转 90 度并水平翻转

ffplay -i input.mp4 -vf transpose=1,hflip

4、过滤器图（Filtergraph）

下面我们先做一个镜面对称的视频举例，最终的效果如下：

第一步：源视频宽度扩大两倍

ffmpeg -i input.mp4 -t 10 -vf pad=2*iw output.mp4

第二步：源视频水平翻转

ffmpeg -i input.mp4 -t 10 -vf hflip output2.mp4

第三步：水平翻转视频覆盖 output.mp4

ffmpeg -i output.mp4 -i output2.mp4 -filter_complex overlay=w compare.mp4

下面我们用过滤器图来实现上面三条命令所实现的效果

①、基本语法

Filtergraph = 分号分隔的一组 filterchain
“filterchain1;filterchain2;…filterchainN-1;filterchainN”

实现上面三步用带有链接标记的过滤器图（Filtergraph）只需一条命令：

ffmpeg -i test.mp4 -t 10 -vf "split[a][b];[a]pad=2*iw[1];[b]hflip[2];[1][2]overlay=w" output.mp4

split 过滤器创建两个输入文件的拷贝并标记为 [a],[b]
[a] 作为 pad 过滤器的输入，pad 过滤器产生 2 倍宽度并输出到 [1]
[b] 作为 hflip 过滤器的输入，vflip 过滤器水平翻转视频并输出到 [2]
用 overlay 过滤器把 [2] 覆盖到 [1] 的旁边

②、Filtergraph 的分类

简单（simple）：一对一
复杂（complex）：多对一，多对多

简单过滤器图处理流程：

复杂过滤器图处理流程：

从图中可以发现复杂过滤器图比简单过滤器图少 2 个步骤，效率比简单高，ffmpeg 建议尽量使用复杂过滤器图。

5、结构体间的关系图

filter 涉及的结构体，主要包括：

FilterGraph, AVFilterGraph
InputFilter, InputStream, OutputFilter, OutputStream
AVFilter，AVFilterContext
AVFilterLink
AVFilterPad

它们之间的类关系如下图所示：

从上图可以看到， FFmpeg 的滤镜相关的结构体三层组成：

filtergraph 层

由结构体 FilterGraph，AVFilterGraph 组成；
其中，FilterGraph：

包含一个 InputFilter，它指示了整个 Graph 的第一个滤镜，并指示了 InputStream，从而作为整个 Graph 的输入；
包含一个 OutputFilter，它指示了整个 Graph 的最后一个滤镜，并指示了 OutputStream，从而作为整个 Graph 的输出；
包含一个 AVFilterGraph 的实例，它指示的是组成本 graph 的 filter；

filterchain 层

它由AVFilter，AVFilterContext，AVFilterLink，AVFilterPad组成；其中，AVFilterContext 是 AVFilter 的实例；

而 filter 之间是用 AVFilterLink 进行连接，意思是，滤镜之间并不是直接相连的，是通过 AVFilterLink 进行连接；
AVFilterContext 通过 AVFilterLink 进行连接后，就组成了 Filterchain。
而 AVFilterContext 与 AVFilterLink 之间的 AVFilterPad 是直接相连的，对应的关系是：AVFilterContext 的 output_pad 连接它下游 AVFilterLink 的 srcpad；AVFilterContext 的 input_pad 连接它上游 AVFilterLink 的 dstpad；

filter 层

由 AVFilterContext，AVFilterPad 组成；其中 AVFilterContext 是真正进行数据处理的滤镜实体；
AVFilterPad 用于 AVFilterContext 之间的 callback（回调）：

第一个 AVFilterContext 的 outputs[0] 指针，指向第一个 AVFilterLink，这个 AVFilterLink 的 dst
指针，指向第二个 AVFilterContext。
如果在前一个 AVFilterContext 调用 outputs[0]->dstpad->filter_frame(Frame* input_frame1)，其实就意味着，第一个过滤器，可以把处理好的一个 frame（名字为 input_frame1），通过这个调用传递给第二个过滤器的 input_pads 的 filter_frame 函数。而第二个过滤器，里面就是用户自己实现的 filter_frame()，以对数据进行处理；

三、过滤器案例实战

下面代码通过解码视频帧并将其送入滤镜图进行处理，然后将处理后的帧写入文件。滤镜描述字符串 filter_descr 指定了滤镜操作，本例中使用了 scale 和 hflip 滤镜来对视频进行缩放和水平翻转操作。最终，程序会将处理后的视频帧以 YUV420P 格式写入文件。

1、示例源码

/**
 * @file
 * API example for decoding and filtering
 * @example filtering_video.c
 */
#define _XOPEN_SOURCE 600 /* for usleep */
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavfilter/buffersink.h>
#include <libavfilter/buffersrc.h>
#include <libavutil/opt.h>
const char *filter_descr = "scale=640:480,hflip";
/* other way:
   scale=78:24 [scl]; [scl] transpose=cclock // assumes "[in]" and "[out]" to be input output pads respectively
 */
static AVFormatContext *fmt_ctx;
static AVCodecContext *dec_ctx;
AVFilterContext *buffersink_ctx;
AVFilterContext *buffersrc_ctx;
AVFilterGraph *filter_graph;
static int video_stream_index = -1;
static int64_t last_pts = AV_NOPTS_VALUE;
static int open_input_file(const char *filename)
{
    int ret;
    AVCodec *dec;
    if ((ret = avformat_open_input(&fmt_ctx, filename, NULL, NULL)) < 0) {
        av_log(NULL, AV_LOG_ERROR, "Cannot open input file\n");
        return ret;
    }
    if ((ret = avformat_find_stream_info(fmt_ctx, NULL)) < 0) {
        av_log(NULL, AV_LOG_ERROR, "Cannot find stream information\n");
        return ret;
    }
    /* select the video stream */
    ret = av_find_best_stream(fmt_ctx, AVMEDIA_TYPE_VIDEO, -1, -1, &dec, 0);
    if (ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "Cannot find a video stream in the input file\n");
        return ret;
    }
    video_stream_index = ret;
    /* create decoding context */
    dec_ctx = avcodec_alloc_context3(dec);
    if (!dec_ctx)
        return AVERROR(ENOMEM);
    avcodec_parameters_to_context(dec_ctx, fmt_ctx->streams[video_stream_index]->codecpar);
    /* init the video decoder */
    if ((ret = avcodec_open2(dec_ctx, dec, NULL)) < 0) {
        av_log(NULL, AV_LOG_ERROR, "Cannot open video decoder\n");
        return ret;
    }
    return 0;
}
static int init_filters(const char *filters_descr)
{
    char args[512];
    int ret = 0;
    const AVFilter *buffersrc  = avfilter_get_by_name("buffer");
    const AVFilter *buffersink = avfilter_get_by_name("buffersink");
    AVFilterInOut *outputs = avfilter_inout_alloc();
    AVFilterInOut *inputs  = avfilter_inout_alloc();
    AVRational time_base = fmt_ctx->streams[video_stream_index]->time_base;
    enum AVPixelFormat pix_fmts[] = { AV_PIX_FMT_GRAY8, AV_PIX_FMT_NONE };
    filter_graph = avfilter_graph_alloc();
    if (!outputs || !inputs || !filter_graph) {
        ret = AVERROR(ENOMEM);
        goto end;
    }
    /* buffer video source: the decoded frames from the decoder will be inserted here. */
    snprintf(args, sizeof(args),
            "video_size=%dx%d:pix_fmt=%d:time_base=%d/%d:pixel_aspect=%d/%d",
            dec_ctx->width, dec_ctx->height, dec_ctx->pix_fmt,
            time_base.num, time_base.den,
            dec_ctx->sample_aspect_ratio.num, dec_ctx->sample_aspect_ratio.den);
    ret = avfilter_graph_create_filter(&buffersrc_ctx, buffersrc, "in",
                                       args, NULL, filter_graph);
    if (ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "Cannot create buffer source\n");
        goto end;
    }
    /* buffer video sink: to terminate the filter chain. */
    ret = avfilter_graph_create_filter(&buffersink_ctx, buffersink, "out",
                                       NULL, NULL, filter_graph);
    if (ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "Cannot create buffer sink\n");
        goto end;
    }
    /*
     * Set the endpoints for the filter graph. The filter_graph will
     * be linked to the graph described by filters_descr.
     */
    /*
     * The buffer source output must be connected to the input pad of
     * the first filter described by filters_descr; since the first
     * filter input label is not specified, it is set to "in" by
     * default.
     */
    outputs->name       = av_strdup("in");
    outputs->filter_ctx = buffersrc_ctx;
    outputs->pad_idx    = 0;
    outputs->next       = NULL;
    /*
     * The buffer sink input must be connected to the output pad of
     * the last filter described by filters_descr; since the last
     * filter output label is not specified, it is set to "out" by
     * default.
     */
    inputs->name       = av_strdup("out");
    inputs->filter_ctx = buffersink_ctx;
    inputs->pad_idx    = 0;
    inputs->next       = NULL;
    if ((ret = avfilter_graph_parse_ptr(filter_graph, filters_descr,
                                    &inputs, &outputs, NULL)) < 0)
        goto end;
    if ((ret = avfilter_graph_config(filter_graph, NULL)) < 0)
        goto end;
end:
    avfilter_inout_free(&inputs);
    avfilter_inout_free(&outputs);
    return ret;
}
/// 将yuv帧写入文件:yuv420p格式
FILE * g__file_fd;
static void write_frame(const AVFrame *frame)
{
    static int printf_flag = 0;
    if(!printf_flag){
        printf_flag = 1;
        printf("frame widht=%d,frame height=%d\n",frame->width,frame->height);
        if(frame->format==AV_PIX_FMT_YUV420P){
            printf("format is yuv420p\n");
        }
        else{
            printf("formet is = %d \n",frame->format);
        }
    }
    fwrite(frame->data[0],1,frame->width*frame->height,g__file_fd);
    fwrite(frame->data[1],1,frame->width/2*frame->height/2,g__file_fd);
    fwrite(frame->data[2],1,frame->width/2*frame->height/2,g__file_fd);
}
int main(int argc, char **argv)
{
    int ret;
    AVPacket packet;
    AVFrame *frame;
    AVFrame *filt_frame;
    g__file_fd = fopen("./debug/test.yuv", "w");
    frame = av_frame_alloc();
    filt_frame = av_frame_alloc();
    if (!frame || !filt_frame) {
        perror("Could not allocate frame");
        exit(1);
    }
    if ((ret = open_input_file("./debug/test.flv")) < 0)
        goto end;
    if ((ret = init_filters(filter_descr)) < 0)
        goto end;
    /* read all packets */
    while (1) {
        if ((ret = av_read_frame(fmt_ctx, &packet)) < 0)
            break;
        if (packet.stream_index == video_stream_index) {
            ret = avcodec_send_packet(dec_ctx, &packet);
            if (ret < 0) {
                av_log(NULL, AV_LOG_ERROR, "Error while sending a packet to the decoder\n");
                break;
            }
            while (ret >= 0) {
                ret = avcodec_receive_frame(dec_ctx, frame);
                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
                    break;
                } else if (ret < 0) {
                    av_log(NULL, AV_LOG_ERROR, "Error while receiving a frame from the decoder\n");
                    goto end;
                }
                frame->pts = frame->best_effort_timestamp;  // 将该时间戳设置为帧的显示时间戳
                /* push the decoded frame into the filtergraph */
                if (av_buffersrc_add_frame_flags(buffersrc_ctx, frame, AV_BUFFERSRC_FLAG_KEEP_REF) < 0) {
                    av_log(NULL, AV_LOG_ERROR, "Error while feeding the filtergraph\n");
                    break;
                }
                /* pull filtered frames from the filtergraph */
                while (1) {
                    ret = av_buffersink_get_frame(buffersink_ctx, filt_frame);
                    if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
                        break;
                    if (ret < 0)
                        goto end;
                    write_frame(filt_frame);
                    av_frame_unref(filt_frame);
                }
                av_frame_unref(frame);
            }
        }
        av_packet_unref(&packet);
    }
end:
    avfilter_graph_free(&filter_graph);
    avcodec_free_context(&dec_ctx);
    avformat_close_input(&fmt_ctx);
    av_frame_free(&frame);
    av_frame_free(&filt_frame);
    fclose(g__file_fd);
    if (ret < 0 && ret != AVERROR_EOF) {
        fprintf(stderr, "Error occurred: %s\n", av_err2str(ret));
        exit(1);
    }
    exit(0);
}

2、运行结果

打印输出如下：

frame widht=640,frame height=480
format is yuv420p

生成的原始视频数据 test.yuv 文件相比之前已进行压缩的 test.flv 文件大很多

使用 yuvplayer.exe 播放生成的 test.yuv 文件可以看到下面的结果：

ffmpeg过滤器filter理论与实战

前言