基于RTMP的智慧数字人|AI数字人传输技术方案探讨

简介: 随着智慧数字人、AI数字人的兴起,越来越多的公司着手构建全息、真实感数字角色等技术合成的数字仿真人虚拟形象,通过“虚拟形象+语音交互(T-T-S、ASR)+自然语言理解(NLU)+深度学习”,构建适用于数字客服、虚拟展厅讲解、 智慧城市、智慧医疗、智慧教育等场景,通过人机可视化语音交互,释放人员基础劳动力,降低运营成本,提升智慧交互体验。

技术背景

随着智慧数字人、AI数字人的兴起,越来越多的公司着手构建全息、真实感数字角色等技术合成的数字仿真人虚拟形象,通过“虚拟形象+语音交互(T-T-S、ASR)+自然语言理解(NLU)+深度学习”,构建适用于数字客服、虚拟展厅讲解、 智慧城市、智慧医疗、智慧教育等场景,通过人机可视化语音交互,释放人员基础劳动力,降低运营成本,提升智慧交互体验。


一个有“温度”的智慧数字人,有多个维度组成,如图像识别、语音识别、语义理解等,本文主要阐述的是如何把这样一个智慧数字人,通过编码传输,以更低的延迟和好的体验,呈现给用户。

技术实现

本文以Windows平台为例,从技术角度探讨智慧数字人的实时编码传输。先上图:

b0667adf2f4d4b39a1e7ab9e51864ae2.png

左侧是Unity采集、获取video Texture和AudioClip数据,编码打包后,然后通过RTMP推送到服务端,右下侧实时拉取RTMP流数据播放,整体延迟在毫秒级。


视频采集这块,实现了Unity获取到的Texture数据的采集、摄像头采集、屏幕采集三大类:

public void SelVideoPushType(int type)
    {
        switch (type)
        {
            case 0:
                video_push_type_ = (uint)NTSmartPublisherDefine.NT_PB_E_VIDEO_OPTION.NT_PB_E_VIDEO_OPTION_LAYER;    //采集Unity窗体
                break;
            case 1:
                video_push_type_ = (uint)NTSmartPublisherDefine.NT_PB_E_VIDEO_OPTION.NT_PB_E_VIDEO_OPTION_CAMERA;   //采集摄像头
                break;
            case 2:
                video_push_type_ = (uint)NTSmartPublisherDefine.NT_PB_E_VIDEO_OPTION.NT_PB_E_VIDEO_OPTION_SCREEN;   //采集屏幕
                break;
            case 3:
                video_push_type_ = (uint)NTSmartPublisherDefine.NT_PB_E_VIDEO_OPTION.NT_PB_E_VIDEO_OPTION_NO_VIDEO; //不采集视频
                break;
        }
        Debug.Log("SelVideoPushType type: " + type + " video_push_type: " + video_push_type_);
    }

音频采集部分,我们主要实现了采集AudioClip的声音、麦克风、扬声器、还有两路AudioClip的音频混音:

public void SelAudioPushType(int type)
    {
        switch (type)
        {
            case 0:
                audio_push_type_ = (uint)NTSmartPublisherDefine.NT_PB_E_AUDIO_OPTION.NT_PB_E_AUDIO_OPTION_EXTERNAL_PCM_DATA;    //采集Unity声音
                break;
            case 1:
                audio_push_type_ = (uint)NTSmartPublisherDefine.NT_PB_E_AUDIO_OPTION.NT_PB_E_AUDIO_OPTION_CAPTURE_MIC;  //采集麦克风
                break;
            case 2:
                audio_push_type_ = (uint)NTSmartPublisherDefine.NT_PB_E_AUDIO_OPTION.NT_PB_E_AUDIO_OPTION_CAPTURE_SPEAKER;  //采集扬声器
                break;
            case 3:
                audio_push_type_ = (uint)NTSmartPublisherDefine.NT_PB_E_AUDIO_OPTION.NT_PB_E_AUDIO_OPTION_TWO_EXTERNAL_PCM_MIXER;  //两路Unity AudioClip混音
                break;
            case 4:
                audio_push_type_ = (uint)NTSmartPublisherDefine.NT_PB_E_AUDIO_OPTION.NT_PB_E_AUDIO_OPTION_NO_AUDIO;   //不采集音频
                break;
        }
        Debug.Log("SelAudioPushType type: " + type + " audio_push_type: " + audio_push_type_);
    }

为了便于测试延迟,在页面加了个简单的时间日期刷新:

//获取当前时间
        GameObject.Find("Canvas/Panel/LableText").GetComponent<Text>().text = string.Format("{0:D2}:{1:D2}:{2:D2}:{3:D2} " + "{4:D4}/{5:D2}/{6:D2}",
            DateTime.Now.Hour, DateTime.Now.Minute, DateTime.Now.Second, DateTime.Now.Millisecond,
            DateTime.Now.Year, DateTime.Now.Month, DateTime.Now.Day);

Unity窗体或Camera采集,可以从Texuture拿到数据,从而获取到rgb数据,投递到封装的wrapper层,实现编码传输。

if (texture_ == null || video_width_ != Screen.width || video_height_ != Screen.height)
  {
      Debug.Log("OnPostRender screen changed++ scr_width: " + Screen.width + " scr_height: " + Screen.height);
      if (screen_image_ != IntPtr.Zero)
      {
          Marshal.FreeHGlobal(screen_image_);
          screen_image_ = IntPtr.Zero;
      }
      if (texture_ != null)
      {
          UnityEngine.Object.Destroy(texture_);
          texture_ = null;
      }
      video_width_ = Screen.width;
      video_height_ = Screen.height;
      texture_ = new Texture2D(video_width_, video_height_, TextureFormat.BGRA32, false);
      screen_image_ = Marshal.AllocHGlobal(video_width_ * 4 * video_height_);
      Debug.Log("OnPostRender screen changed--");
      return;
  }
  texture_.ReadPixels(new Rect(0, 0, video_width_, video_height_), 0, 0, false);
  texture_.Apply();

摄像头和屏幕采集,可以直接在封装层实现,如果需要做预览,只需要把数据回到Unity,通过RawImage实时刷新Texture显示即可。


通过封装层实现数据预览:

public bool StartPreview()
    {
        if(CheckPublisherHandleAvailable() == false)
            return false;
        video_preview_image_callback_ = new NT_PB_SDKVideoPreviewImageCallBack(SDKVideoPreviewImageCallBack);
        NTSmartPublisherSDK.NT_PB_SetVideoPreviewImageCallBack(publisher_handle_, (int)NTSmartPublisherDefine.NT_PB_E_IMAGE_FORMAT.NT_PB_E_IMAGE_FORMAT_RGB32, IntPtr.Zero, video_preview_image_callback_);
        if (NTBaseCodeDefine.NT_ERC_OK != NTSmartPublisherSDK.NT_PB_StartPreview(publisher_handle_, 0, IntPtr.Zero))
        {
            if (0 == publisher_handle_count_)
            {
                NTSmartPublisherSDK.NT_PB_Close(publisher_handle_);
                publisher_handle_ = IntPtr.Zero;
            }
            return false;
        }
        publisher_handle_count_++;
        is_previewing_ = true;
        return true;
    }
    public void StopPreview()
    {
        if (is_previewing_ == false) return;
        is_previewing_ = false;
        publisher_handle_count_--;
        NTSmartPublisherSDK.NT_PB_StopPreview(publisher_handle_);
        if (0 == publisher_handle_count_)
        {
            NTSmartPublisherSDK.NT_PB_Close(publisher_handle_);
            publisher_handle_ = IntPtr.Zero;
        }
    }

预览数据回调:

//预览数据回调
  public void SDKVideoPreviewImageCallBack(IntPtr handle, IntPtr user_data, IntPtr image)
  {
      NT_PB_Image pb_image = (NT_PB_Image)Marshal.PtrToStructure(image, typeof(NT_PB_Image));
      NT_VideoFrame pVideoFrame = new NT_VideoFrame();
      pVideoFrame.width_ = pb_image.width_;
      pVideoFrame.height_ = pb_image.height_;
      pVideoFrame.stride_ = pb_image.stride_[0];
      Int32 argb_size = pb_image.stride_[0] * pb_image.height_;
      pVideoFrame.plane_data_ = new byte[argb_size];
      if (argb_size > 0)
      {
          Marshal.Copy(pb_image.plane_[0],pVideoFrame.plane_data_,0, argb_size);
      }
      {
          cur_image_ = pVideoFrame;
      }
  }

音频采集这块,Unity环境下,主要是采集Unity的AudioClip数据,这块需要注意的是,PCM数据发送间隔,每隔10毫秒发一次,因为AudioClip的size比如可能只有十几秒或者几分钟,需要考虑的是,AudioClip数据采集播放完毕后,是loop的形式反复播放,还是静音帧的形式,只传视频,不传音频。

var pcm_data = new PCMData();
pcm_data.sample_rate_ = audio_clip_info_.audio_clip_.frequency;
pcm_data.channels_ = audio_clip_info_.audio_clip_.channels;
pcm_data.per_channel_sample_number_ = pcm_data.sample_rate_ / 100;
var pcm_sample = new float[pcm_data.sample_rate_ * pcm_data.channels_ / 100];
audio_clip_info_.audio_clip_.GetData(pcm_sample, audio_clip_info_.audio_clip_offset_);
var sample_length = sizeof(float) * pcm_sample.Length;
pcm_data.data_ = Marshal.AllocHGlobal(sample_length);
Marshal.Copy(pcm_sample, 0, pcm_data.data_, pcm_sample.Length);
pcm_data.size_ = (uint)sample_length;
publisher_wrapper_.OnPostAudioPCMFloatData(pcm_data.data_,
    pcm_data.size_,
    pcm_time_stamp_,
    pcm_data.sample_rate_,
    pcm_data.channels_,
    pcm_data.per_channel_sample_number_);
Marshal.FreeHGlobal(pcm_data.data_);
pcm_data.data_ = IntPtr.Zero;
pcm_data = null;
pcm_time_stamp_ += 10;  //时间戳自增10毫秒

如果要两路混音,只要再从Resources下面,获取另一路AudioClip数据,然后投递即可:

audio_clip_info_mix_ = new AudioClipInfo();
audio_clip_info_mix_.audio_clip_ = Resources.Load("AudioData/music") as AudioClip;

数据投递,用以下接口:

publisher_wrapper_.OnPostAudioExternalPCMFloatMixerData(pcm_data_mix.data_,
      pcm_data_mix.size_,
      pcm_time_stamp_mix_,
      pcm_data_mix.sample_rate_,
      pcm_data_mix.channels_,
      pcm_data_mix.per_channel_sample_number_);

数据采集投递过来后,我们以图层的形式投递过来,设置音视频编码参数,底层实现音视频编码:

/*
 * nt_publisher_wrapper.cs
 * nt_publisher_wrapper
 * 
 * Github: https://github.com/daniulive/SmarterStreaming
 * 
 * Created by DaniuLive on 2017/11/14.
 */  
private void SetCommonOptionToPublisherSDK()
  {
      if (!IsPublisherHandleAvailable())
      {
          Debug.Log("SetCommonOptionToPublisherSDK, publisher handle with null..");
          return;
      }
      NTSmartPublisherSDK.NT_PB_ClearLayersConfig(publisher_handle_, 0,
                      0, IntPtr.Zero);
      if (video_option_ == (uint)NTSmartPublisherDefine.NT_PB_E_VIDEO_OPTION.NT_PB_E_VIDEO_OPTION_LAYER)
      {
          // 第0层填充RGBA矩形, 目的是保证帧率, 颜色就填充全黑
          int red = 0;
          int green = 0;
          int blue = 0;
          int alpha = 255;
          NT_PB_RGBARectangleLayerConfig rgba_layer_c0 = new NT_PB_RGBARectangleLayerConfig();
          rgba_layer_c0.base_.type_ = (Int32)NTSmartPublisherDefine.NT_PB_E_LAYER_TYPE.NT_PB_E_LAYER_TYPE_RGBA_RECTANGLE;
          rgba_layer_c0.base_.index_ = 0;
          rgba_layer_c0.base_.enable_ = 1;
          rgba_layer_c0.base_.region_.x_ = 0;
          rgba_layer_c0.base_.region_.y_ = 0;
          rgba_layer_c0.base_.region_.width_ = video_width_;
          rgba_layer_c0.base_.region_.height_ = video_height_;
          rgba_layer_c0.base_.offset_ = Marshal.OffsetOf(rgba_layer_c0.GetType(), "base_").ToInt32();
          rgba_layer_c0.base_.cb_size_ = (uint)Marshal.SizeOf(rgba_layer_c0);
          rgba_layer_c0.red_ = System.BitConverter.GetBytes(red)[0];
          rgba_layer_c0.green_ = System.BitConverter.GetBytes(green)[0];
          rgba_layer_c0.blue_ = System.BitConverter.GetBytes(blue)[0];
          rgba_layer_c0.alpha_ = System.BitConverter.GetBytes(alpha)[0];
          IntPtr rgba_conf = Marshal.AllocHGlobal(Marshal.SizeOf(rgba_layer_c0));
          Marshal.StructureToPtr(rgba_layer_c0, rgba_conf, true);
          UInt32 rgba_r = NTSmartPublisherSDK.NT_PB_AddLayerConfig(publisher_handle_, 0,
                          rgba_conf, (int)NTSmartPublisherDefine.NT_PB_E_LAYER_TYPE.NT_PB_E_LAYER_TYPE_RGBA_RECTANGLE,
                          0, IntPtr.Zero);
          Marshal.FreeHGlobal(rgba_conf);
          NT_PB_ExternalVideoFrameLayerConfig external_layer_c1 = new NT_PB_ExternalVideoFrameLayerConfig();
          external_layer_c1.base_.type_ = (Int32)NTSmartPublisherDefine.NT_PB_E_LAYER_TYPE.NT_PB_E_LAYER_TYPE_EXTERNAL_VIDEO_FRAME;
          external_layer_c1.base_.index_ = 1;
          external_layer_c1.base_.enable_ = 1;
          external_layer_c1.base_.region_.x_ = 0;
          external_layer_c1.base_.region_.y_ = 0;
          external_layer_c1.base_.region_.width_ = video_width_;
          external_layer_c1.base_.region_.height_ = video_height_;
          external_layer_c1.base_.offset_ = Marshal.OffsetOf(external_layer_c1.GetType(), "base_").ToInt32();
          external_layer_c1.base_.cb_size_ = (uint)Marshal.SizeOf(external_layer_c1);
          IntPtr external_layer_conf = Marshal.AllocHGlobal(Marshal.SizeOf(external_layer_c1));
          Marshal.StructureToPtr(external_layer_c1, external_layer_conf, true);
          UInt32 external_r = NTSmartPublisherSDK.NT_PB_AddLayerConfig(publisher_handle_, 0,
                          external_layer_conf, (int)NTSmartPublisherDefine.NT_PB_E_LAYER_TYPE.NT_PB_E_LAYER_TYPE_EXTERNAL_VIDEO_FRAME,
                          0, IntPtr.Zero);
          Marshal.FreeHGlobal(external_layer_conf);
      }
      else if (video_option_ == (uint)NTSmartPublisherDefine.NT_PB_E_VIDEO_OPTION.NT_PB_E_VIDEO_OPTION_CAMERA)
      {
          CameraInfo camera = cameras_[cur_sel_camera_index_];
          NT_PB_VideoCaptureCapability cap = camera.capabilities_[cur_sel_camera_resolutions_index_];
          SetVideoCaptureDeviceBaseParameter(camera.id_.ToString(), (UInt32)cap.width_, (UInt32)cap.height_);
      }
      SetFrameRate((uint)video_fps_);
      Int32 type = 0;   //软编码
      Int32 encoder_id = 1;
      UInt32 codec_id = (UInt32)NTCommonMediaDefine.NT_MEDIA_CODEC_ID.NT_MEDIA_CODEC_ID_H264;
      Int32 param1 = 0;
      SetVideoEncoder(type, encoder_id, codec_id, param1);
      SetVideoQualityV2(CalVideoQuality(video_width_, video_height_, is_h264_encoder_));
      SetVideoBitRate(CalBitRate(video_fps_, video_width_, video_height_));
      SetVideoMaxBitRate((CalMaxKBitRate(video_fps_, video_width_, video_height_, false)));
      SetVideoKeyFrameInterval((key_frame_interval_));
      if (is_h264_encoder_)
      {
          SetVideoEncoderProfile(1);
      }
      SetVideoEncoderSpeed(CalVideoEncoderSpeed(video_width_, video_height_, is_h264_encoder_));
      // 音频相关设置
      SetAuidoInputDeviceId(0);
      SetPublisherAudioCodecType(1);
      SetPublisherMute(is_mute_);
      SetEchoCancellation(0, 0);
      SetNoiseSuppression(0);
      SetAGC(0);
      SetVAD(0);
      SetInputAudioVolume(Convert.ToSingle(audio_input_volume_));
  }

编码打包后,可以调用推送接口,把打包后的数据,实时传到RTMP服务端:

public bool StartPublisher(String url)
    {
        if (CheckPublisherHandleAvailable() == false) return false;
        if (publisher_handle_ == IntPtr.Zero)
        {
            return false;
        }
        if (!String.IsNullOrEmpty(url))
        {
            NTSmartPublisherSDK.NT_PB_SetURL(publisher_handle_, url, IntPtr.Zero);
        }
        if (NTBaseCodeDefine.NT_ERC_OK != NTSmartPublisherSDK.NT_PB_StartPublisher(publisher_handle_, IntPtr.Zero))
        {
            if (0 == publisher_handle_count_)
            {
                NTSmartPublisherSDK.NT_PB_Close(publisher_handle_);
                publisher_handle_ = IntPtr.Zero;
            }
            is_publishing_ = false;
            return false;
        }
        publisher_handle_count_++;
        is_publishing_ = true;
        return true;
    }
    public void StopPublisher()
    {
        if (is_publishing_ == false) return;
        publisher_handle_count_--;
        NTSmartPublisherSDK.NT_PB_StopPublisher(publisher_handle_);
        if (0 == publisher_handle_count_)
        {
            NTSmartPublisherSDK.NT_PB_Close(publisher_handle_);
            publisher_handle_ = IntPtr.Zero;
        }
        is_publishing_ = false;
    }

RTMP传输这块,需要把Event状态回调给Unity,确保Unity实时处理网络异常:


Unity层处理:

public event Action<uint,string> OnLogEventMsg;
publisher_wrapper_.OnLogEventMsg += OnLogHandle;
private void OnLogHandle(uint arg1, string arg2)
{
    Debug.Log(arg2);
}

wrapper层处理:

private void PbEventCallBack(IntPtr handle, IntPtr user_data, 
      UInt32 event_id,
      Int64 param1,
      Int64 param2,
      UInt64 param3,
      UInt64 param4,
      [MarshalAs(UnmanagedType.LPStr)] String param5,
      [MarshalAs(UnmanagedType.LPStr)] String param6,
      IntPtr param7)
  {
      String event_log = "";
      switch (event_id)
      {
          case (uint)NTSmartPublisherDefine.NT_PB_E_EVENT_ID.NT_PB_E_EVENT_ID_CONNECTING:
              event_log = "连接中";
              if (!String.IsNullOrEmpty(param5))
              {
                  event_log = event_log + " url:" + param5;
              }
              break;
          case (uint)NTSmartPublisherDefine.NT_PB_E_EVENT_ID.NT_PB_E_EVENT_ID_CONNECTION_FAILED:
              event_log = "连接失败";
              if (!String.IsNullOrEmpty(param5))
              {
                  event_log = event_log + " url:" + param5;
              }
              break;
          case (uint)NTSmartPublisherDefine.NT_PB_E_EVENT_ID.NT_PB_E_EVENT_ID_CONNECTED:
              event_log = "已连接";
              if (!String.IsNullOrEmpty(param5))
              {
                  event_log = event_log + " url:" + param5;
              }
              break;
          case (uint)NTSmartPublisherDefine.NT_PB_E_EVENT_ID.NT_PB_E_EVENT_ID_DISCONNECTED:
              event_log = "断开连接";
              if (!String.IsNullOrEmpty(param5))
              {
                  event_log = event_log + " url:" + param5;
              }
              break;
          default:
              break;
      }
      if(OnLogEventMsg != null) OnLogEventMsg.Invoke(event_id, event_log);
  }

总结

以上是大概的流程,通过采集Unity的音视频数据,编码打包传输,发送到RTMP服务端,客户端直接拉取RTMP流数据,延迟在毫秒级,用户体验良好,在智慧数字人等交互场景,体验极佳。

相关文章
|
4月前
|
人工智能 缓存 调度
技术改变AI发展:RDMA能优化吗?GDR性能提升方案(GPU底层技术系列二)
随着人工智能(AI)的迅速发展,越来越多的应用需要巨大的GPU计算资源。GPUDirect RDMA 是 Kepler 级 GPU 和 CUDA 5.0 中引入的一项技术,可以让使用pcie标准的gpu和第三方设备进行直接的数据交换,而不涉及CPU。
134623 6
|
1月前
|
数据采集 人工智能 Rust
『GitHub项目圈选周刊01』一款构建AI数字人项目开源了!自动实现音视频同步!
『GitHub项目圈选周刊01』一款构建AI数字人项目开源了!自动实现音视频同步!
219 0
|
6月前
|
人工智能
AI 绘画Stable Diffusion 研究(十二)SD数字人制作工具SadTlaker插件安装教程
AI 绘画Stable Diffusion 研究(十二)SD数字人制作工具SadTlaker插件安装教程
608 0
|
6月前
|
人工智能 编解码
AI 绘画Stable Diffusion 研究(十三)SD数字人制作工具SadTlaker使用教程
AI 绘画Stable Diffusion 研究(十三)SD数字人制作工具SadTlaker使用教程
432 0
|
3月前
|
人工智能 弹性计算 缓存
带你读《弹性计算技术指导及场景应用》——2. 技术改变AI发展:RDMA能优化吗?GDR性能提升方案
带你读《弹性计算技术指导及场景应用》——2. 技术改变AI发展:RDMA能优化吗?GDR性能提升方案
105 1
|
8月前
|
人工智能 监控 安全
智慧园区方案:AI 与视频融合技术如何助力园区监管智能化升级
随着科技的不断发展,人工智能(AI)技术正在各个领域迅速应用和推广。其中,智慧园区是一个重要的应用场景,它通过 AI 技术的支持,实现了园区的智能化管理和高效运营。
259 0
|
8月前
|
人工智能 机器人
AI智能自动交易量化机器人系统开发稳定版丨案例设计丨方案项目丨功能分析丨源码说明
When developing an AI automated quantitative trading robot system, it is first necessary to clarify the system's goals and requirements. Determine key factors such as the market, trading strategy, and risk control methods to be traded. Next, establish the basic framework for data acquisition and pro
|
8月前
|
人工智能 达摩院 语音技术
用1张图像生成数字人,快来制作你的AI视频吧~
最近魔搭上线了一项新能力——仅需输入单张人像照片,利用文字或语音驱动即可秒级生成数字人AI视频!这让小编的短视频UP梦又重新启航燃起了希望!它完全解救了社恐星人,图生视频能力替你说话、唱歌、讲段子、吟诗....无需再对着摄像头NG,一整个绝绝子叠buff!
用1张图像生成数字人,快来制作你的AI视频吧~