💥1 概述
在人类的感知系统中,视觉感知和听觉感知是两种主要感知手段,其中,声音是传递信息的重要媒介,也是听觉感知系统的基本要素之一。当前,随着人工智能技术的迅速发展,机器视觉、计算机视觉等方面的技术虽趋于完善,但机器听觉的主要研究对象是语音和声纹,其对复杂声音事件分类和检测等领域的研究仍相对匮乏。目前声音事件分类主要应用于公共安全智能化监控、异常声音检测、城市噪音检测等领域。但当前已有的声音事件分类和定位检测模型仍存在分类准确率不够高、定位精度较低以及检测错误率较高等问题,本文提出一种有效的先验信噪比平滑方法。该方法从先验信噪比的定义出发,使用小波阈值多窗口功率谱估计方法减小语音功率谱和噪声谱的方差,从而实现先验信噪比的平滑。并运用一种无监督系统,使用增强型多窗口萨维茨基-戈雷 (MWSG) 频谱图进行稳健的鸟类声音检测。
📚2 运行结果
部分代码:
%% % This file is included to demonstrate the steps of algorithm we used to % detect Bird sounds using Multiple Window Savitzky-Golay(MWSG) Filter close all; clear all; clc; %% Step 1 MWSG Filter % Compute the MWSG spectrogram for the given audio signal disp('Reading wav File'); [signal,fs]=audioread('./PC5_20090606_050000_0010.wav'); %% MLSP audio file % Parameters M=21; %Matrix length required to calculate SG coefficents P=3; %Order required to calculate SG coefficents nfft=512; %FFT Order shift=256; % Shift winlength=512;% Window Length disp('Computing MWSG Spectrogram'); MWSG=compute_MWSG_Spec(signal,fs,M,P); %% Step 2 Directionality % Calculate the directional spectrograms based on MWSG Spectrogram % Parameters len=11; % No of array values to be summed up in the required direction disp('Computing directional Spectrograms on MWSG Spectrogram'); [x_D1,x_D2,x_D3,x_D4,DAll]=compute_Dir_Spec_From_MWSG(MWSG,len); %% Step 3 Segmentation % Calculating Predicted frames for each directed spectrogam disp('Computing Predicted Frames'); d1=segment(x_D1); % Predicted frames at 0 degrees directed spectrogram d2=segment(x_D2); % Predicted frames at 0 degrees directed spectrogram d3=segment(x_D3); % Predicted frames at 0 degrees directed spectrogram d4=segment(x_D4); % Predicted frames at 0 degrees directed spectrogram % Final Predicted frames(d) = max(each directional predicted frame) d= (d1+d3+d2+d4); d(d>0)=1; %% Figures % Just to get Frequency and Time Points [~,F,T,~]=spectrogram(signal,winlength,shift,nfft,fs); % Loading ground truth frames disp('Reading GroundTruth'); load('./GroundTruth.txt'); disp('Saving Figures'); h = figure; set(h, 'Visible', 'off'); subplot(3,1,1); time=(1:length(signal))/fs; plot(time,signal); % Signal title('Signal'); xlabel('Time in sec'); ylabel('Amplitute'); subplot(3,1,2); surf(T,F,10*log10(MWSG),'EdgeColor','none'); %MWSG Spectrogram view(0,90); axis tight; title('MWSG Spectrogram'); xlabel('Time in sec'); ylabel('Frequency in Hz'); subplot(3,1,3); plot(T,GroundTruth,'r'); % GroundTruth Frames hold on; plot(T,d,'b'); % Predicted Frames hold off; ylim([0 2]); xlabel('Time in sec'); legend('GroundTruth','Predicted Frames'); saveas(h,'./MWSG_Spectrogram.png');
🎉3 参考文献
部分理论来源于网络,如有侵权请联系删除。
[1]Nithin Rao, Nisha G Meenakshi, Prasanta Kumar Ghosh (2017) Spectrogram enhancement using multiple window Savitzky Golay (MWSG) filter for robust bird sound detection [Source Code].
[2]焦人杰,侯丽敏.基于小波阈值多窗口功率谱估计的语音增强[J].上海大学学报(自然科学版),2008(03):230-235.