Traitement des signaux audio de musique -- debruitage: suppression du bruit de fond (De-hissing)
Contents
Introduction: utilisation de LTFAT et preparation
Demarrage de la toolbox ltfat
clear 'variables'; close all; clc; ltfatstart;
LTFAT version 2.1.1. Copyright 2005-2015 Peter L. S��ndergaard. For help, please type "ltfathelp". LTFAT is using the MEX backend. (Your global and persistent variables have just been cleared. Sorry.)
%Lire le fichier audio jazz.wav fourni, ou celui de votre choixa l'aide de %la fonction audioread, puis l'afficher avec une echelle temporelle adaptee [x,fs] = audioread('./jazz.wav'); x = x(:); T = length(x); TimeAxis = linspace(0,T/fs,T); figure; plot(TimeAxis,x); title('Samples representation');
On s'appuiera sur la toolbox LTFAT pour toutes les transformees temps-frequence. En particulier, on utilisera les fonctions dgtreal} et idgtreal:
help dgtreal help idgtreal
DGTREAL Discrete Gabor transform for real-valued signals Usage: c=dgtreal(f,g,a,M); c=dgtreal(f,g,a,M,L); [c,Ls]=dgtreal(f,g,a,M); [c,Ls]=dgtreal(f,g,a,M,L); Input parameters: f : Input data g : Window function. a : Length of time shift. M : Number of modulations. L : Length of transform to do. Output parameters: c : M*N array of coefficients. Ls : Length of input signal. DGTREAL(f,g,a,M) computes the Gabor coefficients (also known as a windowed Fourier transform) of the real-valued input signal f with respect to the real-valued window g and parameters a and M. The output is a vector/matrix in a rectangular layout. As opposed to DGT only the coefficients of the positive frequencies of the output are returned. DGTREAL will refuse to work for complex valued input signals. The length of the transform will be the smallest multiple of a and M* that is larger than the signal. f will be zero-extended to the length of the transform. If f is a matrix, the transformation is applied to each column. The length of the transform done can be obtained by L=size(c,2)*a. The window g may be a vector of numerical values, a text string or a cell array. See the help of GABWIN for more details. DGTREAL(f,g,a,M,L) computes the Gabor coefficients as above, but does a transform of length L. f will be cut or zero-extended to length L before the transform is done. [c,Ls]=DGTREAL(f,g,a,M) or [c,Ls]=DGTREAL(f,g,a,M,L) additionally returns the length of the input signal f. This is handy for reconstruction: [c,Ls]=dgtreal(f,g,a,M); fr=idgtreal(c,gd,a,M,Ls); will reconstruct the signal f no matter what the length of f is, provided that gd is a dual window of g. [c,Ls,g]=DGTREAL(...) additionally outputs the window used in the transform. This is useful if the window was generated from a description in a string or cell array. See the help on DGT for the definition of the discrete Gabor transform. This routine will return the coefficients for channel frequencies from 0 to floor(M/2). DGTREAL takes the following flags at the end of the line of input arguments: 'freqinv' Compute a DGTREAL using a frequency-invariant phase. This is the default convention described in the help for DGT. 'timeinv' Compute a DGTREAL using a time-invariant phase. This convention is typically used in filter bank algorithms. DGTREAL can be used to manually compute a spectrogram, if you want full control over the parameters and want to capture the output : f=greasy; % Input test signal fs=16000; % The sampling rate of this particular test signal a=10; % Downsampling factor in time M=200; % Total number of channels, only 101 will be computed % Compute the coefficients using a 20 ms long Hann window c=dgtreal(f,{'hann',0.02*fs'},a,M); % Visualize the coefficients as a spectrogram dynrange=90; % 90 dB dynamical range for the plotting plotdgtreal(c,a,M,fs,dynrange); See also: dgt, idgtreal, gabwin, dwilt, gabtight, plotdgtreal References: K. Gr��chenig. Foundations of Time-Frequency Analysis. Birkh��user, 2001. H. G. Feichtinger and T. Strohmer, editors. Gabor Analysis and Algorithms. Birkh��user, Boston, 1998. Url: http://ltfat.github.io/doc/gabor/dgtreal.html IDGTREAL Inverse discrete Gabor transform for real-valued signals Usage: f=idgtreal(c,g,a,M); f=idgtreal(c,g,a,M,Ls); Input parameters: c : Array of coefficients. g : Window function. a : Length of time shift. M : Number of channels. Ls : length of signal. Output parameters: f : Signal. IDGTREAL(c,g,a,M) computes the Gabor expansion of the input coefficients c with respect to the real-valued window g, time shift a and number of channels M. c is assumed to be the positive frequencies of the Gabor expansion of a real-valued signal. It must hold that size(c,1)==floor(M/2)+1. Note that since the correct number of channels cannot be deduced from the input, IDGTREAL takes an additional parameter as opposed to IDGT. The window g may be a vector of numerical values, a text string or a cell array. See the help of GABWIN for more details. IDGTREAL(c,g,a,M,Ls) does as above but cuts or extends f to length Ls. [f,g]=IDGTREAL(...) additionally outputs the window used in the transform. This is usefull if the window was generated from a description in a string or cell array. For perfect reconstruction, the window used must be a dual window of the one used to generate the coefficients. If g is a row vector, then the output will also be a row vector. If c is 3-dimensional, then IDGTREAL will return a matrix consisting of one column vector for each of the TF-planes in c. See the help on IDGT for the precise definition of the inverse Gabor transform. IDGTREAL takes the following flags at the end of the line of input arguments: 'freqinv' Use a frequency-invariant phase. This is the default convention described in the help for DGT. 'timeinv' Use a time-invariant phase. This convention is typically used in filter bank algorithms. Examples: --------- The following example demostrates the basic pricinples for getting perfect reconstruction (short version): f=greasy; % test signal a=32; % time shift M=64; % frequency shift gs={'blackman',128}; % synthesis window ga={'dual',gs}; % analysis window [c,Ls]=dgtreal(f,ga,a,M); % analysis % ... do interesting stuff to c at this point ... r=idgtreal(c,gs,a,M,Ls); % synthesis norm(f-r) % test The following example does the same as the previous one, with an explicit construction of the analysis and synthesis windows: f=greasy; % test signal a=32; % time shift M=64; % frequency shift Ls=length(f); % signal length % Length of transform to do L=dgtlength(Ls,a,M); % Analysis and synthesis window gs=firwin('blackman',128); ga=gabdual(gs,a,M,L); c=dgtreal(f,ga,a,M); % analysis % ... do interesting stuff to c at this point ... r=idgtreal(c,gs,a,M,Ls); % synthesis norm(f-r) % test See also: idgt, gabwin, gabdual, dwilt Url: http://ltfat.github.io/doc/gabor/idgtreal.html
il pourra etre utile d'utiliser les "pointeurs de fonctions" matlab, qui permettent de simplifier le code. En pratique, cela evitera de passer tous les parametres necessaires de dgtreal a une fonction: on passera simplement le nom de la fonction qui l'encapsule. Par exemple, on cree une fonction {\tt op.analysis} qui permet d'implementer l'operateur d'analyse d'une transformee de Gabor avec une fenetre de Hann
%M de taille 1024, avec 50\% de recouvrement: M = 1024; a = M/4; T = length(x); g = gabwin({'tight', 'hann'}, a, M); op.analysis = @(x) dgtreal(x,g,a,M); op.synthesis = @(x) idgtreal(x,g,a,M,T);
Ainsi, il suffit d'appeler op.analysis(x) pour faire l'analyse temps-frequence du signal x
X = op.analysis(x); [NbFreq, NbTime] = size(X); FreqGaborAxis = linspace(0,fs/2,NbFreq); TimeGaborAxis = linspace(0,T,NbTime); figure; imagesc(TimeGaborAxis, FreqGaborAxis, 20*log(abs(X)+eps)); set(gca,'Ydir','Normal');
et op.synthesis pour faire la synthese:
x_reconstruit = op.synthesis(X);
err = norm(x(:)-x_reconstruit(:))^2;
fprintf('erreur de reconstruction: %f\n',err);
erreur de reconstruction: 0.000000
Dans une fonction, on pourra passer simplement la variable op pour beneficier de op.analysis et op.synthesis. Exemple:
function [X,x_rec] = func_ex(x,op)
T = length(x); X = op.analysis(x); x_rec = op.synthesis(X); end
[X, x_rec] = func_ex(x,op); [NbFreq, NbTime] = size(X); FreqGaborAxis = linspace(0,fs/2,NbFreq); TimeGaborAxis = linspace(0,T,NbTime); figure; imagesc(TimeGaborAxis, FreqGaborAxis, 20*log(abs(X)+eps)); set(gca,'Ydir','Normal'); err = norm(x(:)-x_rec(:))^2; fprintf('erreur de reconstruction: %f\n',err);
erreur de reconstruction: 0.000000
Generation de bruits blanc ou colores et estimation spectrale
La fonction noise_gen permet de generer un bruit blanc ou colore
help noise_gen noise_type = 'white'; [noise,spectrum] = noise_gen(T,1,fs,noise_type);
NOISE Stochastic noise generator Usage: outsig = noise(siglen,nsigs,type); Input parameters: siglen : Length of the noise (samples) nsigs : Number of signals (default is 1) type : type of noise. See below. Output parameters: outsig : siglen xnsigs signal vector NOISE(siglen,nsigs) generates nsigs channels containing white noise of the given type with the length of siglen. The signals are arranged as columns in the output. If only siglen is given, a column vector is returned. NOISE takes the following optional parameters: 'white' Generate white (gaussian) noise. This is the default. 'pink' Generate pink noise. 'brown' Generate brown noise. 'red' This is the same as brown noise. By default, the noise is normalized to have a unit energy, but this can be changed by passing a flag to NORMALIZE. Examples: --------- White noise in the time-frequency domain: sgram(noise(5000,'white'),'dynrange',70); Pink noise in the time-frequency domain: sgram(noise(5000,'pink'),'dynrange',70); Brown/red noise in the time-frequency domain: sgram(noise(5000,'brown'),'dynrange',70); See also: normalize Url: http://ltfat.sourceforge.net/doc/signals/noise.php
La methode de Welch permet d'estimer la densite spectrale de puissance du bruit. Pour cela, il suffit de faire la transformer de Gabor du bruit, puis de moyenner les densit?s spectrales obtenues dans chaque fenetre
Exercice 1: implementer l'estimateur de Welch
[dsp_noise] = welch(noise,op,g)
Entrees: - noise: le vecteur contenant le bruit - op: l'operateur permettant de faire l'analyse de Gabor voulue - g: fenetre d'analyse utilisee dans l'operateur (necessaire pour normalisation)
Sortie: - dsp_nosise: l'estimation de la densite spectrale de puissance du bruit
[dsp_noise] = welch(noise,op,g);
On compare la dsp estimee et la dsp theorique
figure; hold on FreqAxisTrue = linspace(0,fs/2,length(spectrum)); plot(FreqAxisTrue,spectrum,'r'); FreqAxisEstim = linspace(0,fs/2,length(dsp_noise)); plot(FreqAxisEstim,dsp_noise,'b'); hold off
On va maintenant simuler l'ajout d'un bruit de fond. La fonction noise_gen generant un bruit de variance 1, on va ajuster l'energie de bruit pour atteindre un SNR en entree fixee
Input_SNR = 0 ; sigma_bruit = sqrt(var(x)*10^(-Input_SNR/10)); noise = sigma_bruit*noise; y = x + noise;
On verifie le SNR en entree:
fprintf('Input SNR = %f\n',SNR(x,y));
Input SNR = 0.000004
On va simuler l'hypothese: on observe une partie du signal qui ne contient que du bruit sur 1 s, qui permet d'estimer la dsp du bruit
Tnoise = fs; noiseObserved = noise(1:Tnoise); [dsp_noise] = welch(noiseObserved,op,g); figure; hold on FreqAxisTrue = linspace(0,fs/2,length(spectrum)); plot(FreqAxisTrue,sigma_bruit^2*spectrum,'r'); FreqAxisEstim = linspace(0,fs/2,length(dsp_noise)); plot(FreqAxisEstim,dsp_noise,'b'); hold off
Debruitage par estimation diagonale
La premiere methode vue est le filtrage "a la Wiener" empirique, sans parametre
Y = op.analysis(y); HWienerNum = bsxfun(@minus,abs(Y).^2,dsp_noise); %Noter l'utilisation de bsxfun + dspNoise est deja une puissance HWienerNum(HWienerNum < 0) = 0; HWienerDenum = abs(Y).^2; HWiener = HWienerNum./(HWienerDenum+eps); % +eps pour eviter les divisions par 0 X_estim = Y.*HWiener ; x_estim = op.synthesis(X_estim);
On evalue le SNR en sortie
Output_SNR = SNR(x,x_estim);
fprintf('Output SNR = %f\n',Output_SNR);
Output SNR = 12.903996
Exercice 2: implementer une methode generale de debruitage, permettant d'effectuer Wiener empirique ou soustraction spectrale, avec plusieurs parametres. On pourra fixer certains parametres par defaut
[x_denoise] = DeHiss(x,noise,op,lambda,alpha,beta,gamma)
De Hissing par estimation diagonale dans le domaine T.F. XDenoise = X .* H avec H = max(gamma x DspN ; (|X|alpha-lambda|DspN|^(alpha/2))./|X|^alpha) ) ^ beta) ou DspN est une estimation de la densit? spectrale de puissance du bruit
Entree: - x: signal a debruiter - noise: bruit isole - op: operateurs d'analyse/synthese - lambda: valeur du seuillage - alpha: puissance du module (defaut alpha = 2) - beta: puissance du filtre (defaut beta = 1) - gamma: bruit residuel (defaut gamma = 1e-4) Sortie: - x_denoise: signal debruite
On testera cette methode avec les valeurs par defaut pour commencer, en faisant varier le parametre lambda.
Enfin, on dessinera la courbe SNR en sortie VS lambda. On choisira une echelle logarithmique pour les lambda
faire apparaitre sur cette courbe les resultats obtenus par le Wiener Empirique sans ponderation par lambda
Exo2
Exercice 3: reprendre l'exercice precedent, en supperposant les courbes obtenues pour differents gamma = [1e-1,1e-2,..,1e-6,0]
Exo3
Exercice 4: comparer les courbes pour differents gamma = [1e-1,1e-2,..,1e-6,0] , beta = [1,2] et alpha = [1,2] . dessiner les cources pour le meilleur gamma, et faire apparaitre les resultats obtenus pour le Wiener Empirique, la soustraction spectrale d'amplitude et la soustraction spectrale de puissance ( lambda = 1 )
Exo4