Ghost update 20110604: Death By Graphs

The process of chirp estimation fits a set of modulated sinusoids to a given audio signal. We pass in initial chirp parameter estimates and the estimator then improves these approximations to a desired level of fit precision. The parameters describing a chirp are the zeroth-order parameters P (phase) and A (amplitude), the first-order parameters W (frequency) and dA (amplitude modulation), and the second-order parameters dW (chirp rate or frequency modulation) and ddA (amplitude modulation squared).

At present, Ghost research uses a specific chirp estimation algorithm we published in 2007, and we'll likely continue using and improving this algorithm until something clearly better comes along. For this reason, it's essential we know what convergence and error to expect from it.

As alluded to in demo 2, the original 2007 paper had little space to devote to detailed algorithm characterization. The few graphs covered specific disjoint cases that were primarily concerned with efficiency as compared to other techniques. This data does not lead to a comprehensive, intuitive undestanding of how the chirp estimation performs given varied input data and operating parameters.

In this update, we concern ourselves with documenting the expected behavior of the chirp estimator, as well as what to expect from several optional improvements alluded to but left unexplored by the 2007 paper. Because chirp estimation is of general use in the field of signal processing, we do not focus only on expected use cases for Ghost. This documentation also serves as a benchmark archive for testing. Future improvments and optimizations can be compared against these known good results.

Notes on Graph Generation

code

The reference chirp estimator algorithm as well as code used to generate the graphs can be found in Xiph.Org's SVN at svn.xiph.org/trunk/chirptest. The module builds a standalone C application (using libcairo for drawing) named 'chirp' that when run produces all the graphs seen here. The application requires several hours of run time on a 32 core Opteron server to generate the graphs as seen here, so you probably don't want to try it on a netbook.

errata

Except where noted, graphs are generated using the algorithms as described in the paper linked above. The paper contains an algebraic sign error on Page 11 in Algorithm 2; the 12th line has a '+' which should be a minus. Specifically it should read:

precision

The algorithms are implemented in single precision float arithmetic with the exception of the instantaneous phase calculation in the basis chirp generation. A naive calculation such as cosf(P+(W+dW*i)*i) runs out of bits in the mantissa of the single-precision argument to cosf() as W and i increase. For the sake of simplicity and clarity, the naive calculation is used but implemented and passed in double precision.

termination and convergence

The termination criteria for convergence is a simple magnitude threshold. The algorithm is declared to have converged when the sum of the squares of the basis projections for each chirp are all less than 1e-12. This value is chosen to be large enough to sit comfortably above numerical noise (between 1e-16 and 1e-14), but small enough to show a substantial amount of the algorithm's depth of precision. The convergence graphs plot the number of iterations required before the final uncounted iteration results in a change below the termination threshold. The error graphs give the worst-case performance for our specific termination criteria across all values of any swept input.

Keep in mind that 'convergence' simply means that the estimation algorithm arrived at a stable solution and halted. It does not mean that the solution is correct or sufficiently accurate. Consult both the convergence and error graphs to determine the trustworthy domain of operation.

Linear vs Non-linear Operation

Algorithms in the 2007 paper

The paper offers two variants of the chirp estimation algorithm, linear and non-linear. The linear and non-linear algorithms differ primarily in that the non-linear algorithm 'recenters' the W parameter (frequency) each iteration, that is, it regenerates the basis functions with an updated W estimate each iteration as well as restarting the residual error calculation. Basis generation is a relatively expensive operation, and the nonlinear estimator repeats basis generation for each chirp in each iteration. Thus, although the conceptual difference between the linear and non-linear algorithms is small, the computational cost is potentially substantial.

Recentering dW

The non-linear algorithm in the paper does not recenter the dW (chirp rate) parameter. In addition, it also assumes an initial dW estimate of 0, thus avoiding the dW term in basis generation completely:

This is an intentional optimization to reduce the basis generation and projection complexity; if unchirped, the basis functions show symmetry/antisymmetry about their centers.

This basis modification also requires modifying the dW update step in Algorithm 2:

Using and recentering the dW parameter has benefits, improving both convergence range and dW accuracy. We will document the estimator performance both of 'partial' non-linear operation (recentering only W with an unchirped basis as described in the paper) as well as 'full' non-linear operation (recentering both W and dW with chirped basis functions).

data

The graphs below show behavior of the linear, partial-nonlinear and full-nonlinear estimators across sets of single-chirp input signals. Estimation was performed with a sine input window and other parameters as in the original 2007 paper.

Near DC, convergence and error behavior most strongly depend on chirp center frequency (W); the following graphs show behavior from DC extending to a few bins above DC. In addition, the phase of an input chirp affects behavior at the edge of convergence, so all graphs show the worst-case behavior run over a swept-phase set of inputs. Estimator behavior is exactly anti-symmetric about the center of the spectrum.

Frequency (W) convergence and error behavior

Graph Selection convergence
RMS fit error
P error
A error
W error
dA error
dW error

Iteration type full-nonlinear
partial-nonlinear
linear

As expected, the linear algorithm shows reliable convergence behavior but as it does not restart the basis functions or error estimates, error increases steadily above and below Y=0 where the input W estimate already matches the chirp W before iteration. Both the partial- and full-nonlinear algorithms show reduced convergence range but high precision across the domain of frequency convergence making the total domain of useful output much larger than the linear estimator. This comes at the cost of a more expensive algorithm.

Chirp rate (dW) convergence and error behavior

Graph Selection convergence
RMS fit error
P error
A error
W error
dA error
dW error

Iteration type full-nonlinear
partial-nonlinear
linear

The linear algorithm shows dW (chirp rate) convergence and error behavior similar to frequency (W) behavior.

The partial-nonlinear algorithm shows error behavior similar to the linear algorithm once there's a significant dW (chirp) component to the input signal. This is not surprising as the partial-nonlinear algorithm is not recentering the basis functions for dW changes and the calculated residual error is also an approximation computed from the basis functions.

The full-nonlinear algorithm shows vastly improved dW behavior similar to W behavior. It gives accurate results across the nearly-complete domain of convergence, which is greatly expanded over the partial-nonlinear algorithm. With the proper window, we can in fact expand the domain of convergence to an arbitrarily large portion of the solution space. More about this later under the section on windowing functions.

A chirp's actual W does not strongly influence the convergence or accuracy of the algorithm once it is more than 5-10 bins above DC (or below Nyquist). W can still affect numerical noise; depending on how the basis and reconstruction chirps are generated, an approach such as cos(W*t) shows a linear loss of precision as |t| or W increases. As such, single-precision basis/reconstruction chirp generation can show noticably more noise near Nyquist than at DC.

Influence of Windowing Functions

FFT techniques tend not to be particularly sensitive to exact window choice; there's a set of mostly interchangable 'good' windows that give similar performance for most uses. Unless you have a very specific need, the Hanning window covers 95% of what any DSP engineer would ever need for use with Fourier transforms. This is not the case with the chirp estimator. The window function used for input and basis windowing exerts considerable influence on algorithm accuracy and convergence behavior.

As with the FFT, window mainlobe width and shape influences discrimination ability (the ability to resolve closely spaced sinusoids/chirps).
Mainlobe slope partially determines convergence speed.
Mainlobe width determines the convergence range of the nonlinear iterator; it cannot converge to values beyond the first null in the window transfer response. Convergence can also 'hang up' in local maxima in the mainlobe response.
In that same vein, sidelobes create local maxima away from the mainlobe. The estimator essentially marches 'up' to peaks in the input signal's windowed response, and so local maxima created by sidelobes will also capture convergence.
Sidelobe leakage, as with the FFT, spreads noise through the spectrum into the results for other sinusoids. The exact behavior is different due to use of Gauss-Seidel fitting, which removes the energy of each fit sinusoid from the error term of all sinusoids. However, unfit energy still exhibits full sidelobe leakage.

This all points to the possibility that a unimodal window without sidelobes could render the nonlinear estimator's solution space substantially convex for single-chirps and indeed this turns out to be the case. The W and dW terms can be made to converge from initial estimates arbitrarily far away from the correct value. This comes at the cost of typically slower convergence (unimodal windows tend to be broader and roll-off more slowly) and some reduction in convergence domain near DC.

A few window functions for testing purposes

Window Selection maxwell
triangular/gaussian
hanning
sine
rectangular

Window functions and linear estimation

Window functions affect linear estimation less profoundly than non-linear estimation. W and dW are not being recentered and as such can't become stuck in local maxima caused by sidelobe leakage. Nonetheless, we expect a window function to influence overall error behavior.

Window effect on linear estimation of W

Graph Selection convergence
RMS fit error
P error
A error
W error
dA error
dW error

Window triangular/gaussian
hanning
sine
rectangular

In the above graphs we do indeed see improvement in both absolute error performance and convergence speed depending on window selection. The useful range of linear estimation can be enhanced by at least a factor of two with an appropriate choice of window.

Window functions and non-linear estimation

Non-linear estimation is quite sensitive to the window function in use as the W and dW parameters track along the frequency domain energy contours created by the window's mainlobe and sidelobe. Where the linear estimator is ~guaranteed to converge (once we're away from DC/Nyquist), we've already noticed that the nonlinear estimator does not have an inherently convex solution space. A unimodal window, however, can render the solution space similarly convex.

Also of interest are optimized windows that attempt to deliver wider convergence without loss of convergence speed. Machine optimization has shown a few promising window research directions.

NOTE: the scales on the nonlinear graphs cover a considerably wider range than on the linear graphs.

Window effect on nonlinear estimation of W

Graph Selection convergence
RMS fit error
P error
A error
W error
dA error
dW error

Window maxwell (optimized)
triangular/gaussian
hanning
sine
rectangular

First- vs Second-Order Operation

The paper states that second-order parameter fits (dW and ddA) are optional to both the linear and nonlinear algorithms. The nonlinear algorithms are also capable of fitting only one of the second order parameters (while clamping the other to zero), such as we've been doing in the nonlinear graphs above, where we fit dW but not ddA.

(Strictly speaking, the same is true of the first-order parameters; we could choose to fit W and not dA for example. However, because of the alternating even/odd orthogonality of the basis functions, fitting only W and not dA has almost no effect on algorithm behavior. This is visible in the first-order nonlinear graphs below. The same is not true of fitting dW but not ddA.)

First and second order estimation behavior

Graph Selection convergence
RMS fit error
P error
A error
W error
dA error
dW error

Algorithm full nonlinear
linear

Order second (dW and ddA)
second (dW only)
first (W and dA)
first (W only)

The linear algorithm does not fit each of the paired parameters separately the way the nonlinear algorithms do. As such we do not plot the 'half orders' for the linear estimator.

Symmetric Normalization

The chirp estimation algorithm does not use complex math; the four (or six) basis functions used by chirp estimation are all real sequences. However, because the basis functions are quadrature (sine/cosine) pairs, they behave in some ways similarly to the real/imaginary halves of a complex basis.

The linear estimator calculates basis energies once at the beginning of fitting. The nonlinear estimators must recalculate basis and basis energy in each iteration.

\begin{matrix} φ_{k, n} = Θ_{k} n (linear / partially non-linear iteration), Θ_{k} n + {\dot{Θ}}_{k} n^{2} (fully non-linear iteration) \\ \begin{aligned} {energy}_{k}^{c} \leftarrow & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} {[h (n) cos φ_{k, n}]}^{2}, & {energy}_{k}^{s} \leftarrow & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} {[h (n) sin φ_{k, n}]}^{2}, \\ {energy}_{k}^{d} \leftarrow & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} {[h (n) n cos φ_{k, n}]}^{2}, & {energy}_{k}^{t} \leftarrow & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} {[h (n) n sin φ_{k, n}]}^{2}, \\ {energy}_{k}^{f} \leftarrow & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} {[h (n) n^{2} cos φ_{k, n}]}^{2}, & {energy}_{k}^{u} \leftarrow & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} {[h (n) n^{2} sin φ_{k, n}]}^{2} \end{aligned} \end{matrix}

If we treat the sine/cosine basis pairs as if they were a single complex basis, the energy of the three complex basis functions is described by:

\begin{array}{rccl} {energy}_{k}^{c, s} \leftarrow & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} {[h (n) cos φ_{k, n}]}^{2} + {[h (n) sin φ_{k, n}]}^{2} = & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} h^{2} (n) [{cos}^{2} φ_{k, n} + {sin}^{2} φ_{k, n}] = & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} h^{2} (n) \\ {energy}_{k}^{d, t} \leftarrow & ... & ... & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} h^{2} (n) n^{2} \\ {energy}_{k}^{f, u} \leftarrow & ... & ... & \sum_{n = - \frac{N - 1}{2}}^{\frac{N - 1}{2}} h^{2} (n) n^{4} \end{array}

Rather than normalizing each of the six real basis projections by the basis energy, we instead normalize by half the 'complex' basis energy. This approximation results in basis energy being determined solely by the window and blocksize, and we can both simplify and move the calculation out of the nonlinear loop.

Effect of symmetric normalization on W (frequency) fit

Parameter fit dW (chirp rate)
W (frequency)

Graph Selection convergence
RMS fit error
P error
A error
W error
dA error
dW error

Algorithm full nonlinear
partial nonlinear
linear

Normalization symmetric
regular

Symmetric normalization approximately halves the number of adds and multiplies needed during the basis projection step of the nonlinear algorithms, as normalization energy is not explicitly calculated. That said, the time spent generating new, recentered basis functions each iteration typically swamps the cost of the actual basis projection onto the error vector. As such the time saved by symmetric normalization may be insignificant, depending on the relative expense of basis generation.

The linear algorithm generates basis functions only once and so must compute normalization factors only once. Symmetric normalization may become useful in the event that the linear algorithm is run repeatedly for only a few iterations each time, thus increasing the relative amount of time the linear algorithm would spend calculating normalization factors during initialization.

dW alpha

Page 10 of the 2007 paper discusses the quantity 'alpha' (α) which is described as an update rate used to scale the update added to the W parameter each iteration:

Θ_{k} \leftarrow Θ_{k} + α Δ Θ_{k}

The paper presents a graph that shows the optimal value for alpha is about 1.0. The exact relationship between W convergence speed and alpha is slightly more complex, but 1.0 does hold up as a good value for alpha upon closer inspection. The optimal value of alpha is affected somewhat by window choice.

Effect of alpha on W (frequency) fit

Graph Selection convergence
RMS fit error
P error
A error
W error

Ghost update 20110604

Chirp Estimator Behavior: Overview

Notes on Graph Generation

code

errata

precision

termination and convergence

Linear vs Non-linear Operation

Algorithms in the 2007 paper

Recentering dW

data

Influence of Windowing Functions

Window functions and linear estimation

Window functions and non-linear estimation

First- vs Second-Order Operation

Symmetric Normalization

dW alpha