Klaus Elk Books

ANC using FxLMS

Fan ANC

Introduction

We want to implement ANC - Active Noise Control - for a HVAC - Heating, Ventilation and Air Conditioning - system. The (over-)simplified concept is to measure the noise with a microphone close to the source. The measured noise is then emitted from a loudspeaker in counter-phase.

In reality it gets more complicated because:

The above means that we need an adaptive system - typically FxLMS, which means Filtered LMS - Least-Mean-Square. The LMS part is responsible for a feedback that trims a simple FIR-filter, while the Fx means that we do not ignore the acoustic phenomena in the duct, but try to model these with another FIR.

Thus our block-diagram will contain two FIR-filters and an LMS.

See also FxLMS for a nice walk-through of the math.

Setup

The setup in the duct looks as follows:


                        Air → → →
                        Channel (duct)
┌───────────────────────────────────────────────────────────┐
│                                                           │
│   FAN        REF MIC        SPEAKER        ERROR MIC      │
│   ||||         (•)           [  ]            (•)          │
│                                                           │
└───────────────────────────────────────────────────────────┘
    x=0        x=5–10 cm     x=30–50 cm      x=45–65 cm

The figure above shows:

The ANC-components are optimally flush-mounted (embedded in the wall of the duct).

Note that the distance between the Ref-Mic and the speaker defines the maximum allowable processing latency.If sound travels 330 m/s, then 33 cm corresponds to 1 ms. This does not allow for long buffers or FFTs. We need to work in the time-domain.

High-level FxLMS

We measure the noise with the ref.mic close to the fan. We need to subtract this further downstream with help from the speaker. When "anti-noise" is emitted from the speaker, it is - like the noise - subjected to the acoustic environment in the duct.

We cannot ignore this influence, so we want to model the changes to the sound in a digital "secondary path", by applying a FIR filter on the digitized signal from the ref. mic. This filter should also take the effects of the digitization, amplifier and loudspeaker into account.
Thus the secondary path models: DSP → DAC → amplifier → speaker → air → microphone

This is sometimes loosely referred to as echo cancellation, but in ANC literature it is usually called secondary path modeling.


        x[n]
        │
        ├──────────────┐
        │              │
        ▼              ▼
    ┌────────┐   ┌─────────┐
    │ W(z)   │   │ Ŝ(z)    │
    │ ANC    │   │ SecPath │
    └────────┘   └─────────┘
        │              │
        ▼              ▼
        y[n]           x_f[n]
        │              │
        ▼              │
    [Speaker]          │
        │              │
        ▼              │
        S(z)           │
        │              │
        ▼              │
        e[n] ◄─────────┘
        │
        ▼
    FxLMS update
        │
        └───────► W(z)

And the LMS update:


e[n]  (from error MIC)
        │
        ▼
    ┌────────────────┐
    │  LMS / FxLMS   │
    │  adaptation    │
    └────────────────┘
        │
        ▼
        W(z)

The meaning of the signals in the above figures:

Signal Meaning
x[n] Reference: Fan noise
y[n] Anti-noise for speaker
S(z) Real acoustic path
Ŝ(z) Model of the above
e[n] Error (residual noise)
x_f[n] Filtered-x (FxLMS-key)

Note that the "S-hat" above in code becomes "Shat". It is used as a calculated FIR-filter. This is not your usual symmetric FIR filter, but rather a measured impulse-response.

The figure below is from the article FxLMS.

Principle diagram from article

The figure and the article named above gives some background, which can be a help. I like the way the figure shows the physical setup with paths and flush-mounted mics and speaker - as well as the block diagram. However, be aware that:

Implementation thoughts

Total loop-delay must be less than 1/6 period, and less than 1/10 period if we want to be comfortable. At 200 Hz, T = 5 ms. This means less than 0.5 ms.

We Prioritize

Over:

Use e.g.

Expect:

As the latency is extremely important, I suggest not to use Linux Embedded or similar OS (which does not run on MCUs anyway). I would suggest either Zephyr, FreeRTOS (discussed in Microcontrollers with C) or a simple concept without RTOS - based on main function and interrupts.

Most ARM Microcontrollers allow for USB-based debugging without the need for an OS. STM32CubeIDE can directly implement the necessary FreeRTOS primitives, whereas with Zephyr you start with Zephyrs build machine.

You may want to use CMSIS-DSP. As demonstrated in Microcontrollers with C, the HW-based float in ARM M4 competes very neat in speed with integer-versions, but has a smaller accuracy (but then again the larger dynamic range). I would definately go with the floating-point version, given that we know that precision is not as important as latency.

Note that we have two ADC's and we want all signals to be sample-synchroneous - like e.g., below.


Reference Mic ┐
              ├─► Audio Codec (2x ADC + DAC)
Error Mic ────┘       │
                      │ I²S / TDM
                      ▼
                     MCU

Apart from requiring that ADCs are in synch, the algorithm given here assumes that DAC latency + acoustic delay are fully absorbed into Shat. If DAC buffering is later introduced we need to adjust the xfilt_buf (see code) correspondingly.

Pseudo code

Intro

The following happens per sample, when doing the ADC:

  1. ADC1 samples reference mic → x[n], and ADC2 samples error mic → e[n]
  2. Store reference samples
  3. Compute Y[n]=WXY[n] = W * X
  4. Output clipping
  5. Compute filtered-x: Xf[n]= X * S^X_f[n] = \hat{S} \ast X
  6. Store filtered-x
  7. Output Y[n] to DAC
  8. Compute updated FxLMS, calc W using e[n], μ step-size and filtered-x: W=WμeXfW = W - \mu e X_f

In the above we have the following:

Boot

BOOT
│
├─▶ Secondary path identification
│      - Fan OFF (or as quiet as possible)
│      - Speaker plays white noise (sweep is an alternative, but stop should be aligned to full sweep-time)
│      - Error mic records response
│      - Estimate Ŝ(z)
│
├─▶ Stop sweep/white noise (speaker quiet)
│
├─▶ Start fan
│
├─▶ Enable FxLMS
│      - W[k] = 0
│      - μ small
│      - No injected noise
│
└─▶ Normal ANC operation

Closer to C

Constants and definitions

#define FS        16000        // sample rate [Hz] needed for phase accuracy
#define N         128          // ANC filter length - and secondary path model length. NB! Must be power of 2
#define MASK      N-1          // Fast wrap in arrays based on mask of index into power-of-2 arrays
#define MU        1e-5f        // step size (example)
#define Y_LIMIT   0.8f         // output limiter

Global state

// In the following, signal-buffers have the NEWEST sample at the LOWEST index. 
// This makes it simple to step through the arrays when doing convolution.
// Buffers are circular

// ANC filter
float W[N];                   // adaptive filter coefficients
float x_buf[N];               // reference history for input
int   x_head = 0;

// Secondary path model
float Shat[N];                // identified offline or at startup

// FxLMS
float xfilt_buf[N];           // filtered-x history (for LMS update)
int   xfilt_head = 0;

// Signals
float x;                      // reference mic sample
float e;                      // error mic sample (the SUM of noise and anti-noise)
float y;                      // speaker output

Initialization

void anc_init(void)
{
    // Clear ANC filter
    for (int i = 0; i < N; i++)
        W[i] = 0.0f;

    // Clear buffers
    memset(x_buf, 0, sizeof(x_buf));
    memset(xfilt_buf,0, sizeof(xfilt_buf));

    x_head = 0;
    xfilt_head = 0;

}

Secondary Path identification (during startup)

The following generates the S-hat - the secondary path model - before enabling ANC.
As stated earlier S-hat is an array with the coefficients of a FIR-filter.
Note that this is not a clean calculated symmetric FIR filter, but rather a measured version.

void identify_secondary_path(void)
{
    // Fan OFF (or minimal noise)

    for (int n = 0; n < ID_LENGTH; n++)
    {
        float u = white_noise();       // test signal
        dac_write(u);

        // Consider gating the below or band-limit...
        float mic = adc_error_read();

        // LMS to estimate Shat[]
        secondary_lms_update(u, mic);
    }

    dac_write(0.0f);   // silence speaker
}

Main ANC-loop - running once per sample

void anc_process_sample(void)
{
    // --- 1. Acquire inputs (synchronous ADCs)

    x = adc_reference_read();
    e = adc_error_read();

    // --- 2. Update reference buffers

    x_head = (x_head -1) & MASK        // Move the head-index one step back - circularly
    x_buf[x_head] = x;                 // Newest sample at lowest index

    // --- 3. Compute ANC output y[n]

    y = 0.0f;                          // FIR-calc does not used older output - like IIR does
    int idx = x_head;                  // Local copy of head for indexing locally

    for (int i = 0; i < N; i++)
    {
        y += W[i] * x_buf[idx];        // Here x is accessed with newest first - convolution becomes dot-product
        idx = (idx+1) & MASK           // Next index is dT older
    }

    // --- 4. Output limiting (safety clipping)

    if (y >  Y_LIMIT) y =  Y_LIMIT;
    if (y < -Y_LIMIT) y = -Y_LIMIT;

    // --- 5. Compute filtered-x -

    float x_f = 0.0f;
    idx = x_head;

    for (int i = 0; i < N; i++)
    {
        x_f += Shat[i] * x_buf[idx];
        idx = (idx+1) & MASK           // Next index is dT older
    }

    // --- 6. store filtered-x

    xfilt_head = (xfilt_head-1) & MASK;
    xflit_buf[xfilt_head] = x_f;

    // --- 7. Output to DAC

    dac_write(y);                      // Time to write to the speaker

    // --- 8. FxLMS update (System A)  // Consider only updating if NOT clipped

    idx = xfilt_head;
    for (int i = 0; i < N; i++)
    {
        W[i] -= MU * e * xfilt_buf[idx]; // Note that we DECREMENT because in ANC the error is the sum of noise and antinoise
        idx = (idx+1) & MASK;
    }
}

The above code executes 3*N MACs (Multiply-Accumulate) per input-sample. When you read ARM's documentation on M4/M7, you see that they claim to do 1 MAC per cycle with floats. This does not include delays due to memory-access, loop-control etc. However, my experiments with CMSIS-DSP on a small M4, show that the float version is not fully pipelined, and that it takes at least 5 cycles per MAC - including memory load - using the "fused-MAC" that avoids rounding between the multiply and the accumulate. See more in my book: Microcontrollers with C.

If you include the overhead of the memory, simple loop-control and array indexing, you may end up with as much as 8 cycles per MAC. It may be faster if M7 is used, and loops are unrolled, but I use the number 8 as a worst-case estimate for the number of cycles per MAC.

For N = 128 @ 16 kHz we have:

Need: 3 × 128 × 16000 = 6.14M MAC/s
Allocate: With 8 cycles/MAC, the above fully consumes an M4 @ 49.1 MHz.

Given the above, the smallest M4 with 72 MHz clock should be able to handle the load, but I would recommend using a faster clock for better performance and headroom.

Secondary path identification

#define MU_S     1e-4f     // step size for secondary path LMS (~10* MU)

float Shat[N];             // secondary path estimate
float u_buf[N];            // speaker excitation buffer

Secondary Path initialization

void secondary_init(void)
{
    for (int i = 0; i < N; i++)
        Shat[i] = 0.0f;

    memset(u_buf, 0, sizeof(u_buf));
}

LMS Update

The inner part of FxLMS is a simple LMS. It is based on white noise or a sweep from the speaker over 0.5 to 2 seconds - with the fan turned off. The algorithm stops when the Shat[] coefficients stabilize and the error-signal stops decreasing. The function below runs once per sample of white noise.

While LMS means Least-Mean-Square, we don't take any mean - we simply use the latest measurement as an estimate of the mean, and we do not square, but instead use the gradient descent step. As the derivative of x2x^2 is 2x - and thereby linear, we can use this incredible simplification!

void secondary_lms_update(float u, float mic)
{
    // --- 1. Update excitation buffer
    shift_right(u_buf, N);
    u_buf[0] = u;

    // --- 2. Predict mic signal by applying the S-hat FIR
    float y_hat = 0.0f;
    for (int i = 0; i < N; i++)
        y_hat += Shat[i] * u_buf[i];

    // --- 3. Error between measured and predicted
    float e_s = mic - y_hat;

    // --- 4. LMS coefficient update
    //     Each index in S-hat is grown by the error scaled by the step, multiplied by sample from same index
    for (int i = 0; i < N; i++)
        Shat[i] += MU_S * e_s * u_buf[i];
}
Black Elk

© 2026 KlausElk.com & ElkTronic.dk