Fan ANC
Introduction
We want to implement ANC - Active Noise Control - for a HVAC - Heating, Ventilation and Air Conditioning - system. The (over-)simplified concept is to measure the noise with a microphone close to the source. The measured noise is then emitted from a loudspeaker in counter-phase.
In reality it gets more complicated because:
- We need to account for the sound travelling over the distances in the setup
- We need to account for turbulence and reflections in the channel/duct
- Wear and tear may change the sound over time
The above means that we need an adaptive system - typically FxLMS, which means Filtered LMS - Least-Mean-Square. The LMS part is responsible for a feedback that trims a simple FIR-filter, while the Fx means that we do not ignore the acoustic phenomena in the duct, but try to model these with another FIR.
Thus our block-diagram will contain two FIR-filters and an LMS.
See also FxLMS for a nice walk-through of the math.
Setup
The setup in the duct looks as follows:
Air → → →
Channel (duct)
┌───────────────────────────────────────────────────────────┐
│ │
│ FAN REF MIC SPEAKER ERROR MIC │
│ |||| (•) [ ] (•) │
│ │
└───────────────────────────────────────────────────────────┘
x=0 x=5–10 cm x=30–50 cm x=45–65 cm
The figure above shows:
- Fan: The source of the noise
- Reference Microphone: Mounted as close as possible to - and pointing towards - the fan.
- Speaker: Mounted downstreams in the channel.
- Error Mic: Mounted even further downstream. Omnidirectional - picks up the sound - including both residual fan noise and anti-noise from the speaker. We consider this signal to be the error we want to eliminate.
The ANC-components are optimally flush-mounted (embedded in the wall of the duct).
Note that the distance between the Ref-Mic and the speaker defines the maximum allowable processing latency.If sound travels 330 m/s, then 33 cm corresponds to 1 ms. This does not allow for long buffers or FFTs. We need to work in the time-domain.
High-level FxLMS
We measure the noise with the ref.mic close to the fan. We need to subtract this further downstream with help from the speaker. When "anti-noise" is emitted from the speaker, it is - like the noise - subjected to the acoustic environment in the duct.
We cannot ignore this influence, so we want to model the changes to the sound in a digital "secondary path", by applying a FIR filter on the digitized signal from the ref. mic. This filter should also take the effects of the digitization, amplifier and loudspeaker into account.
Thus the secondary path models: DSP → DAC → amplifier → speaker → air → microphone
This is sometimes loosely referred to as echo cancellation, but in ANC literature it is usually called secondary path modeling.
x[n]
│
├──────────────┐
│ │
▼ ▼
┌────────┐ ┌─────────┐
│ W(z) │ │ Ŝ(z) │
│ ANC │ │ SecPath │
└────────┘ └─────────┘
│ │
▼ ▼
y[n] x_f[n]
│ │
▼ │
[Speaker] │
│ │
▼ │
S(z) │
│ │
▼ │
e[n] ◄─────────┘
│
▼
FxLMS update
│
└───────► W(z)
And the LMS update:
e[n] (from error MIC)
│
▼
┌────────────────┐
│ LMS / FxLMS │
│ adaptation │
└────────────────┘
│
▼
W(z)
The meaning of the signals in the above figures:
| Signal | Meaning |
|---|---|
x[n] |
Reference: Fan noise |
y[n] |
Anti-noise for speaker |
S(z) |
Real acoustic path |
Ŝ(z) |
Model of the above |
e[n] |
Error (residual noise) |
x_f[n] |
Filtered-x (FxLMS-key) |
Note that the "S-hat" above in code becomes "Shat". It is used as a calculated FIR-filter. This is not your usual symmetric FIR filter, but rather a measured impulse-response.
The figure below is from the article FxLMS.

The figure and the article named above gives some background, which can be a help. I like the way the figure shows the physical setup with paths and flush-mounted mics and speaker - as well as the block diagram. However, be aware that:
- The article use standard math array-indexing of the signals, where the newest sample is at the highest index. In this page we use a concept known from Python DSP, where the latest sample is at index 0. This makes the implementation of convolution a direct dot product without index reversals, as we can step in the same direction on both the array with the filter constants and the buffer-array. You find the dot-product in DSP-libraries like CMSIS-DSP (even though the code is simple, lib-functions may improve e.g., loop-control).
- On this page we use S-hat (shat). The article uses "C(z)".
Implementation thoughts
Total loop-delay must be less than 1/6 period, and less than 1/10 period if we want to be comfortable. At 200 Hz, T = 5 ms. This means less than 0.5 ms.
We Prioritize
- Latency
- Linearity
Over:
- Dynamic range
- Sampling frequency (above 16 kHz)
Use e.g.
- ARM Cortex M4 or M7 with floating point C-style float precision. double is not possible on M4 and M7 always have at least float precision. double is probably overkill - wasting performance.
- Synchronized, external A/D and D/A with at least 16 bits (my experience says that internal A/D and D/A are not good enough)
- Sample freq. at least 16 kHz
- 64–256 taps FIR
- Slow update of filter with LMS - or at start with bandlimited (to ANC band) white-noise
Expect:
- 10 to 20 dB reduction
As the latency is extremely important, I suggest not to use Linux Embedded or similar OS (which does not run on MCUs anyway). I would suggest either Zephyr, FreeRTOS (discussed in Microcontrollers with C) or a simple concept without RTOS - based on main function and interrupts.
Most ARM Microcontrollers allow for USB-based debugging without the need for an OS. STM32CubeIDE can directly implement the necessary FreeRTOS primitives, whereas with Zephyr you start with Zephyrs build machine.
You may want to use CMSIS-DSP. As demonstrated in Microcontrollers with C, the HW-based float in ARM M4 competes very neat in speed with integer-versions, but has a smaller accuracy (but then again the larger dynamic range). I would definately go with the floating-point version, given that we know that precision is not as important as latency.
Note that we have two ADC's and we want all signals to be sample-synchroneous - like e.g., below.
Reference Mic ┐
├─► Audio Codec (2x ADC + DAC)
Error Mic ────┘ │
│ I²S / TDM
▼
MCU
Apart from requiring that ADCs are in synch, the algorithm given here assumes that DAC latency + acoustic delay are fully absorbed into Shat. If DAC buffering is later introduced we need to adjust the xfilt_buf (see code) correspondingly.
Pseudo code
Intro
The following happens per sample, when doing the ADC:
- ADC1 samples reference mic → x[n], and ADC2 samples error mic → e[n]
- Store reference samples
- Compute
- Output clipping
- Compute filtered-x:
- Store filtered-x
- Output Y[n] to DAC
- Compute updated FxLMS, calc W using e[n], step-size and filtered-x:
In the above we have the following:
- is convolution
- is an array with data in
- is an array with data out
- is an array with filtered input - modelling the secondary path
- and are arrays with coefficients. In source they are W and Shat.
- and are scalar-values.
Boot
BOOT
│
├─▶ Secondary path identification
│ - Fan OFF (or as quiet as possible)
│ - Speaker plays white noise (sweep is an alternative, but stop should be aligned to full sweep-time)
│ - Error mic records response
│ - Estimate Ŝ(z)
│
├─▶ Stop sweep/white noise (speaker quiet)
│
├─▶ Start fan
│
├─▶ Enable FxLMS
│ - W[k] = 0
│ - μ small
│ - No injected noise
│
└─▶ Normal ANC operation
Closer to C
Constants and definitions
#define FS 16000 // sample rate [Hz] needed for phase accuracy
#define N 128 // ANC filter length - and secondary path model length. NB! Must be power of 2
#define MASK N-1 // Fast wrap in arrays based on mask of index into power-of-2 arrays
#define MU 1e-5f // step size (example)
#define Y_LIMIT 0.8f // output limiter
Global state
// In the following, signal-buffers have the NEWEST sample at the LOWEST index.
// This makes it simple to step through the arrays when doing convolution.
// Buffers are circular
// ANC filter
float W[N]; // adaptive filter coefficients
float x_buf[N]; // reference history for input
int x_head = 0;
// Secondary path model
float Shat[N]; // identified offline or at startup
// FxLMS
float xfilt_buf[N]; // filtered-x history (for LMS update)
int xfilt_head = 0;
// Signals
float x; // reference mic sample
float e; // error mic sample (the SUM of noise and anti-noise)
float y; // speaker output
Initialization
void anc_init(void)
{
// Clear ANC filter
for (int i = 0; i < N; i++)
W[i] = 0.0f;
// Clear buffers
memset(x_buf, 0, sizeof(x_buf));
memset(xfilt_buf,0, sizeof(xfilt_buf));
x_head = 0;
xfilt_head = 0;
}
Secondary Path identification (during startup)
The following generates the S-hat - the secondary path model - before enabling ANC.
As stated earlier S-hat is an array with the coefficients of a FIR-filter.
Note that this is not a clean calculated symmetric FIR filter, but rather a measured version.
void identify_secondary_path(void)
{
// Fan OFF (or minimal noise)
for (int n = 0; n < ID_LENGTH; n++)
{
float u = white_noise(); // test signal
dac_write(u);
// Consider gating the below or band-limit...
float mic = adc_error_read();
// LMS to estimate Shat[]
secondary_lms_update(u, mic);
}
dac_write(0.0f); // silence speaker
}
Main ANC-loop - running once per sample
void anc_process_sample(void)
{
// --- 1. Acquire inputs (synchronous ADCs)
x = adc_reference_read();
e = adc_error_read();
// --- 2. Update reference buffers
x_head = (x_head -1) & MASK // Move the head-index one step back - circularly
x_buf[x_head] = x; // Newest sample at lowest index
// --- 3. Compute ANC output y[n]
y = 0.0f; // FIR-calc does not used older output - like IIR does
int idx = x_head; // Local copy of head for indexing locally
for (int i = 0; i < N; i++)
{
y += W[i] * x_buf[idx]; // Here x is accessed with newest first - convolution becomes dot-product
idx = (idx+1) & MASK // Next index is dT older
}
// --- 4. Output limiting (safety clipping)
if (y > Y_LIMIT) y = Y_LIMIT;
if (y < -Y_LIMIT) y = -Y_LIMIT;
// --- 5. Compute filtered-x -
float x_f = 0.0f;
idx = x_head;
for (int i = 0; i < N; i++)
{
x_f += Shat[i] * x_buf[idx];
idx = (idx+1) & MASK // Next index is dT older
}
// --- 6. store filtered-x
xfilt_head = (xfilt_head-1) & MASK;
xflit_buf[xfilt_head] = x_f;
// --- 7. Output to DAC
dac_write(y); // Time to write to the speaker
// --- 8. FxLMS update (System A) // Consider only updating if NOT clipped
idx = xfilt_head;
for (int i = 0; i < N; i++)
{
W[i] -= MU * e * xfilt_buf[idx]; // Note that we DECREMENT because in ANC the error is the sum of noise and antinoise
idx = (idx+1) & MASK;
}
}
The above code executes 3*N MACs (Multiply-Accumulate) per input-sample. When you read ARM's documentation on M4/M7, you see that they claim to do 1 MAC per cycle with floats. This does not include delays due to memory-access, loop-control etc. However, my experiments with CMSIS-DSP on a small M4, show that the float version is not fully pipelined, and that it takes at least 5 cycles per MAC - including memory load - using the "fused-MAC" that avoids rounding between the multiply and the accumulate. See more in my book: Microcontrollers with C.
If you include the overhead of the memory, simple loop-control and array indexing, you may end up with as much as 8 cycles per MAC. It may be faster if M7 is used, and loops are unrolled, but I use the number 8 as a worst-case estimate for the number of cycles per MAC.
For N = 128 @ 16 kHz we have:
Need: 3 × 128 × 16000 = 6.14M MAC/s
Allocate: With 8 cycles/MAC, the above fully consumes an M4 @ 49.1 MHz.
Given the above, the smallest M4 with 72 MHz clock should be able to handle the load, but I would recommend using a faster clock for better performance and headroom.
Secondary path identification
#define MU_S 1e-4f // step size for secondary path LMS (~10* MU)
float Shat[N]; // secondary path estimate
float u_buf[N]; // speaker excitation buffer
Secondary Path initialization
void secondary_init(void)
{
for (int i = 0; i < N; i++)
Shat[i] = 0.0f;
memset(u_buf, 0, sizeof(u_buf));
}
LMS Update
The inner part of FxLMS is a simple LMS. It is based on white noise or a sweep from the speaker over 0.5 to 2 seconds - with the fan turned off. The algorithm stops when the Shat[] coefficients stabilize and the error-signal stops decreasing. The function below runs once per sample of white noise.
While LMS means Least-Mean-Square, we don't take any mean - we simply use the latest measurement as an estimate of the mean, and we do not square, but instead use the gradient descent step. As the derivative of is 2x - and thereby linear, we can use this incredible simplification!
void secondary_lms_update(float u, float mic)
{
// --- 1. Update excitation buffer
shift_right(u_buf, N);
u_buf[0] = u;
// --- 2. Predict mic signal by applying the S-hat FIR
float y_hat = 0.0f;
for (int i = 0; i < N; i++)
y_hat += Shat[i] * u_buf[i];
// --- 3. Error between measured and predicted
float e_s = mic - y_hat;
// --- 4. LMS coefficient update
// Each index in S-hat is grown by the error scaled by the step, multiplied by sample from same index
for (int i = 0; i < N; i++)
Shat[i] += MU_S * e_s * u_buf[i];
}