Tracked in #2974 (item TBD-E from the 1-4KB band audit).
Current state
fl::Channel::showPixels(PixelController<RGB, 1, -1>&) is 1,884 B on ESP32-S3 Blink. The bloat breakdown by inlined callee:
| Inlined callee |
Bytes |
Notes |
fl::Channel::resolveDynamicDriver() |
927 B |
called x43 from inside showPixels |
fl::(anonymous namespace)::ReorderingPixelIteratorAny::ReorderingPixelIteratorAny(...) |
743 B |
XYMap reordering + iterator ctor |
fl::(anonymous namespace)::writeUCS7604(fl::vector_psram<u8>*, fl::PixelIterator&, ...) |
507 B |
gated by FASTLED_DISABLE_UCS7604 |
fl::(anonymous namespace)::emitDisabledDriverError(fl::string const&, fl::string const&, ...) |
490 B |
gated FL_NO_INLINE already (#2773 follow-up to #2832) |
fl::PixelIterator::writeWS2812<...>(...) |
inlined |
clockless dispatch |
fl::PixelIterator::writeSK9822<...>(...) |
inlined |
SPI dispatch |
fl::PixelIterator::writeAPA102<...>(...) |
inlined |
SPI dispatch |
So a single showPixels symbol carries: the pre-bound vs. dynamic-driver dispatch, three chipset writer template specializations, the dynamic-driver resolution chain, the disabled-driver diagnostic, and the XYMap-reordering iterator construction --- all welded together.
Where the body lives
src/fl/channels/channel.cpp.hpp lines 449-673 (Channel::showPixels). The per-chipset dispatch is the inner switch blocks:
- Lines 523-547:
ClocklessChipset switch on clockless->encoder -> pixelIterator.writeWS2812(&data) / writeUCS7604(...)
- Lines 549-619:
SpiChipsetConfig switch on config.chipset (11 cases) -> writeAPA102 / writeSK9822 / writeWS2801 / writeP9813 / writeLPD8806 / writeLPD6803 / writeSM16716 / writeHD108
All writeXXX methods live in src/fl/chipsets/encoders/pixel_iterator.h (lines 202, 223, 255, ...) and are fully inlined templates that the compiler folds into showPixels via the call sites above.
The 43x resolveDynamicDriver call count --- diagnose
resolveDynamicDriver() itself is already FL_NO_INLINE (channel.cpp.hpp:390). It is statically called from exactly one site in showPixels (line 493).
The x43 figure in the backref graph is almost certainly the per-instruction caller count from disassembly --- every machine-level branch to resolveDynamicDriver plus all the implied per-edge counts from the symbol graph after the switch tables get flattened, not 43 distinct call sites in C++. Symptom: even though FL_NO_INLINE keeps the body out-of-line, the call-site setup (build the args, save caller-saved regs, branch, restore) is duplicated by the compiler at every branch fan-in inside the dispatch switches. With ~12-14 writeXXX cases each having their own restore path, the short instruction sequence around the call multiplies.
The actionable read: FL_NO_INLINE is doing its job on the body, but the call site itself is being duplicated by the switch dispatch.
Proposed fix
Move per-chipset writer selection off the inline switch and onto the IChannelDriver interface (src/fl/channels/driver.h) --- or to a function-pointer table held by ChannelData:
Option A: virtual writer on IChannelDriver
class IChannelDriver {
public:
// ... existing enqueue/show/poll ...
// Default: dispatch via the current inline switch (back-compat).
// Override per-driver to call exactly the writeXXX the driver supports.
virtual void encodePixels(PixelIterator& it,
fl::vector_psram<u8>* out,
const ChipsetVariant& chipset) FL_NOEXCEPT;
};
Each concrete driver implements just the encoders it needs. Channel::showPixels becomes:
driver->encodePixels(pixelIterator, &data, mChipset);
The 3+ writer templates each become one out-of-line symbol (the virtual override body), not 3 specializations inlined into one ~700 B blob.
Option B: function-pointer table on ChipsetVariant / ChannelData
A static constexpr table indexed by ClocklessEncoder / SpiChipset enum, pointing at &PixelIterator::writeXXX. Avoids the vtable cost but keeps the call site to one indirect branch.
Either way, the goal is to make the writer dispatch a single indirect call out of showPixels, splitting the writer bodies into their own symbols where dead-code elimination (--gc-sections) can drop the unused ones.
Estimated savings
~600-1000 B on this symbol (showPixels itself), with a small portion of that re-spent in the out-of-line writer specializations. Net savings: ~400-800 B because previously-merged-and-deduplicated writer code now lives once per chipset rather than once per instantiation site.
This is additive with the gates already shipped: FASTLED_DISABLE_UCS7604 (#2920), FASTLED_DISABLE_SPI_CHIPSETS (#2913), FASTLED_DISABLE_DYNAMIC_DRIVER (#2926).
Perf trade-off
1 virtual call per FastLED.show() --- sub-microsecond cost on every supported target:
- ESP32-S3 @ 240 MHz: ~1 indirect-jump = 3-5 cycles = ~20 ns
- Even at 60 Hz, this is 20 ns / 16.7 ms = 1.2 ppm of frame time
- WS2812 timing budget for 100 LEDs is ~3 ms; the virtual call is 0.00067% of one frame's encode time
Verdict: free. The encode loop runs numLeds * bytes_per_led * 8 bit-bangs; one extra branch in the prologue is invisible.
Constraint preservation
Per #2974, logging stays enabled. This fix is purely a dispatch-shape change --- every existing FL_ERROR / FL_WARN site continues to fire as before. The emitDisabledDriverError cold helper (already FL_NO_INLINE per #2773) remains untouched.
Acceptance criteria
Refs
Tracked in #2974 (item TBD-E from the 1-4KB band audit).
Current state
fl::Channel::showPixels(PixelController<RGB, 1, -1>&)is 1,884 B on ESP32-S3 Blink. The bloat breakdown by inlined callee:fl::Channel::resolveDynamicDriver()fl::(anonymous namespace)::ReorderingPixelIteratorAny::ReorderingPixelIteratorAny(...)fl::(anonymous namespace)::writeUCS7604(fl::vector_psram<u8>*, fl::PixelIterator&, ...)FASTLED_DISABLE_UCS7604fl::(anonymous namespace)::emitDisabledDriverError(fl::string const&, fl::string const&, ...)FL_NO_INLINEalready (#2773 follow-up to #2832)fl::PixelIterator::writeWS2812<...>(...)fl::PixelIterator::writeSK9822<...>(...)fl::PixelIterator::writeAPA102<...>(...)So a single
showPixelssymbol carries: the pre-bound vs. dynamic-driver dispatch, three chipset writer template specializations, the dynamic-driver resolution chain, the disabled-driver diagnostic, and the XYMap-reordering iterator construction --- all welded together.Where the body lives
src/fl/channels/channel.cpp.hpplines 449-673 (Channel::showPixels). The per-chipset dispatch is the innerswitchblocks:ClocklessChipsetswitch onclockless->encoder->pixelIterator.writeWS2812(&data)/writeUCS7604(...)SpiChipsetConfigswitch onconfig.chipset(11 cases) ->writeAPA102/writeSK9822/writeWS2801/writeP9813/writeLPD8806/writeLPD6803/writeSM16716/writeHD108All
writeXXXmethods live insrc/fl/chipsets/encoders/pixel_iterator.h(lines 202, 223, 255, ...) and are fully inlined templates that the compiler folds into showPixels via the call sites above.The 43x
resolveDynamicDrivercall count --- diagnoseresolveDynamicDriver()itself is alreadyFL_NO_INLINE(channel.cpp.hpp:390). It is statically called from exactly one site inshowPixels(line 493).The x43 figure in the backref graph is almost certainly the per-instruction caller count from disassembly --- every machine-level branch to
resolveDynamicDriverplus all the implied per-edge counts from the symbol graph after the switch tables get flattened, not 43 distinct call sites in C++. Symptom: even thoughFL_NO_INLINEkeeps the body out-of-line, the call-site setup (build the args, save caller-saved regs, branch, restore) is duplicated by the compiler at every branch fan-in inside the dispatch switches. With ~12-14 writeXXX cases each having their own restore path, the short instruction sequence around the call multiplies.The actionable read:
FL_NO_INLINEis doing its job on the body, but the call site itself is being duplicated by the switch dispatch.Proposed fix
Move per-chipset writer selection off the inline switch and onto the
IChannelDriverinterface (src/fl/channels/driver.h) --- or to a function-pointer table held byChannelData:Option A: virtual writer on IChannelDriver
Each concrete driver implements just the encoders it needs.
Channel::showPixelsbecomes:The 3+ writer templates each become one out-of-line symbol (the virtual override body), not 3 specializations inlined into one ~700 B blob.
Option B: function-pointer table on ChipsetVariant / ChannelData
A
static constexprtable indexed byClocklessEncoder/SpiChipsetenum, pointing at&PixelIterator::writeXXX. Avoids the vtable cost but keeps the call site to one indirect branch.Either way, the goal is to make the writer dispatch a single indirect call out of
showPixels, splitting the writer bodies into their own symbols where dead-code elimination (--gc-sections) can drop the unused ones.Estimated savings
~600-1000 B on this symbol (showPixels itself), with a small portion of that re-spent in the out-of-line writer specializations. Net savings: ~400-800 B because previously-merged-and-deduplicated writer code now lives once per chipset rather than once per instantiation site.
This is additive with the gates already shipped:
FASTLED_DISABLE_UCS7604(#2920),FASTLED_DISABLE_SPI_CHIPSETS(#2913),FASTLED_DISABLE_DYNAMIC_DRIVER(#2926).Perf trade-off
1 virtual call per
FastLED.show()--- sub-microsecond cost on every supported target:Verdict: free. The encode loop runs
numLeds * bytes_per_led * 8bit-bangs; one extra branch in the prologue is invisible.Constraint preservation
Per #2974, logging stays enabled. This fix is purely a dispatch-shape change --- every existing
FL_ERROR/FL_WARNsite continues to fire as before. TheemitDisabledDriverErrorcold helper (alreadyFL_NO_INLINEper #2773) remains untouched.Acceptance criteria
Channel::showPixelssymbol drops below ~1,000 B on ESP32-S3 BlinkwriteWS2812,writeSK9822,writeAPA102appear as distinct symbols in the bloat report (not folded into showPixels)--gc-sectionswhen their driver isn't linkedbash test --cppRefs
emitDisabledDriverErrorout-of-line), refactor(channels): extract showPixels cold-path errors + gate addDriver capStr build (#2773 iter 3) #2832, feat(stl): FL_NO_INLINE portable macro + cold-helper split for showPixels (#2773 item 2.1) #2830 (resolveDynamicDriverFL_NO_INLINE)src/fl/channels/channel.cpp.hpp(lines 449-673)src/fl/channels/channel.hsrc/fl/channels/driver.hsrc/fl/chipsets/encoders/pixel_iterator.h