Tracked in #2974.
Current size
fl::ChannelManager::addDriver(int, fl::shared_ptr<fl::IChannelDriver>) is 1,756 B on ESP32-S3. The function is called only 6x total (all from BusTraits<(fl::Bus)1>::registerWithManager() during static init), so paying ~1.7 KB at every call site is poor density for a function that almost never runs in steady state.
Breakdown - hot vs cold
Source: src/fl/channels/manager.cpp.hpp lines 51-139.
HOT (always-taken on the happy path)
- Null-pointer check on
driver (predictable: never taken in real callers)
driver->getName() retrieval into engineName
- Empty-name check (predictable: never taken - built-in drivers always return non-empty)
- Linear scan of
mDrivers for duplicate name (~1-4 entries, all miss on the happy path)
mDrivers.push_back({priority, driver, engineName, enabled})
fl::sort_small(...) to maintain priority order
- Exclusive-driver enable check
COLD (only triggered on validation failure / replacement / debug logging)
FL_WARN("... Null driver provided") - log + early return
FL_WARN("... Engine has empty name ...") - log + early return
FL_DBG("... already registered ... idempotent no-op") - log + early return
FL_WARN("... Replacing existing driver ...") - log
FL_DBG("Waiting for all drivers to become READY ...") + waitForReady() call (490 B callee) + erase loop + FL_DBG("Removing old driver ...")
FL_DBG("Added driver ... caps: ...") block (already gated behind #if FASTLED_HAS_DBG, but the caps-string construction still emits some code)
Top callees inflating the symbol
fl::ChannelManager::waitForReady(unsigned long) - 490 B (only reached on replace path)
fl::vector_basic::push_back_move_impl(void*) - 234 B (hot)
fl::detail::log_emit - 222 B (cold, multiple call sites)
- Several
fl::string ops - ~250 B combined (mostly for log formatting)
Proposed change
Split into a hot core and a [[gnu::cold]] out-of-line helper:
[[gnu::cold]] [[gnu::noinline]]
static void addDriver_slow(ChannelManager* self,
int priority,
fl::shared_ptr<IChannelDriver> driver,
AddDriverError err,
const fl::string& engineName);
void ChannelManager::addDriver(int priority, fl::shared_ptr<IChannelDriver> driver) FL_NOEXCEPT {
if (FL_UNLIKELY(!driver)) {
return addDriver_slow(this, priority, driver, AddDriverError::NullDriver, {});
}
fl::string engineName = driver->getName();
if (FL_UNLIKELY(engineName.empty())) {
return addDriver_slow(this, priority, driver, AddDriverError::EmptyName, engineName);
}
// Duplicate scan - if hit, jump to slow path for replace/no-op/warn handling
for (const auto& entry : mDrivers) {
if (FL_UNLIKELY(entry.name == engineName)) {
return addDriver_slow(this, priority, driver,
(entry.driver == driver && entry.priority == priority)
? AddDriverError::IdempotentNoOp
: AddDriverError::Replacing,
engineName);
}
}
// Happy path: no duplicate, just append + sort
bool enabled = mExclusiveDriver.empty() || (engineName == mExclusiveDriver);
mDrivers.push_back({priority, driver, engineName, enabled});
fl::sort_small(mDrivers.begin(), mDrivers.end());
}
addDriver_slow owns:
- All
FL_WARN / FL_DBG emissions (log_emit ~222 B x N sites)
- The
waitForReady() call (490 B callee) used during the replace flow
- The erase loop for the old duplicate
- The post-replace
push_back + sort_small (or factor those back through the hot path after _slow returns the cleaned-up state)
- The
#if FASTLED_HAS_DBG caps-string construction (already gated, but the strings/log_emit still pull weight on debug builds)
Estimated savings
- Hot path shrinks to ~200 B (precondition checks +
push_back + sort_small).
- Cold helper carries the ~1.5 KB of validation strings, log_emit calls,
waitForReady() glue, and erase logic.
- Net icache savings on the hot path: ~800-1000 B per binary.
- The cold helper ships once and lives in a
.text.cold section, out of the hot icache region. [[gnu::cold]] tells GCC/clang to optimize it for size and place it away from the hot path.
Additional benefit
Once the hot path is ~200 B, the 6 call sites in BusTraits<...>::registerWithManager() become inlining candidates - currently the 1.7 KB body blocks inlining outright. Inlining the slim hot path may further reduce per-call overhead during static init.
Constraint
Logging stays enabled (per #2974). This change does NOT remove any log site - it moves them out of the hot icache path so the happy-path code stays dense.
Tracked in #2974.
Current size
fl::ChannelManager::addDriver(int, fl::shared_ptr<fl::IChannelDriver>)is 1,756 B on ESP32-S3. The function is called only 6x total (all fromBusTraits<(fl::Bus)1>::registerWithManager()during static init), so paying ~1.7 KB at every call site is poor density for a function that almost never runs in steady state.Breakdown - hot vs cold
Source:
src/fl/channels/manager.cpp.hpplines 51-139.HOT (always-taken on the happy path)
driver(predictable: never taken in real callers)driver->getName()retrieval intoengineNamemDriversfor duplicate name (~1-4 entries, all miss on the happy path)mDrivers.push_back({priority, driver, engineName, enabled})fl::sort_small(...)to maintain priority orderCOLD (only triggered on validation failure / replacement / debug logging)
FL_WARN("... Null driver provided")- log + early returnFL_WARN("... Engine has empty name ...")- log + early returnFL_DBG("... already registered ... idempotent no-op")- log + early returnFL_WARN("... Replacing existing driver ...")- logFL_DBG("Waiting for all drivers to become READY ...")+waitForReady()call (490 B callee) + erase loop +FL_DBG("Removing old driver ...")FL_DBG("Added driver ... caps: ...")block (already gated behind#if FASTLED_HAS_DBG, but the caps-string construction still emits some code)Top callees inflating the symbol
fl::ChannelManager::waitForReady(unsigned long)- 490 B (only reached on replace path)fl::vector_basic::push_back_move_impl(void*)- 234 B (hot)fl::detail::log_emit- 222 B (cold, multiple call sites)fl::stringops - ~250 B combined (mostly for log formatting)Proposed change
Split into a hot core and a
[[gnu::cold]]out-of-line helper:addDriver_slowowns:FL_WARN/FL_DBGemissions (log_emit~222 B x N sites)waitForReady()call (490 B callee) used during the replace flowpush_back+sort_small(or factor those back through the hot path after_slowreturns the cleaned-up state)#if FASTLED_HAS_DBGcaps-string construction (already gated, but the strings/log_emit still pull weight on debug builds)Estimated savings
push_back+sort_small).waitForReady()glue, and erase logic..text.coldsection, out of the hot icache region.[[gnu::cold]]tells GCC/clang to optimize it for size and place it away from the hot path.Additional benefit
Once the hot path is ~200 B, the 6 call sites in
BusTraits<...>::registerWithManager()become inlining candidates - currently the 1.7 KB body blocks inlining outright. Inlining the slim hot path may further reduce per-call overhead during static init.Constraint
Logging stays enabled (per #2974). This change does NOT remove any log site - it moves them out of the hot icache path so the happy-path code stays dense.